During the last years, research on integrating Machine Learning (ML) into designing efficient and robust meta-heuristics became very popular. As former scientists and academics at Solvice, we believe in keeping up with the state-of-the-art. In  the authors make a distinction between 3 classes of ML methods for improving metaheuristics:
- Problem-level data-driven metaheuristics: ML can help in modeling the optimization problem to solve (e.g. objective function, constraints). It can also assist in landscape analysis and the decomposition of the problem.
- Low-level data-driven metaheuristics: a metaheuristic is composed of different search components. ML can drive any search component such as the initialization of solution(s) (i.e. the construction heuristics), and the search variation operators (e.g. neighborhoods in local search, mutation and crossover in evolutionary algorithms). It may also be used to tune the various parameters of a metaheuristic.
- High-level data-driven metaheuristics: this class of data-driven metaheuristics concerns the selection and generation of metaheuristics, and the design of hybrid and parallel cooperative metaheuristics.
On top of that distinction, we can generalize between offline and online learning time. In offline data-driven metaheuristics, the ML process occurs a priori before starting to solve the problem. In online data driven-metaheuristics, ML gathers knowledge during the search while solving the problem.
At Solvice, we build APIs around solvers that solve different challenging planning and optimisation problems. These APIs are being used by many different companies ranging from logistic providers, field service providers, warehouses, retail stores, to independent software vendors and software-as-a-service providers for planning systems. Our solver platform never “knows in advance” what the next request will be, so we are dealing with dynamic inputs of all sizes and forms. That is why our solvers need to be robust, next to being efficient and fast. How can a solver be robust? There are two levels that define this:
- Robustness on problem size: we should expect linear solve time differences on instances that vary linearly in size while holding quality.
- Robustness on problem variation: for the same problem size, the “flavor” of the problem could differ greatly because of different parameters
Both robustness criteria should hold for any optimization service, but are rarely achieved.
One way to achieve this is by continuously examining every optimization request that enters the system and offline testing for optimal parameterization in order to be always aware of the best parameter combination.
This approach typically falls under the 2nd category of “Low-level data-driven metaheuristic” as we are optimizing the hyperparameters and predicting the ideal hyperparameter set for any new solve request.
Algorithm selection is not new. As early as 1976, researchers have proposed a framework for the so-called algorithm selection problem (ASP), which predicts which algorithm from a portfolio is likely to perform best based on measurable features from a collection of problem instances. There are the four essential components of that model (Fig 1):
- The problem space P represents the set of instances of a problem;
- The feature space F contains measurable characteristics of the instances generated by a computational feature extraction process applied to P;
- The algorithm space A is the set (portfolio) of all considered algorithms for tackling the problem;
- The performance space Y represents the mapping of each algorithm result for a given problem instance to a vector of performance metrics (e.g. running time, solution quality, etc.).
However, that research provides no advice about the mapping from the problem instance space P to the feature space F. To our understanding, this is by far the most challenging task in this study.
Next to user and API generated solve requests, there is a library of problem instances whose goal is to cover all possible flavors of an optimization problem that our solver platform can handle. So generating instances for that problem space that should cover most of that space is the real challenge.
For instance, for a VRP (Vehicle Routing Problem) problem there are plenty of flavors such as capacitated, time windowed, pickup and deliveries, multi period and many others. So how can we automatically generate all possible flavors of a certain problem type? By introducing a feature set that can define a specific flavor.
For the sake of simplicity, we’ll introduce our problem instance generator only for the VRP problems type that is covered by our OnRoute API. How can we define instances by a feature set? Most research defines instance generation based upon structural characteristics of the problem. We suggest using constraint definitions as features for identification. Examples are:
- Time Window constraint (
ctTW): arrival time for any order should happen within the time window defined by a start time and an end time.
- Shift constraint (
ctS): shifts for drivers/vehicles are defined by their shift start and shift end definitions. These shifts form the supply as opposed to the demand given by the actual orders: their service durations and relative drive times.
- Type requirement constraint (
ctT): type or skill requirements are constraints that enforce some orders to be executed by vehicles that hold that skill.
- Capacity constraint (
ctC): orders can incur a certain load. Vehicles hold a capacity that defines the maximum level of load in one route.
For every constraint, we can define the level of constraintness between 0% and 100%. For instance, the broader the time window for an order, the more flexibility the solver has to plan this order. The definition of a feature’s constraintness level is not to generate instances that are harder to solve, merely to generate different instances.
Sure, ML can offer us improved suggestions of the ideal algorithm, but what is the level of that benefit?
We have seen that problem instance size is one of the key discriminators regarding a feature set definition. Let’s take a closer look at what a very typical benchmarking strategy would look like when it comes to comparing different construction heuristics within a metaheuristic. For VRP’s greedy First Fit (FF) assignment, heuristics are usually very fast and performant for small instances up to a few hundreds of orders/locations. However, when calculating larger sized instances, an alternative k-Nearest Neighbor (k-NN) algorithm greatly improves both computation time as well as solution quality.
In figure 2 we compare these two algorithms Greedy FF and k-NN based on their solution quality compared to the final solution quality as a percentage deviation. Clearly, for large instances, k-NN outperforms greedy FF with almost double the performance.
Real-time learning and predicting
To obtain an ML layer that suggests the ideal algorithm (here: hyper parameter combination) for an instance, we need to:
- Build a problem instance dataset
- Create an algorithm set
- Calculate the performance for every instance and algorithm combination (full factorial)
- Select the best performing algorithm
- Feed training set to an ML algorithm and build the model
- Deploy the ML model
Our input variables are all categorical and the goal is to predict the ideal algorithm configuration which is a vector of categorical variables as well. Clearly, this is the case of multi-class multi-level classification for which we can use well-known algorithms such as multinomial logistic regression.
Figure 3 shows that this ML classifier can suggest ideal algorithm configuration in real-time for a new instance based upon the feature set that identifies this instance. Then, the correct solver is chosen to process this solve request.
Enriching our Solver API with a ML layer to predict the optimal algorithm configuration set allows us to learn from our customer’s requests and greatly improve the solution quality as well as performance of the solver continuously.
In the future, we will improve upon our problem feature set definition to encompass more of the potential space of problems and thus be able to predict new instances better. Secondly, by improving our ML algorithm we aim to have higher accuracy in the prediction.
Updated over 2 years ago