Machine learning

machine learning

Machine Learning for optimization

During the last years, research on integrating Machine Learning (ML) into designing efficient and robust meta-heuristics
became very popular. As former scientists and academics at Solvice, we believe in keeping up with the state-of-the-art.
In [1] the authors make a distinction between 3 classes of ML methods for improving metaheuristics:

  1. Problem-level data-driven metaheuristics: ML can help in modeling the optimization problem to solve (e.g.
    objective function, constraints). It can also assist in landscape analysis and the decomposition of the problem.
  2. Low-level data-driven metaheuristics: a metaheuristic is composed of different search components. ML can drive
    any search component such as the initialization of solution(s) (i.e. the construction heuristics), and the search
    variation operators (e.g. neighborhoods in local search, mutation and crossover in evolutionary algorithms). It may
    also be used to tune the various parameters of a metaheuristic.
  3. High-level data-driven metaheuristics: this class of data-driven metaheuristics concerns the selection and
    generation of metaheuristics, and the design of hybrid and parallel cooperative metaheuristics.

On top of that distinction, we can generalize between offline and online learning time. In offline data-driven
metaheuristics, the ML process occurs a priori before starting to solve the problem. In online data
driven-metaheuristics, ML gathers knowledge during the search while solving the problem.

Towards robust solvers

At Solvice, we build APIs around solvers that solve different challenging planning and optimisation problems. These APIs
are being used by many different companies ranging from logistic providers, field service providers, warehouses, retail
stores, to independent software vendors and software-as-a-service providers for planning systems. Our solver platform
never “knows in advance” what the next request will be, so we are dealing with dynamic inputs of all sizes and forms.
That is why our solvers need to be robust, next to being efficient and fast. How can a solver be robust? There are two
levels that define this:

  • Robustness on problem size: we should expect linear solve time differences on instances that vary linearly in size
    while holding quality.
  • Robustness on problem variation: for the same problem size, the “flavor” of the problem could differ greatly because
    of different parameters

Both robustness criteria should hold for any optimization service, but are rarely achieved.

One way to achieve this is by continuously examining every optimization request that enters the system and offline
testing for optimal parameterization in order to be always aware of the best parameter combination.

This approach typically falls under the 2nd category of “Low-level data-driven metaheuristic” as we are optimizing the
hyperparameters and predicting the ideal hyperparameter set for any new solve request.

Algorithm selection

Algorithm selection is not new. As early as 1976, researchers have proposed a framework for the so-called algorithm
selection problem (ASP), which predicts which algorithm from a portfolio is likely to perform best based on measurable
features from a collection of problem instances. There are the four essential components of that model (Fig 1):

  • The problem space P represents the set of instances of a problem;
  • The feature space F contains measurable characteristics of the instances generated by a computational feature
    extraction process applied to P;
  • The algorithm space A is the set (portfolio) of all considered algorithms for tackling the problem;
  • The performance space Y represents the mapping of each algorithm result for a given problem instance to a vector
    of performance metrics (e.g. running time, solution quality, etc.).
862

Figure 1. Algorithm Selection Problem

However, that research provides no advice about the mapping from the problem instance space P to the feature space F. To
our understanding, this is by far the most challenging task in this study.

Generation solver data

Next to user and API generated solve requests, there is a library of problem instances whose goal is to cover all
possible flavors of an optimization problem that our solver platform can handle. So generating instances for that
problem space that should cover most of that space is the real challenge.

For instance, for a VRP (Vehicle Routing Problem) problem there are plenty of flavors such as capacitated, time
windowed, pickup and deliveries, multi period and many others. So how can we automatically generate all possible flavors
of a certain problem type? By introducing a feature set that can define a specific flavor.

For the sake of simplicity, we’ll introduce our problem instance generator only for the VRP problems type that is
covered by our OnRoute API. How can we define instances by a feature set? Most research defines instance generation
based upon structural characteristics of the problem. We suggest using constraint definitions as features for
identification. Examples are:

  • Time Window constraint (ctTW): arrival time for any order should happen within the time window defined by a start
    time and an end time.
  • Shift constraint (ctS): shifts for drivers/vehicles are defined by their shift start and shift end definitions.
    These shifts form the supply as opposed to the demand given by the actual orders: their service durations and relative
    drive times.
  • Type requirement constraint (ctT): type or skill requirements are constraints that enforce some orders to be
    executed by vehicles that hold that skill.
  • Capacity constraint (ctC): orders can incur a certain load. Vehicles hold a capacity that defines the maximum level
    of load in one route.

For every constraint, we can define the level of constraintness between 0% and 100%. For instance, the broader the time
window for an order, the more flexibility the solver has to plan this order. The definition of a feature’s
constraintness level is not to generate instances that are harder to solve, merely to generate different instances.

Influence of the performance of Algorithms

Sure, ML can offer us improved suggestions of the ideal algorithm, but what is the level of that benefit?

We have seen that problem instance size is one of the key discriminators regarding a feature set definition. Let’s take
a closer look at what a very typical benchmarking strategy would look like when it comes to comparing different
construction heuristics within a metaheuristic. For VRP’s greedy First Fit (FF) assignment, heuristics are usually very
fast and performant for small instances up to a few hundreds of orders/locations. However, when calculating larger sized
instances, an alternative k-Nearest Neighbor (k-NN) algorithm greatly improves both computation time as well as solution
quality.

In figure 2 we compare these two algorithms Greedy FF and k-NN based on their solution quality compared to the final
solution quality as a percentage deviation. Clearly, for large instances, k-NN outperforms greedy FF with almost double
the performance.

1200

Figure 2: Influence of feature (size) on the Algorithm performance (% of optimised solution)

Real-time learning and predicting

To obtain an ML layer that suggests the ideal algorithm (here: hyper parameter combination) for an instance, we need to:

  1. Build a problem instance dataset
  2. Create an algorithm set
  3. Calculate the performance for every instance and algorithm combination (full factorial)
  4. Select the best performing algorithm
  5. Feed training set to an ML algorithm and build the model
  6. Deploy the ML model

Our input variables are all categorical and the goal is to predict the ideal algorithm configuration which is a vector
of categorical variables as well. Clearly, this is the case of multi-class multi-level classification for which we can
use well-known algorithms such as multinomial logistic regression.

Figure 3 shows that this ML classifier can suggest ideal algorithm configuration in real-time for a new instance based
upon the feature set that identifies this instance. Then, the correct solver is chosen to process this solve request.

1600

Figure 3: Solver API using ML classifier to suggest ideal algorithm online.

Future roadmap

Enriching our Solver API with a ML layer to predict the optimal algorithm configuration set allows us to learn from our
customer’s requests and greatly improve the solution quality as well as performance of the solver continuously.

In the future, we will improve upon our problem feature set definition to encompass more of the potential space of
problems and thus be able to predict new instances better. Secondly, by improving our ML algorithm we aim to have higher
accuracy in the prediction.

Stay tuned!