Optimization meets Machine Learning

14 downloads 38 Views 10MB Size Report
Sep 6, 2017 - optimization is to embrace data in a protagonist role and combine it with machine learning. — Dimitris Bertsimas, Editorial Statement.
Optimization meets Machine Learning Marco L¨ ubbecke Lehrstuhl f¨ ur Operations Research RWTH Aachen University @mluebbecke

OR 2017 · Berlin · September 6, 2017

Optimization is Everywhere

@mluebbecke · Optimization meets Machine Learning · 2/30

Machine Learning is Everywhere

@mluebbecke · Optimization meets Machine Learning · 3/30

Machine Learning is Everywhere

@mluebbecke · Optimization meets Machine Learning · 3/30

Machine Learning is Everywhere

@mluebbecke · Optimization meets Machine Learning · 3/30

Machine Learning is Everywhere

@mluebbecke · Optimization meets Machine Learning · 3/30

Machine Learning is Everywhere

@mluebbecke · Optimization meets Machine Learning · 3/30

Machine Learning is Everywhere

@mluebbecke · Optimization meets Machine Learning · 3/30

Machine Learning is Everywhere

@mluebbecke · Optimization meets Machine Learning · 3/30

Machine Learning is Everywhere

@mluebbecke · Optimization meets Machine Learning · 3/30

Machine Learning is Everywhere

@mluebbecke · Optimization meets Machine Learning · 3/30

Literally Everyone speaks about Machine Learning

@mluebbecke · Optimization meets Machine Learning · 4/30

Literally Everyone speaks about Machine Learning

@mluebbecke · Optimization meets Machine Learning · 4/30

Literally Everyone speaks about Machine Learning

@mluebbecke · Optimization meets Machine Learning · 4/30

New INFORMS Journal on Optimization

“ “

One of the largest opportunities of the field of optimization is to embrace data in a protagonist role and combine it with machine learning. — Dimitris Bertsimas, Editorial Statement

My vision of the future for [. . . ] optimization

@mluebbecke · Optimization meets Machine Learning · 5/30



What is Machine Learning?

Machine Learning Supervised Learning Classification

Unsupervised Learning

Regression Clustering

source: www.mathworks.com

@mluebbecke · Optimization meets Machine Learning · 6/30

Supervised Learning: Classification I

(xi

data X

)

@mluebbecke · Optimization meets Machine Learning · 7/30

Supervised Learning: Classification I

data X

labels Y

(xi , yi )

@mluebbecke · Optimization meets Machine Learning · 7/30

Supervised Learning: Classification I

data X , d features, labels Y

(xi , yi )

φ : X → Rd

(φ(xi ), yi )

@mluebbecke · Optimization meets Machine Learning · 7/30

Supervised Learning: Classification I

data X , d features, labels Y

(xi , yi )

φ : X → Rd

(φ(xi ), yi )

@mluebbecke · Optimization meets Machine Learning · 7/30

black box

Supervised Learning: Classification I

data X , d features, labels Y

(xi , yi )

φ : X → Rd

(φ(xi ), yi )

black box

“learns” f : Rd → Y s.t. error(f (φ(xi )), yi ) “small” xi ∈X

@mluebbecke · Optimization meets Machine Learning · 7/30

Supervised Learning: Classification I

data X , d features, labels Y

(xi , yi )

φ : X → Rd

(φ(xi ), yi )

black box

“learns” f : Rd → Y s.t. error(f (φ(xi )), yi ) “small” xi ∈X

an optimization problem

@mluebbecke · Optimization meets Machine Learning · 7/30

Supervised Learning: Classification I

data X , d features, labels Y

(xi , yi )

φ : X → Rd

(φ(xi ), yi )

black box

“learns” f : Rd → Y

validate

s.t. error(f (φ(xi )), yi ) “small” xi ∈X

an optimization problem

@mluebbecke · Optimization meets Machine Learning · 7/30

Binary Classification: Dog or Muffin? Owl or Apple?

@mluebbecke · Optimization meets Machine Learning · 8/30

Supervised Learning: Regression y

y = mx + b I “from {0, 1} to [0, 1]”

P min i ε2i black box yi = m · xi + b + εi

x @mluebbecke · Optimization meets Machine Learning · 9/30

xi ∈ X

Supervised Learning: Regression y

y = mx + b I “from {0, 1} to [0, 1]”

P min i ε2i yi = m · xi + b + εi

x @mluebbecke · Optimization meets Machine Learning · 9/30

xi ∈ X

Unsupervised Learning: Clustering x2

x1 @mluebbecke · Optimization meets Machine Learning · 10/30

Unsupervised Learning: Clustering x2 φ : X → Rd

x1 @mluebbecke · Optimization meets Machine Learning · 10/30

black box

Unsupervised Learning: Clustering x2 y3

φ : X → Rd

black box

“learns”

y1

f : Rd → Y s.t. all x ∈ X : f (φ(x)) = y

y2

are “similar” x1 @mluebbecke · Optimization meets Machine Learning · 10/30

Unsupervised Learning: Clustering x2 y3

φ : X → Rd

black box

“learns”

y1

f : Rd → Y s.t. all x ∈ X : f (φ(x)) = y

y2

are “similar” x1 @mluebbecke · Optimization meets Machine Learning · 10/30

Optimization naturally appears in ML



Optimization lies at the heart of ML. Most ML problems reduce to optimization problems. — Bennett, Parrado-Hern´andez (2006)

I

minimize e.g., prediction error

I

continuous, convex optimization

I

discrete, integer optimization

@mluebbecke · Optimization meets Machine Learning · 11/30



Example: A MIP in a black box : Classification Trees I

optimal classification trees

Bertsimas & Dunn (2017)

source: www.edureka.co/blog/decision-trees

I

use few nodes, shallow depth → formulated as MIP

I

improves accuracy over classical CART method by 0.5–2%

@mluebbecke · Optimization meets Machine Learning · 12/30

Many Opportunities for Discrete Optimization in ML

I

within black boxes to capture combinatorial explosion

→ see also Andrea’s plenary on Friday I

feature selection

I

outlier detection

I

parameter tuning

I

...

@mluebbecke · Optimization meets Machine Learning · 13/30

How about the converse Direction? I

“emulating the expert”

I

observe a decision maker

I

learn their objective function

I

online learning

B¨ armann, Pokutta & Schneider (2017)

max cTtrue x : x ∈ X(p) given (pt , x∗t )t=1,...,T

@mluebbecke · Optimization meets Machine Learning · 14/30

ML may help improving Optimization Algorithms

I

e.g., branching in B&B

I

full strong branching gives locally perfect information

I

predict the strong branching score of a variable

I

features describe state of a variable

I

supervised learning: regression

→ promising proof-of-concept Marcos Alvarez, Louveaux & Wehenkel (2017)

I

survey on ML in branching/searching

@mluebbecke · Optimization meets Machine Learning · 15/30

Lodi & Zarpellon (2017)

A Progress Bar for Branch-and-Bound?

I

predict the runtime of branch-and-bound algorithms Hutter, Xu, Hoos & Leyton-Brown (2014)

CPLEX 12.1 on 1510 publicly available MIPs

predicted

I

actual runtime @mluebbecke · Optimization meets Machine Learning · 16/30

A Progress Bar for Branch-and-Bound?

I

very preliminary experiments with gurobi 7.5

I

predict elapsed runtime percentage Kruber, L, Obeloer genannt Bregenhorn (2017)

@mluebbecke · Optimization meets Machine Learning · 16/30

A Progress Bar for Branch-and-Bound?

I

very preliminary experiments with gurobi 7.5

I

predict elapsed runtime percentage Kruber, L, Obeloer genannt Bregenhorn (2017)

@mluebbecke · Optimization meets Machine Learning · 16/30

Learning when to solve a MIP by Branch-and-Price I

our MIP solver GCG detects many potential DW reformulations .. . MIP +

CPU time .. .

Kruber, L, Parmentier (2017)

@mluebbecke · Optimization meets Machine Learning · 17/30

Learning when to solve a MIP by Branch-and-Price I

our MIP solver GCG detects many potential DW reformulations .. . MIP +

φ

100+ features #conss, #vars, %constype, %vartype, #blocks, . . .

CPU time .. .

Kruber, L, Parmentier (2017)

@mluebbecke · Optimization meets Machine Learning · 17/30

Learning when to solve a MIP by Branch-and-Price I

our MIP solver GCG detects many potential DW reformulations .. . MIP +

φ

100+ features #conss, #vars, %constype, %vartype, #blocks, . . .

CPU time .. .

k-NN learns binary classifier f “run SCIP or GCG?”

Kruber, L, Parmentier (2017)

@mluebbecke · Optimization meets Machine Learning · 17/30

Learning when to solve a MIP by Branch-and-Price I

our MIP solver GCG detects many potential DW reformulations .. . MIP +

φ

100+ features #conss, #vars, %constype, %vartype, #blocks, . . .

CPU time .. .

k-NN learns

f

SCIP GCG

faster SCIP GCG 69.5% 9.9% 6.9% 13.7%

Kruber, L, Parmentier (2017)

@mluebbecke · Optimization meets Machine Learning · 17/30

binary classifier f “run SCIP or GCG?”

What ML Answers can we (Optimizers) expect?

I

we get statistical answers → not what we are used to see

I

we have domain/expert knowledge: e.g., pseudo-costs

I

ML may give a better predictor, but no explanation

I

some info can be extracted from most influential features

→ interpretability is a huge theoretical and practical topic

@mluebbecke · Optimization meets Machine Learning · 18/30

Decision Making: Machine Learning



Machine Learning and Artificial Intelligence delivers the most value when you need to make lots of similar decisions quickly. — Ingo Mierswa, Rapidminer

I

simple decisions: e.g., auto correct current word

I

solution: often a single score → greedy

I

keep/learn habits: extrapolate from the past (!)

@mluebbecke · Optimization meets Machine Learning · 20/30



Typical Example: Predictive Maintenance

source: blog.capterra.com/should-you-invest-in-a-predictive-maintenance-strategy/

@mluebbecke · Optimization meets Machine Learning · 21/30

Exploit all Options: Prescriptive Maintenance

source: www.siemens.com/press/pool/de/pressebilder/photonews/pn200826/300dpi/pn200826-12 300dpi.jpg

@mluebbecke · Optimization meets Machine Learning · 22/30

Decision Making: Optimization



How often can the result of an optimization model be captured in a single variable? — Ed Rothberg, Gurobi



I

solution: not only the objective value!

I

complex decisions/plans: e.g., timetables, crew schedules, . . .

I

global scope: models all (reasonable) interdependencies

@mluebbecke · Optimization meets Machine Learning · 23/30

Perfect Partners

“Current Standard:” Predictive then Prescriptive Analytics ML harnesses the bigness of data (the past and present); Optimization captures the bigness of options (the future).

@mluebbecke · Optimization meets Machine Learning · 24/30

Learning (about) optimal Solutions

I

in recurring complex decision situations

I

learn how good (partial) solutions look like

I

this may help finding good solutions faster

I

learn spatio-temporal patterns to generate effective schedules Le, Liu, Lau (2016)

I

@mluebbecke · Optimization meets Machine Learning · 25/30

Learning (about) Optimization Models

I

ML can make sense of data

I

optimization models are also “data”

→ (how) can ML help us make sense of optimization models?

I

can ML learn good modeling?

@mluebbecke · Optimization meets Machine Learning · 26/30

Vision: Learning (about) Optimization Problems

I

learn the semantics of a MIP model (“the problem”)

⇒ e.g., help the modeler find a better formulation

@mluebbecke · Optimization meets Machine Learning · 27/30

The AI Umbrella?

I

OR vs. analytics discussion X

I

OR vs. AI discussion ?

@mluebbecke · Optimization meets Machine Learning · 28/30

Why is this Relevant?

source: blogs.worldbank.org/category/tags/artificial-intelligence

I

if the fourth industrial revolution is about AI, OR should be part of it

@mluebbecke · Optimization meets Machine Learning · 29/30

Optimization met Machine Learning Marco L¨ ubbecke Lehrstuhl f¨ ur Operations Research RWTH Aachen University @mluebbecke

OR 2017 · Berlin · September 6, 2017

@mluebbecke · Optimization meets Machine Learning · 30/30