A general linear program takes the form \[
\begin{array}{ll}
\text{minimize}& \mathbf{c}^T \mathbf{x} \\
\text{subject to}& \mathbf{A} \mathbf{x} = \mathbf{b} \\
& \mathbf{G} \mathbf{x} \leq \mathbf{h}.
\end{array}
\] Linear program is a convex optimization problem, why?
The standard form of an LP is \[
\begin{array}{ll}
\text{minimize}& \mathbf{c}^T \mathbf{z} \\
\text{subject to}& \mathbf{A} \mathbf{z} = \mathbf{b} \\
& \mathbf{z} \geq \mathbf{0}.
\end{array}
\]
To transform a general linear program into the standard form, we introduce the slack variables\(\mathbf{s} \geq \mathbf{0}\) such that \(\mathbf{G} \mathbf{x} + \mathbf{s} = \mathbf{h}\). Then we write \(\mathbf{x} = \mathbf{x}^+ - \mathbf{x}^-\), where \(\mathbf{x}^+ \geq \mathbf{0}\) and \(\mathbf{x}^- \geq \mathbf{0}\). This yields the problem \[
\begin{array}{ll}
\text{minimize}& \mathbf{c}^T (\mathbf{x}^+ - \mathbf{x}^-) \\
\text{subject to}& \mathbf{A} (\mathbf{x}^+ - \mathbf{x}^-) = \mathbf{b} \\
& \mathbf{G} (\mathbf{x}^+ - \mathbf{x}^-) + \mathbf{s} = \mathbf{h} \\
& \mathbf{x}^+ \geq \mathbf{0}, \mathbf{x}^- \geq \mathbf{0}, \mathbf{s} \geq \mathbf{0}
\end{array}
\] in \(\mathbf{x}^+\), \(\mathbf{x}^-\), and \(\mathbf{s}\). Letting \(\mathbf{z} = (\mathbf{x}^{+}, \mathbf{x}^{-}, \mathbf{s})\) makes the formulation standard: \[
\begin{array}{ll}
\text{minimize}& (\mathbf{c}^T, -\mathbf{c}^T, \mathbf{0}) \mathbf{z} \\
\text{subject to}& [\mathbf{A}, -\mathbf{A}, \mathbf{0}] \mathbf{z} = \mathbf{b} \\
& [\mathbf{G}, -\mathbf{G}, \mathbf{I}] \mathbf{z} = \mathbf{h} \\
& \mathbf{z} \geq \mathbf{0}
\end{array}
\]
Slack variables are often used to transform a complicated inequality constraint to simple non-negativity constraints.
The inequality form of an LP is \[
\begin{array}{ll}
\text{minimize}& \mathbf{c}^T \mathbf{x} \\
\text{subject to}& \mathbf{G} \mathbf{x} \leq \mathbf{h}.
\end{array}
\]
Some software, e.g., solveLP in R, requires an LP be written in either standard or inequality form. Modeling tools do this for you.
Problems that can be formulated as LP
Convex piecewise-linear minimization problem
\[
\begin{array}{ll}
\text{minimize}& \max_{i=1,\ldots,m} (\mathbf{a}_i^T \mathbf{x} + b_i)
\end{array}
\] can be transformed to an LP \[
\begin{array}{ll}
\text{minimize}& t \\
\text{subject to}& \mathbf{a}_i^T \mathbf{x} + b_i \le t, \quad i = 1,\ldots,m,
\end{array}
\] in \(\mathbf{x}\) and \(t\).
Apparently \[
\text{minimize} \max_{i=1,\ldots,m} |\mathbf{a}_i^T \mathbf{x} + b_i|
\] and \[
\text{minimize} \max_{i=1,\ldots,m} (\mathbf{a}_i^T \mathbf{x} + b_i)_+
\] are also LP.
Linear fractional programming
\[
\begin{array}{ll}
\text{minimize}& \frac{\mathbf{c}^T \mathbf{x} + d}{\mathbf{e}^T \mathbf{x} + f} \\
\text{subject to}& \mathbf{A} \mathbf{x} = \mathbf{b} \\
& \mathbf{G} \mathbf{x} \leq \mathbf{h} \\
& \mathbf{e}^T \mathbf{x} + f > 0
\end{array}
\] can be transformed to an LP \[
\begin{array}{ll}
\text{minimize}& \mathbf{c}^T \mathbf{y} + d z \\
\text{subject to}& \mathbf{G} \mathbf{y} - z \mathbf{h} \leq \mathbf{0} \\
& \mathbf{A} \mathbf{y} - z \mathbf{b} = \mathbf{0} \\
& \mathbf{e}^T \mathbf{y} + f z = 1 \\
& z \ge 0
\end{array}
\] in \(\mathbf{y}\) and \(z\), via transformation of variables \[
\begin{array}{ll}
\mathbf{y} = \frac{\mathbf{x}}{\mathbf{e}^T \mathbf{x} + f}, \quad z = \frac{1}{\mathbf{e}^T \mathbf{x} + f}.
\end{array}
\] See Section 4.3.2 of Boyd and Vandenberghe (2004) for proof.
LP example 1: compressed sensing
Compressed sensingCandes and Tao (2006) and Donoho (2006) tried to address a fundamental question: how to compress and transmit a complex signal (e.g., musical clips, mega-pixel images), which can be decoded to recover the original signal?
Suppose a signal \(\mathbf{x} \in \mathbb{R}^n\) is sparse, in the sense taht there are at most \(s\) non-zero components. We undersample the signal by multiplying a (fat) measurement matrix \(\mathbf{A} \in \mathbb{R}^{m\times n}\) that has iid normal entries. That is, our measurements are \(\mathbf{y} = \mathbf{A} \mathbf{x}\) if there is no noise. Note this is to find a solution to an underdetermined linear system.
Candes, Romberg and Tao (2006) showed that the solution to \[
\begin{array}{ll}
\text{minimize}& \|\mathbf{x}\|_1 \\
\text{subject to}& \mathbf{A} \mathbf{x} = \mathbf{y}
\end{array}
\] exactly recovers the true signal under certain conditions on \(\mathbf{A}\) when \(n \gg s\) and \(m \approx s \ln(n/s)\). Why sparsity is a reasonable assumption? Virtually all real-world images have low information content. This is the basis of JPEG and MPEG image compression formats.
Also used in fast medical imaging:
Fast and accurate reconstruction for susceptibility source separation in QSM”, Seyoon Ko, Jingu Lee, Joong-Ho Won, and Jongho Lee, ISMRM 2018.
Compared to the minimum \(\ell_2\) norm solution (Moore-Penrose), minimum \(\ell_1\) solution does not have a closed form solution. However, it is a convex optimization problem.
The \(\ell_1\) minimization problem is apparently an LP, by writing \(\mathbf{x} = \mathbf{x}^+ - \mathbf{x}^-\), \[
\begin{array}{ll}
\text{minimize}& \mathbf{1}^T (\mathbf{x}^+ + \mathbf{x}^-) \\
\text{subject to}& \mathbf{A} (\mathbf{x}^+ - \mathbf{x}^-) = \mathbf{y} \\
& \mathbf{x}^+ \geq \mathbf{0}, \mathbf{x}^- \geq \mathbf{0}.
\end{array}
\]
Numerical examples:
Generate a sparse signal and sub-sampling
usingPlots, Random# random seedRandom.seed!(280)# Size of signaln =1024# Sparsity (# nonzeros) in the signals =20# Number of samples (undersample by a factor of 8) m =128# Generate and display the signalx0 =zeros(n)x0[rand(1:n, s)] =randn(s)# Generate the random sensing matrixA =randn(m, n) / m# Subsample by multiplexingy = A * x0;
# plot the true signalplot(x0, line=:stem, title="True signal x_0", label=false)
Solve LP by modeling tool Convex.jl
usingConvex# Use COSMO solverusingCOSMOsolver = COSMO.Optimizer()## Use SCS solver# using SCS# solver = SCS.Optimizer()## Use Mosek solver# using Mosek, MosekTools, MathOptInterface# const MOI = MathOptInterface# solver = Mosek.Optimizer()## Use Hypatia solver# using Hypatia# solver = Hypatia.Optimizer()# Set up optimizaiton problemx =Variable(n)problem =minimize(norm(x, 1))problem.constraints += A * x == y# Solve the problem@timesolve!(problem, solver)
# Display the solutionplot(x0, line=:stem, label="true")plot!(x.value, line=:stem, title="Reconstructed signal overlayed with x0", label="recon")
LP example 2: quantile regression
In linear regression, we model the mean of response variable as a function of covariates.
In many situations, the error variance is not constant, the distribution of \(y\) may be asymmetric, or we simply care about the quantile(s) of response variable. Quantile regression offers a better modeling tool in these applications.
For a cumulative distribution function \(F_Y\) of \(Y\) on \(\mathbb{R}\), the \(\tau\)-quantile is \(q_{\tau} = F_Y^{-1}(\tau) \triangleq \inf\{y: F_Y(y) \geq \tau\}\).
It can be shown that \(q_{\tau}\) minimizes \(L(q) = \mathbf{E}[\rho_{\tau}(Y - q)]\) where \(\rho_\tau(u) = u (\tau - 1_{\{u < 0\}})\).
The conditional quantile of \(Y\) given \(X=\mathbf{x}\) is then \(q_{\tau}(\mathbf{x}) = \inf\{y: F_{Y|X}(y | \mathbf{x}) \geq \tau \}\), where \(F_{Y|X}(\cdot | \mathbf{x})\) is the conditional CDF of \(Y\) given \(X=\mathbf{x}\).
We model \(q_{\tau}(\mathbf{x}) = \beta^T\mathbf{x}\) for each \(\tau\).
Thus in practice we minimize the sample loss function \[
L(\beta) = \frac{1}{n}\sum_{i=1}^n \rho_\tau (y_i - \mathbf{x}_i^T \beta).
\]
Writing \(\mathbf{y} - \mathbf{X} \beta = \mathbf{r}^+ - \mathbf{r}^-\), this is equivalent to the LP \[
\begin{array}{ll}
\text{minimize} & \tau \mathbf{1}^T \mathbf{r}^+ + (1-\tau) \mathbf{1}^T \mathbf{r}^- \\
\text{subject to} & \mathbf{r}^+ - \mathbf{r}^- = \mathbf{y} - \mathbf{X} \beta \\
& \mathbf{r}^+ \geq \mathbf{0}, \mathbf{r}^- \geq \mathbf{0}
\end{array}{ll}
\] in \(\mathbf{r}^+\), \(\mathbf{r}^-\), and \(\beta\).
Special case: LAD (\(\ell_1\) regression)
A popular method in robust statistics is the least absolute deviation (LAD) regression that minimizes the median absolute deviation (MAD) rather than the mean squared deviation. This in fact minimizes the \(\ell_1\) norm of the residual vector \(\|\mathbf{y} - \mathbf{X} \beta\|_1\). This apparently is equivalent to the LP \[
\begin{array}{ll}
\text{minimize}& \mathbf{1}^T (\mathbf{r}^+ + \mathbf{r}^-) \\
\text{subject to} & \mathbf{r}^+ - \mathbf{r}^- = \mathbf{y} - \mathbf{X} \beta \\
& \mathbf{r}^+ \geq \mathbf{0}, \mathbf{r}^- \geq \mathbf{0}
\end{array}
\] in \(\mathbf{r}^+\), \(\mathbf{r}^-\), and \(\beta\).
LP Example 3: \(\ell_\infty\) regression (Chebychev approximation)
Minimizing the worst possible residual \(\|\mathbf{y} - \mathbf{X} \beta\|_\infty\) is equivalent to the LP \[
\begin{array}{ll}
\text{minimize}& t \\
\text{subject to}& -t \le y_i - \mathbf{x}_i^T \beta \le t, \quad i = 1,\dots,n
\end{array}
\] in variables \(\beta\) and \(t\).
LP Example 4: Dantzig selector
Candes and Tao (2007) proposed a variable selection method called the Dantzig selector that solves \[
\begin{array}{ll}
\text{minimize}& \|\mathbf{X}^T (\mathbf{y} - \mathbf{X} \beta)\|_\infty \\
\text{subject to}& \sum_{j=2}^p |\beta_j| \le t,
\end{array}
\] which can be transformed to an LP. They named the method after George Dantzig, who invented the simplex method for efficiently solving LP in the 1950s.
LP Example 5: 1-norm SVM
In two-class classification problems, we are given training data \((\mathbf{x}_i, y_i)\), \(i=1,\ldots,n\), where \(\mathbf{x}_i \in \mathbb{R}^p\) are feature vectors and \(y_i \in \{-1, 1\}\) are class labels. Zhu, Rosset, Tibshirani, and Hastie (2004) proposed the 1-norm support vector machine (SVM) that achieves the dual purpose of classification and feature selection. Denote the solution of the optimization problem \[
\begin{array}{ll}
\text{minimize}& \sum_{i=1}^n \left[ 1 - y_i \left( \beta_0 + \sum_{j=1}^p x_{ij} \beta_j \right) \right]_+ \\
\text{subject to}& \|\beta\|_1 = \sum_{j=1}^p |\beta_j| \le t
\end{array}
\] by \(\hat \beta_0(t)\) and \(\hat \beta(t)\).
Note \([x]_+ = \max(0, x) = \rho_1(x)\). This 1-norm SVM problem is an LP.
1-norm SVM classifies a future feature vector \(\mathbf{x}\) by the sign of fitted model \[
\begin{array}{ll}
\hat f(\mathbf{x}) = \hat \beta_0 + \mathbf{x}^T \hat \beta.
\end{array}
\]
usingRandom, LinearAlgebra, SparseArraysusingDataFramesusingMosek, MosekTools, MathOptInterfaceconst MOI = MathOptInterfaceRandom.seed!(123) # seedn, p =100, 10# Design matrixX = [ones(n, 1) randn(n, p)]# True regression coefficients (first 5 are non-zero)β = [1.0; randn(5); zeros(p -5)]Y =sign.(X * β +5*randn(n))# solve at a grid of λ#opt = () -> COSMO.Optimizer(verbose=false) λgrid =0:0.5:10β̂svmpath =zeros(length(λgrid), size(X, 2)) # each row is β̂ at a λβ̂svm =Variable(size(X, 2))@timefor i in1:length(λgrid) λ = λgrid[i]# objective problem =minimize(sum(pos(1- Y .* (X * β̂svm))) + λ *norm(β̂svm[2:end], 1)) solver = Mosek.Optimizer() # MOSEK this time! MOI.set(solver, MOI.RawOptimizerAttribute("LOG"), 0) # keep silent solve!(problem, solver) β̂svmpath[i, :] = β̂svm.valueend
0.208457 seconds (299.61 k allocations: 28.063 MiB)
Many more applications of LP, especially in the assignment problem: airport scheduling (Copenhagen airport uses Gurobi), airline flight scheduling, NFL scheduling, match.com, …
Apparently any loss/penalty or loss/constraint combinations of form \[
\{\ell_1, \ell_\infty, \text{quantile}\} \times \{\ell_1, \ell_\infty, \text{quantile}\},
\] possibly with affine (equality and/or inequality) constraints, can be formulated as an LP.