A (linearly constrained) quadratic program (QP) has quadratic objective function and affine constraint functions \[
\begin{array}{ll}
&\text{minimize}& (1/2) \mathbf{x}^T \mathbf{P} \mathbf{x} + \mathbf{q}^T \mathbf{x} + r \\
&\text{subject to}& \mathbf{G} \mathbf{x} \leq \mathbf{h} \\
& & \mathbf{A} \mathbf{x} = \mathbf{b},
\end{array}
\] where we require \(\mathbf{P} \in \mathbb{S}_+^d\) (why?). Apparently LP is a special case of QP with \(\mathbf{P} = \mathbf{0}_{n \times n}\).
Example 1: least squares
The least squares problem minimizes \(\|\mathbf{y} - \beta_0\mathbf{1} - \mathbf{X} \beta\|_2^2\), which obviously is an (unconstrained) QP. \[
\min_{\beta} \frac{1}{2}\|\mathbf{y} - \beta_0\mathbf{1} - \mathbf{X} \beta\|_2^2 = \frac{1}{2}\beta^T\mathbf{X}^T\mathbf{X}\beta - (\mathbf{y - \beta_0\mathbf{1}})^T\mathbf{X}\beta + \frac{1}{2}\|\mathbf{y} - \beta_0\mathbf{1}\|_2^2.
\]
If the data is centered (\(\mathbf{1}^T\mathbf{X} = \mathbf{0}\)), then \(\beta_0 = \frac{1}{n}\mathbf{y}\).
Usually the intercept \(\beta_0\) is not penalized.
If the data is centered (\(\mathbf{1}^T\mathbf{X} = \mathbf{0}\)), then \(\beta_0 = \frac{1}{n}\mathbf{y}\). (Why?)
usingStatisticsusingRandom, LinearAlgebra, SparseArraysRandom.seed!(123) # seedn, p =100, 10# Design matrixX0 =randn(n, p)X = [ones(n, 1) (X0 .-mean(X0 ,dims=1)) ./std(X0, dims=1)] # design matrix is standardized and includes intercept# True regression coefficients (first 5 are non-zero)β = [1.0; randn(5); zeros(p -5)]# Responsesy = X * β +randn(n)# plot the true βusingPlotsplt =plot(1:p, β[2:end], line=:stem, marker=2, legend=:none)xlabel!(plt, "coordinate")ylabel!(plt, "beta_j")title!(plt, "True beta")
usingConvex, MathOptInterfaceusingMosek, MosekToolsconst MOI = MathOptInterface# solve at a grid of λλgrid =0:20:1000ridgepath =zeros(length(λgrid), size(X, 2)) # each row is β̂ at a λβ̂ridge =Variable(size(X, 2))@timefor i in1:length(λgrid) λ = λgrid[i]# objective problem =minimize(sumsquares(y - X * β̂ridge) + λ *sumsquares(β̂ridge[2:end])) solver = Mosek.Optimizer() # MOSEK this time! MOI.set(solver, MOI.RawOptimizerAttribute("LOG"), 0) # keep silent solve!(problem, solver) ridgepath[i, :] = β̂ridge.valueendplt =plot(collect(λgrid), ridgepath, legend=:none)xlabel!(plt, "lambda")ylabel!(plt, "beta_hat")title!(plt, "Ridge")
1.251917 seconds (807.74 k allocations: 66.132 MiB, 1.98% gc time, 71.62% compilation time)
Example 2: constrained least squares
Least squares with linear constraints. For example, nonnegative least squares (NLS) \[
\begin{array}{ll}
&\text{minimize}& \frac 12 \|\mathbf{y} - \beta_0\mathbf{1} - \mathbf{X} \beta\|_2^2 \\
&\text{subject to}& \beta \geq \mathbf{0}.
\end{array}
\]
# Use COSMO solverusingCOSMOsolver = COSMO.Optimizer()## Use SCS solver# using SCS# solver = SCS.Optimizer()## Use Mosek solver# using Mosek, MosekTools# solver = Mosek.Optimizer()## Use Hypatia solver# using Hypatia# solver = Hypatia.Optimizer()# Set up optimizaiton problemβ̂nonneg =Variable(size(X, 2))problem =minimize(0.5sumsquares(y - X * β̂nonneg)) problem.constraints += β̂nonneg[2:end] >=0# Solve the problem@timesolve!(problem, solver)
We have seen this example (zero-sum lasso) before.
usingMosek, MosekTools# solve at a grid of λλgrid =0:10:150β̂conpath =zeros(length(λgrid), size(X, 2)) # each row is β̂ at a λβ̂conlasso =Variable(size(X, 2))@timefor i in1:length(λgrid) λ = λgrid[i]# objective problem =minimize(0.5sumsquares(y - X * β̂conlasso) + λ *norm(β̂conlasso[2:end], 1)) problem.constraints += β̂conlasso[2:end] >=0 solver = Mosek.Optimizer() # MOSEK this time! MOI.set(solver, MOI.RawOptimizerAttribute("LOG"), 0) # keep silent solve!(problem, solver) β̂conpath[i, :] = β̂conlasso.valueendplt =plot(collect(λgrid), β̂conpath, legend=:none)xlabel!(plt, "lambda")ylabel!(plt, "beta_hat")title!(plt, "Constrained Lasso")
0.115292 seconds (120.25 k allocations: 11.904 MiB)
QP example 7: robust regression
The Huber loss function \[\begin{eqnarray*}
\phi(r) = \begin{cases}
r^2 & |r| \le M \\
M(2|r| - M) & |r| > M
\end{cases}
\end{eqnarray*}\] is commonly used in robust statistics. The robust regression problem \[\begin{eqnarray*}
&\text{minimize}& \sum_{i=1}^n \phi(y_i - \beta_0 - \mathbf{x}_i^T \beta)
\end{eqnarray*}\] can be transformed to a QP \[\begin{eqnarray*}
&\text{minimize}& \mathbf{u}^T \mathbf{u} + 2 M \mathbf{1}^T \mathbf{v} \\
&\text{subject to}& - \mathbf{u} - \mathbf{v} \leq \mathbf{y} - \mathbf{X} \beta \leq \mathbf{u} + \mathbf{v} \\
& & \mathbf{0} \leq \mathbf{u} \leq M \mathbf{1}, \mathbf{v} \geq \mathbf{0}
\end{eqnarray*}\] in \(\mathbf{u}, \mathbf{v} \in \mathbb{R}^n\) and \(\beta \in \mathbb{R}^p\). Hint: write \(|r_i| = (|r_i| \wedge M) + (|r_i| - M)_+ = u_i + v_i\).
QP example 8: SVM
In two-class classification problems, we are given training data \((\mathbf{x}_i, y_i)\), \(i=1,\ldots,n\), where \(\mathbf{x}_i \in \mathbb{R}^n\) are feature vector and \(y_i \in \{-1, 1\}\) are class labels. Support vector machine (SVM) solves the optimization problem \[
\begin{array}{ll}
&\text{minimize}& \sum_{i=1}^n \left[ 1 - y_i \left( \beta_0 + \sum_{j=1}^p x_{ij} \beta_j \right) \right]_+ + \lambda \|\beta\|_2^2,
\end{array}
\] where \(\lambda \ge 0\) is a tuning parameters. This is a QP (why?).
# convert to classification problemY =sign.(X * β +5*randn(n))# solve at a grid of λλgrid =0:10:100β̂svmpath =zeros(length(λgrid), size(X, 2)) # each row is β̂ at a λβ̂svm =Variable(size(X, 2))@timefor i in1:length(λgrid) λ = λgrid[i]# objective problem =minimize(sum(pos(1- Y .* (X * β̂svm))) + λ *sumsquares(β̂svm[2:end])) solver = Mosek.Optimizer() # MOSEK this time! MOI.set(solver, MOI.RawOptimizerAttribute("LOG"), 0) # keep silent solve!(problem, solver) β̂svmpath[i, :] = β̂svm.valueendplt =plot(collect(λgrid), β̂svmpath, legend=:none)xlabel!(plt, "lambda")ylabel!(plt, "beta_hat")title!(plt, "SVM")
1.881291 seconds (2.51 M allocations: 167.199 MiB, 4.83% gc time, 93.43% compilation time)