Consider a linear map from \(\mathbb{R}^n\) to \(\mathbb{S}^d\)\[
\mathcal{A}: \mathbf{x} \mapsto x_1 \mathbf{F}_1 + \cdots + x_n \mathbf{F}_n.
\] A linear matrix inequality (LMI) refers to \(\mathcal{A}\mathbf{x} + \mathbf{G} \preceq \mathbf{0}\) or \(\mathcal{A}\mathbf{x} + \mathbf{G} \succeq \mathbf{0}\). Here, \(\mathbf{G}, \mathbf{F}_1, \ldots, \mathbf{F}_n \in \mathbb{S}^d\).
A semidefinite program (SDP) has the form \[\begin{eqnarray*}
&\text{minimize}& \mathbf{c}^T \mathbf{x} \\
&\text{subject to}& \mathcal{A}\mathbf{x} + \mathbf{G} \preceq \mathbf{0}\\
& & \mathbf{A} \mathbf{x} = \mathbf{b},
\end{eqnarray*}\] where \(\mathbf{A} \in \mathbb{R}^{p \times n}\), and \(\mathbf{b} \in \mathbb{R}^p\). The optimization variable is \(\mathbf{x}\).
When \(\mathbf{G}, \mathbf{F}_1, \ldots, \mathbf{F}_n\) are all diagonal, SDP reduces to LP. (Why?)
The standard form SDP has form \[\begin{eqnarray*}
&\text{minimize}& \text{tr}(\mathbf{C} \mathbf{X}) \\
&\text{subject to}& \text{tr}(\mathbf{A}_i \mathbf{X}) = b_i, \quad i = 1, \ldots, p \\
& & \mathbf{X} \succeq \mathbf{0},
\end{eqnarray*}\] where \(\mathbf{C}, \mathbf{A}_1, \ldots, \mathbf{A}_p \in \mathbb{S}^d\). Note both the objective and the equality constraint are linear in the optimization variable \(\mathbf{X}\in\mathbb{S}^d\): \(\text{tr}(\mathbf{A}\mathbf{X})=\langle \mathbf{A}, \mathbf{X} \rangle\). (Why?)
An inequality form SDP has form \[\begin{eqnarray*}
&\text{minimize}& \mathbf{c}^T \mathbf{x} \\
&\text{subject to}& x_1 \mathbf{A}_1 + \cdots + x_n \mathbf{A}_n \preceq \mathbf{B},
\end{eqnarray*}\] with variable \(\mathbf{x} \in \mathbb{R}^n\), and parameters \(\mathbf{B}, \mathbf{A}_1, \ldots, \mathbf{A}_n \in \mathbb{S}^d\), \(\mathbf{c} \in \mathbb{R}^n\).
Semidefinite Representability
A convex set \(S \in \mathbb{R}^n\) is said to be semidefinite representable (SDr) if \(S\) is the projection of a set in a higher-dimensional space that can be described by a set of LMIs. Specifically, \[
\mathbf{x}\in S \iff \exists\mathbf{u}\in \mathbb{R}^m: \mathcal{A}\begin{pmatrix}\mathbf{x}\\\mathbf{u}\end{pmatrix} - \mathbf{B} \succeq \mathbf{0}
\] where \(\mathcal{A}: \mathbb{R}^n\times \mathbb{R}^m\) is a linear map.
A convex function \(f: \mathbb{R}^n \to \mathbb{R}\) is said to be SDr if and only if \(\text{epi}f\) is SDr.
Any SOCr set is SDr, because \[
\|\mathbf{x}\|_2 \le t
\iff
\mathbf{x}^T\mathbf{x} \le t^2, ~ t \ge 0
\iff
\begin{bmatrix} t\mathbf{I} & \mathbf{x} \\ \mathbf{x}^T & t \end{bmatrix} \succeq \mathbf{0}
\] the last expression is an LMI; this is due to the Schur complement.
Consequence: any SOCP is SDP.
Exercise. Write LP, QP, QCQP, and SOCP in form of SDP.
Example 1: nearest correlation matrix
Let \(\mathbb{C}^p\) be the convex set of \(p \times p\) correlation matrices \[
\mathbb{C}^p = \{ \mathbf{X} \in \mathbb{S}_+^p: x_{ii}=1, i=1,\ldots,p\}.
\] Given \(\mathbf{A} \in \mathbb{S}^p\), often we need to find the closest correlation matrix to \(\mathbf{A}\)\[
\begin{array}{ll}
\text{minimize}& \|\mathbf{A} - \mathbf{X}\|_{\text{F}} \\
\text{subject to}& \mathbf{X} \in \mathbb{C}^p.
\end{array}
\]
This projection problem can be solved via an SDP \[
\begin{array}{ll}
\text{minimize}& t \\
\text{subject to}& \|\mathbf{A} - \mathbf{X}\|_{\text{F}} \le t \\
& \text{diag}(\mathbf{X}) = \mathbf{1} \\
& \mathbf{X} \succeq \mathbf{0}
\end{array}
\] in variables \(\mathbf{X} \in \mathbb{S}^{p}\) and \(t \in \mathbb{R}\).
The SOC constraint \(\|\mathbf{A} - \mathbf{X}\|_{\text{F}} \le t\) can be written as an LMI \[
\begin{bmatrix}
t \mathbf{I} & \text{vec} (\mathbf{A} - \mathbf{X}) \\
\text{vec} (\mathbf{A} - \mathbf{X})^T & t
\end{bmatrix} \succeq \mathbf{0}
\] (why?).
Suppose \[
\mathbf{A}(\mathbf{x}) = \mathbf{A}_0 + x_1 \mathbf{A}_1 + \cdots x_n \mathbf{A}_n,
\] where \(\mathbf{A}_i \in \mathbb{S}^m\), \(i = 0, \ldots, n\). Let \(\lambda_1(\mathbf{x}) \ge \lambda_2(\mathbf{x}) \ge \cdots \ge \lambda_m(\mathbf{x})\) be the ordered eigenvalues of \(\mathbf{A}(\mathbf{x})\).
Minimizing the largest eigenvalue is equivalent to the SDP \[\begin{eqnarray*}
&\text{minimize}& t \\
&\text{subject to}& \mathbf{A}(\mathbf{x}) \preceq t \mathbf{I}
\end{eqnarray*}\] in variables \(\mathbf{x} \in \mathbb{R}^n\) and \(t \in \mathbb{R}\).
Minimizing the sum of \(k\) largest eigenvalues is an SDP too. How about minimizing the sum of all eigenvalues?
Maximizing the minimum eigenvalue is an SDP as well.
Minimizing the spread of the eigenvalues \(\lambda_1(\mathbf{x}) - \lambda_n(\mathbf{x})\) is equivalent to the SDP \[\begin{eqnarray*}
&\text{minimize}& t_1 - t_n \\
&\text{subject to}& t_n \mathbf{I} \preceq \mathbf{A}(\mathbf{x}) \preceq t_1 \mathbf{I}
\end{eqnarray*}\] in variables \(\mathbf{x} \in \mathbb{R}^n\) and \(t_1, t_n \in \mathbb{R}\).
Minimizing the spectral radius (or spectral norm) \(\rho(\mathbf{x}) = \max_{i=1,\ldots,n} |\lambda_i(\mathbf{x})|\) is equivalent to the SDP \[\begin{eqnarray*}
&\text{minimize}& t \\
&\text{subject to}& - t \mathbf{I} \preceq \mathbf{A}(\mathbf{x}) \preceq t \mathbf{I}
\end{eqnarray*}\] in variables \(\mathbf{x} \in \mathbb{R}^n\) and \(t \in \mathbb{R}\).
To minimize the condition number \(\kappa(\mathbf{x}) = \lambda_1(\mathbf{x}) / \lambda_n(\mathbf{x})\), note \(\lambda_1(\mathbf{x}) / \lambda_m(\mathbf{x}) \le t\) if and only if there exists a \(\mu > 0\) such that \(\mu \mathbf{I} \preceq \mathbf{A}(\mathbf{x}) \preceq \mu t \mathbf{I}\), or equivalently, \(\mathbf{I} \preceq \mu^{-1} \mathbf{A}(\mathbf{x}) \preceq t \mathbf{I}\). With change of variables \(y_i = x_i / \mu\) and \(s = 1/\mu\), we can solve the SDP \[\begin{eqnarray*}
&\text{minimize}& t \\
&\text{subject to}& \mathbf{I} \preceq s \mathbf{A}_0 + y_1 \mathbf{A}_1 + \cdots y_n \mathbf{A}_n \preceq t \mathbf{I} \\
& & s \ge 0,
\end{eqnarray*}\] in variables \(\mathbf{y} \in \mathbb{R}^n\) and \(s, t \ge 0\). In other words, we normalize the spectrum by the smallest eigenvalue and then minimize the largest eigenvalue of the normalized LMI.
Minimizing the \(\ell_1\) norm of the eigenvalues \(|\lambda_1(\mathbf{x})| + \cdots + |\lambda_m(\mathbf{x})|\) is equivalent to the SDP \[\begin{eqnarray*}
&\text{minimize}& \text{tr} (\mathbf{A}^+) + \text{tr}(\mathbf{A}^-) \\
&\text{subject to}& \mathbf{A}(\mathbf{x}) = \mathbf{A}^+ - \mathbf{A}^- \\
& & \mathbf{A}^+ \succeq \mathbf{0}, \mathbf{A}^- \succeq \mathbf{0},
\end{eqnarray*}\] in variables \(\mathbf{x} \in \mathbb{R}^n\) and \(\mathbf{A}^+, \mathbf{A}^- \in \mathbb{S}_+^n\).
Roots of determinant. The determinant of a semidefinite matrix \(\text{det} (\mathbf{A}(\mathbf{x})) = \prod_{i=1}^m \lambda_i (\mathbf{x})\) is neither convex or concave, but rational powers of the determinant can be modeled using linear matrix inequalities. For a rational power \(0 \le q \le 1/m\), the function \(\text{det} (\mathbf{A}(\mathbf{x}))^q\) is concave and we have \[\begin{eqnarray*}
& & t \le \text{det} (\mathbf{A}(\mathbf{x}))^q\\
&\Leftrightarrow& \begin{bmatrix}
\mathbf{A}(\mathbf{x}) & \mathbf{L} \\
\mathbf{L}^T & \text{diag}(\mathbf{L})
\end{bmatrix} \succeq \mathbf{0}, \quad (\ell_{11} \ell_{22} \cdots \ell_{mm})^q \ge t,
\end{eqnarray*}\] where \(\mathbf{L} \in \mathbb{R}^{m \times m}\) is a lower-triangular matrix. The first inequality is an LMI, and the second is SOCr.
Similarly for any rational \(q>0\), we have \[\begin{eqnarray*}
& & t \ge \text{det} (\mathbf{A}(\mathbf{x}))^{-q} \\
&\Leftrightarrow& \begin{bmatrix}
\mathbf{A}(\mathbf{x}) & \mathbf{L} \\
\mathbf{L}^T & \text{diag}(\mathbf{L})
\end{bmatrix} \succeq \mathbf{0}, \quad (\ell_{11} \ell_{22} \cdots \ell_{mm})^{-q} \le t
\end{eqnarray*}\] for a lower triangular \(\mathbf{L}\).
lambda_max, lambda_min are implemented in Convex.jl package for Julia.
Statistical applications of eigenvalue problems
Covariance matrix estimation
Suppose \(\mathbf{x}_1, \dotsc, \mathbf{x}_n \in \mathbb{R}^p \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma})\) and we want to estimate the covariance matrix \(\Sigma\) by maximium likelihood. If \(n \ge p\), then the sample covariance matrix \(\mathbf{S}=\frac{1}{n}\sum_{i=1}^n\mathbf{x}_i\mathbf{x}_i^T\) is the MLE. However, if \(n < p\) (high-dimensional setting), the MLE is not defined (why?)
A possiblity is to regularize the MLE, e.g., by restricting the condition number of \(\boldsymbol\Sigma\) to be less than \(\kappa\). This approach leads to an SDP \[
\begin{array}{ll}
\text{maximize} & \log\text{det}\boldsymbol{\Sigma}^{-1} - \text{tr}(\boldsymbol{\Sigma}^{-1}\mathbf{S}) \\
\text{subject to} & \mu\mathbf{I} \preceq \boldsymbol{\Sigma}^{-1} \preceq \kappa\mu\mathbf{I}
\end{array}
\] where the optimization variables are \(\boldsymbol{\Sigma}^{-1}\in\mathbb{S}^p\) and \(\mu\in\mathbb{R}_+\). See
Won, J.H., Lim, J., Kim, S.J. and Rajaratnam, B., 2013. Condition‐number‐regularized covariance estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(3), pp.427-450.
Optimal experiment design
Consider a linear model \[
y_i = \mathbf{x}_i^T\boldsymbol{\beta} + \varepsilon_i, \quad i=1,\dotsc,n,
\] where \(\varepsilon_i\) are independent Gaussian noises with common variance \(\sigma^2\). It is well known that the least squares estimate \(\hat{\boldsymbol{\beta}}\) is unbiased and has covariance \(\sigma^2(\sum_{i=1}^n\mathbf{x}_i\mathbf{x}_i ^T)^{−1}\).
In experimental design, given total number of allowable experiments, we want to choose among a list of \(m\) candidate design points \(\{\mathbf{x}_1,\dotsc,\mathbf{x}_m\}\) such that the covariance matrix is minimized in some sense. This exact design problem is a combinatorial optimization problem, which is hard to solve.
Instead, weight \(w_i\) is put to each design point \(\mathbf{x}_i\), and we want to find a probability vector \(\mathbf{w}\in\Delta_{m-1}\), and the matrix \(V = (\sum_{i=1}^m w_i\mathbf{x}_i\mathbf{x}_i^T)^{−1}\) is “small”.
Let \(\mathbf{A}(\mathbf{x}) = \mathbf{A}_0 + x_1 \mathbf{A}_1 + \cdots x_n \mathbf{A}_n\), where \(\mathbf{A}_i \in \mathbb{R}^{p \times q}\) and \(\sigma_1(\mathbf{x}) \ge \cdots \sigma_{\min\{p,q\}}(\mathbf{x}) \ge 0\) be the ordered singular values.
Spectral norm (or operator norm or matrix-2 norm) minimization. Consider minimizing the spectral norm \(\|\mathbf{A}(\mathbf{x})\|_2 = \sigma_1(\mathbf{x})\). Note \(\|\mathbf{A}\|_2 \le t\) if and only if \(\mathbf{A}^T \mathbf{A} \preceq t^2 \mathbf{I}\) (and \(t \ge 0\)) if and only if \(\begin{bmatrix} t\mathbf{I} & \mathbf{A} \\ \mathbf{A}^T & t \mathbf{I} \end{bmatrix} \succeq \mathbf{0}\). This results in the SDP \[\begin{eqnarray*}
&\text{minimize}& t \\
&\text{subject to}& \begin{bmatrix} t\mathbf{I} & \mathbf{A}(\mathbf{x}) \\ \mathbf{A}(\mathbf{x})^T & t \mathbf{I} \end{bmatrix} \succeq \mathbf{0}
\end{eqnarray*}\] in variables \(\mathbf{x} \in \mathbb{R}^n\) and \(t \in \mathbb{R}\).
Nuclear norm minimization. Minimization of the nuclear norm (or trace norm) \(\|\mathbf{A}(\mathbf{x})\|_* = \sum_i \sigma_i(\mathbf{x})\) can be formulated as an SDP.
Singular values of \(\mathbf{A}\) coincides with the eigenvalues of the symmetric matrix \[
\bar{\mathbb{A}} =
\begin{bmatrix}
\mathbf{0} & \mathbf{A} \\
\mathbf{A}^T & \mathbf{0}
\end{bmatrix},
\] which has eigenvalues \((\sigma_1, \ldots, \sigma_p, - \sigma_p, \ldots, - \sigma_1)\). Therefore minimizing the nuclear norm of \(\mathbf{A}\) is same as minimizing the \(\ell_1\) norm of eigenvalues of \(\bar{\mathbf{A}}\), which we know is an SDP.
Minimizing the sum of \(k\) largest singular values is an SDP as well.
operator_norm and nuclear_norm are implemented in Convex.jl.
This approach yields \(\mathbf{S} + \kappa\mathbf{I}\) as a solution. See
Warton, D.I., 2008. Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association, 103(481), pp.340-349. https://doi.org/10.1198/016214508000000021
Example 4: cone of nonnegative polynomials
Consider nonnegative polynomial of degree \(2n\)\[
f(t) = \mathbf{x}^T \mathbf{v}(t) = x_0 + x_1 t + \cdots x_{2n} t^{2n} \ge 0, \text{ for all } t.
\] The cone \[
\mathbf{K}^n = \{\mathbf{x} \in \mathbb{R}^{2n+1}: f(t) = \mathbf{x}^T \mathbf{v}(t) \ge 0, \text{ for all } t \in \mathbb{R}\}
\] can be characterized by LMI \[
f(t) \ge 0 \text{ for all } t \Leftrightarrow x_i = \langle \mathbf{X}, \mathbf{H}_i \rangle, i=0,\ldots,2n, \mathbf{X} \in \mathbb{S}_+^{n+1},
\] where \(\mathbf{H}_i \in \mathbb{R}^{(n+1) \times (n+1)}\) are Hankel matrices with entries \((\mathbf{H}_i)_{kl} = 1\) if \(k+l=i\) or 0 otherwise. Here \(k, l \in \{0, 1, \ldots, n\}\).
Similarly the cone of nonnegative polynomials on a finite interval \[
\begin{eqnarray*}
\mathbf{K}_{a,b}^n = \{\mathbf{x} \in \mathbb{R}^{n+1}: f(t) = \mathbf{x}^T \mathbf{v}(t) \ge 0, \text{ for all } t \in [a,b]\}
\end{eqnarray*}
\] can be characterized by LMI as well.
Example. Polynomial curve fitting. We want to fit a univariate polynomial of degree \(n\)\[
\begin{eqnarray*}
f(t) = x_0 + x_1 t + x_2 t^2 + \cdots x_n t^n
\end{eqnarray*}
\] to a set of measurements \((t_i, y_i)\), \(i=1,\ldots,m\), such that \(f(t_i) \approx y_i\). Define the Vandermonde matrix \[
\begin{eqnarray*}
\mathbf{A} = \begin{pmatrix}
1 & t_1 & t_1^2 & \cdots & t_1^n \\
1 & t_2 & t_2^2 & \cdots & t_2^n \\
\vdots & \vdots & \vdots & & \vdots \\
1 & t_m & t_m^2 & \cdots & t_m^n
\end{pmatrix},
\end{eqnarray*}
\]
Then we wish \(\mathbf{A} \mathbf{x} \approx \mathbf{y}\). Using least squares criterion, we obtain the optimal solution \(\mathbf{x}_{\text{LS}} = (\mathbf{A}^T \mathbf{A})^{-1} \mathbf{A}^T \mathbf{y}\). With various constraints, it is possible to find optimal \(\mathbf{x}\) by SDP.
Nonnegativity. Then we require \(\mathbf{x} \in \mathbf{K}_{a,b}^n\).
Monotonicity. We can ensure monotonicity of \(f(t)\) by requiring that \(f'(t) \ge 0\) or \(f'(t) \le 0\). That is \((x_1,2x_2, \ldots, nx_n) \in \mathbf{K}_{a,b}^{n-1}\) or \(-(x_1,2x_2, \ldots, nx_n) \in \mathbf{K}_{a,b}^{n-1}\).
Convexity or concavity. Convexity or concavity of \(f(t)\) corresponds to \(f''(t) \ge 0\) or \(f''(t) \le 0\). That is \((2x_2, 2x_3, \ldots, (n-1)nx_n) \in \mathbf{K}_{a,b}^{n-2}\) or \(-(2x_2, 2x_3, \ldots, (n-1)nx_n) \in \mathbf{K}_{a,b}^{n-2}\).
nonneg_poly_coeffs() and convex_poly_coeffs() are implemented in cvx. Not in Convex.jl yet.
Statistical Applications
Kim, S.-J., Lim, J. , Won, J. H. Nonparametric Sharpe Ratio Function Estimation in Heteroscedastic Regression Models via Convex Optimization. PMLR 84:1495-1504, 2018. http://proceedings.mlr.press/v84/kim18b.html
Papp, D. and F. Alizadeh (2014). Shape-constrained estimation using nonnegative splines. Journal of Computational and Graphical Statistics 23(1), 211– 231. https://doi.org/10.1080/10618600.2012.707343
SDP relaxation of binary LP
Consider a binary linear programming problem \[
\begin{eqnarray*}
&\text{minimize}& \mathbf{c}^T \mathbf{x} \\
&\text{subject to}& \mathbf{A} \mathbf{x} = \mathbf{b}, \quad \mathbf{x} \in \{0,1\}^n.
\end{eqnarray*}
\] Note \[
\begin{eqnarray*}
& & x_i \in \{0,1\},~\forall i
\Leftrightarrow x_i^2 = x_i ,~\forall i
\Leftrightarrow \mathbf{X} = \mathbf{x} \mathbf{x}^T, \text{diag}(\mathbf{X}) = \mathbf{x}.
\end{eqnarray*}
\] By relaxing the rank 1 constraint on \(\mathbf{X}\), we obtain an SDP relaxation \[
\begin{eqnarray*}
&\text{minimize}& \mathbf{c}^T \mathbf{x} \\
&\text{subject to}& \mathbf{A} \mathbf{x} = \mathbf{b}, \text{diag}(\mathbf{X}) = \mathbf{x}, \mathbf{X} \succeq \mathbf{x} \mathbf{x}^T.
\end{eqnarray*}
\]
Note \(\mathbf{X} \succeq \mathbf{x} \mathbf{x}^T\) is equivalent to the LMI \[
\begin{eqnarray*}
\begin{bmatrix}
1 & \mathbf{x}^T \\ \mathbf{x} &\mathbf{X}
\end{bmatrix} \succeq \mathbf{0}.
\end{eqnarray*}
\]
This SDP can be efficiently solved and provides a lower bound to the original problem (why?). If the optimal \(\mathbf{X}\) has rank 1, then it is a solution to the original binary LP.
We can tighten the relaxation by adding other constraints that cut away part of the feasible set, without excluding rank 1 solutions. For instance, \(0 \le x_i \le 1\) and \(0 \le X_{ij} \le 1\).