Semidefinite Programming

Advanced Statistical Computing

Joong-Ho Won

Seoul National University

November 2023

versioninfo()

Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 2 on 8 virtual cores

using Pkg
Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()

Status `~/Dropbox/class/M1399.000200/2023/M1300_000200-2023fall/lectures/19-sdp/Project.toml`
  [1e616198] COSMO v0.8.8
  [f65535da] Convex v0.15.4
  [a93c6f00] DataFrames v1.6.1
  [2e9cd046] Gurobi v1.2.0
  [b99e6be6] Hypatia v0.7.3
  [b6b21f68] Ipopt v1.5.1
  [b8f27783] MathOptInterface v1.22.0
  [6405355b] Mosek v10.1.3
  [1ec41992] MosekTools v0.15.1
  [91a5bcdd] Plots v1.39.0
  [c946c3f1] SCS v2.0.0

SDP

Consider a linear map from \(\mathbb{R}^n\) to \(\mathbb{S}^d\) \[ \mathcal{A}: \mathbf{x} \mapsto x_1 \mathbf{F}_1 + \cdots + x_n \mathbf{F}_n. \] A linear matrix inequality (LMI) refers to \(\mathcal{A}\mathbf{x} + \mathbf{G} \preceq \mathbf{0}\) or \(\mathcal{A}\mathbf{x} + \mathbf{G} \succeq \mathbf{0}\). Here, \(\mathbf{G}, \mathbf{F}_1, \ldots, \mathbf{F}_n \in \mathbb{S}^d\).
A semidefinite program (SDP) has the form \[\begin{eqnarray*} &\text{minimize}& \mathbf{c}^T \mathbf{x} \\ &\text{subject to}& \mathcal{A}\mathbf{x} + \mathbf{G} \preceq \mathbf{0}\\ & & \mathbf{A} \mathbf{x} = \mathbf{b}, \end{eqnarray*}\] where \(\mathbf{A} \in \mathbb{R}^{p \times n}\), and \(\mathbf{b} \in \mathbb{R}^p\). The optimization variable is \(\mathbf{x}\).

When \(\mathbf{G}, \mathbf{F}_1, \ldots, \mathbf{F}_n\) are all diagonal, SDP reduces to LP. (Why?)

The standard form SDP has form \[\begin{eqnarray*} &\text{minimize}& \text{tr}(\mathbf{C} \mathbf{X}) \\ &\text{subject to}& \text{tr}(\mathbf{A}_i \mathbf{X}) = b_i, \quad i = 1, \ldots, p \\ & & \mathbf{X} \succeq \mathbf{0}, \end{eqnarray*}\] where \(\mathbf{C}, \mathbf{A}_1, \ldots, \mathbf{A}_p \in \mathbb{S}^d\). Note both the objective and the equality constraint are linear in the optimization variable \(\mathbf{X}\in\mathbb{S}^d\): \(\text{tr}(\mathbf{A}\mathbf{X})=\langle \mathbf{A}, \mathbf{X} \rangle\). (Why?)
An inequality form SDP has form \[\begin{eqnarray*} &\text{minimize}& \mathbf{c}^T \mathbf{x} \\ &\text{subject to}& x_1 \mathbf{A}_1 + \cdots + x_n \mathbf{A}_n \preceq \mathbf{B}, \end{eqnarray*}\] with variable \(\mathbf{x} \in \mathbb{R}^n\), and parameters \(\mathbf{B}, \mathbf{A}_1, \ldots, \mathbf{A}_n \in \mathbb{S}^d\), \(\mathbf{c} \in \mathbb{R}^n\).

Semidefinite Representability

A convex set \(S \in \mathbb{R}^n\) is said to be semidefinite representable (SDr) if \(S\) is the projection of a set in a higher-dimensional space that can be described by a set of LMIs. Specifically, \[ \mathbf{x}\in S \iff \exists\mathbf{u}\in \mathbb{R}^m: \mathcal{A}\begin{pmatrix}\mathbf{x}\\\mathbf{u}\end{pmatrix} - \mathbf{B} \succeq \mathbf{0} \] where \(\mathcal{A}: \mathbb{R}^n\times \mathbb{R}^m\) is a linear map.
A convex function \(f: \mathbb{R}^n \to \mathbb{R}\) is said to be SDr if and only if \(\text{epi}f\) is SDr.

Any SOCr set is SDr, because \[ \|\mathbf{x}\|_2 \le t \iff \mathbf{x}^T\mathbf{x} \le t^2, ~ t \ge 0 \iff \begin{bmatrix} t\mathbf{I} & \mathbf{x} \\ \mathbf{x}^T & t \end{bmatrix} \succeq \mathbf{0} \] the last expression is an LMI; this is due to the Schur complement.
- Consequence: any SOCP is SDP.
Exercise. Write LP, QP, QCQP, and SOCP in form of SDP.

Example 1: nearest correlation matrix

Let \(\mathbb{C}^p\) be the convex set of \(p \times p\) correlation matrices \[ \mathbb{C}^p = \{ \mathbf{X} \in \mathbb{S}_+^p: x_{ii}=1, i=1,\ldots,p\}. \] Given \(\mathbf{A} \in \mathbb{S}^p\), often we need to find the closest correlation matrix to \(\mathbf{A}\) \[ \begin{array}{ll} \text{minimize}& \|\mathbf{A} - \mathbf{X}\|_{\text{F}} \\ \text{subject to}& \mathbf{X} \in \mathbb{C}^p. \end{array} \]

This projection problem can be solved via an SDP \[ \begin{array}{ll} \text{minimize}& t \\ \text{subject to}& \|\mathbf{A} - \mathbf{X}\|_{\text{F}} \le t \\ & \text{diag}(\mathbf{X}) = \mathbf{1} \\ & \mathbf{X} \succeq \mathbf{0} \end{array} \] in variables \(\mathbf{X} \in \mathbb{S}^{p}\) and \(t \in \mathbb{R}\).

The SOC constraint \(\|\mathbf{A} - \mathbf{X}\|_{\text{F}} \le t\) can be written as an LMI \[ \begin{bmatrix} t \mathbf{I} & \text{vec} (\mathbf{A} - \mathbf{X}) \\ \text{vec} (\mathbf{A} - \mathbf{X})^T & t \end{bmatrix} \succeq \mathbf{0} \] (why?).

using Random, LinearAlgebra, Convex

Random.seed!(123) # seed
n = 5
A = randn(n,n)
ludecomp = lu(A)
P = ludecomp.L * transpose(ludecomp.L)
Drt = Diagonal(1 ./sqrt.(diag(P)))
C = Drt * P * Drt  # correlation matrix

5×5 Matrix{Float64}:
  1.0        0.701548    -0.241841     0.339226   -0.491819
  0.701548   1.0         -0.00847432   0.0815131  -0.306746
 -0.241841  -0.00847432   1.0         -0.187443   -0.335222
  0.339226   0.0815131   -0.187443     1.0         0.0438733
 -0.491819  -0.306746    -0.335222     0.0438733   1.0

eigvals(C)  # positive definite

5-element Vector{Float64}:
 0.130718940453724
 0.5558237025323486
 0.8245896298137761
 1.4100343266239637
 2.0788334005761886

E  = randn(n, n) * 5e-1  
A = C + Symmetric(E)   # perturbed correlation matrix

5×5 Matrix{Float64}:
  1.2565     0.328028    0.451559   -0.584448  -0.353937
  0.328028   0.854993   -0.0292125  -0.21921   -0.0975128
  0.451559  -0.0292125   0.518852   -0.664467   0.179528
 -0.584448  -0.21921    -0.664467    0.842271   0.300607
 -0.353937  -0.0975128   0.179528    0.300607   0.758908

eigvals(A) # indefinite

5-element Vector{Float64}:
 -0.15626393041586717
  0.47654001329122847
  0.7290094141661763
  0.9508610234009199
  2.2313768576133635

using Convex, MathOptInterface
const MOI = MathOptInterface

# Use SCS solver
using SCS
solver = SCS.Optimizer()  
MOI.set(solver, MOI.RawOptimizerAttribute("verbose"), 0)

# Use COSMO solver
#using COSMO
# solver = COSMO.Optimizer()
# MOI.set(solver, MOI.RawOptimizerAttribute("max_iter"), 10000)
# MOI.set(solver, MOI.RawOptimizerAttribute("verbose"), false)

X = Semidefinite(n)
problem = minimize(norm(vec(A - X)))  # Frobenius norm
problem.constraints += diag(X) == ones(n)
# Solve the problem by calling solve
@time solve!(problem, solver)

 20.606486 seconds (24.06 M allocations: 2.155 GiB, 3.50% gc time, 95.72% compilation time)

X.value

5×5 Matrix{Float64}:
  1.0        0.328028    0.451559   -0.584448  -0.353937
  0.328028   1.0        -0.0292125  -0.21921   -0.0975128
  0.451559  -0.0292125   1.0        -0.664467   0.179528
 -0.584448  -0.21921    -0.664467    1.0        0.300607
 -0.353937  -0.0975128   0.179528    0.300607   1.0

norm(vec(A - X.value))

0.6335027201033284

This matrix is positive semidefinite up to numerical precision of the solver.

eigvals(X.value)

5-element Vector{Float64}:
 0.1575766394919282
 0.4004986229855604
 0.9077188979101973
 1.2439961037462537
 2.2902097361445772

Compare the accuracy of MOSEK (interior point solver) and SCS/COSMO (first-order solver):

using Convex, MathOptInterface
const MOI = MathOptInterface

## Use Mosek solver
using Mosek, MosekTools
solver = Mosek.Optimizer()  
MOI.set(solver, MOI.RawOptimizerAttribute("LOG"), 0) 

X2 = Semidefinite(n)
problem = minimize(norm(vec(A - X2)))  # Frobenius norm
problem.constraints += diag(X2) == ones(n)
# Solve the problem by calling solve
@time solve!(problem, solver)

  8.064625 seconds (7.50 M allocations: 503.126 MiB, 3.99% gc time, 98.81% compilation time: 2% of which was recompilation)

norm(vec(A - X2.value))

0.6335027206974722

eigvals(X2.value)

5-element Vector{Float64}:
 0.1575674727666177
 0.4004830812605716
 0.907714378195316
 1.2440014659260892
 2.290233601851406

For the history of the nearest matrix problem and algorithms, check out Nick Higham’s blog
A variant of the nearest matrix problem has an intimate connection with joint estimation of Gaussian mean vector and covariance matrix:
- Joong-Ho Won, Proximity Operator of the Matrix Perspective Function and its Applications, NeurIPS 2020.

Example 2: eigenvalue problems

Suppose \[ \mathbf{A}(\mathbf{x}) = \mathbf{A}_0 + x_1 \mathbf{A}_1 + \cdots x_n \mathbf{A}_n, \] where \(\mathbf{A}_i \in \mathbb{S}^m\), \(i = 0, \ldots, n\). Let \(\lambda_1(\mathbf{x}) \ge \lambda_2(\mathbf{x}) \ge \cdots \ge \lambda_m(\mathbf{x})\) be the ordered eigenvalues of \(\mathbf{A}(\mathbf{x})\).
Minimizing the largest eigenvalue is equivalent to the SDP \[\begin{eqnarray*} &\text{minimize}& t \\ &\text{subject to}& \mathbf{A}(\mathbf{x}) \preceq t \mathbf{I} \end{eqnarray*}\] in variables \(\mathbf{x} \in \mathbb{R}^n\) and \(t \in \mathbb{R}\).
- Minimizing the sum of \(k\) largest eigenvalues is an SDP too. How about minimizing the sum of all eigenvalues?
- Maximizing the minimum eigenvalue is an SDP as well.

Minimizing the spread of the eigenvalues \(\lambda_1(\mathbf{x}) - \lambda_n(\mathbf{x})\) is equivalent to the SDP \[\begin{eqnarray*} &\text{minimize}& t_1 - t_n \\ &\text{subject to}& t_n \mathbf{I} \preceq \mathbf{A}(\mathbf{x}) \preceq t_1 \mathbf{I} \end{eqnarray*}\] in variables \(\mathbf{x} \in \mathbb{R}^n\) and \(t_1, t_n \in \mathbb{R}\).
Minimizing the spectral radius (or spectral norm) \(\rho(\mathbf{x}) = \max_{i=1,\ldots,n} |\lambda_i(\mathbf{x})|\) is equivalent to the SDP \[\begin{eqnarray*} &\text{minimize}& t \\ &\text{subject to}& - t \mathbf{I} \preceq \mathbf{A}(\mathbf{x}) \preceq t \mathbf{I} \end{eqnarray*}\] in variables \(\mathbf{x} \in \mathbb{R}^n\) and \(t \in \mathbb{R}\).

To minimize the condition number \(\kappa(\mathbf{x}) = \lambda_1(\mathbf{x}) / \lambda_n(\mathbf{x})\), note \(\lambda_1(\mathbf{x}) / \lambda_m(\mathbf{x}) \le t\) if and only if there exists a \(\mu > 0\) such that \(\mu \mathbf{I} \preceq \mathbf{A}(\mathbf{x}) \preceq \mu t \mathbf{I}\), or equivalently, \(\mathbf{I} \preceq \mu^{-1} \mathbf{A}(\mathbf{x}) \preceq t \mathbf{I}\). With change of variables \(y_i = x_i / \mu\) and \(s = 1/\mu\), we can solve the SDP \[\begin{eqnarray*} &\text{minimize}& t \\ &\text{subject to}& \mathbf{I} \preceq s \mathbf{A}_0 + y_1 \mathbf{A}_1 + \cdots y_n \mathbf{A}_n \preceq t \mathbf{I} \\ & & s \ge 0, \end{eqnarray*}\] in variables \(\mathbf{y} \in \mathbb{R}^n\) and \(s, t \ge 0\). In other words, we normalize the spectrum by the smallest eigenvalue and then minimize the largest eigenvalue of the normalized LMI.

Minimizing the \(\ell_1\) norm of the eigenvalues \(|\lambda_1(\mathbf{x})| + \cdots + |\lambda_m(\mathbf{x})|\) is equivalent to the SDP \[\begin{eqnarray*} &\text{minimize}& \text{tr} (\mathbf{A}^+) + \text{tr}(\mathbf{A}^-) \\ &\text{subject to}& \mathbf{A}(\mathbf{x}) = \mathbf{A}^+ - \mathbf{A}^- \\ & & \mathbf{A}^+ \succeq \mathbf{0}, \mathbf{A}^- \succeq \mathbf{0}, \end{eqnarray*}\] in variables \(\mathbf{x} \in \mathbb{R}^n\) and \(\mathbf{A}^+, \mathbf{A}^- \in \mathbb{S}_+^n\).

Roots of determinant. The determinant of a semidefinite matrix \(\text{det} (\mathbf{A}(\mathbf{x})) = \prod_{i=1}^m \lambda_i (\mathbf{x})\) is neither convex or concave, but rational powers of the determinant can be modeled using linear matrix inequalities. For a rational power \(0 \le q \le 1/m\), the function \(\text{det} (\mathbf{A}(\mathbf{x}))^q\) is concave and we have \[\begin{eqnarray*} & & t \le \text{det} (\mathbf{A}(\mathbf{x}))^q\\ &\Leftrightarrow& \begin{bmatrix} \mathbf{A}(\mathbf{x}) & \mathbf{L} \\ \mathbf{L}^T & \text{diag}(\mathbf{L}) \end{bmatrix} \succeq \mathbf{0}, \quad (\ell_{11} \ell_{22} \cdots \ell_{mm})^q \ge t, \end{eqnarray*}\] where \(\mathbf{L} \in \mathbb{R}^{m \times m}\) is a lower-triangular matrix. The first inequality is an LMI, and the second is SOCr.

Similarly for any rational \(q>0\), we have \[\begin{eqnarray*} & & t \ge \text{det} (\mathbf{A}(\mathbf{x}))^{-q} \\ &\Leftrightarrow& \begin{bmatrix} \mathbf{A}(\mathbf{x}) & \mathbf{L} \\ \mathbf{L}^T & \text{diag}(\mathbf{L}) \end{bmatrix} \succeq \mathbf{0}, \quad (\ell_{11} \ell_{22} \cdots \ell_{mm})^{-q} \le t \end{eqnarray*}\] for a lower triangular \(\mathbf{L}\).

References: See Lecture 4 (p146-p151) in the book Ben-Tal and Nemirovski (2001) for the proof of above facts.
lambda_max, lambda_min are implemented in Convex.jl package for Julia.

Statistical applications of eigenvalue problems

Covariance matrix estimation

Suppose \(\mathbf{x}_1, \dotsc, \mathbf{x}_n \in \mathbb{R}^p \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma})\) and we want to estimate the covariance matrix \(\Sigma\) by maximium likelihood. If \(n \ge p\), then the sample covariance matrix \(\mathbf{S}=\frac{1}{n}\sum_{i=1}^n\mathbf{x}_i\mathbf{x}_i^T\) is the MLE. However, if \(n < p\) (high-dimensional setting), the MLE is not defined (why?)

A possiblity is to regularize the MLE, e.g., by restricting the condition number of \(\boldsymbol\Sigma\) to be less than \(\kappa\). This approach leads to an SDP \[ \begin{array}{ll} \text{maximize} & \log\text{det}\boldsymbol{\Sigma}^{-1} - \text{tr}(\boldsymbol{\Sigma}^{-1}\mathbf{S}) \\ \text{subject to} & \mu\mathbf{I} \preceq \boldsymbol{\Sigma}^{-1} \preceq \kappa\mu\mathbf{I} \end{array} \] where the optimization variables are \(\boldsymbol{\Sigma}^{-1}\in\mathbb{S}^p\) and \(\mu\in\mathbb{R}_+\). See

Won, J.H., Lim, J., Kim, S.J. and Rajaratnam, B., 2013. Condition‐number‐regularized covariance estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(3), pp.427-450.

Optimal experiment design

Consider a linear model \[ y_i = \mathbf{x}_i^T\boldsymbol{\beta} + \varepsilon_i, \quad i=1,\dotsc,n, \] where \(\varepsilon_i\) are independent Gaussian noises with common variance \(\sigma^2\). It is well known that the least squares estimate \(\hat{\boldsymbol{\beta}}\) is unbiased and has covariance \(\sigma^2(\sum_{i=1}^n\mathbf{x}_i\mathbf{x}_i ^T)^{−1}\).

In experimental design, given total number of allowable experiments, we want to choose among a list of \(m\) candidate design points \(\{\mathbf{x}_1,\dotsc,\mathbf{x}_m\}\) such that the covariance matrix is minimized in some sense. This exact design problem is a combinatorial optimization problem, which is hard to solve.
Instead, weight \(w_i\) is put to each design point \(\mathbf{x}_i\), and we want to find a probability vector \(\mathbf{w}\in\Delta_{m-1}\), and the matrix \(V = (\sum_{i=1}^m w_i\mathbf{x}_i\mathbf{x}_i^T)^{−1}\) is “small”.

Some popular criteria are:
- D-optimal design: minimize \(\text{det}V\).
- E-optimal design: minimize \(\|V\|_2 = \lambda_{\max}(V)\).
- A-optimal design: minimize \(\frac{1}{m}\sum_{i=1}^p\lambda_i(V) = \text{trace}{V}\).
All of these problems are can be cast as SDPs; see
- Vandenberghe, L., Boyd, S. and Wu, S.P., 1998. Determinant maximization with linear matrix inequality constraints. SIAM Journal on Matrix Analysis and Applications, 19(2), pp.499-533.
- Papp, D., 2012. Optimal designs for rational function regression. Journal of the American Statistical Association, 107(497), pp.400-411.

Example 3: singular value problems

Let \(\mathbf{A}(\mathbf{x}) = \mathbf{A}_0 + x_1 \mathbf{A}_1 + \cdots x_n \mathbf{A}_n\), where \(\mathbf{A}_i \in \mathbb{R}^{p \times q}\) and \(\sigma_1(\mathbf{x}) \ge \cdots \sigma_{\min\{p,q\}}(\mathbf{x}) \ge 0\) be the ordered singular values.
Spectral norm (or operator norm or matrix-2 norm) minimization. Consider minimizing the spectral norm \(\|\mathbf{A}(\mathbf{x})\|_2 = \sigma_1(\mathbf{x})\). Note \(\|\mathbf{A}\|_2 \le t\) if and only if \(\mathbf{A}^T \mathbf{A} \preceq t^2 \mathbf{I}\) (and \(t \ge 0\)) if and only if \(\begin{bmatrix} t\mathbf{I} & \mathbf{A} \\ \mathbf{A}^T & t \mathbf{I} \end{bmatrix} \succeq \mathbf{0}\). This results in the SDP \[\begin{eqnarray*} &\text{minimize}& t \\ &\text{subject to}& \begin{bmatrix} t\mathbf{I} & \mathbf{A}(\mathbf{x}) \\ \mathbf{A}(\mathbf{x})^T & t \mathbf{I} \end{bmatrix} \succeq \mathbf{0} \end{eqnarray*}\] in variables \(\mathbf{x} \in \mathbb{R}^n\) and \(t \in \mathbb{R}\).

Nuclear norm minimization. Minimization of the nuclear norm (or trace norm) \(\|\mathbf{A}(\mathbf{x})\|_* = \sum_i \sigma_i(\mathbf{x})\) can be formulated as an SDP.

Singular values of \(\mathbf{A}\) coincides with the eigenvalues of the symmetric matrix \[ \bar{\mathbb{A}} = \begin{bmatrix} \mathbf{0} & \mathbf{A} \\ \mathbf{A}^T & \mathbf{0} \end{bmatrix}, \] which has eigenvalues \((\sigma_1, \ldots, \sigma_p, - \sigma_p, \ldots, - \sigma_1)\). Therefore minimizing the nuclear norm of \(\mathbf{A}\) is same as minimizing the \(\ell_1\) norm of eigenvalues of \(\bar{\mathbf{A}}\), which we know is an SDP.
Minimizing the sum of \(k\) largest singular values is an SDP as well.
operator_norm and nuclear_norm are implemented in Convex.jl.

Statistical applications

Matrix completion (we’ve seen this before).
(Approximately) rank-regularized covariance matrix estimation. \[ \begin{array}{ll} \text{maximize} & \log\text{det}\boldsymbol{\Omega} - \text{tr}(\boldsymbol{\Omega}\mathbf{S}) \\ \text{subject to} & \|\boldsymbol{\Omega}\|_* \le k \end{array} \] where \(\boldsymbol{\Omega}=\boldsymbol{\Sigma}^{-1}\).
- This approach yields \(\mathbf{S} + \kappa\mathbf{I}\) as a solution. See
Warton, D.I., 2008. Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association, 103(481), pp.340-349. https://doi.org/10.1198/016214508000000021

Example 4: cone of nonnegative polynomials

Consider nonnegative polynomial of degree \(2n\) \[ f(t) = \mathbf{x}^T \mathbf{v}(t) = x_0 + x_1 t + \cdots x_{2n} t^{2n} \ge 0, \text{ for all } t. \] The cone \[ \mathbf{K}^n = \{\mathbf{x} \in \mathbb{R}^{2n+1}: f(t) = \mathbf{x}^T \mathbf{v}(t) \ge 0, \text{ for all } t \in \mathbb{R}\} \] can be characterized by LMI \[ f(t) \ge 0 \text{ for all } t \Leftrightarrow x_i = \langle \mathbf{X}, \mathbf{H}_i \rangle, i=0,\ldots,2n, \mathbf{X} \in \mathbb{S}_+^{n+1}, \] where \(\mathbf{H}_i \in \mathbb{R}^{(n+1) \times (n+1)}\) are Hankel matrices with entries \((\mathbf{H}_i)_{kl} = 1\) if \(k+l=i\) or 0 otherwise. Here \(k, l \in \{0, 1, \ldots, n\}\).

Similarly the cone of nonnegative polynomials on a finite interval \[ \begin{eqnarray*} \mathbf{K}_{a,b}^n = \{\mathbf{x} \in \mathbb{R}^{n+1}: f(t) = \mathbf{x}^T \mathbf{v}(t) \ge 0, \text{ for all } t \in [a,b]\} \end{eqnarray*} \] can be characterized by LMI as well.
- (Even degree) Let \(n = 2m\). Then \[ \begin{eqnarray*} \mathbf{K}_{a,b}^n &=& \{\mathbf{x} \in \mathbb{R}^{n+1}: x_i = \langle \mathbf{X}_1, \mathbf{H}_i^m \rangle + \langle \mathbf{X}_2, (a+b) \mathbf{H}_{i-1}^{m-1} - ab \mathbf{H}_i^{m-1} - \mathbf{H}_{i-2}^{m-1} \rangle, \\ & & \quad i = 0,\ldots,n, \mathbf{X}_1 \in \mathbf{S}_+^m, \mathbf{X}_2 \in \mathbb{S}_+^{m-1}\}. \end{eqnarray*} \]
- (Odd degree) Let \(n = 2m + 1\). Then \[ \begin{eqnarray*} \mathbf{K}_{a,b}^n &=& \{\mathbf{x} \in \mathbb{R}^{n+1}: x_i = \langle \mathbf{X}_1, \mathbf{H}_{i-1}^m - a \mathbf{H}_i^m \rangle + \langle \mathbf{X}_2, b \mathbf{H}_{i}^{m} - \mathbf{H}_{i-1}^{m} \rangle, \\ & & \quad i = 0,\ldots,n, \mathbf{X}_1, \mathbf{X}_2 \in \mathbb{S}_+^m\}. \end{eqnarray*} \]
References: Nesterov (2000) and Lecture 4 (p157-p159) of Ben-Tal and Nemirovski (2001). See also Mosek Modelling Cookbook, Sect. 6.2.6.

Example. Polynomial curve fitting. We want to fit a univariate polynomial of degree \(n\) \[ \begin{eqnarray*} f(t) = x_0 + x_1 t + x_2 t^2 + \cdots x_n t^n \end{eqnarray*} \] to a set of measurements \((t_i, y_i)\), \(i=1,\ldots,m\), such that \(f(t_i) \approx y_i\). Define the Vandermonde matrix \[ \begin{eqnarray*} \mathbf{A} = \begin{pmatrix} 1 & t_1 & t_1^2 & \cdots & t_1^n \\ 1 & t_2 & t_2^2 & \cdots & t_2^n \\ \vdots & \vdots & \vdots & & \vdots \\ 1 & t_m & t_m^2 & \cdots & t_m^n \end{pmatrix}, \end{eqnarray*} \]

Then we wish \(\mathbf{A} \mathbf{x} \approx \mathbf{y}\). Using least squares criterion, we obtain the optimal solution \(\mathbf{x}_{\text{LS}} = (\mathbf{A}^T \mathbf{A})^{-1} \mathbf{A}^T \mathbf{y}\). With various constraints, it is possible to find optimal \(\mathbf{x}\) by SDP.

Nonnegativity. Then we require \(\mathbf{x} \in \mathbf{K}_{a,b}^n\).
Monotonicity. We can ensure monotonicity of \(f(t)\) by requiring that \(f'(t) \ge 0\) or \(f'(t) \le 0\). That is \((x_1,2x_2, \ldots, nx_n) \in \mathbf{K}_{a,b}^{n-1}\) or \(-(x_1,2x_2, \ldots, nx_n) \in \mathbf{K}_{a,b}^{n-1}\).
Convexity or concavity. Convexity or concavity of \(f(t)\) corresponds to \(f''(t) \ge 0\) or \(f''(t) \le 0\). That is \((2x_2, 2x_3, \ldots, (n-1)nx_n) \in \mathbf{K}_{a,b}^{n-2}\) or \(-(2x_2, 2x_3, \ldots, (n-1)nx_n) \in \mathbf{K}_{a,b}^{n-2}\).
nonneg_poly_coeffs() and convex_poly_coeffs() are implemented in cvx. Not in Convex.jl yet.

Statistical Applications

Kim, S.-J., Lim, J. , Won, J. H. Nonparametric Sharpe Ratio Function Estimation in Heteroscedastic Regression Models via Convex Optimization. PMLR 84:1495-1504, 2018. http://proceedings.mlr.press/v84/kim18b.html
Papp, D. and F. Alizadeh (2014). Shape-constrained estimation using nonnegative splines. Journal of Computational and Graphical Statistics 23(1), 211– 231. https://doi.org/10.1080/10618600.2012.707343

SDP relaxation of binary LP

Consider a binary linear programming problem \[ \begin{eqnarray*} &\text{minimize}& \mathbf{c}^T \mathbf{x} \\ &\text{subject to}& \mathbf{A} \mathbf{x} = \mathbf{b}, \quad \mathbf{x} \in \{0,1\}^n. \end{eqnarray*} \] Note \[ \begin{eqnarray*} & & x_i \in \{0,1\},~\forall i \Leftrightarrow x_i^2 = x_i ,~\forall i \Leftrightarrow \mathbf{X} = \mathbf{x} \mathbf{x}^T, \text{diag}(\mathbf{X}) = \mathbf{x}. \end{eqnarray*} \] By relaxing the rank 1 constraint on \(\mathbf{X}\), we obtain an SDP relaxation \[ \begin{eqnarray*} &\text{minimize}& \mathbf{c}^T \mathbf{x} \\ &\text{subject to}& \mathbf{A} \mathbf{x} = \mathbf{b}, \text{diag}(\mathbf{X}) = \mathbf{x}, \mathbf{X} \succeq \mathbf{x} \mathbf{x}^T. \end{eqnarray*} \]

Note \(\mathbf{X} \succeq \mathbf{x} \mathbf{x}^T\) is equivalent to the LMI \[ \begin{eqnarray*} \begin{bmatrix} 1 & \mathbf{x}^T \\ \mathbf{x} &\mathbf{X} \end{bmatrix} \succeq \mathbf{0}. \end{eqnarray*} \]

This SDP can be efficiently solved and provides a lower bound to the original problem (why?). If the optimal \(\mathbf{X}\) has rank 1, then it is a solution to the original binary LP.
We can tighten the relaxation by adding other constraints that cut away part of the feasible set, without excluding rank 1 solutions. For instance, \(0 \le x_i \le 1\) and \(0 \le X_{ij} \le 1\).