Advanced Statistical Computing
Seoul National University
November 2023
Fact. Any local solution of a convex optimization problem is a global solution.
The underlying space is a vector space \(V\) unless otherwise stated.
Line segments: for given \(\mathbf{x}\), \(\mathbf{y}\), \(\{\alpha \mathbf{x} + (1-\alpha)\mathbf{y}: 0 \le \alpha \le 1\}\).
A set \(C\) is convex if for every pair of points \(\mathbf{x}\) and \(\mathbf{y}\) lying in \(C\) the entire line segment connecting them also lies in \(C\).
Singleton: \(\{\mathbf{a}\}\).
Euclidean space \(\mathbb{R}^d\).
Norm ball: \(B_r(\mathbf{c})=\{\mathbf{x}: \|\mathbf{x}-\mathbf{c}\| \le r\}\) for any proper norm \(\|\cdot\|\).
\(\ell_p\) “norm” \(\|\mathbf{x}\|_p = (\sum_{i=1}^d |x_i|^p)^{1/p}\), \(0<p<1\), in \(\mathbb{R}^d\) is not a proper norm.
Hyperplane: \(\{\mathbf{x}: \langle \mathbf{a}, \mathbf{x} \rangle = c \}\).
Halfspace: \(\{\mathbf{x}: \langle \mathbf{a}, \mathbf{x} \rangle \le c \}\) or \(\{\mathbf{x}: \langle \mathbf{a}, \mathbf{x} \rangle < c \}\).
Polyhedron: \(\{\mathbf{x}: \langle \mathbf{a}_j, \mathbf{x} \rangle \le b_j,~j=1,\dots,m\} = \{\mathbf{x}: \mathbf{A}\mathbf{x} \le \mathbf{b}\}\).
Positive semidefinite cone \(\mathbb{S}^d_{+}=\{\mathbf{X}\in\mathbb{R}^{d\times d}: \mathbf{X} = \mathbf{X}^T, ~\mathbf{X} \succeq \mathbf{0} \}\) and set of positive definite matrices \(\mathbb{S}^d_{++}=\{\mathbf{X}\in\mathbb{R}^{d\times d}: \mathbf{X}=\mathbf{X}^T, ~\mathbf{X} \succ \mathbf{0} \}\). (What is \(V\)?)
Translation: \(C + \mathbf{a} = \{\mathbf{x} + \mathbf{a}: \mathbf{x}\in C\}\) if \(C\) is convex.
Minkowski sum: \(C + D=\{\mathbf{x}+\mathbf{y}: \mathbf{x}\in C, \mathbf{y}\in D\}\) if \(C\) and \(D\) are convex.
Cartesian product: \(C\times D=\{(\mathbf{x},\mathbf{y}): \mathbf{x}\in C, \mathbf{y}\in D\}\) if \(C\) and \(D\) are convex.
Image \(\mathcal{A}C\) of a convex set \(C\) under a linear map \(\mathcal{A}\).
Inverse image \(\mathcal{A}^{-1}C\) of a convex set \(C\) under a linear map \(\mathcal{A}\).
Any subspace
Any hyperplane passing the origin.
Any halfspace whose closure passes the origin.
\(\mathbb{S}_{+}^d\), set of positive semidefinite matrices.
Second-order cone (Lorentz cone): \(\{(\mathbf{x}, t): \|\mathbf{x}\|_2 \le t \}\).
Norm cone: \(\{(\mathbf{x}, t): \|\mathbf{x}\| \le t \}\), where \(\|\cdot\|\) is any norm.
The intersection of an arbitrary collection of convex, affine, or conical sets is convex, affine, conical, respectively.
Convex combination: \(\{\sum_{i=1}^m \alpha_i \mathbf{x}_i: \alpha_i \ge 0, ~\sum_{i=1}^m\alpha_i = 1\}\)
Convex sets are closed under convex combination: any convex combination of points from a convex set \(C\) belongs to \(C\)
Conic combination, affine conbination are defined similarly; similar closure properties also hold.
The convex hull of a nonempty set \(C\) is the set of all convex combinations of points in \(C\):
\[
\text{conv}C = \{\sum_{i=1}^m \alpha_i \mathbf{x}_i: \mathbf{x}_i\in C,~ \alpha_i \ge 0, i=1,\dotsc,m \text{ for some }m,~\sum_{i=1}^m\alpha_i = 1 \}
\] which is the smallest convex set containing \(C\).
\(\text{cone}C\) drops the sum condition, and affine hull \(\text{aff}C\) drops the nonnegativity conditions on \(\alpha_i\)s.
\(\text{dim}(C) \triangleq \text{dim}(\text{aff}C)\).
Most constraint sets in optimization does not have an interior, e.g, probability simplex. It is useful to define the interior relative to the affine hull: \[ \text{relint}C = \{\mathbf{x}\in C: (\exists r>0) B(\mathbf{x}, r) \cap \text{aff}C \subset C \} \]
Recall that a real-valued function \(f\) is convex if \[ f(\alpha\mathbf{x} + (1-\alpha)\mathbf{y}) \le \alpha f(\mathbf{x}) + (1-\alpha) f(\mathbf{y}), \quad \forall\mathbf{x}, \mathbf{y} \in \text{dom}f, \forall\alpha \in [0, 1], \]
Extended-value functions: it is often convenient to extend the domain of \(f\) to the whole \(V\) and allow to have the value \(\infty\). Then \(f:V \to \mathbb{R}\cup\{\infty\}\) and \[ \text{dom} f = \{\mathbf{x}: f(\mathbf{x}) < \infty \} \] is the essential domain of \(f\).
This extension allows us to consider the indicator function of a set: \[ \iota_C(\mathbf{x}) = \begin{cases} 0 & \mathbf{x} \in C \\ \infty & \mathbf{x} \notin C \end{cases} \] so that a constrained optimization problem is converted to an unconstrained problem: \[ \min_{\mathbf{x}\in C} f(\mathbf{x}) = \min_{\mathbf{x}} f(\mathbf{x}) + \iota_C(\mathbf{x}) \]
Properness: a function \(f\) is proper if \(\text{dom}f \neq \emptyset\) and \(f(\mathbf{x}) > -\infty\) for all \(\mathbf{x}\).
A.k.a. supporting hyperplane inequality
If \(f\) is differentiable (i.e., its gradient \(\nabla f\) exists at each point in \(\text{dom}f\), which is open), then \(f\) is convex if and only if \(\text{dom} f\) is convex and \[ f(\mathbf{y}) \ge f(\mathbf{x}) + \langle \nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x} \rangle \] for all \(\mathbf{x}, \mathbf{y} \in \text{dom}f\).
The epigraph of a function \(f\) is the set \[ \text{epi}f = \{(\mathbf{x}, t): \mathbf{x}\in\text{dom}f,~ f(\mathbf{x}) \le t \}. \]
A function \(f\) is convex if and only if \(\text{epi}f\) is convex.
If \((\mathbf{y}, t)\in\text{epi}f\), then from the supporting hyperplance inequality, \[ t \ge f(\mathbf{y}) \ge f(\mathbf{x}) + \langle \nabla f(\mathbf{x}), \mathbf{y} - \mathbf{x} \rangle \] or \[ \left\langle (\nabla f(\mathbf{x}), -1), (\mathbf{y}, t) - (\mathbf{x}, f(\mathbf{x})) \right\rangle \le 0. \] This means that the hyperplane defined by \((\nabla f(\mathbf{x}),−1)\) supports \(\text{epi}f\) at the boundary point \((\mathbf{x},f(\mathbf{x}))\):
An extended-value function \(f\) is called closed if \(\text{epi}f\) is closed.
Let \(A\) and \(B\) be two disjoint nonempty convex subsets of \(\mathbb{R}^n\). Then there exist a nonzero vector \(\mathbf{v}\) and a real number \(c\) such that \[ \mathbf{a}^T\mathbf{v} \le c \le \mathbf{b}^T\mathbf{v}, \quad \forall\mathbf{a} \in A,~ \forall\mathbf{b} \in B . \] That is, the hyperplane \(\{\mathbf{x}\in\mathbb{R}^n: \mathbf{x}^T\mathbf{v} = c\}\) normal to \(\mathbf{v}\) separates \(A\) and \(B\).
If in addition both \(A\) and \(B\) are closed and one of them is bounded, then the separation is strict, i.e., there exist a nonzero vector \(\mathbf{v}\) and real numbers \(c_1\) and \(c_2\) such that \[ \mathbf{a}^T\mathbf{v} < c_1 < c_2 < \mathbf{b}^T\mathbf{v}, \quad \forall\mathbf{a} \in A,~ \forall\mathbf{b} \in B . \]
Thus there exists a hyperplane that strictly separates \((y, \zeta)\) and \(\text{epi}\varphi\) in the above plot if \(\varphi\) is closed. (why?)
\(\alpha\)-sublevel set of an extended-value function \(f\) is \[ S_{\alpha} = \{\mathbf{x}\in\text{dom}f : f(\mathbf{x}) \le \alpha \}. \]
If \(f\) is convex, then \(S_{\alpha}\) is convex for all \(\alpha\).
Further if \(f\) is continuous, then all sublevel sets are closed.
(Nonnegative weighted sums) If \(f\) and \(g\) are convex and \(\alpha\) and \(\beta\) are nonnegative constants, then \(\alpha f + \beta g\) is convex, with \(\text{dom}f \cap \text{dom}g\).
(Composition with affine mapping) If \(f\) is convex, then composition \(f(\mathbf{A}\mathbf{x} + \mathbf{b})\) of \(f\) with an affine map \(\mathbf{x}\mapsto \mathbf{A}\mathbf{x} + \mathbf{b}\) is convex, with \(\text{dom}g=\{\mathbf{x}:\mathbf{Ax}+\mathbf{b}\in\text{dom}f\}\).
(Pointwise maximum and supremum) If \(f_i\) is convex for each fixed \(i=1,\dotsc,m\), then \(g(\mathbf{x}) = \max_{i=1,\dotsc,m} f_i(\mathbf{x})\) is convex, with \(\text{dom}g = \cap_{i=1}^m\text{dom}f_i\).
(Composition) For \(h:\mathbb{R}\to\mathbb{R}\cup\{\infty\}\) and \(g:V\to\mathbb{R}\cup\{\infty\}\), if \(h\) is convex and nondecreasing, and \(g\) is convex, then \(f = h \circ g\) is convex, with \(\text{dom}f = \{\mathbf{x}\in\text{dom}g: g(\mathbf{x})\in\text{dom}h\}\).
(Paritial minimization) If \(f(\mathbf{x},\mathbf{y})\) is jointly convex in \((\mathbf{x},\mathbf{y})\), then \(g(\mathbf{x}) = \inf_{\mathbf{y}\in C} f(\mathbf{x},\mathbf{y})\) is convex provided it is proper and \(C\) is nonempty convex, with \(\text{dom}f = \{\mathbf{x}: (\exists\mathbf{y}\in C)(\mathbf{x},\mathbf{y})\in\text{dom}f\}\) (projection).
(Perspective) If \(f(\mathbf{x})\) is convex and finite, then its perspective \(g(\mathbf{x}, t)=t f(t^{-1}\mathbf{x})\) is convex, with \(\text{dom}g=\{(\mathbf{x},t): t^{-1}\mathbf{x}\in\text{dom}f,~ t>0\}\).
A sum of convex functions is convex.
\(f(\mathbf{x})=x_{(1)}+x_{(2)}+\dotsb+x_{(k)}\), the some of \(k\) largest components of \(\mathbf{x}\in\mathbb{R}^d\), is convex, because \[ f(\mathbf{x})=\max\{x_{i_1} +\dotsb + x_{i_k} : 1\le i_1 <i_2 < \dotsb < i_k \le n\}. \]
Maximum eigenvalue of a symmetric matrix \(\lambda_{\max}: \mathbb{S}^d \to \mathbb{R}\) is convex, because \[ \lambda_{\max}(\mathbf{X}) = \max_{\mathbf{v}^T\mathbf{v}=1} \mathbf{v}^T\mathbf{X}\mathbf{v}. \] (Rayleigh quotient). The maximand is linear (hence convex) in \(\mathbf{X}\) for each \(\mathbf{v}\).
Sum of \(k\) largest eigenvalues of a symmetric matrix is convex, because \[ \sum_{i=1}^k\lambda_{i}(\mathbf{X}) = \max_{\mathbf{V}^T\mathbf{V}=\mathbf{I}_k, \mathbf{V}\in\mathbb{R}^{d\times k}} \text{tr}(\mathbf{V}^T\mathbf{X}\mathbf{V}) \] (Ky Fan, 1949).
Support function of a set: \(\sigma_C(\mathbf{x}) = \sup_{\mathbf{y}\in C} \langle \mathbf{x}, \mathbf{y} \rangle\) is convex (with an obvious domain), because \(\langle \mathbf{x}, \mathbf{y} \rangle\) is convex (linear) in \(\mathbf{x}\) for each \(\mathbf{y}\in C\).
Matrix perspective function \(f(\mathbf{x},\mathbf{Y}) = \mathbf{x}^T\mathbf{Y}^{-1}\mathbf{x}\) is convex with \(\text{dom}f=\mathbb{R}^d\times\mathbb{S}_{++}^d\) because \[ \mathbf{x}^T\mathbf{Y}^{-1}\mathbf{x} = \sup_{\mathbf{z}\in\mathbb{R}^d}\{2\mathbf{x}^T\mathbf{z} - \mathbf{z}^T\mathbf{Y}\mathbf{z}\}. \]
Kullback-Leibler divergence of two \(d\)-dimensional probability vectors \[ D(\mathbf{x} || \mathbf{y}) = \sum_{i=1}^d x_i\log(x_i/y_i) \] (taking \(0\log 0= 0\) and \(\log 0=\infty\) by continuity) is convex on \(\Delta_{d-1}\times\Delta_{d-1}\), because