A function \(f: \mathbb{R}^n \mapsto \mathbb{R}\) with \(\text{dom} f = \mathbb{R}_{++}^n\) defined as \[\begin{eqnarray*}
f(\mathbf{x}) = c x_1^{a_1} x_2^{a_2} \cdots x_n^{a_n},
\end{eqnarray*}\] where \(c>0\) and \(a_i \in \mathbb{R}\), is called a monomial.
A sum of monomials, i.e., \[\begin{eqnarray*}
f(\mathbf{x}) = \sum_{k=1}^K c_k x_1^{a_{1k}} x_2^{a_{2k}} \cdots x_n^{a_{nk}},
\end{eqnarray*}\] where \(c_k > 0\), is called a posynomial.
Examples
\(0.23\), \(2z\sqrt{x/y}\), \(3x^2y^{−.12}z\) are monomials (hence also posynomials)
\(2x + 3y−2z\), \(x^2 + \tan x\) are not posynomials
Monomials are closed under multiplication and division.
Posynomials are closed under addition, multiplication, and positive scaling.
If \(\gamma\) is a nonnegative integer and \(f\) is a posynomial, then \(f^{\gamma}\) is a posynomial.
GP
A geometric program is of form \[\begin{eqnarray*}
&\text{minimize}& f_0(\mathbf{x}) \\
&\text{subject to}& f_i(\mathbf{x}) \le 1, \quad i=1,\ldots,m \\
& & h_i(\mathbf{x}) = 1, \quad i=1,\ldots,p
\end{eqnarray*}\] where \(f_0, \ldots, f_m\) are posynomials and \(h_1, \ldots, h_p\) are monomials. The constraint \(\mathbf{x} > \mathbf{0}\) is implicit.
Q: Is GP a convex optimization problem?
With change of variable \(y_i = \log x_i\), a monomial \[\begin{eqnarray*}
f(\mathbf{x}) = c x_1^{a_1} x_2^{a_2} \cdots x_n^{a_n}
\end{eqnarray*}\] can be written as \[\begin{eqnarray*}
f(\mathbf{x}) = f(e^{y_1}, \ldots, e^{y_n}) = c (e^{y_1})^{a_1} \cdots (e^{y_n})^{a_n} = e^{\mathbf{a}^T \mathbf{y} + b},
\end{eqnarray*}\] where \(b = \log c\). Similarly, we can write a posynomial as \[\begin{eqnarray*}
f(\mathbf{x}) &=& \sum_{k=1}^K c_k x_1^{a_{1k}} x_2^{a_{2k}} \cdots x_n^{a_{nk}} \\
&=& \sum_{k=1}^K e^{\mathbf{a}_k^T \mathbf{y} + b_k},
\end{eqnarray*}\] where \(\mathbf{a}_k = (a_{1k}, \ldots, a_{nk})\) and \(b_k = \log c_k\).
The original GP can be expressed in terms of the new variable \(\mathbf{y}\)\[\begin{eqnarray*}
&\text{minimize}& \sum_{k=1}^{K_0} e^{\mathbf{a}_{0k}^T \mathbf{y} + b_{0k}} \\
&\text{subject to}& \sum_{k=1}^{K_i} e^{\mathbf{a}_{ik}^T \mathbf{y} + b_{ik}} \le 1, \quad i = 1,\ldots,m \\
& & e^{\mathbf{g}_i^T \mathbf{y} + h_i} = 1, \quad i=1,\ldots,p,
\end{eqnarray*}\] where \(\mathbf{a}_{ik}, \mathbf{g}_i \in \mathbb{R}^n\).
Taking log of both objective and constraint functions, we obtain the geometric program in convex form\[\begin{eqnarray*}
&\text{minimize}& \log \left(\sum_{k=1}^{K_0} e^{\mathbf{a}_{0k}^T \mathbf{y} + b_{0k}}\right) \\
&\text{subject to}& \log \left(\sum_{k=1}^{K_i} e^{\mathbf{a}_{ik}^T \mathbf{y} + b_{ik}}\right) \le 0, \quad i = 1,\ldots,m \\
& & \mathbf{g}_i^T \mathbf{y} + h_i = 0, \quad i=1,\ldots,p.
\end{eqnarray*}\]
Why do we require monomials for equality constraints?
If all \(f_i\) are monomials, GP reduces to an LP.
Example 1: Multinomial probabilities with order constraints
Let a random variable \(X\) has a multinomial distribution with parameters \(n\) and \(\mathbf{p} = (p_1, \dotsc , p_r)\) with order constraints \(p_1 \le \dotsb \le p_r\).
Observed frequencies of the classes are \((n_1, \dotsc, n_r)\).
Let \(z_j = e^{\beta_j}\), \(j=1,\ldots,p\). The objective becomes \[
\small
\begin{eqnarray*}
\prod_{i:y_i=1} \prod_{j=1}^p z_j^{-x_{ij}} \prod_{i=1}^n \left( 1 + \prod_{j=1}^p z_j^{x_{ij}} \right).
\end{eqnarray*}
\] This leads to a GP \[
\small
\begin{eqnarray*}
&\text{minimize}& \prod_{i:y_i=1} s_i \prod_{i=1}^n t_i \\
&\text{subject to}& \prod_{j=1}^p z_j^{-x_{ij}} \le s_i, \quad i = 1,\ldots,m \\
& & 1 + \prod_{j=1}^p z_j^{x_{ij}} \le t_i, \quad i = 1, \ldots, n,
\end{eqnarray*}
\] in variables \(\mathbf{s} \in \mathbb{R}^{m}\), \(\mathbf{t} \in \mathbb{R}^n\), and \(\mathbf{z} \in \mathbb{R}^p\). Here \(m\) is the number of observations with \(y_i=1\).
Regularized logistic regression as GP
How to incorporate the lasso penalty? Let \(z_j^+ = e^{\beta_j^+}\), \(z_j^- = e^{\beta_j^-}\). Lasso penalty takes the form \(e^{\lambda |\beta_j|} = (z_j^+ z_j^-)^\lambda\).
Function \(f\) is a generalized posynomial if it can be formed using addition, multiplication, positive real power, and maximum, starting from posynomials (hence closed for these operations). For example,
A generalized geometric program is of form \[\begin{eqnarray*}
&\text{minimize}& f_0(\mathbf{x}) \\
&\text{subject to}& f_i(\mathbf{x}) \le 1, \quad i=1,\ldots,m \\
& & h_i(\mathbf{x}) = 1, \quad i=1,\ldots,p
\end{eqnarray*}\] where \(f_0, \ldots, f_m\) are generalized posynomials and \(h_1, \ldots, h_p\) are monomials. The constraint \(\mathbf{x} > \mathbf{0}\) is implicit.
Any GP is a GGP; a GGP can be converted to GP. Hence they are equivalent. For example, GGP
Mosek is capable of solving GP. cvx has a GP mode that recognizes and transforms GP problems.
Unfortunately, Convex.jl does not directly suport GP. However, it supports more general exponential cone representable problems.
Doubly unfortunately, Mosek does not support general exponential cone constraints. So we have to turn to either SCS, COSMO, or ECOS.
Exponential Cone
The exponential cone is defined as the set \(K_{\exp}=\{(x,y,t): ye^{x/y} \le t, y > 0 \}\). Note this is the epigraph of the perspective of \(f(x)=e^{x}\). Since \(K_{\exp}\) is not closed, we instead consider the closure: \[
\bar{K}_{\exp} = K_{\exp} \cup \{(x,0,z): x \le 0, z \ge 0\}.
\]
Exponential cone representability is defined in an obvious fashion.
GP constraints \(\log \left(\sum_{k=1}^{K_i} e^{\mathbf{a}_{ik}^T \mathbf{y} + b_{ik}}\right) \le 0\) are expressed in log-sum-exp composited with an affine map.
Code example
Multinomial MLE with order constraints
Data: \(n_1, n_2, n_3 = (4, 2, 374)\) (n=380).
Constraints: \(p_1 \le p_2 \le p_3\).
usingConvex, MathOptInterfaceconst MOI = MathOptInterface## Use SCS solver# using SCS# solver = SCS.Optimizer()# MOI.set(solver, MOI.RawOptimizerAttribute("verbose"), 1)## Use COSMO solver# using COSMO# solver = COSMO.Optimizer()# MOI.set(solver, MOI.RawOptimizerAttribute("max_iter"), 5000)# Use ECOS solverusingECOSsolver = ECOS.Optimizer()MOI.set(solver, MOI.RawOptimizerAttribute("verbose"), true)n=[4, 2, 374] # trinomial with n=380x =Variable(length(n)) # x = log(p)problem =maximize(dot(n, x))for i=1:length(n)-1 problem.constraints += x[i] - x[i+1] <=0endproblem.constraints +=logsumexp(x) <=0# Solve the problem@timesolve!(problem, solver)
### https://github.com/JuliaOpt/Convex.jl/blob/master/docs/examples_literate/general_examples/logistic_regression.jlusingDataFramesusingRDatasetsiris =dataset("datasets", "iris")## outcome variable: +1 for versicolor, -1 otherwiseY = [species =="versicolor" ? 1.0:-1.0 for species in iris.Species]## create data matrix with one column for each feature (first column corresponds to offset)X =hcat(ones(size(iris, 1)), iris.SepalLength, iris.SepalWidth, iris.PetalLength, iris.PetalWidth);usingECOSsolver = ECOS.Optimizer()MOI.set(solver, MOI.RawOptimizerAttribute("verbose"), true)## solve the logistic regression problemn, p =size(X)beta =Variable(p)problem =minimize(logisticloss(-Y.*(X*beta)))solve!(problem, solver)