We do not assume convexity by default here.
Zeroth-order optimality condition $$ f(\mathbf{x}^{\star}) \le f(\mathbf{x}), \quad \forall \mathbf{x}\in\mathbb{R}^d. $$
First-order optimality condition: assume $f$ is continuously differentiable. $$ \nabla f(\mathbf{x}^\star) = 0. $$
Second-order optimality condition: assume $f$ is twice continuously differentiable. $$ \nabla^2 f(\mathbf{x}^\star) \succeq 0. $$
Assume all of these functions share some open set $U \subset \mathbb{R}^d$ as their domain.
Zeroth-order optimality condition $$ f(\mathbf{x}^{\star}) \le f(\mathbf{x}), \quad \forall \mathbf{x}\in C = \{\mathbf{x}: f_i(\mathbf{x})\le 0,~i=1,\dotsc,m,~h_i(\mathbf{x})=0,~i=1,\dotsc,p\}. $$
First-order optimality condition: Karush-Kuhn-Tucker (KKT) conditions
Kuhn, H. W.; Tucker, A. W., Nonlinear Programming. Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 481--492, University of California Press, Berkeley, Calif., 1951. https://projecteuclid.org/euclid.bsmsp/1200500249
Karush, W., Minima of Functions of Several Variables with Inequalities as Side Constraints. M.Sc. Dissertation. Dept. of Mathematics, Univ. of Chicago, Chicago, Illinois, 1939.
Define the Lagrangian \begin{eqnarray*} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu}) = f_0(\mathbf{x}) + \sum_{i=1}^m \lambda_i f_i(\mathbf{x}) + \sum_{i=1}^p \nu_i h_i(\mathbf{x}). \end{eqnarray*} The vectors $\boldsymbol{\lambda} = (\lambda_1,\ldots, \lambda_m)^T$ and $\boldsymbol{\nu} = (\nu_1,\ldots,\nu_p)^T$ are called the Lagrange multiplier vectors or dual variables.
Let $\mathbf{x}^\star$ be a local minimizer of $f_0$ in the feasible region $C$.
Assume $f_0, f_1, \ldots,f_m,h_1,\ldots,h_p$ are continuously differentiable near $\mathbf{x}^{\star}$.
(Mangasarian-Fromovitz constraint qualification). Assume
(The vector $\mathbf{v}$ is a tangent vector in the sense that infinitesimal motion from $\mathbf{x}^{\star}$ along $\mathbf{v}$ stays within the feasible region.)
(Lagrange multiplier rule). Suppose that the objective function $f_0$ of the constrained optimization problem above has a local minimum at the feasible point $\mathbf{x}^{\star}$. If $f_0$ and the constraint functions are continuously differentiable near $\mathbf{x}^{\star}$, and the Mangasarian-Fromovitz constraint qualification holds at $\mathbf{x}^{\star}$, then there exist Lagrange multipliers $\lambda_1^{\star}, \dotsc, \lambda_m^{\star}$ and $\nu_1^{\star}, \dotsc, \nu_p^{\star}$ such that $$ \nabla_{\mathbf{x}}\mathcal{L}(\mathbf{x}^{\star},\boldsymbol{\lambda}^{\star}, \boldsymbol{\nu}^{\star})= \nabla f_0(\mathbf{x}^\star) + \sum_{i=1}^m \lambda_i^\star \nabla f_i(\mathbf{x}^\star) + \sum_{i=1}^p \nu_i^\star \nabla h_i(\mathbf{x}^\star) = \mathbf{0}. $$ Moreover, each of the multipliers $\lambda_i^{\star}$ associated with the inequality constraints is nonnegative, and $\lambda_i^{\star} = 0$ whenever $f_i(\mathbf{x}^{\star}) < 0$.
We can summarize the KKT conditions as \begin{eqnarray*} f_i(\mathbf{x}^\star) &\le& 0, \quad i = 1,\ldots,m \\ h_i(\mathbf{x}^\star) &=& 0, \quad i = 1,\ldots,p \\ \lambda_i^\star &\ge& 0, \quad i = 1,\ldots,m \\ \lambda_i^\star f_i(\mathbf{x}^\star) &=& 0, \quad i=1,\ldots,m \\ \nabla f_0(\mathbf{x}^\star) + \sum_{i=1}^m \lambda_i^\star \nabla f_i(\mathbf{x}^\star) + \sum_{i=1}^p \nu_i^\star \nabla h_i(\mathbf{x}^\star) &=& \mathbf{0}. \end{eqnarray*}
The fourth condition $\lambda_i^\star f_i(\mathbf{x}^\star) = 0$ is called the complementary slackness.
Remember that the KKT conditions are collectively a necessary condition for local optimality.
If the optimization problem is convex, then they become a necessary and sufficient condition, i.e., finding a triple $(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu})$ that satisfies the KKT conditions guarantees global optimiality of the triple.
(Slater's constraint qualification condition). Assume there exists a feasible point $\mathbf{x}_0$ such that $f_i(\mathbf{x}_0) < 0$ for active inequality constraints, i.e., for all $i$ with $f_i(\mathbf{x}^{\star})=0$.
The likelihood of observation $(n_1, \dotsc, n_r)$ from $\text{mult}(n, (p_1, \dotsc, p_r))$ is $\prod_{i=1}^r p_i^{n_i}$. The MLE problem is
\begin{eqnarray*} &\text{minimize}& -\sum_{i=1}^r n_i\log p_i \\ &\text{subject to}& -p_i \le 0, \quad i = 1,\dotsc,r \\ & & \sum_{i=1}^r p_i = 1. \end{eqnarray*}This is a convex optimization problem. The Lagrangian is $$ \mathcal{L}(\mathbf{p}, \boldsymbol{\lambda}, \nu) = -\sum_{i=1}^r n_i\log p_i - \sum_{i=1}^r \lambda_i p_i + \nu\left(\sum_{i=1}^r p_i - 1\right). $$ Differentiate with respect to $\mathbf{p}$ to have $$ 0 = -\frac{n_i}{p_i} - \lambda_i + \nu, \quad i=1,\dotsc, r. $$ Multiply $p_i$ on both sides and sum on $i$ to have $$ 0 = -n - \sum_{i=1}^r \lambda_i p_i + \nu \sum_{i=1}^r p_i = -n - 0 + \nu $$ due to complementary slackness. Setting all $\lambda_i=0$ gives $p_i=n_i/n$. The triple $((n_1/n, \dotsc, n_r/n), \mathbf{0}, n)$ satisfies the KKT conditions and hence is a global solution (Even if some $n_i=0$).
Suppose the Lagrangian multiplier rule is satisfied at a point $\mathbf{x}^{\star}$ with associated multipliers $\boldsymbol{\lambda}$ and $\boldsymbol{\nu}$. Assume the Lagrangian $\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu})$ is twice continuously differentiable in $\mathbf{x}$ near $\mathbf{x}^{\star}$. If $\mathbf{v}^T\nabla^2_{\mathbf{x}}\mathcal{L}(\mathbf{x}^{\star}, \boldsymbol{\lambda}, \boldsymbol{\nu})\mathbf{v} > 0$ for all $\mathbf{v}\neq\mathbf{0}$ satisfying $\langle \nabla h_i(\mathbf{x}^{\star}), \mathbf{v} \rangle = 0$ for all $i$ and $\langle \nabla f_i(\mathbf{x}^{\star}), \mathbf{v} \rangle \le 0$ for all active constraints, then $\mathbf{x}^{\star}$ is a local minimum of $f_0$.
Denote the optimal value of original problem by $p^\star=\inf_{\mathbf{x}\in C}f_0(\mathbf{x})$. For any $\boldsymbol{\lambda} \ge \mathbf{0}$ and any $\boldsymbol{\nu}$, we have \begin{eqnarray*} g(\lambda, \nu) \le p^\star. \end{eqnarray*} Proof: For any feasible point $\tilde{\mathbf{x}}$, \begin{eqnarray*} \mathcal{L}(\tilde{\mathbf{x}}, \boldsymbol{\lambda}, \boldsymbol{\nu}) = f_0(\tilde{\mathbf{x}}) + \sum_{i=1}^m \lambda_i f_i(\tilde{\mathbf{x}}) + \sum_{i=1}^p \nu_i h_i(\tilde{\mathbf{x}}) \le f_0(\tilde{\mathbf{x}}) \end{eqnarray*} because the second term is non-positive (since $\boldsymbol{\lambda}\ge\mathbf{0}$) and the third term is zero. Then \begin{eqnarray*} g(\boldsymbol{\lambda}, \boldsymbol{\nu}) = \inf_{\mathbf{x}} L(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu}) \le L(\tilde{\mathbf{x}}, \boldsymbol{\lambda}, \boldsymbol{\nu}) \le f_0(\tilde{\mathbf{x}}). \end{eqnarray*}
Since each pair $(\boldsymbol{\lambda}, \boldsymbol{\nu})$ with $\boldsymbol{\lambda} \geq \mathbf{0}$ gives a lower bound to the optimal value $p^\star$. It is natural to ask for the best possible lower bound for $p^{\star}$ the Lagrange dual function can provide. This leads to the Lagrange dual problem \begin{eqnarray*} &\text{maximize}& g(\boldsymbol{\lambda}, \boldsymbol{\nu}) \\ &\text{subject to}& \boldsymbol{\lambda} \geq \mathbf{0}, \end{eqnarray*} which is a convex problem whether or not the primal problem is convex.
Let $d^\star=\sup_{\boldsymbol{\lambda}\ge\mathbf{0}, \boldsymbol{\nu}}g(\boldsymbol{\lambda},\boldsymbol{\nu})$. Then we have weak duality \begin{eqnarray*} d^\star \le p^\star. \end{eqnarray*} The difference $p^\star - d^\star$ is called the duality gap.
Strong duality refers to zero duality gap. This occurs when the problem is convex and the Lagrange multiplier rule holds at a feasible point $\hat{\mathbf{x}}$. Let the corresponding Lagrange multipliers be $(\hat{\boldsymbol{\lambda}}, \hat{\boldsymbol{\nu}})$. In this case, $\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu})$ is convex in $\mathbf{x}$. Since the Lagrange multiplier rule states that $\nabla_{\mathbf{x}}\mathcal{L}(\hat{\mathbf{x}}, \hat{\boldsymbol{\lambda}}, \hat{\boldsymbol{\nu}}) = \mathbf{0}$, point $\hat{\mathbf{x}}$ furnishes an unconstrained minimum of the Lagrangian. Hence $$ p^{\star} \le f(\hat{\mathbf{x}}) = \mathcal{L}(\hat{\mathbf{x}}, \hat{\boldsymbol{\lambda}}, \hat{\boldsymbol{\nu}}) = \inf_{\mathbf{x}} \mathcal{L}(\mathbf{x}, \hat{\boldsymbol{\lambda}}, \hat{\boldsymbol{\nu}}) = g(\hat{\boldsymbol{\lambda}}, \hat{\boldsymbol{\nu}}) \le d^{\star}. $$ (Why the first equality?)
That is, $d^{\star}=p^{\star}$.
For convex problems, conditions for the Lagrange multiplier rule holds also yields strong duality, e.g., Slater's condition.
Consider the dual of QP
\begin{eqnarray*} &\text{minimize}& \frac{1}{2}\mathbf{x}^T\mathbf{P}\mathbf{x} + \mathbf{q}^T\mathbf{x} \\ &\text{subject to}& \mathbf{A}\mathbf{x} = \mathbf{b} \end{eqnarray*}($\mathbf{P}\succ\mathbf{0}$).
The Lagrangian is $$ \mathcal{L}(\mathbf{x}, \boldsymbol{\nu}) = \frac{1}{2}\mathbf{x}^T\mathbf{P}\mathbf{x} + \mathbf{q}^T\mathbf{x} + \boldsymbol{\nu}^T(\mathbf{A}\mathbf{x}-\mathbf{b}). $$
The dual function is \begin{eqnarray*} g(\boldsymbol{\nu}) &=& \inf_{\mathbf{x}} \frac{1}{2}\mathbf{x}^T\mathbf{P}\mathbf{x} + \mathbf{q}^T\mathbf{x} + \boldsymbol{\nu}^T(\mathbf{A}\mathbf{x}-\mathbf{b}) \\ &=& \inf_{\mathbf{x}} \frac{1}{2}\mathbf{x}^T\mathbf{P}\mathbf{x} + (\mathbf{A}^T\boldsymbol{\nu}+\mathbf{q})^T\mathbf{x} - \boldsymbol{\nu}^T\mathbf{b} \\ &=& -\frac{1}{2}(\mathbf{A}^T\boldsymbol{\nu}+\mathbf{q})^T\mathbf{P}^{-1}(\mathbf{A}^T\boldsymbol{\nu}+\mathbf{q}) - \boldsymbol{\nu}^T\mathbf{b}. \end{eqnarray*}
The infimum is attained with $$ \mathbf{x} = -\mathbf{P}^{-1}(\mathbf{A}^T\boldsymbol{\nu}+\mathbf{q}). $$
Thus the dual problem is an unconstraint QP, which is easier to solve than the primal.
The maximum of $g(\boldsymbol{\nu})$ occurs at $$ \mathbf{0} = -\mathbf{b} - \mathbf{A}\mathbf{P}^{-1}(\mathbf{A}^T\boldsymbol{\nu}+\mathbf{q}) $$ or $$ \boldsymbol{\nu} = -(\mathbf{A}\mathbf{P}^{-1}\mathbf{A})^{-1}(\mathbf{b}+\mathbf{A}\mathbf{P}^{-1}\mathbf{q}). $$
Therefore the primal optimum is attained at $$ \mathbf{x}^{\star} = -\mathbf{P}^{-1}(-\mathbf{A}^T(\mathbf{A}\mathbf{P}^{-1}\mathbf{A})^{-1}(\mathbf{b}+\mathbf{A}\mathbf{P}^{-1}\mathbf{q})+\mathbf{q}) = \mathbf{P}^{-1}\mathbf{A}^T(\mathbf{A}\mathbf{P}^{-1}\mathbf{A})^{-1}(\mathbf{b}+\mathbf{A}\mathbf{P}^{-1}\mathbf{q}) - \mathbf{P}^{-1}\mathbf{q}. $$
Note
$$ \sup_{\boldsymbol{\lambda}\ge\mathbf{0}, \boldsymbol{\nu}}\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu}) = \sup_{\boldsymbol{\lambda}\ge\mathbf{0}, \boldsymbol{\nu}} f_0(\mathbf{x}) + \sum_{i=1}^m \lambda_i f_i(\mathbf{x}) + \sum_{i=1}^p \nu_i h_i(\mathbf{x}) = \begin{cases} f_0(\mathbf{x}) & \text{if } \mathbf{x} \in C \\ \infty & \text{otherwise} \end{cases} = f_0(\mathbf{x}) + \iota_{C}(\mathbf{x}) $$where $$ C = \{\mathbf{x}: f_i(\mathbf{x})\le 0,~i=1,\dotsc,m,~h_i(\mathbf{x})=0,~i=1,\dotsc,p\}. $$
Thus weak duality is
$$ d^{\star} = \sup_{\boldsymbol{\lambda}\ge\mathbf{0}, \boldsymbol{\nu}} g(\boldsymbol{\lambda}, \boldsymbol{\nu}) = \sup_{\boldsymbol{\lambda}\ge\mathbf{0}, \boldsymbol{\nu}}\inf_{\mathbf{x}}\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu}) \le \inf_{\mathbf{x}}\sup_{\boldsymbol{\lambda}\ge\mathbf{0}, \boldsymbol{\nu}}\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu}) = \inf_{\mathbf{x}\in C} f_0(\mathbf{x}) = p^{\star} $$or min-max inequlity.
Strong duality amounts to prove the minimax theorem
$$ \sup_{\boldsymbol{\lambda}\ge\mathbf{0}, \boldsymbol{\nu}}\inf_{\mathbf{x}}\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu}) = \inf_{\mathbf{x}}\sup_{\boldsymbol{\lambda}\ge\mathbf{0}, \boldsymbol{\nu}}\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu}). $$Boyd & Vandenberghe (Ch. 5)
Lange, K., 2016. MM optimization algorithms (Vol. 147). SIAM. (Ch. 3)
Lange, K., 2010. Numerical analysis for statisticians. Springer Science & Business Media. (Ch. 11)