Goal

Given a differentiable function $f : M \to R$ , what is the condition for the minimizer of $f$ ? What if there are constraints? The general answer is the Karush-Kuhn-Tucker (KKT) condition, which is typically written in a coordinate-heavy form.

Is there a high level understanding of KKT condition using the notion of pullback and pushforward?

Karush-Kuhn-Tucker (KKT) Condition

The problem is that we want to

x \in R^{n} min E (x)

subject to $g_{i} (x) = 0$ for $i = 1, \dots, m$ and $h_{j} (x) \leq 0$ for $j = 1, \dots, ℓ$ .

The KKT Theorem states that the optimizer $x_{0}$ of the above problem satisfies

\nabla E ∣_{x_{0}} + i = 1 \sum m λ_{i} \nabla g_{i} ∣_{x_{0}} + j = 1 \sum ℓ μ_{j} \nabla h_{j} ∣_{x_{0}} = 0

μ_{1}, \dots, μ_{ℓ} \geq 0

j = 1 \sum ℓ μ_{j} h_{j} (x_{0}) = 0

High Level Understanding of KKT Condition

Imagine you are a ball rolling down a 3D hill and you want to minimize your energy function $E (x)$ . Reaching the bottom of the hill is the optimal solution. However, there are some constraints on where you can go. Consider when $g_{i} (x) = 0$ . This is like saying there are rigid train tracks. The ball must stay exactly on this line. Consider the constraint $h_{j} (x) \leq 0$ . These are like fences. The ball can roll freely inside the fenced yard ( $h_{j} (x) < 0$ ) , but cannot cross the boundary ( $h_{j} (x) = 0$ ).

The KKT Theorem states that if the ball is at the optimal solution $x_{0}$ , those three specific conditions must hold.

This balances the forces. The gradient $\nabla E ∣_{x_{0}}$ is gravity pulling the ball down the hill. Without constraints, the ball would stop where gravity is zero ( $\nabla E ∣_{x_{0}} = 0$ ). However, the train tracks and fences exert forces on the ball. It means the track $\nabla g$ and the fence $\nabla h$ are pushing the ball with exact, equal, and opposite forces to balance the gravity. The scalars $λ_{i}$ and $μ_{j}$ are the strength of the forces. The net force must be $0$ for the ball to be at rest.
The second condition says the force from the fence must be pushing inward. If $μ_{j} < 0$ , the force from the fence is pushing outward, which means the ball is trying to cross the fence. This cannot be optimal because the ball can roll freely inside the fenced yard, so it can always find a better solution by rolling inside the yard.
We know $μ_{j} \geq 0$ from condition 2 and $h_{j} (x_{0}) \leq 0$ from the constraint. Since every single term in the sum $\sum_{j = 1}^{ℓ} μ_{j} h_{j} (x_{0})$ is non-positive, the only way for the sum to be $0$ is that every single term is $0$ . In particular, for $μ_{j} h_{j} (x_{0})$ to equal $0$ , one of two must be true.
1. The ball is far away from the fence, meaning $h_{j} (x_{0}) < 0$ . Because the ball is not touching the fence, there is no force from the fence, so $μ_{j} = 0$ . Thus the product is $0$ .
2. The ball is touching the fence, meaning $h_{j} (x_{0}) = 0$ . The product is still $0$ . (Note that $μ_{j} > 0$ ).

Definition (Annihilator Subspace)

The annihilator subspace of a subspace $W \subset U$ is a subspace $W^{\circ} \subset U^{*}$ defined by

W^{\circ} := {λ \in U^{*} ∣ ⟨ λ ∣ w ⟩ = 0, \forall w \in W}

It’s like the dual of the null space.

Definition (Polar Cone)

The polar cone of a subset $S \subset U$ is a subset $S^{\circ} \subset U^{*}$ defined by

S^{\circ} := {λ \in U^{*} ∣ ⟨ λ ∣ s ⟩ \leq 0, \forall s \in S}

Proposition (Convexity of Polar Cone)

The polar cone of any set is a convex cone.

Proposition (Polar Cone of a Subspace)

The polar cone of a subspace is the annihilator space.

Definition (Four Fundamental Subspaces)

The four fundamental subspaces of a linear map $A : U \to V$ are the following four subspaces:

Kernel

ker (A) := {u \in U ∣ A u = 0} \subset U

Image

im (A) := {A u \in V ∣ u \in U} \subset V

Cokernel

ker (A^{*}) := {λ \in V^{*} ∣ A^{*} λ = 0} \subset V^{*}

Coimage

im (A^{*}) := {A^{*} λ \in U^{*} ∣ λ \in V^{*}} \subset U^{*}

Theorem (Fundamental Theorem of Linear Maps)

The four fundamental subspaces of a linear map $A : U \to V$ satisfy the following properties:

$ker (A)^{\circ} = im (A^{*})$
1. The coimage perfectly annihilates the kernel.
$ker (A^{*})^{\circ} = im (A)$
1. The image perfectly annihilates the cokernel.
$im (A)^{\circ} = ker (A^{*})$
1. The cokernel annihilates the image.
$im (A^{*})^{\circ} = ker (A)$
1. The kernel annihilates the coimage. Recall that $A^{*}$ is the adjoint linear map of $A$ , where $A^{*} : V^{*} \to U^{*}$ .

Proof: Use the dual pairing.

Let $λ \in V^{*}$ . We have $A^{*} λ \in im (A^{*})$ . Since $A^{*}$ is adjoint, then

⟨ A^{*} λ ∣ u ⟩ = ⟨ λ ∣ A u ⟩

Let $u$ be some vector. As $A u = 0$ , we have RHS is $0$ and thus LHS is $0$ . In particular, it means $A^{*} λ \in ker (A)^{\circ}$ as it annihilates the kernel.

The proof is similar for the other three properties.

Unconstrained Optimization

Let $M$ be a domain without boundary (e.g. $M = R^{n}$ ). Let $E : M \to R$ be a smooth function. The problem is that we want to minimize $E (x)$ for all $x \in M$ . If $x_{0}$ is a minimizer, then

d E ∣_{x_{0}} [[\overset{x}{˚}]] = 0, \forall \overset{x}{˚} \in T_{x_{0}} M

Equivalently, $d E ∣_{x_{0}} = 0$ .

Equality Constraints

Here, we discuss the “train tracks” and how they affect the optimality condition.

Let $M$ be an $n$ -dimensional domain without boundary (e.g. M = $R^{n}$ ). Let $S \subset M$ be a surface without boundary. Typically,

S = {x \in M ∣ g (x) = 0} for some g : M \to R^{m}

In that case, $S$ is $(n - m)$ -dimensional. The problem is that we want to minimize $E (x)$ for all $x \in S$ . For an unconstrained problem on $S$ , the condition for an optimal $x_{0} \in S$ is

d E ∣_{x_{0}} [[\overset{x}{˚}]] = 0, \forall \overset{x}{˚} \in T_{x_{0}} S \subset T_{x_{0}} M

Equivalently, $d E ∣_{x_{0}} \in (T_{x_{0}} S)^{\circ}$ .

Now suppose $g : M \to Y$ where $Y$ is a vector space, and $S$ is given by

S = g^{- 1} ({0_{Y}}) = {x \in M ∣ g (x) = 0_{Y}}

What is $(T_{x_{0}} S)^{\circ}$ in terms of $g$ ? We observe that

T_{x_{0}} S = ker (d g ∣_{x_{0}})

Therefore by Theorem (Fundamental Theorem of Linear Maps),

(T_{x_{0}} S)^{\circ} = (ker (d g ∣_{x_{0}}))^{\circ} = im (d g ∣_{x_{0}}^{*}) = {d g ∣_{x_{0}}^{*} [[λ]] \in T_{x_{0}} M^{*} ∣ λ \in Y^{*}}

The optimality condition is that there exists $λ \in Y^{*}$ such that

d E ∣_{x_{0}} = d g ∣_{x_{0}}^{*} [[λ]]

In coordinate form,

\frac{\partial E}{\partial x _{i}}_{x_{0}} = α \sum λ_{α} \frac{\partial g _{α}}{\partial x _{i}}_{x_{0}}

The idea is that if we have found the minimizer point $x_{0}$ while trapped on a surface $S$ , then no matter what direction we take on $S$ from $x_{0}$ (i.e. the tangent space $T_{x_{0}} S$ ), the energy function $E$ does not change (i.e. $d E ∣_{x_{0}} [[\overset{x}{˚}]] = 0$ ). However, this means that $d E ∣_{x_{0}}$ is actually in the annihilator subspace of the tangent space. Via the fundamental theorem of linear maps, we get that $d E$ is in the image of $d g^{*}$ , such that there exists covector $λ \in Y^{*}$ such that $d E ∣_{x_{0}} = d g ∣_{x_{0}}^{*} [[λ]]$ . The resulting equation is the classic Lagrange multiplier form.

Inequality Constraints

Here we discuss the “fences” and how they affect the optimality condition.

Definition (Tangent Cone)

Let $M$ be a domain without boundary (e.g. $M = R^{n}$ ). Let $S \subset M$ be a surface with boundary. Typically,

S = {x \in M ∣ h_{i} (x) \leq 0, i = 1, \dots, ℓ}

The problem is that we want to minimize $E (x)$ for all $x \in S$ .

For each $x \in S$ , we define the tangent cone of $S$ at $x$ by

C_{x} := {\overset{x}{˚} \in T_{x} M \overset{x}{˚} \in \frac{d _{γ} ( t )}{d t}_{t = 0} for some curve γ : [0, 1] \to S}

A tangent cone may be a subspace, the entire tangent space $T_{x} M$ , or some non-convex cone. If $x$ is the minimizer of $E$ , then it means that there is no direction to go that decreases $E$ . In particular,

d E ∣_{x_{0}} [[\overset{x}{˚}]] \geq 0

or that

- d E ∣_{x_{0}} [[\overset{x}{˚}]] \leq 0

which by definition of polar cone implies that

- d E ∣_{x_{0}} \in C_{x_{0}}^{\circ}

Physically, the covector $- d E$ points in the “downhill” direction. Since it lives inside the polar cone, then gravity is actively trying to push the ball through the fence, away from the valid yard. The fence stops the ball from falling further.

Translating to KKT via Pullbacks

We can map this back to our constraint functions $h_{1}, \dots, h_{ℓ}$ . Think of $h = (h_{1}, \dots, h_{ℓ}) : M \to R^{ℓ}$ as a map. Since the rule is that $h (x) \leq 0$ for all $x \in S$ , we have an “admissible set” in $R^{ℓ}$ :

A := {z \in R^{ℓ} ∣ z_{1} \leq 0, \dots, z_{ℓ} \leq 0}

The differential map $d h_{x}$ pushes the complex tangent cone $C_{S, x}$ forward, flattening it into the simple, negative-quadrant cone $C_{A, h_{x}}$ in Euclidean space.

(d h_{x})_{*} C_{S, x} = C_{A, h_{x}}

To find the polar cone on the manifold $M$ , we find the polar cone on the negative quadrant in $R^{ℓ}$ and pullback via the adjoint map $(d h_{x})^{*}$ .

C_{S, x}^{\circ} = (d h_{x})^{*} C_{A, h_{x}}^{\circ}

What is the polar cone of the negative quadrant? By definition, it is a strictly positive quadrant. Therefore, any covector living in polar cone $C_{A, h_{x}}^{\circ}$ must be a vector of positive numbers. Call this positive vector $μ$ where $μ_{j} \geq 0$ for all $j$ . Therefore

- d E = (d h_{x})^{*} [[μ]]

where $μ_{j} \geq 0$ . So, when we write this in standard coordinate notation, the adjoint $(d h_{x})^{*}$ becomes the sum of the constraint gradients, yielding

- d E = j = 1 \sum ℓ μ_{j} d h_{j}

Example 1

Consider the following diagram.

"\\usepackage{amsmath}\n\\usepackage{amssymb}\n\\usepackage{amsfonts}\n\n\\begin{document}\n\\begin{tikzpicture}[>=stealth, scale=1.1]\n\n% 1. Ground Rectangle\n\\fill[gray!50] (0,0) rectangle (11, -1.5);\n\n% 2. Object body (Using hardcoded coordinates for a -30 deg rotation)\n\\draw[fill=purple!15, draw=black] (1.15, 3.0) -- (2.65, 5.6) -- (7.85, 2.6) -- (6.35, 0) -- cycle;\n\n% 3. Internal frame axes\n\\draw (1.9, 4.3) -- (7.1, 1.3);\n\\draw (3.75, 1.5) -- (5.25, 4.1);\n\n% 4. Origin point O\n\\fill (0,0) circle (1.5pt) node[above left] {$\\mathbf{O}$};\n\n% 5. Vectors (Guaranteed to render ON TOP of the purple block)\n\\draw[->, thick] (0,0) -- (4.5, 2.8) node[midway, above left] {$\\mathbf{c}_{\\text{world}}$};\n\\draw[->, thick] (4.5, 2.8) -- (6.35, 0) node[midway, below left] {$\\mathbf{d}_{\\text{body}}$};\n\n% 6. Center and contact points (Rendered last to sit on top of arrow lines)\n\\fill (4.5, 2.8) circle (1.5pt); \n\\fill (6.35, 0) circle (1.5pt); \n\n% 7. Labels and Equations\n\\node at (6.35, 0) [below=6pt] {$\\mathbf{c}_{\\text{world}} + \\mathbf{d}_{\\text{world}}$};\n\\node at (8.2, 2.8) [anchor=west] {$\\mathbf{d}_{\\text{world}} = \\mathbf{R}^{\\theta} \\mathbf{d}_{\\text{body}}$};\n\n\\end{tikzpicture}\n\\end{document}"

source code

Let the configuration space be all center of mass and rotation.

M = {(c_{world}, θ) \in R^{2} \times S^{1}}

Here, $c_{world} = [c_{1}, c_{2}]^{⊤}$ is the position of the center of mass in the world frame, and $θ$ is the rotation angle. We see that $R^{θ}$ is the rotation matrix and $d_{body}$ is the vector from the center of mass to the corner in the body frame. The corner of the box is

x = c_{world} + R^{θ} d_{body}

There is an inequality constraint that the corner must be above the ground. When the corner is contact with the ground, what is the polar tangent cone in center of mass and rotation?

So,

x [x y]_{(c, θ)} = c_{world} + R^{θ} d_{body} = [c_{1} c_{2}] + [cos θ sin θ - sin θ cos θ] [d_{1} d_{2}]

and so the height of the corner is represented by the following. Note that it must always be non-negative (in the air) and is equal to $0$ at the optimal solution (on the ground).

y (c, θ) [\frac{\partial y}{\partial c _{1}} \frac{\partial y}{\partial c _{2}} \frac{\partial y}{\partial θ}] = d_{1} sin θ + d_{2} cos θ + c_{2} = [01 d_{1} cos θ - d_{2} sin θ]

Here we calculated the pushforward which is the Jacobian. The partial derivatives tell us exactly how much the corner moves up or down if we nudge the box horizontally, vertically, or rotatationally.

To find the geometric KKT conditions, we must pull that 1D force $μ$ backward into our 3D configuration space. We do this by multiplying the transposed $(^{*})$ Jacobian (the adjoint map!) by $μ$ . Therefore, the polar tangent cone

in the height space is ${μ \leq θ}$ .
in $(c, θ)$ space is

⎩ ⎨ ⎧ 01 d_{1} cos θ - d_{2} sin θ μ μ \leq 0 ⎭ ⎬ ⎫

This gives the set of vectors

⎩ ⎨ ⎧ 0 μ (d_{1} cos θ - d_{2} sin θ) μ μ \leq 0 ⎭ ⎬ ⎫

where the horizontal force is $0$ . This represents the fact that the ground does not push the box horizontally (i.e. no friction causing it to slide left or right). The vertical force (in the $c_{2}$ axis) is exactly the normal force $μ$ from the ground, which is pushing up. It is the equal and exact opposite of the downward gravity force $- d E$ . The rotational force is on $θ$ axis; it is precisely the torque from the ground pushing up on the corner of angle $θ$ from the ground.

Recall that torque is the cross product of force and distance. The force is $μ$ . The horizontal distance from the center of the mass to the corner is precisely $d_{1} cos θ - d_{2} sin θ$ .

Explorer

kyle's notes

KKT Condition