Goal
Given a differentiable function , what is the condition for the minimizer of ? What if there are constraints? The general answer is the Karush-Kuhn-Tucker (KKT) condition, which is typically written in a coordinate-heavy form.
Is there a high level understanding of KKT condition using the notion of pullback and pushforward?
Karush-Kuhn-Tucker (KKT) Condition
The problem is that we want to
subject to for and for .
The KKT Theorem states that the optimizer of the above problem satisfies
High Level Understanding of KKT Condition
Imagine you are a ball rolling down a 3D hill and you want to minimize your energy function . Reaching the bottom of the hill is the optimal solution. However, there are some constraints on where you can go. Consider when . This is like saying there are rigid train tracks. The ball must stay exactly on this line. Consider the constraint . These are like fences. The ball can roll freely inside the fenced yard () , but cannot cross the boundary ().
The KKT Theorem states that if the ball is at the optimal solution , those three specific conditions must hold.
- This balances the forces. The gradient is gravity pulling the ball down the hill. Without constraints, the ball would stop where gravity is zero (). However, the train tracks and fences exert forces on the ball. It means the track and the fence are pushing the ball with exact, equal, and opposite forces to balance the gravity. The scalars and are the strength of the forces. The net force must be for the ball to be at rest.
- The second condition says the force from the fence must be pushing inward. If , the force from the fence is pushing outward, which means the ball is trying to cross the fence. This cannot be optimal because the ball can roll freely inside the fenced yard, so it can always find a better solution by rolling inside the yard.
- We know from condition 2 and from the constraint. Since every single term in the sum is non-positive, the only way for the sum to be is that every single term is . In particular, for to equal , one of two must be true.
- The ball is far away from the fence, meaning . Because the ball is not touching the fence, there is no force from the fence, so . Thus the product is .
- The ball is touching the fence, meaning . The product is still . (Note that ).
Definition (Annihilator Subspace)
The annihilator subspace of a subspace is a subspace defined by
It’s like the dual of the null space.
Definition (Polar Cone)
The polar cone of a subset is a subset defined by
Proposition (Convexity of Polar Cone)
The polar cone of any set is a convex cone.
Proposition (Polar Cone of a Subspace)
The polar cone of a subspace is the annihilator space.
Definition (Four Fundamental Subspaces)
The four fundamental subspaces of a linear map are the following four subspaces:
- Kernel
- Image
- Cokernel
- Coimage
Theorem (Fundamental Theorem of Linear Maps)
The four fundamental subspaces of a linear map satisfy the following properties:
-
- The coimage perfectly annihilates the kernel.
-
- The image perfectly annihilates the cokernel.
-
- The cokernel annihilates the image.
-
- The kernel annihilates the coimage. Recall that is the adjoint linear map of , where .
Proof: Use the dual pairing.
Let . We have . Since is adjoint, then
Let be some vector. As , we have RHS is and thus LHS is . In particular, it means as it annihilates the kernel.
The proof is similar for the other three properties.
Unconstrained Optimization
Let be a domain without boundary (e.g. ). Let be a smooth function. The problem is that we want to minimize for all . If is a minimizer, then
Equivalently, .
Equality Constraints
Here, we discuss the “train tracks” and how they affect the optimality condition.
Let be an -dimensional domain without boundary (e.g. M = ). Let be a surface without boundary. Typically,
In that case, is -dimensional. The problem is that we want to minimize for all . For an unconstrained problem on , the condition for an optimal is
Equivalently, .
Now suppose where is a vector space, and is given by
What is in terms of ? We observe that
Therefore by Theorem (Fundamental Theorem of Linear Maps),
The optimality condition is that there exists such that
In coordinate form,
The idea is that if we have found the minimizer point while trapped on a surface , then no matter what direction we take on from (i.e. the tangent space ), the energy function does not change (i.e. ). However, this means that is actually in the annihilator subspace of the tangent space. Via the fundamental theorem of linear maps, we get that is in the image of , such that there exists covector such that . The resulting equation is the classic Lagrange multiplier form.
Inequality Constraints
Here we discuss the “fences” and how they affect the optimality condition.
Definition (Tangent Cone)
Let be a domain without boundary (e.g. ). Let be a surface with boundary. Typically,
The problem is that we want to minimize for all .
For each , we define the tangent cone of at by
A tangent cone may be a subspace, the entire tangent space , or some non-convex cone. If is the minimizer of , then it means that there is no direction to go that decreases . In particular,
or that
which by definition of polar cone implies that
Physically, the covector points in the “downhill” direction. Since it lives inside the polar cone, then gravity is actively trying to push the ball through the fence, away from the valid yard. The fence stops the ball from falling further.
Translating to KKT via Pullbacks
We can map this back to our constraint functions . Think of as a map. Since the rule is that for all , we have an “admissible set” in :
- The differential map pushes the complex tangent cone forward, flattening it into the simple, negative-quadrant cone in Euclidean space.
- To find the polar cone on the manifold , we find the polar cone on the negative quadrant in and pullback via the adjoint map .
What is the polar cone of the negative quadrant? By definition, it is a strictly positive quadrant. Therefore, any covector living in polar cone must be a vector of positive numbers. Call this positive vector where for all . Therefore
where . So, when we write this in standard coordinate notation, the adjoint becomes the sum of the constraint gradients, yielding
Example 1
Consider the following diagram.
Let the configuration space be all center of mass and rotation.
Here, is the position of the center of mass in the world frame, and is the rotation angle. We see that is the rotation matrix and is the vector from the center of mass to the corner in the body frame. The corner of the box is
There is an inequality constraint that the corner must be above the ground. When the corner is contact with the ground, what is the polar tangent cone in center of mass and rotation?
So,
and so the height of the corner is represented by the following. Note that it must always be non-negative (in the air) and is equal to at the optimal solution (on the ground).
Here we calculated the pushforward which is the Jacobian. The partial derivatives tell us exactly how much the corner moves up or down if we nudge the box horizontally, vertically, or rotatationally.
To find the geometric KKT conditions, we must pull that 1D force backward into our 3D configuration space. We do this by multiplying the transposed Jacobian (the adjoint map!) by . Therefore, the polar tangent cone
- in the height space is .
- in space is
This gives the set of vectors
where the horizontal force is . This represents the fact that the ground does not push the box horizontally (i.e. no friction causing it to slide left or right). The vertical force (in the axis) is exactly the normal force from the ground, which is pushing up. It is the equal and exact opposite of the downward gravity force . The rotational force is on axis; it is precisely the torque from the ground pushing up on the corner of angle from the ground.
Recall that torque is the cross product of force and distance. The force is . The horizontal distance from the center of the mass to the corner is precisely .