In calculus, we take derivatives of functions of a few variables. In calculus of variations, we take derivatives of functions of functions (also called functionals). The main purpose of this is to formulate the optimization problem over function spaces and derive its optimality condition (KKT Condition).
These optimality conditions are often differential equations, called Euler-Lagrange Equations, and most physical equations arise as Euler-Lagrange equations of some optimization problem. That way, we can “design one optimal functional” instead of “modeling forces”.
Example 1
Let
M={y:[0,T]→R∣y(0)=y0,y(T)=yT}
be the manifold of all possible valid trajectories of a particle in one dimension, from y0 to yT in time T. A single “point” on this manifold is an entire function y(t). Because it takes infinitely many values (one for each t), we can think of M as an infinite-dimensional manifold.
The problem is, what is the optimality condition for
y∈MminE(y)
If we fixed some p∈M, the tangent spaceTyM represents all valid perturbations we can apply to y without leaving the Manifold. This perturbation is y˚. So, for perturbed path y(t)+εy˚(t)∈M, it must satisfy the fixed boundary conditions. In particular,
TyM={y˚:[0,T]→R∣y˚(0)=0,y˚(T)=0}
Visually, we have
where the blue curve at the bottom represents one example of a perturbation y˚, and the purple curves represent some examples of y.
The optimality condition is
dEy[[y˚]]=0
for all y˚∈TyM, just like the first derivative test for optimality in regular calculus. Geometrically, this means the covector itself is zero. I.e. if we were at the bottom of some “energy bowl”, no matter which direction y˚ we perturb, the energy will not change. Indeed,
0=dEy[[y˚]]=dεdε=0E(y+εy˚)=dεdε=0∫t=0T(21(y′(t)+εy˚′(t))2+cos(y(t)+εy˚(t)))dt=∫t=0T(y′y˚′−sin(y)y˚)dt=−∫t=0T(y′′+sin(y))y˚dtvia integration by parts⟹0=y′′(t)+sin(y(t))
We expanded the definition of directional derivative
We applied the multivariate chain rule since ε appears in both y and y′
We took the partial derivative of L with respect to ε. The use of ⟨,⟩ is just to denote the dot product between the gradient and the perturbation. Here, it is the inner product in Rm.
We applied integration by parts to move the derivative from y˚′ to ∂y′∂L. The boundary term vanishes since y˚(0)=y˚(T)=0.
The Euler-Lagrange equations are foundational to physics. For detailed derivations of Newton’s laws, conservation of energy, and rotational dynamics from variational principles, see Lagrangian Mechanics.
Example 2 (Hanging Chain)
What is the shape of a hanging chain on two poles? It is the curve with the lowest potential energy with fixed total length.
Let the shape be the function graph
y=f(x)
First, we want to describe the physical geometry of the curve. Any section of the chain forms a right triangle microscopically. With the Pythagorean theorem, we have that the length of the chain
ds=dx2+dy2=1+(dxdy)2dx=1+f′2dx
The total length is found by integrating ds along the curve
L(f)=∫abds=∫ab1+f′2dx
The total gravitational potential energy is
E(f)=∫abf1+f′2dx
We must minimize E. However, without any constraints, the chain would drop straight down into infinity (to achieve negative infinite potential energy). So we must constrain it to have a fixed length ℓ. Thus, the constraint is L(f)−ℓ=0. The KKT Condition for minimizing E with constraint L−ℓ=0 says that there exists some λ such that
Consider a graphG=(V,E) where each vertex is assigned a mass mi and position xi. Each edge is assigned a spring with a spring constant ke. The total kinetic energy and potential energy is
K=i∈V∑21mi∣x˙i∣2U=e=(i,j)∈E∑21ke∣xi−xj∣2
The Euler-Lagrange equation for this gives
dtd∂x˙i∂L=−∂xi∂L⟹mix¨i=−∂xi∂U
We can find the derivative with respect to one vertex position
source code
One interesting thing to note is that this is similar to a feed forward neural network. We input raw vertex positions and they transform step by step into a single scalar output U.
The first transformation layer subtracts the position of the source vertex from the position of the destination vertex to get the edge vector.
The second transformation layer normalizes (calculates the magnitude) the edge vector to get the current physical length of the string.
The third transformation layer computes the local energy of each edge via the formula 21keℓe2.
The sum of the individual spring energies gives us the total potential energy of the system.
Now, like in machine learning, we do a backward pass. To find the force on a specific vertex i, we must compute ∂xi∂U. Because the energy was calculated through a chain of operations, we must use the multivariate chain rule to backpropagate the gradient through the computational graph.
On each edge e of the graph, we can compute σe=ke(dxe), which is the stress (stiffness times strain) on that edge. To get the actual force fi actuing on each node, the chain rule requires us to sum up the stress vectors from all the edges that touch that specific node. Indeed,
fi=div(σ)
where div is the graph divergence operator. Thus, the force on a node is the aggregate of the stress from all edges touching that node.
Remark
But of course, since this is just like machine learning, we can efficiently compute the forces on all nodes by matrix multiplications. Truly, we have
where the left side is global Newton’s Second Law F=ma and the right side is the restorative spring forces. Note that K(dx) here is Hooke’s Law (forces on linear springs). The d⊤ is actually the adjoint linear map (recall that the transpose is the adjoint of a linear map with respect to the standard inner product) acting as the discrete gradient. The multiplication of d⊤(Kdx) sums up all the converging spring tensions onto their shared nodes, calculating the net force on each node.
It’s particularly important to mention the Graph Laplacian shown in the diagram. The final system becomes
Mx¨=−Lx
where L encodes the connectivity of the graph and the stiffness of each edge.