Nonlinear Equations#
Information |
Details |
---|---|
Lead Author |
Hantao Cui |
Learning Objectives |
⢠Understand nonlinear equation fundamentals |
Prerequisites |
Python basics, NumPy knowledge |
Estimated Time |
120 minutes |
Topics |
Nonlinear equations, Newton-Raphson method, numerical methods, root finding, residual functions, SciPy solvers |
Introduction to Nonlinear Equations#
Nonlinear equations are ubiquitous in scientific computing. In power systems, AC power flow equations are nonlinear equations.
The standard form of nonlinear equations is:
where f
is a nonlinear function and x
is the variable. In other
words, solving nonlinear equations means finding the roots of a function. This
function is called the residual function, which will be small enough (zero
at the tolerance level) when the solution is found.
How Computers Solve Equations#
Itâs important to understand that computers donât directly âsolveâ equations in the way humans might algebraically manipulate them. Instead, computers find solutions through numerical methods that locate where functions equal zero. This key insight is fundamental to computational approaches:
To solve an equation f(x) = g(x), computers reformulate it as finding roots of f(x) - g(x) = 0.
This reformulated function is called a residual function, which should approach zero (within tolerance) when a solution is found.
For example, to solve the equation x² = 4, a computer actually finds where the residual function r(x) = x² - 4 equals zero. The roots of this residual function (x = 2 and x = -2) are the solutions to the original equation.
Standard Form of Nonlinear Equations#
The standard form used in numerical computing is:
where:
\(f\) is a nonlinear function (or system of functions)
\(x\) is the variable (or vector of variables) weâre solving for
When \(f(x)\) has multiple input variables, weâre dealing with a system of nonlinear equations, which is written as:
Importance of the Residual Function#
The residual function measures how far we are from a solution. A perfect solution would give a residual of exactly zero, but in numerical computing, we typically accept solutions where the residual is smaller than some tolerance (e.g., 10âťâś).
As we work through numerical solution methods, keep this concept in mind: weâre not âsolvingâ equations directly - weâre finding inputs that make the residual function sufficiently close to zero.
Note that for the function argument x
, it is implicitly a 1D NumPy array, although a list of numbers will also work.
Theoretical Approaches (Newton Method)#
The Newton-Raphson method is one of the most powerful techniques for solving nonlinear equations. Before we explore fsolve
in SciPy, itâs worth understanding this fundamental algorithm, as many nonlinear solvers use variations of it.
Basic Concept#
Newton-Raphson leverages linear approximations to iteratively approach the solution of nonlinear equations. For a single equation \(f(x) = 0\), the iteration is given by:
Where \(f'(x_k)\) is the derivative evaluated at the current iteration. This formula can be derived by approximating the function with its tangent line and finding where this line crosses the x-axis.
Visual Example#
Letâs visualize Newton-Raphson for finding a root of \(f(x) = \cos(x) - x/2\), which has more curvature and shows the iterations more clearly:
The residual function \(f(x)\) and the gradient \(f'(x)\) are given below:
import numpy as np
def f(x):
return np.cos(x) - x / 2 # A function with more curvature
def df(x):
return -np.sin(x) - 1 / 2 # Derivative of f(x)
def newton_step(x):
return x - f(x) / df(x)
Iteration 1: x = 2.00000000, f(x) = -1.41614684
Iteration 2: x = 0.99513984, f(x) = 0.04681567
Iteration 3: x = 1.03010731, f(x) = -0.00032681
Iteration 4: x = 1.02986654, f(x) = -0.00000001
Iteration 5: x = 1.02986653, f(x) = 0.00000000

This example clearly shows how Newton-Raphson uses tangent lines at each iteration to find the next approximation to the root. Note how the convergence accelerates as we approach the solution near x â 1.03.
Extension to Systems of Equations#
For systems of nonlinear equations, the Newton-Raphson method extends naturally by using the Jacobian matrix:
Where:
\(\mathbf{x}_k\) is the vector of variables at iteration \(k\)
\(\mathbf{f}(\mathbf{x}_k)\) is the vector of function values (residuals)
\(J(\mathbf{x}_k)\) is the Jacobian matrix of partial derivatives
In practice, we donât explicitly calculate the inverse. Instead, we solve the linear system:
And then update: \(\mathbf{x}_{k+1} = \mathbf{x}_k + \Delta \mathbf{x}_k\)
Two-Equation Example#
Consider this problem from scipy.optimize.fsolve
documentation:
We can define the residual function and the analytical Jacobian matrix (more on this later) as follows.
def residual(x):
return np.array([x[0] * np.cos(x[1]) - 4, x[0] * x[1] - x[1] - 5])
def J(x):
return np.array([[np.cos(x[1]), -x[0] * np.sin(x[1])], [x[1], x[0] - 1]])
Next, we write the Newton method as a function, using an existing linear matrix solver from NumPy.
from numpy.linalg import solve
def newton_raphson(F, J, x0, tol=1e-6, max_iter=20):
"""
Solve a system of nonlinear equations using Newton-Raphson method.
Parameters
----------
F - Function that returns the residuals
J - Function that returns the Jacobian matrix
x0 - Initial guess
tol - Convergence tolerance
max_iter - Maximum number of iterations
Returns
-------
x - Solution vector
iterations - Number of iterations performed
"""
x = x0.copy()
for iter in range(max_iter):
f = F(x)
# Early termination if tolerance met
if np.linalg.norm(f) < tol:
return x, iter
jacobian = J(x)
try:
dx = solve(jacobian, -f)
except np.linalg.LinAlgError:
print("Jacobian matrix is singular.")
return x, iter
x = x + dx
print(f"Iteration {iter+1}: ||f(x)|| = {np.linalg.norm(f):.8f}")
print("Maximum iterations reached without convergence.")
return x, max_iter
# Initial guess
x0 = np.array([1.0, 1.0])
# Solve the system
solution, iterations = newton_raphson(residual, J, x0)
print(f"\nSolution: x = {solution[0]:.6f}, y = {solution[1]:.6f}")
print(f"Check: fâ = {residual(solution)[0]:.8f}, fâ = {residual(solution)[1]:.8f}")
Iteration 1: ||f(x)|| = 6.08025560
Iteration 2: ||f(x)|| = 4.91726997
Iteration 3: ||f(x)|| = 2.10781410
Iteration 4: ||f(x)|| = 0.38973132
Iteration 5: ||f(x)|| = 0.00769602
Iteration 6: ||f(x)|| = 0.00000361
Solution: x = 6.504097, y = 0.908414
Check: fâ = -0.00000000, fâ = 0.00000000
Convergence Properties#
The Newton-Raphson method has several important convergence characteristics:
Quadratic Convergence: Near a solution, the error approximately squares with each iteration. This means the number of correct digits roughly doubles each step, leading to very fast convergence once youâre close enough.
Initial Guess Sensitivity: The method can be sensitive to the starting point:
With a good initial guess, convergence is rapid
With a poor initial guess, the method might converge slowly or even diverge
Near points where the derivative is zero, the method can behave erratically
Convergence Criteria: In practice, we typically check both:
The residual norm: \(||\mathbf{f}(\mathbf{x}_k)|| < \epsilon_f\)
The step size: \(||\Delta \mathbf{x}_k|| < \epsilon_x\)
Numerical Solution using SciPy#
Basic Usage#
In practice, we can use off-the-shelf solvers in Python for simple problems.
fsolve
is a function in scipy.optimize
for root-finding. We can check the
docstring by typing ?fsolve
in a Jupyter cell or visiting the documentation.
from scipy.optimize import fsolve
x0 = np.array([1, 1])
sol = fsolve(residual, x0)
print(sol)
[6.50409711 0.90841421]
np.isclose(residual(sol), [0.0, 0.0])
# The residual at the solution is expected to be close to 0.0.
array([ True, True])
Jacobian Matrix#
SciPyâs fsolve
uses the Powellâs hybrid
method, which combines
Newton-Raphson steps for fast convergence when near the solution
Gradient descent characteristics for improved stability when far from the solution
This hybrid approach retains the rapid convergence of Newton-Raphson while improving reliability for a wider range of initial guesses. This method requires the Jacobian matrix.
Calculating the Jacobian matrix is computationally expensive. By default,
fsolve
uses the hybrd
method in MINPACK, which calculates the numerical Jacobian
matrix using divided difference. For each column of the Jacobian matrix, two
evaluations of the residual functions are needed, resulting in a total of 2N
residual evaluations for an N-dimensional problem.
Providing the Jacobian matrix can speed up the calculation. This is one of the reasons why texts on power system analysis always show Jacobian matrix.
fsolve
allows specifying the function that provides the Jacobian matrix
through the fprime
argument.
To clarify, the Jacobian function is the first-order derivatives of the residual function w.r.t the unknown variables. It is not necessarily the derivative of the original function w.r.t the unknown variables.
In our example, the residual function is:
And since, the unknowns are
Therefore, the analytical Jacobian matrix is:
It is called the analytical Jacobian matrix because it is derived based on calculus.
def jacobian_analytical(x):
return np.array([[np.cos(x[1]), -x[0] * np.sin(x[1])], [x[1], x[0] - 1]])
sol = fsolve(residual, x0, fprime=jacobian_analytical)
print(sol)
np.isclose(
residual(sol), [0.0, 0.0]
) # the residual at the solution should be almost 0.0.
[6.50409711 0.90841421]
array([ True, True])
In this small example, we cannot see the performance gain from providing the Jacobian matrix. However, this will be illustrated by a power flow problem.
Numerical Jacobian Calculation#
When we donât specify the Jacobian matrix, fsolve
(which uses MINPACK under the hood) approximates it using a divided difference approach. Each element of the Jacobian matrix is calculated as:
where:
\(J_{ij}\) is the \((i,j)\) element of the Jacobian matrix
\(f_i\) is the \(i\)-th function in the system
\(x_j\) is the \(j\)-th variable
\(h_j\) is a small step size for variable \(j\)
\(e_j\) is the unit vector in the direction of \(x_j\)
Letâs implement this numerical approximation to see how it works:
def jacobian_numerical(func, x, eps=1e-6):
"""
Calculate the Jacobian matrix using finite differences.
"""
n = len(x)
f0 = func(x)
m = len(f0)
J = np.zeros((m, n))
for j in range(n):
# Create a copy of x with a small perturbation in the j-th element
x_perturbed = x.copy()
x_perturbed[j] += eps
# Calculate the perturbed function values
f1 = func(x_perturbed)
# Compute the j-th column of the Jacobian
J[:, j] = (f1 - f0) / eps
return J
# Example: Calculate the Jacobian for our simple function
x = np.array([1.0, 1.0])
J = jacobian_numerical(residual, x)
print("Numerical Jacobian matrix:")
print(J)
# Compare with the analytical Jacobian at the same point
J_analytical = jacobian_analytical(x)
print("\nAnalytical Jacobian matrix:")
print(J_analytical)
Numerical Jacobian matrix:
[[ 0.54030231 -0.84147125]
[ 1. 0. ]]
Analytical Jacobian matrix:
[[ 0.54030231 -0.84147098]
[ 1. 0. ]]
Trade-offs of Numerical Jacobian#
While calculating the Jacobian numerically is convenient, it has several drawbacks:
Computational Cost: For a system with \(n\) variables, computing the numerical Jacobian requires at least \(n\) additional function evaluations.
Accuracy: The accuracy of the approximation depends on the choice of step size. Too small may lead to numerical errors due to floating-point precision, while too large may not capture the local derivative correctly.
Performance Impact: For large systems like power flow equations, providing an analytical Jacobian can significantly improve computational efficiency.
Handling Convergence Challenges#
Convergence#
We talked about âconvergenceâ, but what does it mean? Intuitively, it means that the solution does not change much between iterations. That can be seen from the previous figure.
Typically, convergence is checked on the variable between consecutive
iterations. For a specified xtol
, if the solution satisfies \(|x_{k+1} - x_k| <
\text{xtol}\), the solution is deemed converged. This is what is checked by
fsolve
.
In addition, the residual should be small enough when converged. The tolerance
is often named ftol
. If the residual satisfies \(|f(x_{k+1})| < \text{ftol}\), the solution
is deemed converged.
It is beneficial to check both of them if you are writing a custom solver.
Initial Guess#
Initial guess plays an important role in the solver convergence. An inproper initial guess may result in failure to converge. Such guesses are said to be not in the region of attraction of the solution.
For example, if we use the initial guess [0, 0]
, the solver will fail to
converge.
sol = fsolve(residual, [0, 0])
/tmp/ipykernel_57937/2019346469.py:1: RuntimeWarning: The iteration is not making good progress, as measured by the
improvement from the last ten iterations.
sol = fsolve(residual, [0, 0])
The warning message indicates that iterations are not converging in the last ten steps.
In other cases, a different initial guess may result in a different solution. Consider the simple equation
Starting from 1.2
and -1.2
will converge to 1
and -1
, respectively.
def simple_residual(x):
return np.array([x[0] ** 2 - 1])
sol = fsolve(simple_residual, [1.2])
print("Starting from 1.2, the solution is:", sol)
sol = fsolve(simple_residual, [-1.2])
print("Starting from -1.2, the solution is:", sol)
Starting from 1.2, the solution is: [1.]
Starting from -1.2, the solution is: [-1.]
Storing History using Python decorator#
Letâs get into fsolve
to print out the residual at the solution.
First, letâs enable full_output=True
to get some extra information, such as the number of
iterations. Options like full_output
and the corresponding output format can be looked up in
SciPy documentation.
sol, infodict, ier, mesg = fsolve(residual, [0, 0], full_output=True)
print(f"Number of iterations: {infodict['nfev']}")
Number of iterations: 33
Now letâs store the x
and the residual at each step. We will implement it using
a technique called decorator.
Decorators can be used to modify the behavior of a function. They are often used to add extra functionality to a function.
A decorator is a function which takes a function as input and returns a modified function.
residual_history = []
x_history = []
# `with_history` is a decorator to wrap the callback function
def with_history(func): # `func` is the function to modify (wrap)
def wrapper(*args, **kwargs): # `wrapper` is the modified function to be returned
residual_history.append(residual(args[0]))
x_history.append(args[0])
return func(*args, **kwargs)
return wrapper
## TODO: fix me
Below is the basic way to use the decorator â just call the decorator with the
function to be modified. In this way, we did not modify any of the residual
code but added extra functionality. Decorators are most powerful when
you have a long function needing new functionality with only external
modifications.
residual_with_history = with_history(residual)
sol = fsolve(residual_with_history, [0, 0])
/tmp/ipykernel_57937/865456783.py:3: RuntimeWarning: The iteration is not making good progress, as measured by the
improvement from the last ten iterations.
sol = fsolve(residual_with_history, [0, 0])
# plot the residual history and `x` history in two figures, side by side
import matplotlib.pyplot as plt
import seaborn as sns
def plot_history(residual_history, x_history):
plt.figure(figsize=(8, 3))
plt.subplot(1, 2, 1)
plt.plot(residual_history)
plt.xlabel("Iteration")
plt.ylabel("Residual")
plt.title("Residual History")
plt.subplot(1, 2, 2)
plt.plot(x_history)
plt.xlabel("Iteration")
plt.ylabel("x")
plt.title("x History")
plot_history(residual_history, x_history)

You can tell that the residual does not converge to zero, and the x
are
jumping back and forth.
Now letâs check the residual and x
using initial guess [1, 1]
.
# don't forget to reset the history
residual_history = []
x_history = []
sol = fsolve(residual_with_history, [1, 1])
plot_history(residual_history, x_history)

More on Decorators
Decorators can be reused to wrap different functions. Suppose we have a
different residual function for which the history is needed. In addition to
writing out the residual function, we can use @with_history
to wrap it. This
is a syntax sugar provided by Python.
@with_history
def residual2(x):
return [x[0]**2 + x[1]**2 - 1,
x[0] + x[1] - 1]
Conclusion#
While numerical Jacobian calculation offers convenience, analytical Jacobians provide better performance for complex systems. Understanding both approaches gives you flexibility:
Use numerical Jacobians for rapid prototyping and simple problems
Implement analytical Jacobians for production code and large-scale systems where performance matters
The choice depends on your specific needs in balancing development time against execution speed.
For our nonlinear equation solving needs, itâs valuable to know that even without providing a Jacobian, we can still find solutions efficiently using tools like fsolve
. However, as problems become complex, the analytical Jacobian can be helpful for computational speed up.