JP / EN
Numerical Analysis

Least Squares Method

Polynomial Approximation: An algorithm to extract continuous truth from discrete data.

1. Introduction: The Purpose of Polynomial Approximation

Suppose we have $N$ discrete data points $(x_1, y_1), (x_2, y_2), \dots, (x_N, y_N)$ obtained from observations or experiments. The goal of polynomial approximation is to derive an $M$-th degree polynomial $P(x)$ that best represents the underlying trend of this data.

$$ P(x) = a_0 + a_1 x + a_2 x^2 + \dots + a_M x^M = \sum_{j=0}^{M} a_j x^j $$

In the Least Squares Method, we search for a coefficient vector $\boldsymbol{a} = (a_0, a_1, \dots, a_M)^T$ that minimizes the sum of squared residuals $S$ between the actual value $y_i$ and the polynomial predicted value $P(x_i)$ at each data point.

$$ S = \sum_{i=1}^{N} (y_i - P(x_i))^2 $$

2. Derivation of Normal Equations & Reduction to LU Decomposition

To find the extremum where the sum of squares $S$ is minimized, we take the partial derivative of $S$ with respect to each coefficient $a_k \ (k = 0, 1, \dots, M)$ and set the condition to $0$.

$$ \frac{\partial S}{\partial a_k} = -2 \sum_{i=1}^{N} x_i^k \left( y_i - \sum_{j=0}^{M} a_j x_i^j \right) = 0 $$

Expanding this and rearranging for $\boldsymbol{a}$ yields the following Normal Equations expressed in matrix form.

$$ \begin{pmatrix} N & \sum x_i & \sum x_i^2 & \cdots & \sum x_i^M \\ \sum x_i & \sum x_i^2 & \sum x_i^3 & \cdots & \sum x_i^{M+1} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \sum x_i^M & \sum x_i^{M+1} & \sum x_i^{M+2} & \cdots & \sum x_i^{2M} \end{pmatrix} \begin{pmatrix} a_0 \\ a_1 \\ \vdots \\ a_M \end{pmatrix} = \begin{pmatrix} \sum y_i \\ \sum x_i y_i \\ \vdots \\ \sum x_i^M y_i \end{pmatrix} $$
Solving as a System of Linear Equations:
Although it may look like a complex nonlinear problem at first glance, each element of the left-hand matrix (such as $\sum x_i^k$) and the right-hand vector are mere constants that can be uniquely calculated from the given data set.
In other words, this problem ultimately reduces to a standard system of linear equations, $\boldsymbol{Ax} = \boldsymbol{b}$. When implementing this in a program, we can apply LU Decomposition to the constructed constant matrix $\boldsymbol{A}$, and by performing forward and backward substitution, we can calculate the unknown coefficients $\boldsymbol{a}$ highly efficiently and deterministically.

3. Concrete Calculation Example (4th-Degree Polynomial)

Let's look at the process of deriving a 4th-degree polynomial ($M=4$) using the least squares method from the following 7 noisy observation data points.

$i$ $x_i$ (Input) $y_i$ (Observed)
1-3.047.5
2-2.07.5
3-1.0-0.2
40.01.2
51.0-0.8
62.0-4.5
73.0-0.1

First, we calculate the necessary components ($\sum x_i^k$ and $\sum x_i^k y_i$) to construct the normal equations. Since the inputs $x_i$ are symmetrically distributed from $-3$ to $3$ in this case, the sums of odd powers ($\sum x_i, \sum x_i^3, \dots$) will all be $0$.

  • $N = 7$
  • $\sum x_i^2 = (-3)^2 + \dots + 3^2 = 28$
  • $\sum x_i^4 = (-3)^4 + \dots + 3^4 = 196$
  • $\sum x_i^6 = 1588, \quad \sum x_i^8 = 13636$
  • $\sum y_i = 47.5 + 7.5 - 0.2 + 1.2 - 0.8 - 4.5 - 0.1 = 50.6$
  • $\sum x_i y_i = -167.4$
  • $\sum x_i^2 y_i = 437.6$
  • $\sum x_i^3 y_i = -1381.8$
  • $\sum x_i^4 y_i = 3886.4$

Applying these constants to the matrix $\boldsymbol{A}\boldsymbol{a} = \boldsymbol{b}$ of the normal equations gives us the following $5 \times 5$ system of linear equations.

$$ \begin{pmatrix} 7 & 0 & 28 & 0 & 196 \\ 0 & 28 & 0 & 196 & 0 \\ 28 & 0 & 196 & 0 & 1588 \\ 0 & 196 & 0 & 1588 & 0 \\ 196 & 0 & 1588 & 0 & 13636 \end{pmatrix} \begin{pmatrix} a_0 \\ a_1 \\ a_2 \\ a_3 \\ a_4 \end{pmatrix} = \begin{pmatrix} 50.6 \\ -167.4 \\ 437.6 \\ -1381.8 \\ 3886.4 \end{pmatrix} $$

Now, all we have to do is solve this coefficient matrix using an algorithm like LU decomposition. As a result of the calculation, we obtain the following coefficient vector.

Derived 4th-Degree Polynomial

$P(x) = 1.057 + 1.037x - 2.012x^2 - 1.006x^3 + 0.501x^4$

The graph below plots the observed data (red points) and the approximate curve of the 4th-degree polynomial derived by LU decomposition (blue line). It can be visually confirmed that it brilliantly captures the underlying trend of the data while smoothing out the effects of noise.