Least Squares Method
Polynomial Approximation: An algorithm to extract continuous truth from discrete data.
1. Introduction: The Purpose of Polynomial Approximation
Suppose we have $N$ discrete data points $(x_1, y_1), (x_2, y_2), \dots, (x_N, y_N)$ obtained from observations or experiments. The goal of polynomial approximation is to derive an $M$-th degree polynomial $P(x)$ that best represents the underlying trend of this data.
In the Least Squares Method, we search for a coefficient vector $\boldsymbol{a} = (a_0, a_1, \dots, a_M)^T$ that minimizes the sum of squared residuals $S$ between the actual value $y_i$ and the polynomial predicted value $P(x_i)$ at each data point.
2. Derivation of Normal Equations & Reduction to LU Decomposition
To find the extremum where the sum of squares $S$ is minimized, we take the partial derivative of $S$ with respect to each coefficient $a_k \ (k = 0, 1, \dots, M)$ and set the condition to $0$.
$$ \frac{\partial S}{\partial a_k} = -2 \sum_{i=1}^{N} x_i^k \left( y_i - \sum_{j=0}^{M} a_j x_i^j \right) = 0 $$Expanding this and rearranging for $\boldsymbol{a}$ yields the following Normal Equations expressed in matrix form.
Although it may look like a complex nonlinear problem at first glance, each element of the left-hand matrix (such as $\sum x_i^k$) and the right-hand vector are mere constants that can be uniquely calculated from the given data set.
In other words, this problem ultimately reduces to a standard system of linear equations, $\boldsymbol{Ax} = \boldsymbol{b}$. When implementing this in a program, we can apply LU Decomposition to the constructed constant matrix $\boldsymbol{A}$, and by performing forward and backward substitution, we can calculate the unknown coefficients $\boldsymbol{a}$ highly efficiently and deterministically.
3. Concrete Calculation Example (4th-Degree Polynomial)
Let's look at the process of deriving a 4th-degree polynomial ($M=4$) using the least squares method from the following 7 noisy observation data points.
| $i$ | $x_i$ (Input) | $y_i$ (Observed) |
|---|---|---|
| 1 | -3.0 | 47.5 |
| 2 | -2.0 | 7.5 |
| 3 | -1.0 | -0.2 |
| 4 | 0.0 | 1.2 |
| 5 | 1.0 | -0.8 |
| 6 | 2.0 | -4.5 |
| 7 | 3.0 | -0.1 |
First, we calculate the necessary components ($\sum x_i^k$ and $\sum x_i^k y_i$) to construct the normal equations. Since the inputs $x_i$ are symmetrically distributed from $-3$ to $3$ in this case, the sums of odd powers ($\sum x_i, \sum x_i^3, \dots$) will all be $0$.
- $N = 7$
- $\sum x_i^2 = (-3)^2 + \dots + 3^2 = 28$
- $\sum x_i^4 = (-3)^4 + \dots + 3^4 = 196$
- $\sum x_i^6 = 1588, \quad \sum x_i^8 = 13636$
- $\sum y_i = 47.5 + 7.5 - 0.2 + 1.2 - 0.8 - 4.5 - 0.1 = 50.6$
- $\sum x_i y_i = -167.4$
- $\sum x_i^2 y_i = 437.6$
- $\sum x_i^3 y_i = -1381.8$
- $\sum x_i^4 y_i = 3886.4$
Applying these constants to the matrix $\boldsymbol{A}\boldsymbol{a} = \boldsymbol{b}$ of the normal equations gives us the following $5 \times 5$ system of linear equations.
Now, all we have to do is solve this coefficient matrix using an algorithm like LU decomposition. As a result of the calculation, we obtain the following coefficient vector.
Derived 4th-Degree Polynomial
$P(x) = 1.057 + 1.037x - 2.012x^2 - 1.006x^3 + 0.501x^4$
The graph below plots the observed data (red points) and the approximate curve of the 4th-degree polynomial derived by LU decomposition (blue line). It can be visually confirmed that it brilliantly captures the underlying trend of the data while smoothing out the effects of noise.