Nonlinear matrix problem

Problem

Consider the nonlinear matrix problem

\[ \left[\left(\begin{matrix} 1 & 2 \\ 3 & 4 \end{matrix}\right) + \vert \vert x \vert \vert \left(\begin{matrix} 0 & 1 \\ -1 & 0 \end{matrix}\right) \right] \mathbf{x} = \left(\begin{matrix} 0 \\ \frac{1}{2} \end{matrix}\right) \] Construct a sequence of approximate solutions \(x_n \) by solving the linear problems

\[ \left[\left(\begin{matrix} 1 & 2 \\ 3 & 4 \end{matrix}\right) + \vert \vert x_{n-1} \vert \vert \left(\begin{matrix} 0 & 1 \\ -1 & 0 \end{matrix}\right) \right] \mathbf{x} = \left(\begin{matrix} 0 \\ \frac{1}{2} \end{matrix}\right) \ \ , \ \ \mathbf{x}_0 = \left(\begin{matrix} 0 \\ 0 \end{matrix}\right) \]

What happens if the rhs is \(\left(\begin{matrix} 0 \\ 1 \end{matrix}\right)\) instead of \(\left(\begin{matrix} 0 \\ \frac{1}{2} \end{matrix}\right)\) ?

Theory

Fix Point Iteration (Power Iteration)

If we apply the matrix \(\mathbf{A} \) to an arbitrary vector \(\mathbf{y}\), what happens?

First we can expand the vector in eigenbasis \[ \mathbf{y} = \sum_{n} \alpha_n \mathbf{x}_n \] and if applying to the matrix \(\mathbf{A} \) \[ \mathbf{A} \mathbf{y} = \sum_{n} \alpha_n \mathbf{A} \mathbf{x}_n = \sum_{n} \alpha_n \lambda_n \mathbf{x}_n \] So applying a matrix to an arbitrary vector amplifies the eigenvectors with large eigenvalues that contribute to the vector.

The algorithm becomes:

Pick an arbitrary normalized vector \(\mathbf{y}\), which means \(\vert \vert \mathbf{y} \vert \vert = 1\)
Calculate \( \mathbf{x} = \mathbf{A} \mathbf{y} \) and normalize it, so \(\frac{\mathbf{x}}{\vert \vert \mathbf{x} \vert \vert} \rightarrow \mathbf{x} \)
If the change \( \vert \vert \mathbf{x} - \mathbf{y} \vert \vert \) is sufficiently small, then \(\mathbf{x}\) is an approximate eigenvector and terminate (other termination conditions possible). Otherwise set \(\mathbf{y} = \mathbf{x}\) and go back to step 2.
From the eigenvector find the corresponding eigenvalue \(\lambda \) via the Rayleigh quotient, which is the fraction of two inner products \[ \lambda = \frac{\langle \mathbf{y} , \mathbf{A} \mathbf{y} \rangle}{\langle \mathbf{y} , \mathbf{y} \rangle} \]

Fix point iteration will find the eigenvalue with the greatest norm that was contained in the original starting vector.

Inverse Power Iteration

If we wish to change the eigenvalues but not the eigenvectors of a matrix (so-called spectral transformations), we can apply an matrix inversion \[\mathbf{A} \mathbf{x} = \lambda \mathbf{x} \ \ \rightarrow \ \ \mathbf{A}^{-1} \mathbf{x} = \lambda^{-1} \mathbf{x} \] So we apply the fix point iteration to the inverse matrix \(\tilde{\mathbf{A}} = \mathbf{A}^{-1} \).

So that means, that the largest eigenvalue \(\tilde{\lambda} \) of \(\mathbf{\tilde{A}} \) is always the smallest eigenvalue

Shift-and-Invert (Shifted Inverse Iteration)

Another spectral transformation is adding a constant \(\mu \) to the diagonal: \[\mathbf{A} \mathbf{x} = \lambda \mathbf{x} \ \ \rightarrow \ \ \left(\mathbf{A} + \mu \mathbf{I} \right) \mathbf{x} = \mathbf{A} \mathbf{x} + \mu \mathbf{x} = \lambda \mathbf{x} + \mu \mathbf{x} = \left(\lambda + \mu \right) \mathbf{x} \] We can combine the spectral transformation by adding the constant \(\mu \) to the diagonal and then invert that matrix. \[ \tilde{\mathbf{A}} = \left(\mathbf{A} - \mu \mathbf{I} \right)^{-1} \] The largest eigenvalue \(\tilde{\lambda} \) of \(\mathbf{\tilde{A}} \) is the eigenvalue \(\lambda = \mu + \tilde{\lambda}^{-1} \) of \(\mathbf{A} \) that is closest to \(\mu \)

Implementation

This implementation involves three iterative methods for finding eigenvalues and eigenvectors of a matrix \(\mathbf{A}\)

Fix Point Iteration (Power Iteration): To find the highest eigenvalue
Inverse Power Iteration: To find the smallest eigenvalue
Shift-and-Invert (Shifted Inverse Iteration): To find the eigenvalue closest to a constant \(\mu \)

The theories and algorithms for these 3 methods are described in the Theory section.

The different considerations for all the algorithms are:

Matrix Initialization: e.g. for Python, we can initialize the matrix \(\mathbf{A}\) using NumPy's np.array function
Random Initial Vector: We have to initialize a random vector \(\mathbf{y}\). And it has to be normalized before starting the iterations to ensure numerical stability.
Normalization: In each iteration, the vector \(\mathbf{y}\) is normalized to prevent numerical overflow or underflow, which ensures the algorithm's stability.
Eigenvalue Calculation: To calculate the eigenvalue \(\lambda\), we calculate the Rayleigh quotient (see Theory Section), This is a fraction between two inner products. The equation can be written in a simpler way (more on that later).
Iteration and Convergence: You can decide between two approaches. Either by iterate for a fixed number and see if it converges or make a condition for a given change in the eigenvalue (for the most precise, use machine precision) and if met, then terminate.

The specific considerations for the Inverse Power Iteration and Shift-and-Invert are

Matrix Inversion (Inserve Iteration): For the inverse iteration, the inverse of the matrix \(\mathbf{A}\) is calculated using a library, e.g. in Python using np.linalg.inv. Ensure that the matrix \(\mathbf{A}\) is non-singular (i.e., it has an inverse).
Matrix Inversion (Shift-and-Invert): For the shift-and-invert iteration, the matrix \( \left(\mathbf{A} - \mu \mathbf{I}\right) \) is inverted. The shift parameter \(\mu\) should be chosen, such that \( \left(\mathbf{A} - \mu \mathbf{I}\right) \) is invertible.

What I did:

Matrix and Random Vector Inilization: For the matrix \(\mathbf{A}\) just write it (for Python: np.array() ) and for the random vector \(\mathbf{y}\) use some kind of library for the random vector (for Python: np.random.rand() ).
Normalization: Use som kind of library for normalization of vectors (for Python: np.linalg.norm() )
Eigenvalue Calculation: The Rayleigh quotient for the calculation of the eigenvalue \(\lambda \) was originally stated in the Theory Section to be: \[ \lambda = \frac{\langle \mathbf{y} , \mathbf{A} \mathbf{y} \rangle}{\langle \mathbf{y} , \mathbf{y} \rangle} \] But inner product \(\langle \mathbf{A} , \mathbf{B} \rangle \) can also be written as the transpose of the first vector/matrix dotted with the second vector/matrix \(\mathbf{A}^{\intercal} \mathbf{B}\).
So using this we can rewrite the Rayleigh quotient:
\[ \lambda = \frac{\mathbf{y}^{\intercal} \mathbf{A} \mathbf{y}}{\mathbf{y}^{\intercal} \mathbf{y}} \]
Iteration and Convergence: For this simple exercise, just use a fixed number of iterations, since it converges fast.
Matrix Inversion and the shift \(\mu\): The matrix inversion is done by using a library (for Python np.linalg.inv() ) and the shift \(\mu\) must be chosen wisely. We have to look at the eigenvalues again. The shift-and-invert has to find the negative eigenvalue, so we have to choose the shift \(\mu \) to be closest to \(\lambda_4 \approx - 1.4668 \).

Results

Implementing the three iterations methods (variations os Fix Point Iterations):

Fix Point Iteration (Power Iteration): To find the highest eigenvalue
Inverse Power Iteration: To find the smallest eigenvalue
Shift-and-Invert (Shifted Inverse Iteration): To find the eigenvalue closest to a constant \(\mu \)

For the matrix \(\mathbf{A}\) \[ \mathbf{A} = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 1 & 5 & 6 & 7 \\ 1 & 1 & 8 & 9 \\ 1 & 1 & 1 & 0 \end{pmatrix} \] with approximate eigenvalues \(\lambda \) \[\lambda_1 \approx 11.8789 \] \[\lambda_2 \approx 2.8673 \] \[\lambda_3 \approx - 1.4668 \] \[\lambda_4 \approx 0.7206 \] The convergence plot (with 10 iterations) showing the three different iteration methods are shown in Figure 1.

The results from the command line are shown in Figure 2 and corresponds with the given values of \(\lambda \)

Figure 1: Plot of basic fix point iterations.

Results of ... — Figure 2: Results of basic fix point iterations.

For the matrix \(\mathbf{B}\) \[ \mathbf{B} = \frac{1}{5} \left(\begin{matrix} 3 & 4 & 0 \\ -4 & 3 & 0 \\ 0 & 0 & 5 \end{matrix} \right) \]

The convergence plot (with 10 iterations) showing the three different iteration methods are shown in Figure 3. The results from the command line are shown in Figure 4

Figure 3: Plot of basic fix point iterations.

Figure 4: Results of basic fix point iterations.

Code

Show/Hide Python Code

import numpy as np
import matplotlib.pyplot as plt

# Parameters
deflation = 0.1

# Matrices and vectors
A = np.array([[1, 2], [3, 4]])
B = np.array([[0, 1], [-1, 0]])

x = np.array([1, -1])
b = np.array([0, 0.5])
# b = np.array([0, 1])

history = []

# Iterative process
for ii in range(10):
    x = np.linalg.inv(A + np.linalg.norm(x) * B).dot(b)
    # x = (1 - deflation) * x + deflation * np.linalg.inv(A + np.linalg.norm(x) * B).dot(b)
    history.append(x)

# Convert history to a numpy array for plotting
history = np.array(history).T

# Plotting the history
plt.plot(history[0],'ko-' , label='x1')
plt.plot(history[1],'bo-', label='x2')
plt.xlabel('Iteration')
plt.ylabel('Value')
plt.legend()
plt.show()

# Uncomment the following lines to plot the semilog graph of the differences
# diff_history = np.diff(history)
# plt.semilogy(np.abs(diff_history[0]), label='diff x1')
# plt.semilogy(np.abs(diff_history[1]), label='diff x2')
# plt.xlabel('Iteration')
# plt.ylabel('Absolute Difference')
# plt.legend()
# plt.show()

# Print the final value of x
print(history[:, -1])

...

Intro

Fix Point Iteration (Power Iteration)

Inverse Power Iteration

Shift-and-Invert (Shifted Inverse Iteration)

Code