|
The algorithm for LU factorization is given in
Figure 1.4 and the graphical representation of the
algorithm is given in Figure 1.5, as it would look part
way through the computation. The black square represents the current
pivot element. The horizontal shaded rectangle is a row from the upper
triangle of the matrix. The vertical shaded rectangle is a column from
the lower triangle of the matrix. The L and U labeled regions are
the portions of the matrix that have already been updated by the
algorithm. The algorithm has not yet reached the region labeled A'.
Figure 1.4:
LU factorization pseudo-code.
- find maximum element in column(i) (starting below the diagonal)
- swap the row of maximum element with row(i)
- scale column(i) by 1/A(i,i)
- let A' = A(i:N,i:N)
- A' <- A' + x y^T (rank one update)
|
Figure 1.5:
Diagram for LU factorization.
|
We create an upper and lower triangle view of the original matrix
A. We use typedefs to create the specific triangular
matrix type from the MTL class triangle_view. We then create the U and
L matrix objects and pass the matrix A to their
constructors. The L and U objects are just handles,
so their creation is inexpensive (constant time complexity). Every
component and algorithm in MTL is inside the mtl:: namesapce.
If you prefer not to have to specify the namespace each time you can
do using namespace mtl;.
typedef mtl::row_type<Matrix>::type RowMatrix;
typedef mtl::column_type<Matrix>::type ColumnMatrix;
typedef mtl::triangle_view<RowMatrix, unit_upper>::type Unit_Upper;
typedef mtl::triangle_view<ColumnMatrix, unit_lower>::type Unit_Lower;
Unit_Upper U(A);
Unit_Lower L(A);
We can access the rows and columns from L and U
through their TwoD iterators, which are created with the
begin() method. Dereferencing the TwoD iterators gives a row
or column.
Unit_Upper::iterator rowi = U.begin();
Unit_Lower::iterator columni = L.begin();
The first operation in the LU factorization is to find the maximum
element in the column. When we search for the maximum element, we want
to include the element in the pivot row, so we cannot use the
L triangle object defined above since it is unit
diagonal. Instead we create another lower triangle D. We can
then simply apply the max_index() function to the
dcolumn object.
typedef mtl::triangle_view<Matrix, lower>::type D(A);
Lower::iterator dcoli = D.begin();
jp = mtl::max_index(*dcoli);
The next operation in the LU factorization is to swap the current row
with the row that has the maximum element. The following code snippet
shows how to call the swap() function to interchange the
rows. The pivot variable is the row index for the current
pivot element. Note how rows from the matrix A are being
used as arguments in the swap() algorithm. In MTL, rows
and columns are first class vector objects (with some caveats for
sparse rows and columns). Similarly, submatrices in MTL are first
class matrix objects, and can be used in the same way as a matrix.
if (jp != j)
mtl::swap(rows(A)[j], rows(A)[jp]);
The third operation in the LU factorization is to scale the column
under the pivot by 1/A(i,i), which can be
performed with the scale() MTL
algorithm, which we apply to the column dereferenced by
*columni.
if (j < M - 1)
mtl::scale(*columni, T(1) / A(j,j));
The final operation in the LU factorization is to update the
trailing submatrix according to A' = A' + x y^T A' =
A' + x yT.
We first create the submatrix A' from matrix A using
the sub_matrix() method. The algorithm
rank_one_update() can then be applied to the submatrix. We
do not use *rowi and *columni directly because their
indexing is in terms of A and not subA. A future
release of MTL will provide a better way of adjusting the indices than
this method of copying.
subA = A.sub_matrix(j+1, M, j+1, N);
copy(*columni, c); copy(*rowi, r);
mtl::rank_one_update(subA, scaled(c, T(-1)), r);
The complete implementation of this example LU factorization can be
found in lu.h.
The execution time of many linear algebra operations on modern
computer architectures can be decreased dramatically through blocking
to increase cache utilization [6,7].
In algorithms where repeated matrix-vector operations are done ( rank_one_update() for instance), it is beneficial to convert
the algorithm to use matrix-matrix operations to introduces more
opportunities for blocking.
The LU factorization algorithm can be reformulated to use more
matrix-matrix operations [16]. First we
split matrix A into four submatrices, using a blocking factor of
r (i.e., A11 is r x r).
A = [ | ]
[ A_11 | A_12 ] r
[-------|------]
[ A_21 | A_22 ] n - r
[ | ]
r n - r
Then we formulate A = LU in terms of the blocks.
[ A_11 A_12 ] = [ L_11 0 ] [ U_11 U_12 ]
[ A_21 A_22 ] [ L_21 L_22 ] [ 0 U_22 ]
From this we can derive the following equations for the submatrices of
A. The matrix on the right shows the values that should occupy Aafter one step of the blocked LU factorization.
We find L11, U11,
and L21 by applying the pointwise LU factorization
to the combined region of A11 and
A21. We then calculate U12 with a
triangular solve applied to A12. Finally we
calculate A*22 with a matrix product of
L21 and U12. The algorithm is then
applied recursively to A*22 .
In the implementation of block LU factorization MTL can be used to
create a partitioning of the matrix into submatrices. The use of the
submatrix objects throughout the algorithm removes the redundant
indexing that a programmer would typically have to do to specify the
regions for each submatrix for each operation. The partition() method of the MTL
matrix cuts the original matrix along particular rows and columns. The
partitioning results in a new matrix whose entries are the
submatrices. In the code below we create the partitioning that
corresponds to Figure 1.7
with the matrix object Ap. The partitioning for
Figure 1.8 is created with
matrix object As.
int row_sep1[] = { j, j + jb };
int col_sep1[] = { j };
Matrix Ap = A.partition(array_to_vec(row_sep1),
array_to_vec(col_sep1));
int row_sep2[] = { j, j + jb };
int col_sep2[] = { j, j + jb };
Matrix As = A.partition(array_to_vec(row_sep2),
array_to_vec(col_sep2));
triangle_view<Matrix, unit_lower>::type L_11(As(1,1));
Figure 1.7 depicts the block factorization part
way through the computation. The matrix is divided up for the
pointwise factorization step. The region including A11 and
A21 is labeled A1. Since there is pivoting involved, the rows
in the regions labeled A0 and A2 must be swapped according to
the pivots used in A1.
Figure 1.7:
Pointwise step in block LU factorization.
|
The implementation of this step in MTL is very concise. The Ap(1,1)
submatrix object is passed to the lu_factorize() algorithm. We
then use the multi_swap() function on Ap(1,0) and Ap(1,2)
to pivot their rows to match Ap(1,1).
Pvector pivots(jb);
lu_factorize(Ap(1,1), pivots);
multi_swap(Ap(1,0), pivots);
multi_swap(Ap(1,2), pivots);
Once A1 has been factored, the A12 and A22 regions must
be updated. The submatrices involved are depicted in
Figure 1.8. The A12 region needs to be
updated with a triangular solve. This version of triangular solve can
be found in the matmat namespace of the
MTL.
Figure 1.8:
Update steps in block LU factorization.
|
To apply the tri_solve() algorithm to L11 and A12, we
merely call the MTL routine and pass in the L_11 and As(1,2) matrix objects.
mtl::tri_solve(L_11, As(1,2), mtl::left_side());
The last step is to calculate A*22 with a
matrix-matrix multiply according to A*22 <-
A22 - L21U12.
To implement this operation we use the As(1,2) and
As(2,1) matrix objects, which have been overwritten in the
previous steps with U12 and L21,
and make a call to the mult() function.
mtl::mult(scaled(As(2,1), T(-1)), As(1,2), As(2,2));
The complete version of the MTL block LU factorization algorithm
is given in blocked_lu.h.
|