Why was row (rather than column) pivoting chosen for LU factorisation? #1224

johnomotani · 2026-04-07T09:19:13Z

johnomotani
Apr 7, 2026

Does anyone know why LAPACK designers chose to use row-pivoting rather than column-pivoting in the LU factorisation algorithm? I would have thought that when using a column-major matrix format it would be cheaper to swap columns than to swap rows, and as far as I could see (although my brief googling failed to find a reference with a definitive-sounding statement) row pivoting or column pivoting are as good as each other from a numerical stability point of view (anyone know if that is true?).

I'm interested because for various reasons I'm implementing a parallelised LU solver, and considering switching from row pivoting to column pivoting for efficiency. I am wondering if I am missing some reason why this is a bad idea, so interested in the original design decision in LAPACK (or any other thoughts/opinions on this issue).

mgates3 · 2026-04-08T01:52:44Z

mgates3
Apr 8, 2026
Maintainer

A few things come to mind:

There's a strong bias in the literature to row pivoting. Most of the analysis I have seen is on row pivoting, with a small amount on rook pivoting and complete pivoting.
LAPACK factors the matrix A one block column at a time (a "panel"), then updating the trailing matrix right of the panel, then factoring the next block column, and so on. Thus, while searching for a pivot in column j, all the previous updates from columns < j have already been applied to column j. Whereas if it used column pivoting, it would be searching for a pivot in the trailing matrix that has updates from previous panels applied, but no updates from the current panel applied. This would make the LU factorization dependent on the block size; changing the block size would yield different L and U factors.
One strategy, employed by MAGMA, is to store the panel column-major, so pivot searches on contiguous columns are fast, but store the trailing matrix row-major, so row swaps on contiguous rows are fast. This involves transposing each panel.

0 replies

johnomotani · 2026-04-08T09:33:06Z

johnomotani
Apr 8, 2026
Author

Thanks @mgates3! The thing that jumps out to me from your comments is that the pivot search will be faster when searching a column rather than a row (with a column major matrix), so there is a trade-off (that I was not aware of before!) is between efficient column/row searches vs. efficient row/column swaps. For my application the pivot searches are parallelised (ScaLAPACK-style). I wonder if that tips the balance for me towards wanting to optimise the row/column swaps, as the pivot searches are done in parallel on smaller segments. I will have to benchmark more, but now have a better idea what to look out for when making changes.

0 replies

johnomotani · 2026-04-09T08:26:12Z

johnomotani
Apr 9, 2026
Author

After some experimentation with 'equivalent' implementations that use row swaps or column swaps, the row-swap version that LAPACK uses seems more efficient (roughly 2x). I suspect it might be related to the 'recursive algorithm' where the memory layouts that occur in the row-swap version are a bit more efficient. The time for pivot searches does not seem to be a significant part of the run time.

0 replies

ilayn · 2026-04-09T08:35:40Z

ilayn
Apr 9, 2026

Another detail about column swaps is that the permutation appears on the right hand side. in case you want to use this in linear solves (which is why mostly LU is for), it would convert the problems to a permuted solve. Permutations are sometimes not needed for the solves until some goal is achieved (say a parametric solve for many RHS). And permuted solves are not as cache friendly.

Even if you remove the recursion overhead for small problems (see https://github.com/ilayn/semicolon-lapack/blob/main/src/d/dgetrf2.c as an example that uses CBLAS interface) I don't think you can make it as fast as the unpermuted solve.

0 replies

johnomotani · 2026-04-09T09:48:12Z

johnomotani
Apr 9, 2026
Author

@ilayn but with row swaps, you still have to permute the right hand side before the solve. Why is permuting the solution after the solve slower than permuting the right hand side before the solve?

1 reply

ilayn Apr 9, 2026

I wanted to emphasize that it is not always needed to know the permuted result but only check the solve and then permute afterwards if needed. With column-swap it becomes mandatory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why was row (rather than column) pivoting chosen for LU factorisation? #1224

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why was row (rather than column) pivoting chosen for LU factorisation? #1224

Uh oh!

johnomotani Apr 7, 2026

Replies: 5 comments · 1 reply

Uh oh!

mgates3 Apr 8, 2026 Maintainer

Uh oh!

johnomotani Apr 8, 2026 Author

Uh oh!

johnomotani Apr 9, 2026 Author

Uh oh!

ilayn Apr 9, 2026

Uh oh!

johnomotani Apr 9, 2026 Author

Uh oh!

ilayn Apr 9, 2026

johnomotani
Apr 7, 2026

Replies: 5 comments 1 reply

mgates3
Apr 8, 2026
Maintainer

johnomotani
Apr 8, 2026
Author

johnomotani
Apr 9, 2026
Author

ilayn
Apr 9, 2026

johnomotani
Apr 9, 2026
Author