这个说得比较清楚。内存效率提升,但是也没说计算浮点操作数减少。
%pylab
Using matplotlib backend: Qt5Agg
Populating the interactive namespace from numpy and matplotlib
A矩阵是对角占优的矩阵
A=array([5 ,-2 ,0 ,0 ,0,
-2 ,5,-2,0,0,
0 , -2, 5,-2,0,
0,0,-2,5,-2,
0,0,0,-2,5])
A = A.reshape([5,5])
A
array([[ 5, -2, 0, 0, 0],
[-2, 5, -2, 0, 0],
[ 0, -2, 5, -2, 0],
[ 0, 0, -2, 5, -2],
[ 0, 0, 0, -2, 5]])
P矩阵是排列矩阵,交换第一个元素和第5个元素
P=array([0 ,0 ,0 ,0 ,1,
0,1,0,0,0,
0 , 0, 1,0,0,
0,0,0,1,0,
1,0,0,0,0])
P = P.reshape([5,5])
P
array([[0, 0, 0, 0, 1],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[1, 0, 0, 0, 0]])
排列矩阵的转置是自身的逆
P.T.dot(P)
array([[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1]])
作用到矩阵A上,生成矩阵B
B=P.T.dot(A).dot(P)
B
array([[ 5, 0, 0, -2, 0],
[ 0, 5, -2, 0, -2],
[ 0, -2, 5, -2, 0],
[-2, 0, -2, 5, 0],
[ 0, -2, 0, 0, 5]])
上下三角矩阵和对角矩阵
#GS iteration
L=tril(B,-1)
L
array([[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0],
[ 0, -2, 0, 0, 0],
[-2, 0, -2, 0, 0],
[ 0, -2, 0, 0, 0]])
D=diag(diag(B))
D
array([[5, 0, 0, 0, 0],
[0, 5, 0, 0, 0],
[0, 0, 5, 0, 0],
[0, 0, 0, 5, 0],
[0, 0, 0, 0, 5]])
U=triu(B,1)
U
array([[ 0, 0, 0, -2, 0],
[ 0, 0, -2, 0, -2],
[ 0, 0, 0, -2, 0],
[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0]])
M_GS is Gauss-Seidel iter. matrix
M_GS=inv(L+D).dot(U)
M_GS
array([[ 0. , 0. , 0. , -0.4 , 0. ],
[ 0. , 0. , -0.4 , 0. , -0.4 ],
[ 0. , 0. , -0.16 , -0.4 , -0.16 ],
[ 0. , 0. , -0.064, -0.32 , -0.064],
[ 0. , 0. , -0.16 , 0. , -0.16 ]])
spectral radius of GS iter.
spectral_radius_GS=max(abs(linalg.eig(M_GS)[0]))
spectral_radius_GS
0.48000000000000026
M_J is Jacobi iter. matrix
M_J=inv(D).dot(U+L)
M_J
array([[ 0. , 0. , 0. , -0.4, 0. ],
[ 0. , 0. , -0.4, 0. , -0.4],
[ 0. , -0.4, 0. , -0.4, 0. ],
[-0.4, 0. , -0.4, 0. , 0. ],
[ 0. , -0.4, 0. , 0. , 0. ]])
spectral radius of iter. matrix
spectral_radius_J=max(abs(linalg.eig(M_J)[0]))
spectral_radius_J
0.69282032302755092
Compare A with B
L,D,U=tril(A,-1),diag(diag(A)),triu(A,1)
M_GS=inv(L+D).dot(U)
spectral_radius_GS=max(abs(linalg.eig(M_GS)[0]))
spectral_radius_GS
0.47999999999999987
对比可以看出来,reorder之后迭代矩阵谱半径几乎没变化。说明加速主要来源于内存访问效率的提高。
某种意义上来说,说明CPU没吃饱,上菜速度太慢了。以后pde模拟程序的优化方向大概应该是加快上菜速度。。。