首页 > 硕士 > 工学 > 正文

基于GPU平台的KLU并行算法的研究:对角线块的LU分解

Research of KLU Algorithm Based on GPU: LU Factorization of Diagonal Blocks

作者: 专业:计算机系统结构 导师:何立强 年度:2011 学位:硕士  院校: 内蒙古大学

Keywords

KLU algorithm, numeric factorization, GPU, parallel algorithm

        随着电子科学技术的发展,为设计高质量的电路,电路模拟必不可少。在电路模拟的过程中要涉及到稀疏线性方程组的求解。而随着电路矩阵规模的不断增大,对电路矩阵的求解已经成为电路模拟过程的一个瓶颈。针对电路模拟过程中产生的电路矩阵的特点,通常均采用直接法来进行此类线性方程组的求解。目前常用的求解器有sparse 1.3、superLU、KLU,其中以Timothy Davis教授开发的KLU最为有效率。KLU主要由预处理,首次LU分解,再分解和回代求解这几个部分组成。再分解部分是算法的重要组成部分。在一次电路模拟过程中,就是通过多次调用该部分来完成对稀疏矩阵的数值LU分解。因此,本文主要对这部分的算法进行了并行研究和探索,并提出了基于GPU平台的可行的并行算法。KLU算法在LU分解过程中,采用的算法是基于高斯消去法的Gilbert-Peierls算法。我们通过研究串行算法和程序提出了两种不同的并行设想,并在GPU平台上设计和实现了四种不同的并行算法P_Llen算法、P_Ulen算法、P nk算法和P_stream算法。我们在实验平台Ⅰ上对这四种并行算法进行了性能测试和分析,通过分析我们发现P_stream算法较前三种并行算法在性能上有较大的优势,但由于该并行算法受限于实验平台Ⅰ中GPU显存的限制,导致并行度较低,性能较串行算法有所下降。为提高P_stream算法的并行度,我们在GPU显存容量更大的实验平台Ⅱ上对其进行了性能的测试和分析。通过分析,我们发现随着并行度的提高,P_stream并行算法性能也随之得到提升,但还是受限于显存容量的限制,导致性能较串行算法并没得到提升。由于我们是首次尝试对KLU算法在GPU平台上进行并行算法的实现,加之稀疏矩阵数据的稀疏性、LU分解数据的前后依赖性、硬件限制以及自身编程经验的不足,导致并行算法性能较原有串行算法略有下降,但在本文中我们提出的一些并行设想以及尝试也能为同方向的研究者提供很好的借鉴。
    Along with the development of electronic science and technology, circuit simulation is the necessary part for designing high quality circuit. In the process of simulating circuits, the solving of the sparse linear equations is involved. With the increasing scale of circuit matrix, the solving of circuit matrix has become a bottleneck in circuit simulation process. In view of the characteristics of the circuit matrix, produced at circuit simulation process, people usually adopts the direct method for solving such linear equations. Now, the solvers, people commonly used, includes sparsel.3, superLU, KLU. And the KLU developed by Timothy Davis is most efficient.KLU mainly consist of the pretreatment part, the first LU factorization part, refactorization part and back and solving part. And refactorization part is an important part of the algorithm. In a circuit simulation process, people employ exactly multiple calling this part of sparse matrix to complete the numerical LU decomposition. Therefore, this paper mainly research and exploer the parallel algorithm of this part based on GPU platform.In the LU decomposition process, KLU adopt Gilbert-Peierls algorithm, based on the gaussian elimination method. The research have studied the serial algorithm and program, put forward two different parallel ideas, and design and realization o the four different parallel algorithm P_Llen algorithm, PUlen algorithm, P_nl algorithm and P_stream algorithm on the GPU platform, we tested and analysed the performance of the four parallel algorithms in experimental platform I. This pape found that Pstream algorithm has larger advantage in performance than previous three parallel algorith in performance by analyzing. But the P_stream algorithm restricted by the GPU memory limit, the perfomance is lower than serial algorithm For improving the parallel degree of the P_stream, this paper tested and analysed the performance of the algorithm in the experimental platformⅡ, which has the large GPU memory capacity. Through analysis, this paper found that Pstream algorithm performance ascension with the parallel rise, but the performance is not promotec serial algorithm, still limited by the limitation of the GPU memory.Since we are first attempt to research and realize the parallel KLU algorithm on the GPU platform, and the sparseness of sparse matrix data, data dependence at LU factorization, hardware constraints and my poor programming experience, resulting in parallel algorithm performance lower slightly than original serial algorithm. But Some parallel ideas and trying, which we proposed in this paper, can also provide very good reference for the same direction researchers.
        

基于GPU平台的KLU并行算法的研究:对角线块的LU分解

摘要4-6
ABSTRACT6-7
第一章 绪论11-14
    1.1 研究背景11-12
    1.2 国内外研究现状12
    1.3 研究目标与内容12-13
        1.3.1 研究目标12-13
        1.3.2 研究内容13
    1.4 论文结构13-14
第二章 GPU通用计算与CUDA简介14-18
    2.1 GPU通用计算14-16
    2.2 CUDA简介16-17
        2.2.1 CUDA开发16
        2.2.2 CUDA编程模型16-17
    2.3 本章小结17-18
第三章 KLU18-23
    3.1 KLU简介18-19
    3.2 KLU求解稀疏线性方程组流程19-21
    3.3 KLU在电路模拟中的应用21-22
        3.3.1 电路矩阵的特点21
        3.3.2 电路模拟中的线性系统21-22
    3.4 本章小结22-23
第四章 KLU之LU分解23-31
    4.1 稠密LU分解23-24
    4.2 稀疏LU分解24-25
    4.3 Left Looking高斯消去法25-27
    4.4 Gilbert-Peierls算法27-29
        4.4.1 符号分析27-29
        4.4.2 数值分解29
    4.5 KLU再分解机制29-30
    4.6 本章小结30-31
第五章 KLU分解阶段的并行算法研究31-47
    5.1 KLU_refactor算法31-34
        5.1.1 算法描述31-33
        5.1.2 算法分析33-34
    5.2 P_Llen并行算法34-35
        5.2.1 算法描述34-35
        5.2.2 并行性分析35
    5.3 P_Ulen并行算法35-39
        5.3.1 P_Ulen算法描述35-38
        5.3.2 优化策略38-39
    5.4 P_nk并行算法39-42
        5.4.1 P_nk算法描述39-42
        5.4.2 并行性分析42
    5.5 P_stream并行算法42-46
        5.5.1 P_stream算法描述42-45
        5.5.2 并行性分析45-46
    5.6 本章小结46-47
第六章 实验环境和实验方法47-50
    6.1 实验环境47-48
    6.2 实验方法48-50
        6.2.1 测试矩阵48-49
        6.2.2 测试方法49-50
第七章 实验结果和性能分析50-55
    7.1 klu_refactor串/并行算法性能比较50-52
    7.2 并行算法性能分析52-55
第八章 全文总结与进一步工作55-57
    8.1 全文总结55-56
    8.2 进一步工作56-57
参考文献57-59
致谢59
        下载全文需50


本文地址:

上一篇:基于图像识别与匹配技术的奶牛保险系统研究
下一篇:海拉尔河流域景观分布格局变化及其驱动因子分析

分享到: 分享基于GPU平台的KLU并行算法的研究:对角线块的LU分解到腾讯微博           收藏
评论排行
公告