首页 > 硕士 > 工学 > 正文

数据挖掘中关联规则算法的研究及应用

Research and Application of Association Rull Mining Algorithm in the Data Mining

作者: 专业:计算机应用技术 导师:郑刚 年度:2010 学位:硕士  院校: 安徽工程大学

Keywords

data mining, association rule, maximum frequent item sets, incremental updating, frequent patter

        数据挖掘是从大量的、不完全的、有噪声的、模糊的、随机的数据中,提取隐含在其中的、人们事先不知道的、但又是潜在有用的信息和知识的过程,其主要目标是从大型的数据库中挖掘出对用户有价值的信息。其中关联规则挖掘是数据挖掘的一个重要研究分支,主要用于发现数据集中项之间的相关联系。由于关联规则形式简洁、易于解释和理解并可以有效地捕捉数据间的重要关系,因此从大型数据库中挖掘关联规则问题已成为数据挖掘中最成熟、最重要、最活跃的研究内容。本文对数据挖掘技术,尤其是关联规则数据挖掘技术进行了全面地分析和研究,在先前研究的基础上,提出解决相应问题的关联规则挖掘算法。论文的主要内容包括以下四个方面:第一、数据挖掘技术、关联规则挖掘技术的分析与研究。文中详细地介绍了数据挖掘基本概念,并对数据挖掘的过程、数据挖掘的应用领域以及数据挖掘的常用技术进行分类、归纳和总结,并且对数据挖掘技术的国内外研究现状进行分析;文中还对关联规则的定义、性质、基本步骤做了系统地阐述,分析研究关联规则挖掘的经典挖掘算法Apriori以及基于Apriori算法的的改进方法,另外,对不产生候选挖掘频繁项集的FP-growth算法的过程、思想进行了详细地描述。第二、深入研究了关联规则中最大频繁项目集,提出一种基于FP-tree结构的最大频繁模式挖掘算法DMFIA-D。通过实例说明该DMFIA-D算法执行过程,并通过试验证明该算法与DMFIA算法相比更具有优越性,试验还验证了算法的可扩展性。DMFIA-D算法对FP-tree结构进行了改进,充分利用FP-tree结构特征,并运用双向搜索策略,自顶向下选取最大频繁候选项集,自底向上对候选项集进行计数、剪枝最终确定最大频繁项目集。由于减少了最大频繁候选集,并对候选集进行有效剪枝,从而缩短了算法的挖掘时间,提高效率。第三、文中研究了增量更新算法FUP,提出一种基于临时表的改进算法MFUP。实例说明了MFUP算法的执行过程,实验验证了MFUP算法的优越性。通过对FUP算法进行分析,指出它的优缺点,针对FUP算法的不足,提出改进算法MFUP。该算法通过建立临时表,来存放增量数据库的频繁项集,充分利用原数据库挖掘的结果,尽早的删除了更新数据库的非频繁项目集,从而大大减少了对数据的重复扫描,提高了数据挖掘算法的效率。第四、研究探讨了算法DMFIA-D在超市系统分析中会员消费情况的应用尝试。为超市系统针对会员消费情况制定销售策略、促销活动等提供辅助决策信息。
    Data mining is to reveal the implicated but useful information from massive, noisy, fuzzy, incomplete dataset. Its essential target is to extract valuable information from the large-scale database. Association rule mining is an important branch of data mining, which mainly use to find relevant contact. Because the form of association rule mining is succinct, easy to explain and understanding, and may catch the data effectively the important relation. Now the question of association rule mining from the large-scale database has become the most active, matures, importantly research content in the data mining.In this thesis, we research and analyse the data mining technology, especially the association rule mining technology. Based on the previous research, we put forward corresponding algorithm of mining association rules for the problems which has found in the research process. The thesis mainly includes the following four aspects:First, the data mining technology, the association rules mining technical are analysed and researched. We introduce the basic concept of data mining, and classifies and deduces and summarizes the process of data mining, the application field of data mining and the common technology of the data mining, the domestic and overseas research situation of the data mining in the paper. Meanwhile, we expatiate the basis concept of the association rules by the numbers, and deduce the classification of the association rules and the basis steps of the association rules. We also research the classic algorithm Apriori of the association rule, the improving method based on of the Apriori algorithm and the FP-growth algorithm of no candidate mining of frequent item sets.Second, we have researched the maximum frequent item sets,and proposed an DMFIA-D algorithm for mining maximum frequent item sets based on FP-tree.We explained the algorithm through the example,and validated the superiority and expansibility of the algorithm.Based on the research of the concept of maximum frequent item sets and the existing maximum frequent item sets algorithms, We propose an algorithm for mining maximum frequent item sets based on FP-tree, which algorithm improves FP-tree structure, and makes full use of FP-tree structural features, and uses bi-directional search strategy. The bi-directional search strategy means that the top-down search the candidates item sets of the maximum frequent and the bottom-up count or cut the candidates first, and finally make sure the maximum frequent item sets. Because of cutting down the candidates, so it reduces the time of the algorithm mining,.and inceases the efficiency.Thirdly, in the paper we also research the incremental updating algorithm FUP, and analyse the FUP algorithm, and propose the advantages and the disadvantages of the FUP algorithm. We provide a new algorithm MFUP which algorithm based on temporary table for mining association rules. The MFUP algorithm made full use of the old data mining rules and reduced the times of scaning the database greatly, thus the data mining efficiency increased. The example and the experiment in the paper shows that MFUP is better than FUP.At last, we study the problem of the application of mining maximal frequent patterns algorithm DMFIA-D in the analysis of the supermarket system. Mei Jun(Computer Application Technology) Supervised by Zheng Gang
        

数据挖掘中关联规则算法的研究及应用

摘要5-7
ABSTRACT7-9
第1章 绪论12-21
    1.1 研究背景12-13
    1.2 数据挖掘概述13-17
        1.2.1 数据挖掘的定义13-14
        1.2.2 数据挖掘的过程14-15
        1.2.3 数据挖掘的应用15-16
        1.2.4 数据挖掘的常用技术16-17
    1.3 国内外研究现状17-18
    1.4 论文的研究内容18-19
    1.5 论文的组织结构19-21
第2章 关联规则挖掘21-31
    2.1 关联规则的描述21-22
    2.2 关联规则挖掘的基本步骤22-23
    2.3 关联规则挖掘算法23-30
        2.3.1 关联规则经典挖掘算法Apriori23-26
        2.3.2 基于Apriori算法的改进方法26-27
        2.3.3 不产生候选挖掘频繁项集的算法FP-growth27-30
    2.4 本章小结30-31
第3章 基于FP-TREE的最大频繁项目集挖掘算法DMFIA-D31-45
    3.1 最大频繁项目集挖掘的概述31-33
        3.1.1 最大频繁项目集挖掘的概念31-32
        3.1.2 最大频繁项目集挖掘算法研究现状32-33
    3.2 新的FP-TREE的设计与构造33-35
        3.2.1 新的FP-tree的定义33
        3.2.2 新的FP-tree的构造算法33-34
        3.2.3 新的FP-tree的性质34-35
    3.3 DMFIA-D算法35-39
        3.3.1 DMFIA-D算法的搜索策略35-36
        3.3.2 DMFIA-D算法具体思想36-38
        3.3.3 DMFIA-D算法描述38-39
    3.4 实例说明39-41
    3.5 算法分析比较及实验性能测试41-44
    3.6 本章小结44-45
第4章 基于临时表的增量更新算法MFUP45-56
    4.1 增量式更新算法的概述45-47
        4.1.1 增量式更新算法的分类45-46
        4.1.2 增量式关联规则更新算法研究现状46
        4.1.3 FUP算法46-47
    4.2 基于临时表的增量更新算法MFUP47-52
        4.2.1 增量更新中频繁项集的性质48-49
        4.2.2 MFUP算法基本思路49-50
        4.2.3 MFUP算法描述过程50-52
    4.3 实例说明52-54
    4.4 实验及算法性能分析54-55
    4.5 本章小结55-56
第5章 关联规则技术在超市系统分析中的应用尝试56-64
    5.1 数据挖掘过程56-57
    5.2 超市系统简介57-58
    5.3 数据预处理58-60
    5.4 实验结果60-61
    5.5 关联规则分析61-62
    5.6 本章小结62-64
第6章 总结与展望64-66
参考文献66-70
攻读学位期间发表的学术论文70-71
致谢71
        下载全文需50


本文地址:

上一篇:基于ARM的图像采集与无线传输技术的研究
下一篇:基于JZ4750无线视频监控系统的研究与实现

分享到: 分享数据挖掘中关联规则算法的研究及应用到腾讯微博           收藏
评论排行
公告