首页 > 硕士 > 工学 > 正文

基于数据挖掘的客户行为分析和预测研究

The Research on Customer Behavior Analysis and Forecast Based on Data Mining

作者: 专业:计算机应用技术 导师:花嵘 年度:2010 学位:硕士  院校: 山东科技大学

Keywords

classification and prediction, multilayer perceptrons, support vector machine, logistic model tree

        经济全球化和多样化使得企业从“以产品为中心”向“以客户为中心”转变,客户关系管理(CRM)成为企业竞争力的一个重要方面。利用数据挖掘技术分析这种海量的CRM数据,可以挖掘出有关客户的潜在的有用的知识,帮助企业了解现有客户的购买习惯,为客户提供个性化的、更能满足其需求的服务。同时,基于数据挖掘的客户关系管理,有利于企业发现、吸引和拓展潜在客户,从而最大化客户对企业的商业利润。因此,研究数据挖掘技术在CRM中的应用,具有重要的理论指导和实际应用价值。分类和预测是数据挖掘领域中的一个重要研究课题,很多相关的研究结果已经用于客户关系管理中。本文基于法国电信运营商Orange公司(KDDCUP2009数据集)提供的数据集,建立数据挖掘流程,在数据预处理后实现和改进了三种分类算法,并提出四种集成分类器算法,完成对客户的购买欲、忠诚度和增值服务的分类和预测。最后,设计实验评价各种分类器的性能,并对实验结果进行比较分析。本文的主要工作包括:数据预处理:数据预处理是数据挖掘工作的重点,数据预处理的好坏,对数据挖掘的最终效果有着直接的影响。因此本文中的数据预处理分为两步,初步预处理和二次预处理。初步预处理主要包括:数据观察,数据清理,离散化处理和属性特征选择。而二次预处理或深层预处理,则依赖于具体的分类模型。分类模型的构建:针对Orange公司提供的客户数据集,本文首先探索了多层感知分类器(MLP:Multilayer perceptrons)的算法实现。而后应用经典的支持向量积(SVM: Support Vector Machine)算法构建第二个分类器。最后,基于逻辑模型树(LMT:Logistic Model Tree)构建第三个分类器。为了提高分类性能,我们设计实现了四个集成分类器,分别是:基于后验概率的集成分类器、基于投票的集成分类器、基于后验概率的加权集成分类器和基于投票的加权集成分类器。实验设计及结果分析:本文首先给出了整体实验框架,而后通过对三种经典分类算法的实验结果,以及最后集成分类器的实验结果比较,本文应用ROC曲线下面积(Area Under the Curve, AUC)作为评价指标,对三种经典分类器实验结果和集成分类器进行分析。对于单个分类器来说,改进的逻辑模型树分类器的分类效果明显好于多层感知器和支持向量机。对于集成分类器来说,基于后验概率的加权集成分类器和基于投票的加权集成分类器要更好一些。本文把数据挖掘理论和Orange公司提供的客户数据集相结合,通过对数据挖掘流程的实现,最终完成了对客户购买欲、忠诚度和增值服务的预测。实验结果表明,本文所实现的经典分类预测模型和集成分类器模型是科学有效,并且基本符合应用实际。因此,本文所提出的模型在客户关系管理中具有很重要的意义。
    The globalization and diversity of economy makes the enterprise changed its pattern from "product-centric" to "customer-centric". Customer relationship management (CRM) is becoming one of the most important aspects to evaluate the competitiveness of the enterprise. Analyzing the massive CRM data with data mining techniques has many advantages.It can discover the potential useful knowledge about customer, which will help the enterprise understand customers’buying habits and provide customers with personalized and better services.Meanwhile, CRM based on data mining may help the enterprise find, attract and develop potential customers to maximize business profits. Therefore, the study of data mining technology in the CRM has important theoretical and practical significance.The classification and forecast are important research topics in data mining domain. Many research works have been employed the customer relations management. With the data set provided by Orange corporation, we aim to classify and predict the customers" three aspects, including Appetency, Churn, Up-selling. We firstly design a data mining flow oriented to the customers classification. After preprocessing the data sets, we implement and modify three classification algorithms.And then we propose four kinds of combined classifiers.At last, we conduct several experiments to evaluate the performance the algorithms.The experimental results are also presented, compared and analyzed. The main works in this thesis are following:Data preprocessing:data preprocessing is a very important step for the whole data mining process.It has the direct influence to data mining’s final results.Therefore, data preprocessing in this thesis is divided into two steps:simple preprocessing and deep preprocessing. Simple preprocessing incorporates data exploration, data cleaning, data discretization and attribute/feature selection. The deep preprocessing, however, depends on the specific classification model requirements.The construction of classification model:With the data set provided by Orange Corporation, we firstly study the algorithm of Multi-Layer Perceptrons and then implement it on the data set to construct our first classifier. The second classifier is constructed based on Support Vector Machine.Then we implement and modify the algorithm of Logistic Model Tree, which help us construct the third classifier. To improve the performance of classification, we propose four kinds of combined classifiers including combined posteriors, combined votes, weighted posteriors, and weighted votes.Experiment design and analysis:To evaluate the performance the algorithms, we conduct several experiments.We firstly present the experiment framework/flow. And then we implement the proposed or modified algorithms detailed in chapter 4.The AUC(Area Under the Curve) is adopted as the criteria of classification performance.Three single classifier including MLP, SVM and LMT results are compared and analyzed.The results show that our modified LMT has the highest AUC value.In order to improve the classification performance, we implement our four proposed combined algorithms.The experimental results show that weighted posteriors and weighted votes achieves good performance.This thesis apply the data mining technologies in the customer data set provide by Orange. Through designing the data mining process and constructing classifiers, we achieve the classification and prediction of the customers’three aspects, including Appetency, Churn, Up-selling. Experiment results show that our methods are effective and efficient in CRM. Therefore, the models constructed in this thesis have some potential significance in CRM.
        

基于数据挖掘的客户行为分析和预测研究

摘要5-7
Abstract7-8
1 绪论11-15
    1.1 课题的背景和意义11-12
    1.2 国内外的研究现状12
    1.3 论文的研究内容12-13
    1.4 论文的组织结构13-15
2 相关研究工作15-27
    2.1 客户关系管理15-17
    2.2 数据挖掘技术17-23
    2.3 数据挖掘软件23-26
    2.4 本章小结26-27
3 数据预处理27-35
    3.1 数据观察28-29
    3.2 数据清理29-30
    3.3 离散化处理30-32
    3.4 属性特征选择32-34
    3.5 本章小结34-35
4 模型的构建及算法35-48
    4.1 多层感知器算法MLP35-39
    4.2 支持向量机SVM39-43
    4.3 逻辑模型树LMT43-44
    4.4 集成分类器的设计44-47
    4.5 本章小结47-48
5 实验和结果评价48-55
    5.1 实验目标及评价标准48
    5.2 实验框架48-50
    5.3 实验环境、结果及分析50-54
    5.4 本章小结54-55
6 总结与展望55-57
    6.1 总结55-56
    6.2 展望56-57
参考文献57-60
攻读硕士期间主要成果60-61
致谢61
        下载全文需50


本文地址:

上一篇:搜索竞价广告关键词优化问题研究
下一篇:无线局域网数据采集方法和安全检测技术研究

分享到: 分享基于数据挖掘的客户行为分析和预测研究到腾讯微博           收藏
评论排行
公告