Ви переглядаєте архівну версію офіційного сайту НУЛП (2005-2020р.р.). Актуальна версія: https://lpnu.ua

Data Mining

Major: Software Engineering
Code of Subject: 7.121.02.E.26
Credits: 5
Department: Software
Lecturer: doctor of sciences, professor Hrytsyuk Yu.I.
Semester: 2 семестр
Mode of Study: денна
Learning outcomes:
As a result of discipline, a specialist should know:
-Differences in Data Mining from classical statistical analysis methods and OLAP systems;
-Types of Data Mining (Association, Classification, Sequence, Clustering, Forecasting);
-the scope of Data Mining;
-Data Mining methods: neural networks, decision trees, restricted search methods, genetic algorithms, evolutionary programming, cluster models, combined methods.
The trained specialist should be able to:
1) to solve practical tasks with the help of a tool that uses Data Mining technology;
2) implement the data analysis process using Data Mining technology, including all stages of this process.
Required prior and related subjects:
Databases
Data Warehouses
Data structures
Fundamentals of artificial intelligence
Summary of the subject:
Concept of Data Mining. The emergence, prospects, problems of Data Mining. A look at Data Mining as part of the IT market. Data Mining Stages and actions that are performed within these stages. Classification of Data Mining Methods. Comparative characteristics of some methods based on their properties. Data Mining Task. The essence of the tasks of Data Mining and their classification. The notion of "information", "knowledge", comparison and comparison of these concepts. Task classification and clustering. The essence of tasks, the process of solution, methods of solution, application. Comparison of the considered tasks. The essence of the task of forecasting. Concept of a time series, its components, parameters of forecasting, types of forecasts. Tasks for data visualization. Basics of data analysis. The main characteristics of descriptive statistics, the essence of correlation and regression analysis. Examples of problem solving in Microsoft Excel. Application of Data Mining technology. Concepts of Web Mining, Text Mining, Call Mining. Methods of classification and forecasting. Method of decision trees. Elements of the decision tree, the process of its construction. Examples of trees that solve classification problems. Algorithms for constructing decision trees CART and C4.5. The method of reference vectors, the method of "closest neighbor" and Bayesian methods of classification. Advantages and disadvantages of these methods. The method of neural networks. Elements and architecture, learning process and the phenomenon of retraining the neural network. Model of the neural network - perceptron. An example of solving a problem with the help of a device of neural networks. Description of work with neural networks, classification of neural networks. The process of training data. Self-organizing Kohonen Maps are an example of solving a task. Fundamentals of cluster analysis, mathematical characteristics of the cluster. Two groups of hierarchical cluster analysis: agglomerate and divisible methods. An example of hierarchical cluster analysis in SPSS. Iterative methods on an example of the algorithm k-averages. Fundamentals of factor analysis and iterative clustering in SPSS. The process of cluster analysis. A comparative analysis of hierarchical and non-hierarchical methods and some new algorithms. Methods of searching for associative rules. The essence of the task of finding associative rules. Algorithm of Apriori. The essence of some other algorithms. An example of solving a problem in the analytic package Deductor. Ways of visual representation of data. Visualization methods. Methods and means of visual representation of information. Ways of presenting information in one-, two-, three-dimensional measurements, ways of displaying information in more than three dimensions. Principles of high-quality visualization. Main trends in visualization. Comprehensive approach to Data Mining, OLAP and Data Warehousing in DSN. Information systems of the type SPPR, their types and components. The main ideas of OLAP technology, the architecture of OLAP servers, the integration of Data Mining and OLAP. Data warehouse technology and the benefits of using it, in particular for the Data Mining process. The initial stages of the Data Mining process. The process of data preparation, the concept of data quality, dirty data, the stages of data clearing. Two classification of tools for clearing and editing data, the main functions of data clearing tools, the classification of errors in data that arise from the use of data clearing tools. The stages of the Data Mining process are related to the construction, verification, evaluation, selection and correction of models. The concept of "model" and "modeling". Organizational and Human Factors in Data Mining. Data Mining Standards. The Data Mining process in terms of organizational factors, as well as the known methodologies CRISP and SEMMA. Standards that have a direct and indirect relationship to Data Mining. Market Data Mining Tools, its development, tool providers, tools classification. The criteria by which you can compare and choose the Data Mining tool. Data Mining Tools, SAS Enterprise Miner. SAS Enterprise Miner package 5.1. Overview of the software product, the main features and technical requirements of the package. SAS's approach to creating information and analytical systems. PolyAnalyst system. Architecture, analytical tools, short description of PolyAnalyst mathematical algorithms. Characteristics of the Web Analyst system. Cognos software and STATISTICA Data Miner. Cognos software suite; features of the methodology of modeling in the system. STATISTICA Data Miner tool, analysis tools and workflow. Oracle Data Mining and Deductor tools. Data Mining Difference between Oracle and Deductor. Characteristics of Oracle Data Mining, implemented algorithms and functionality. Analytical platform Deductor, architecture of its system and analytical algorithms. KXEN Tool KXEN software. Differences in the KXEN approach from the traditional Data Mining approach. Prerequisites for creating the KXEN system and its specifications. Key components of the KXEN system. IOLAP technology. Data Mining consulting. The concept of Data Mining-consulting, the task of services for the effective implementation of this technology. The benefits of this option. The procedure for the SnowCactus consulting company to work with the client.
Recommended Books:
Literature to the theoretical course
1. Barseghyan AA Methods and models of data analysis: OLAP and Data Mining: training. way. / AA Barseghyan, MS Kupriyanov, VV Stepanenko, II Cold. - 2nd ed., Pererab. and add - St. Petersburg : Publishing House BHV-Petersburg, 2004. - (+ CD-ROM). - 336 pp.
2. Barseghyan AA Data Analysis Technologies: Data Mining, Visual Mining, Text Mining, OLAP: Training. way. / AA Barseghyan, MS Kupriyanov, VV Stepanenko, II Cold. - 2nd ed., Pererab. and add - St. Petersburg : Publishing House BHV-Petersburg, 2007. - (+ CD-ROM). - 384 s.
3. Barseghyan AA Analysis of data and processes: study. way. / AA Barseghyan, MS Kupriyanov, II Cold, MD Tess, SI Elizarov - 3rd ed., Pererab. and add - St. Petersburg : Publishing House BHV-Petersburg, 2009. - (+ CD-ROM). - 512 s.
4. CRISP-DM 1.0. Step-by-step Data Mining Guide. SPSS 2000
5. Artificial Intelligence - A Guide to Intelligent Systems, Michael Negnivitsky, Addi-son-wesley, Pearson Education Limited? 2002
6. LyuherD Artificial intelligence. - M.: Publishing House "Mir", 2003. 690 p.
7. Gavrilova T. A., Khoroshevsky VF Knowledge bases of intellectual systems. St. Petersburg: Peter, 2001. 384 p.
8. Winston R. N. Artificial Intelligence (3rd Edition). Addison-Wesley Pub Co; Third edi-tion, 1992. 691 p.
9. Stuart J. Russell, Peter Norvig. Artificial Intelligence: A Modem Approach (2nd Edi-tion). Prentice Hall; 2nd edition, 2002. 1132 p.
10. Artificial intelligence, reference book in 3 volumes. - Moscow: Publishing house "Radio and Communication", 1990, ed. Zakharova V. N. and Khoroshevsky V. F.
11. Nilsson N. Principles of Artificial Intelligence. - Moscow: Publishing House "Mir", 1985.374 pp.
12. Pospelov DA From the history of artificial intelligence: the history of artificial intelligence until the mid 80's. Artificial Intelligence News, 1994, No. 4. - P. 70-90.
Literature for laboratory classes
1. XELOPES Library Documentation. Version 1.1 Prudsys AG. Germany. Chemnitz May 26, 2003. - 126 p.
Assessment methods and criteria:
Laboratory classes
50
Participation in seminars
10
Reports
10
Presentations
10
Kyrgyzstan
20
Total points
100