data

简单了解下数据挖掘
将数据data变成知识knowledge的过程

data-target data-preprocessed data-Transformed data-patterns-knowledge

什么是KDD? Knowledge Discover in Database
定义:KDD is the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.
valid: the discovered patterns should also hold for new, previously unseen problem instances.
novel: at least to the system and preferably to the user
potentially useful: they should lead to some benefit to the user or task
ultimately understandable用户必须理解: the end user should be able to interpret the patterns either immediately or aftersome post-processing

datasets 由instances组成,由特征attribute描述

learning Task 分类
有监督supervise学习 有反馈
无监督unsupervised学习 无反馈

direct feedback-有label

1-unsupervised learning 无label
clustering
applications:Market Segmentation, Document Clustering
Typical subtask: Clustering, Association rules(e.g. Market basket analysis), outlier detection

2-supervise learning 有label 可以集成
Typical subtask: Classification, Regression, Outlier
Classification
Application: Fraud Detection, Chum prediction in telo
e.g. Goggle news 左侧都是有label的, 预测Farming

Un/Supervise learning 都有异常检测