Modeling Data
Data Space contains all possible data samples,
Data Distribution
Dataset is a collection of data samples drawn from
Unified Learning Setting
General Learning Problem aims to learn the data distribution
Supervised Learning
-
uses a labeled dataset
-
computational view: learn
where -
statistical view: learn
where -
Classification maps x to discrete y
-
Regression: maps x to continuous y
-
both learn
-
Discriminative Model describes conditional label distribution
Unsupervised Learning
-
uses a unlabeled dataset
-
computational view: learn
where -
statistical view: learn
where -
Generative Model describes the complete data distribution
or joint distribution
Generic Framework
-
generative learning is used to learn
from dataset -
can be converted to with Bayes rule -
Components of Generative Model
-
marginal sample distribution
-
computed with marginalization rule
-
label generative model
-
computed with conditioning
-
label prior distribution
-
label posterior distribution
-
Generative Model
-
Tabular Model estimates distribution directly from dataset,
-
discriminative:
- specific sample is very unlikely,
- specific sample is very unlikely,
-
generative:
- specific label is likely,
- specific label is likely,
-
-
Naive Bayes Model
- assume all entries are independent in a sample
- use bayes rule to get
- this approach enables generative model to be used for discriminative problem when working with small datasets