Modeling Data

Data Space contains all possible data samples,
Data Distribution is the probability of a data sample being drawn
Dataset is a collection of data samples drawn from

Unified Learning Setting

General Learning Problem aims to learn the data distribution

Supervised Learning

  • uses a labeled dataset

  • computational view: learn where

  • statistical view: learn where

  • Classification maps x to discrete y

  • Regression: maps x to continuous y

  • both learn

  • Discriminative Model describes conditional label distribution

Unsupervised Learning

  • uses a unlabeled dataset

  • computational view: learn where

  • statistical view: learn where

  • Generative Model describes the complete data distribution or joint distribution

Generic Framework

  • generative learning is used to learn from dataset

  • can be converted to with Bayes rule

  • Components of Generative Model

    • marginal sample distribution

    • computed with marginalization rule

    • label generative model

    • computed with conditioning

    • label prior distribution

    • label posterior distribution

Generative Model

  • Tabular Model estimates distribution directly from dataset,

    • discriminative:

      • specific sample is very unlikely,
    • generative:

      • specific label is likely,
  • Naive Bayes Model

    • assume all entries are independent in a sample
    • use bayes rule to get
    • this approach enables generative model to be used for discriminative problem when working with small datasets