-
Why not a multiple regression (or multifactor ANOVA)?
$$
Y=\beta_1X_1+\beta_2X_2+\beta_3X_3+...+\epsilon
$$
-
Curse of dimensionality
- In many real-world cases, $X_1,X_2,X_3...$ variables have some degrees of correlation with each other
- Datasets are typically high dimensional, but true dimensionality is often much lower
- Example: handwritten digits in a bitmap have many possible combinations but the true dimensionality is the possible variation of the pen-stroke
- As dimensionality grows, there are fewer observations per partition
-
Also what really are dependent and independent variables?
- Example: finding signals of adaptation in ~1 million SNPs with 46 environmental variables
-
Dimensionality reduction
- Feature elimination: picking a subset of the original dimensions
- Feature extraction: construct a new set of dimensions
-
Many multivariate methods are not based on probability theories but a set of algorithms
-
Applications of multivariate methods
- Multivariate hypothesis testing
- Dimensionality reduction
- Clustering
- Accounting for confounding variable
- Data mining