關於大數據分析, 機械學習, 資料採礦等所使用的資料分析方法,
在統計領域也稱為統計學習,
統計主要是討論資料分析的方法,
因此統計學習的入們課程以討論資料分析為重心,
至於大數據資料庫的管理與存取, 不同演算法的差議,
不會在本課程討論. 另外, linear regression, logistic regression,
data visualization 在統計基礎課程已經講授.
machine learning 與 deep learning 可使用 Python, R, SAS, IBM-SPSS 等軟體.
Softwares
- R:
- RSudio
- tidyverse
- ggplot2, lattice
- caret, tidymodels, mlr3
- Python:
- Jupyter
- Jupyter notebooks
- NumPy, SciPy, Pandas
- Matplotlib, Seaborn
- Scikit-learn
大數據分析基本
- Introduction
- Tidy Data
- Data Preprocessing
- Descriptive Statistics
- Missing Data
- Categorical Variables
- Dummy Variables
- Combine Levels
- Binning Predictors
- Continuous Variables
- Near-zero Variance
- Collinearity
- Linear Dependencies
- Outliers
- Data Transformations
- Putting It All Together
Big Data Visualization
- Exploatory Data Analysis
- R: Basic Graphics
- R: lattice package
- R: ggplot2 package
- Analytic Visualization
Unsupervised Learning
- PCA (principal Components Analysis)
- Cluster Analysis
- Basket Analysis and Association Analysis
- Recommendation System
Supervised Learning
- Predication: Theory and Evaluation
- Classification: Theory and Evaluation
- Linear Regression
- Logistic Regression
- Penalized Regression (L1, L2, Elastic Net)
- Non-parametric Regression (LOWESS, Spline, GAM)
- Robust Regression
- Principal Component Regression (PCR)
- Partial Least Square Regression (PLSR)
- Discriminant (LDA, QDA)
- Naive Bayes Classifier (NB)
- k-Nearest Neighbor Classification (KNN)
- Regression Tree and Classification Tree (Tree)
- Support Vector Machines (SVM)
- Bagging
- Random Forest (RF)
- Boosting
Text Mining
- R: tidytext package
- Word and Document Frequency
- Relationships Between Words
- Mining the Corpus
- Sentiment Analysis
- Topic Modeling
Deep Learning
- Introduction to Neural Network
- Recurrent Neural Network (RNN) and Elman Neural Networks (ENN)
- Jordan Neural Networks (JNN)
- Autoencoder
- Stacked Autoencoder and Denoising Autoencoder (DA)
- Restricted Boltzmann Machines (RBM)
- Deep Belief Networks (DBN)
- Convolutional Neural Network (CNN)
- Natural Language Processing (NLP)