大數據分析, 機械學習, 資料採礦

Big Data Analysis, Machine Learning, Data Mining


Date
Aug 20, 2020 12:00 AM — 3:49 PM

關於大數據分析, 機械學習, 資料採礦等所使用的資料分析方法, 在統計領域也稱為統計學習, 統計主要是討論資料分析的方法, 因此統計學習的入們課程以討論資料分析為重心, 至於大數據資料庫的管理與存取, 不同演算法的差議, 不會在本課程討論. 另外, linear regression, logistic regression, data visualization 在統計基礎課程已經講授. machine learning 與 deep learning 可使用 Python, R, SAS, IBM-SPSS 等軟體.

Softwares

  • R:
    • RSudio
    • tidyverse
    • ggplot2, lattice
    • caret, tidymodels, mlr3
  • Python:
    • Jupyter
    • Jupyter notebooks
    • NumPy, SciPy, Pandas
    • Matplotlib, Seaborn
    • Scikit-learn

大數據分析基本

  • Introduction
  • Tidy Data
  • Data Preprocessing
    • Descriptive Statistics
    • Missing Data
    • Categorical Variables
    • Dummy Variables
      • Combine Levels
      • Binning Predictors
    • Continuous Variables
      • Near-zero Variance
      • Collinearity
      • Linear Dependencies
      • Outliers
    • Data Transformations
    • Putting It All Together

Big Data Visualization

  • Exploatory Data Analysis
  • R: Basic Graphics
  • R: lattice package
  • R: ggplot2 package
  • Analytic Visualization

Unsupervised Learning

  • PCA (principal Components Analysis)
  • Cluster Analysis
  • Basket Analysis and Association Analysis
  • Recommendation System

Supervised Learning

  • Predication: Theory and Evaluation
  • Classification: Theory and Evaluation
  • Linear Regression
  • Logistic Regression
  • Penalized Regression (L1, L2, Elastic Net)
  • Non-parametric Regression (LOWESS, Spline, GAM)
  • Robust Regression
  • Principal Component Regression (PCR)
  • Partial Least Square Regression (PLSR)
  • Discriminant (LDA, QDA)
  • Naive Bayes Classifier (NB)
  • k-Nearest Neighbor Classification (KNN)
  • Regression Tree and Classification Tree (Tree)
  • Support Vector Machines (SVM)
  • Bagging
  • Random Forest (RF)
  • Boosting

Text Mining

  • R: tidytext package
  • Word and Document Frequency
  • Relationships Between Words
  • Mining the Corpus
  • Sentiment Analysis
  • Topic Modeling

Deep Learning

  • Introduction to Neural Network
  • Recurrent Neural Network (RNN) and Elman Neural Networks (ENN)
  • Jordan Neural Networks (JNN)
  • Autoencoder
  • Stacked Autoencoder and Denoising Autoencoder (DA)
  • Restricted Boltzmann Machines (RBM)
  • Deep Belief Networks (DBN)
  • Convolutional Neural Network (CNN)
  • Natural Language Processing (NLP)
Jeff Lin
Jeff Lin
骨科醫師/醫學統計顧問

骨科醫師/醫學統計顧問