Skip to content

Taylor and Francis Book Publication (Routledge) : DIMENSIONALITY REDUCTION ALGORITHMS IN APPLIED MACHINE LEARNING

Notifications You must be signed in to change notification settings

khanfarhan10/DIMENSIONALITY_REDUCTION

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

DSDA_2020 (Data Science and Data Analytics: Opportunities and Challenges)

High Dimensionality Dataset Reduction Methodologies in Applied Machine Learning

Farhan Hai Khana,Tannistha Palb

a. Department of Electrical Engineering, Institute of Engineering & Management, Kolkata, India, khanfarhanpro@gmail.com
b. Department of Electronics and Communication Engineering, Institute of Engineering & Management, Kolkata, India, paltannistha@gmail.com

Abstract

A common problem faced while handling multi-featured datasets is the high amount of dimensionality that they often consist of, leading to barriers in generalized hands-on Machine Learning. These datasets also give a drastic impact on the performance of Machine Learning algorithms, being memory inefficient and frequently leading to model overfitting. It often becomes difficult to visualize or gain insightful knowledge on the data features such as presence of outliers.

This chapter will help data analysts reduce data dimensionality using various methodologies such as:

  1. Feature Selection using Covariance Matrix
  2. Principal Component Analysis (PCA)
  3. t-distributed Stochastic Neighbour Embedding (t-SNE)

Under applications of Dimensionality Reduction Algorithms with Visualizations, firstly, we introduce the Boston Housing Dataset and use the Correlation Matrix to apply Feature Selection on the strongly positive correlated data and perform Simple Linear Regression over the new features.Then we use UCI Breast Cancer Dataset to perform PCA Analysis with Support Vector Machine Classification (SVM). Lastly, we apply t-SNE to MNIST Handwritten Digits Dataset and use k-Nearest Neighbours (kNNs) clustering for classification.

Finally, we explore the benefits of using Dimensionality Reduction Methods and provide a comprehensive overview of reduction in storage space, efficient models,feature selection guidelines ,redundant data removal and outlier analysis.

Keywords : Dimensionality Reduction, Feature Selection, Covariance Matrix, PCA , t-SNE

Table of Contents

  1. Problems faced with Multi-Dimensional Datasets
    1. Data Intuition
    2. Data Visualization Constraints
    3. Outlier Detection
  2. Dimensionality Reduction Algorithms with Visualizations
    1. Feature Selection using Covariance Matrix
    2. Principal Component Analysis (PCA)
    3. t-distributed Stochastic Neighbour Embedding (t-SNE)
  3. Benefits of Dimensionality Reduction
    1. Storage Space Reduction
    2. Computation Time Optimization
    3. Redundant Feature Removal
    4. Incorrect Data Removal

About

Taylor and Francis Book Publication (Routledge) : DIMENSIONALITY REDUCTION ALGORITHMS IN APPLIED MACHINE LEARNING

Topics

Resources

Stars

Watchers

Forks