Data Science Using R
Overview
With this comprehensive R training learn hand-on skills on Data Science with R - the golden boy of Data Science! Over past several years R has garnered immense popularity among Data Science practitioners and it is no surprise that R language is often as referred as lingua franca of Data Science! This Data Science R course effectively covers basic data analytics, statistical predictive modelling and machine learning through various practical examples and projects.
CASE STUDIES:
1. Banking Case study
Understand customer spend & repayment behavior, along with evaluating areas of bankruptcy, fraud, and collections. Also, respond to customer requests for help with proactive offers and service.
2. Retail Case study
A retail store requires to analyze the day-to-day transactions and keep a track of its customers spread across various locations and their purchases/returns across various categories. The objective of the case study is to understand customer behavior in-terms of purchase and returns through various Data Manipulation steps in R.
3. Auto Insurance Case study
Perform various tasks pertaining to data cleaning, basic data explorations and data mining from the two given datasets (Customer demo graphics, claims data)
4. Visualization Case study
Perform different graphical analysis (bar chart, pie chart, box plot, histogram, stacked charts, heat maps, scatter plots, panel charts etc) for solving different business problems
5. Credit Card Customers Segmentation
A credit card company wishes to understand its customer behavior so to have an enriched customer profile by having intelligent KPI’s. The idea is to apply advanced algorithms like factor and cluster analysis for data reduction and customer segmentation based on the behavioral data.
6. Proactive Attrition Management
A wireless telecom companies wants to reduce customer churn by developing a proactive churn management model. The idea is to build a logistic regression based predictive model to develop an incentive plan for enticing would-be churners to remain with the company.
7. Predicting Loan Default
A bank would like to build credit risk model (application score card using PD models) to accept/ reject applications for loans. Also it wants to understand the key drivers for default or delinquency.
8. Key Drivers for Customer credit card spending
The objective of this case study is to understand what's driving the total spend of credit card (Primary Card + Secondary card) and identify the key spend drivers. This will require candidates to apply OLS/ linear regression and follow end-to-end model building process and help set the credit limit and designing new product offerings.
9. Time Series Forecasting
Use time series analysis to forecast the outbound passenger movement for next few quarters.
10. Sentiment Analysis
Objective of this analysis is to obtain data from Twitter and check how the sentiment varies by country for a particular brand/keyword/company.
11. Social Media Analytics Case Study
Objective of this analysis is to obtain the data from social media platforms like Twitter/Facebook/Youtube etc and perform different analysis using text mining and Machine learning techniques
Duration
60 hours
Intended Audience
Candidates from various quantitative backgrounds, like Engineering, Finance, Maths, Statistics, Business Management who want R training with detailed focus on Data Science and Machine Learning applications
Course outlines:
1. Introduction to Data Science With R
- What is analytics & Data Science?
- Common Terms in Analytics
- Analytics vs. Data warehousing, OLAP, MIS Reporting
- Relevance in industry and need of the hour
- Types of problems and business objectives in various industries
- How leading companies are harnessing the power of analytics?
- Critical success drivers
- Overview of analytics tools & their popularity
- Analytics Methodology & problem solving framework
- List of steps in Analytics projects
- Identify the most appropriate solution design for the given problem statement
- Project plan for Analytics project & key milestones based on effort estimates
- Build Resource plan for analytics project
- Why R for data science?
2. Introduction – Data Importing/Exporting
- Introduction R/R-Studio - GUI
- Concept of Packages - Useful Packages (Base & Other packages)
- Data Structure & Data Types (Vectors, Matrices, factors, Data frames, and Lists)
- Importing Data from various sources (txt, dlm, excel, sas7bdata, db, etc.)
- Database Input (Connecting to database)
- Exporting Data to various formats)
- Viewing Data (Viewing partial data and full data)
- Variable & Value Labels – Date Values
3. Data Manipulation
- Data Manipulation steps
- Creating New Variables (calculations & Binning)
- Dummy variable creation
- Applying transformations
- Handling duplicates
- Handling missings
- Sorting and Filtering
- Subsetting (Rows/Columns)
- Appending (Row appending/column appending)
- Merging/Joining (Left, right, inner, full, outer etc)
- Data type conversions
- Renaming
- Formatting
- Reshaping data
- Sampling
- Data manipulation tools
- Operators
- Functions
- Packages
- Control Structures (if, if else)
- Loops (Conditional, iterative loops, apply functions)
- Arrays
- R Built-in Functions (Text, Numeric, Date, utility)
- Numerical Functions
- Text Functions
- Date Functions
- Utilities Functions
- R User Defined Functions
- R Packages for data manipulation (base, dplyr, plyr, data.table, reshape, car, sqldf, etc)
4. Data Analysis - Visualization
- Introduction exploratory data analysis
- Descriptive statistics, Frequency Tables and summarization
- Univariate Analysis (Distribution of data & Graphical Analysis)
- Bivariate Analysis (Cross Tabs, Distributions & Relationships, Graphical Analysis)
- Creating Graphs- Bar/pie/line chart/histogram/boxplot/scatter/density etc)
- R Packages for Exploratory Data Analysis (dplyr, plyr, gmodes, car, vcd, Hmisc, psych, doby etc)
- R Packages for Graphical Analysis (base, ggplot, lattice,etc)
5. Introduction to Statistics
- Basic Statistics - Measures of Central Tendencies and Variance
- Building blocks - Probability Distributions - Normal distribution - Central Limit Theorem
- Inferential Statistics -Sampling - Concept of Hypothesis Testing
- Statistical Methods - Z/t-tests (One sample, independent, paired), Anova, Correlations and Chi-square
6. Introduction to Predictive Modeling
- Concept of model in analytics and how it is used?
- Common terminology used in analytics & modeling process
- Popular modeling algorithms
- Types of Business problems - Mapping of Techniques
- Different Phases of Predictive Modeling
7. Data Exploration for Modeling
8. Data Preparation
- Need of Data preparation
- Consolidation/Aggregation - Outlier treatment - Flat Liners - Missing values- Dummy creation - Variable Reduction
- Variable Reduction Techniques - Factor & PCA Analysis
9. Segmentation: Solving Segmentation problems
- Introduction to Segmentation
- Types of Segmentation (Subjective Vs Objective, Heuristic Vs. Statistical)
- Heuristic Segmentation Techniques (Value Based, RFM Segmentation and Life Stage Segmentation)
- Behavioral Segmentation Techniques (K-Means Cluster Analysis)
- Cluster evaluation and profiling - Identify cluster characteristics
- Interpretation of results - Implementation on new data
10. Linear regression: Solving regression problems
- Introduction - Applications
- Assumptions of Linear Regression
- Building Linear Regression Model
- Understanding standard metrics (Variable significance, R-square/Adjusted R-square, Global hypothesis,etc)
- Assess the overall effectiveness of the model
- Validation of Models (Re running Vs. Scoring)
- Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers etc.)
- Interpretation of Results - Business Validation - Implementation on new data
11. Logistic Regression: Solving Classification Problems
- Introduction - Applications
- Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models
- Building Logistic Regression Model (Binary Logistic Model)
- Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test, Gini, KS, Misclassification, ROC Curve etc)
- Validation of Logistic Regression Models (Re running Vs. Scoring)
- Standard Business Outputs (Decile Analysis, ROC Curve, Probability Cut-offs, Lift charts, Model equation, Drivers or variable importance, etc)
- Interpretation of Results - Business Validation - Implementation on new data
12. Time Series Forecasting: Solving Forecasting Problems
- Introduction - Applications
- Time Series Components ( Trend, Seasonality, Cyclicity and Level) and Decomposition
- Classification of Techniques (Pattern based - Pattern less)
- Basic Techniques - Averages, Smoothening, etc
- Advanced Techniques - AR Models, ARIMA, etc
- Understanding Forecasting Accuracy - MAPE, MAD, MSE, etc
13. Machine Learning -Predictive Modeling – Basics
- Introduction to Machine Learning & Predictive Modeling
- Types of Business problems - Mapping of Techniques - Regression vs. classification vs. segmentation vs. Forecasting
- Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning
- Different Phases of Predictive Modeling (Data Pre-processing, Sampling, Model Building, Validation)
- Overfitting (Bias-Variance Trade off) & Performance Metrics
- Feature engineering & dimension reduction
- Concept of optimization & cost function
- Overview of gradient descent algorithm
- Overview of Cross validation(Bootstrapping, K-Fold validation etc)
- Model performance metrics (R-square, Adjusted R-squre, RMSE, MAPE, AUC, ROC curve, recall, precision, sensitivity, specificity, confusion metrics )
14. Unsupervised Learning: Segmentation
- What is segmentation & Role of ML in Segmentation?
- Concept of Distance and related math background
- K-Means Clustering
- Expectation Maximization
- Hierarchical Clustering
- Spectral Clustering (DBSCAN)
- Principle component Analysis (PCA)
15. Supervised Learning: Decision Trees
- Decision Trees - Introduction - Applications
- Types of Decision Tree Algorithms
- Construction of Decision Trees through Simplified Examples; Choosing the "Best" attribute at each Non-Leaf node; Entropy; Information Gain, Gini Index, Chi Square, Regression Trees
- Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with Numerical Variables; other Measures of Randomness
- Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as Rules
- Decision Trees - Validation
- Overfitting - Best Practices to avoid
16. Supervised Learning: Ensemble Learning
- Concept of Ensembling
- Manual Ensembling Vs. Automated Ensembling
- Methods of Ensembling (Stacking, Mixture of Experts)
- Bagging (Logic, Practical Applications)
- Random forest (Logic, Practical Applications)
- Boosting (Logic, Practical Applications)
- Ada Boost
- Gradient Boosting Machines (GBM)
- XGBoost
17. Supervised Learning: Artificial Neural Networks (Ann)
- Motivation for Neural Networks and Its Applications
- Perceptron and Single Layer Neural Network, and Hand Calculations
- Learning In a Multi Layered Neural Net: Back Propagation and Conjugant Gradient Techniques
- Neural Networks for Regression
- Neural Networks for Classification
- Interpretation of Outputs and Fine tune the models with hyper parameters
- Validating ANN models
18. Supervised Learning: Support Vector Machines
- Motivation for Support Vector Machine & Applications
- Support Vector Regression
- Support vector classifier (Linear & Non-Linear)
- Mathematical Intuition (Kernel Methods Revisited, Quadratic Optimization and Soft Constraints)
- Interpretation of Outputs and Fine tune the models with hyper parameters
- Validating SVM models
19. Supervised Learning: Knn
- What is KNN & Applications?
- KNN for missing treatment
- KNN For solving regression problems
- KNN for solving classification problems
- Validating KNN model
- Model fine tuning with hyper parameters
- Supervised Learning: Naïve Bayes
- Concept of Conditional Probability
- Bayes Theorem and Its Applications
- Naïve Bayes for classification
- Applications of Naïve Bayes in Classifications
20. Text Mining & Analytics
- Taming big text, Unstructured vs. Semi-structured Data; Fundamentals of information retrieval, Properties of words; Creating Term-Document (TxD); Matrices; Similarity measures, Low-level processes (Sentence Splitting; Tokenization; Part-of-Speech Tagging; Stemming; Chunking)
- Finding patterns in text: text mining, text as a graph
- Natural Language processing (NLP)
- Text Analytics – Sentiment Analysis using R
- Text Analytics – Word cloud analysis using R
- Text Analytics - Segmentation using K-Means/Hierarchical Clustering
- Text Analytics - Classification (Spam/Not spam)
- Applications of Social Media Analytics
- Metrics (Measures Actions) in social media analytics
- Examples & Actionable Insights using Social Media Analytics
- Important R packages for Machine Learning (caret, H2O, Randomforest, nnet, tm etc)
- Fine tuning the models using Hyper parameters, grid search, piping etc.
21. Project - Consolidate Learnings:
- Applying different algorithms to solve the business problems and bench mark the results
Học trực tuyến
Học tại Hồ Chí Minh
Học tại Hà Nội