Data Science using S-A-S

Overview:

This S-A-S Data Science training encompasses basic statistical concepts to advanced analytics and predictive modelling techniques using Stastical Analysis System.

This course is designed considering some specific industry segments and is preferred for reporting analytics and predictive modelling, while tools like R and Python gets an edge when it comes advance data science, machine learning and AI applications.

Post completion of this S-A-S Data Science training candidates can also appear for optional Global Certification (Certification fee not included in the course fee).

Crafted and delivered by a team of industry experts, this comprehensive S-A-S data science training has all the components required to give you a head-start into the field of advance Analytics!

Duration:

80 hours

Intended Audience:

Candidates from various quantitative backgrounds, like Engineering, Finance, Maths, Statistics, Business Management who learn SAS & R for advanced analytics job roles.

 Course outlines:

1.      Introduction to the Analytics World and ETL

  • Analytics World
    • Introduction to Analytics
    • Concept of ETL
    • S-A-S in advanced analytics
    • Global Certification: Induction and walk through
      • Getting Started
      • Software installation
      • Introduction to GUI
      • Different components of the language
      • All programming windows
      • Concept of Libraries and Creating Libraries
      • Variable Attributes - (Name, Type, Length, Format, In format, Label)
      • Importing Data and Entering data manually
      • Understanding Datasets
        • Descriptor Portion of a Dataset (Proc Contents)
        • Data Portion of a Dataset
        • Variable Names and Values
        • Data Libraries

2.      Base S-A-S-Accessing the data

  • Understanding Data Step Processing
    • Data Step and Proc Step
    • Data step execution
    • Compilation and execution phase
    • Input buffer and concept of PDV
    • Importing Raw Data Files
      • Column Input and List Input and Formatted methods
      • Delimiters, Reading missing and non-standard values
      • Reading one to many and many to one records
      • Reading Hierarchical files
      • Creating raw data files and put statement
      • Formats / Informat
      • Importing and Exporting Data (Fixed Format / Delimited)
      • Proc Import / Delimited text files
      • Proc Export / Exporting Data
      • Datalines / Cards;
      • Atypical importing cases (mixing different style of inputs)
        • Reading Multiple Records per Observation
        • Reading “Mixed Record Types”
        • Sub-setting from a Raw Data File
        • Multiple Observations per Record
        • Reading Hierarchical Files
        • Importing Tips

3.      Data Understanding, Managing and Manipulation

  • Understanding and Exploration Data
    • Introduction to basic Procedures - Proc Contents, Proc Print
    • Understanding and Exploration Data
      • Operators and Operands
      • Conditional Statements (Where, If, If then Else, If then Do and select when)
      • Difference between WHERE and IF statements and limitation of WHERE statements
      • Labels, Commenting
      • System Options (OBS, FSTOBS, NOOBS etc…)
      • Data Manipulation
        • Proc Sort - with options / De-Duping
        • Accumulator variable and By-Group processing
        • Explicit Output Statements
        • Nesting Do loops
        • Do While and Do Until Statement
        • Array elements and Range
        • Combining Datasets (Appending and Merging)
          • Concatenation
          • Interleaving
          • Proc Append
          • One To One Merging
          • Match Merging
          • IN = Controlling merge and Indicator

4.      Data Mining with Proc SQL

  • Introduction to Databases
  • Introduction to Proc SQL
  • Basics of General SQL language
  • Creating table and Inserting Values
  • Retrieve & Summarize data
  • Group, Sort & Filter
  • Using Joins (Full, Inner, Left, Right and Outer)
  • Reporting and summary analysis
  • Concept of Indexes and creating Indexes (simple and composite)
  • Connecting S-A-S to external Databases
  • Implicit and Explicit pass through methods

5.      Macros for Automation

  • Macro Parameters and Variables
  • Different types of Macro Creation
  • Defining and calling a macro
  • Using call Symput and Symget
  • Macros options (mprint symbolgen mlogic merror serror)

6.      Fundametal of Statistics

  • Basic Statistics - Measures of Central Tendencies and Variance
  • Building blocks - Probability Distributions - Normal distribution - Central Limit Theorem
  • Inferential Statistics -Sampling - Concept of Hypothesis Testing
  • Statistical Methods - Z/t-tests( One sample, independent, paired), Anova, Correlations and Chi-square

7.      Data preparation

  • Need of Data preparation
  • Data Audit Report and Its importance
  • Consolidation/Aggregation - Outlier treatment - Flat Liners - Missing values- Dummy creation - Variable Reduction
  • Variable Reduction Techniques - Factor & PCA Analysis

8.      Segmentation

  • Introduction to Segmentation
  • Types of Segmentation (Subjective Vs Objective, Heuristic Vs. Statistical)
  • Heuristic Segmentation Techniques (Value Based, RFM Segmentation and Life Stage Segmentation)
  • Behavioural Segmentation Techniques (K-Means Cluster Analysis)
  • Cluster evaluation and profiling
  • Interpretation of results - Implementation on new data

9.      Linear Regression

  • Introduction - Applications
  • Assumptions of Linear Regression
  • Building Linear Regression Model
  • Understanding standard metrics (Variable significance, R-square/Adjusted R-square, Global hypothesis ,etc)
  • Validation of Models (Re running Vs. Scoring)
  • Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers etc.)
  • Interpretation of Results - Business Validation - Implementation on new data

10.  Logistic Regression

  • Introduction - Applications
  • Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models
  • Building Logistic Regression Model
  • Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test, Gini, KS, Misclassification, etc)
  • Validation of Logistic Regression Models (Re running Vs. Scoring)
  • Standard Business Outputs (Decile Analysis, ROC Curve,
    Probability Cut-offs, Lift charts, Model equation, Drivers, etc)
  • Interpretation of Results - Business Validation -Implementation on new data

11.  Time Series forecasting

  • Introduction - Applications
  • Time Series Components (Trend, Seasonality, Cyclicity and Level) and Decomposition
  • Classification of Techniques (Pattern based - Pattern less)
  • Basic Techniques - Averages, Smoothening, etc
  • Advanced Techniques - AR Models, ARIMA, etc
  • Understanding Forecasting Accuracy - MAPE, MAD, MSE, etc

12.  Introduction to Machine Learning

  • Statistical learning vs. Machine learning
  • Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning
  • Concept of Overfitting and Under fitting (Bias-Variance Trade off) & Performance Metrics
  • Types of Cross validation (Train & Test, Bootstrapping, K-Fold validation etc)

13.  Regression & Classification model building

  • Recursive Partitioning (Decision Trees)
  • Ensemble Models (Random Forest, Bagging & Boosting)
  • K-Nearest Neighbours
  • Học tại Hồ Chí Minh

  • Học tại Hà Nội

  • Học trực tuyến


Các khóa học khác