Data Science Workbench Training

Overview:

Cloudera Data Science Workbench Training prepares learners to complete exploratory data science and machine learning projects using Cloudera Data Science Workbench (CDSW).

Delivery Method and Course Duration:

OnDemand: 180 days

Objectives:

Through narrated demonstrations and hands-on exercises, learners gain familiarity with CDSW and develop the skills required to:

  • Navigate CDSW’s options and interfaces with confidence
  • Create projects in CDSW and collaborate securely with other users and teams
  • Develop and run reproducible Python and R code
  • Customize projects by installing packages and setting environment variables
  • Connect to a secure (Kerberized) Cloudera cluster
  • Work with large-scale data using Apache Spark 2 with PySpark and sparklyr
  • Perform full exploratory data science and machine learning workflows in CDSW using Python or R—read, inspect, transform, visualize, and model data
  • Work collaboratively using CDSW together with Git
What To Expect:

This course is designed for learners at organizations using CDSW under a Cloudera Enterprise license or a trial license. The learner must have access to a CDSW environment on a Cloudera cluster running Apache Spark 2. Some experience with data science using Python or R is helpful but not required. No prior knowledge of Spark or other Hadoop ecosystem tools is required

Course outlines:

1.      Overview of CDSW

  • Introduction to CDSW
  • How to Access CDSW
  • Navigating around CDSW
  • User Settings
  • Hadoop Authentication

2.      Projects in CDSW

  • Creating a New Project
  • Navigating around a Project
  • Project Settings

3.      The CDSW Workbench Interface

  • Using the Workbench
  • Using the Sidebar
  • Using the Code Editor
  • Engines and Sessions

4.      Running Python and R Code in CDSW

  • Running Code
  • Using the Session Prompt
  • Using the Terminal
  • Installing Packages
  • Using Markdown in Comments

5.      Using Apache Spark 2 in CDSW

  • Scenario and Dataset
  • Copying Files to HDFS
  • Interfaces to Apache Spark 2
  • Connecting to Spark
  • Reading Data
  • Inspecting Data

6.      Exploratory Data Science in CDSW

  • Transforming Data
  • Using SQL Queries
  • Visualizing Data from Spark
  • Machine Learning with MLlib
  • Session History

7.      Teams and Collaboration in CDSW

  • Collaboration in CDSW
  • Teams in CDSW
  • Using Git for Collaboration

8.      Conclusion

  • Học trực tuyến

  • Học tại Hồ Chí Minh

  • Học tại Hà Nội


Các khóa học khác