Data Science Workbench Training
Overview:
Cloudera Data Science Workbench Training prepares learners to complete exploratory data science and machine learning projects using Cloudera Data Science Workbench (CDSW).
Delivery Method and Course Duration:
OnDemand: 180 days
Objectives:
Through narrated demonstrations and hands-on exercises, learners gain familiarity with CDSW and develop the skills required to:
- Navigate CDSW’s options and interfaces with confidence
- Create projects in CDSW and collaborate securely with other users and teams
- Develop and run reproducible Python and R code
- Customize projects by installing packages and setting environment variables
- Connect to a secure (Kerberized) Cloudera cluster
- Work with large-scale data using Apache Spark 2 with PySpark and sparklyr
- Perform full exploratory data science and machine learning workflows in CDSW using Python or R—read, inspect, transform, visualize, and model data
- Work collaboratively using CDSW together with Git
What To Expect:
This course is designed for learners at organizations using CDSW under a Cloudera Enterprise license or a trial license. The learner must have access to a CDSW environment on a Cloudera cluster running Apache Spark 2. Some experience with data science using Python or R is helpful but not required. No prior knowledge of Spark or other Hadoop ecosystem tools is required
Course outlines:
1. Overview of CDSW
- Introduction to CDSW
- How to Access CDSW
- Navigating around CDSW
- User Settings
- Hadoop Authentication
2. Projects in CDSW
- Creating a New Project
- Navigating around a Project
- Project Settings
3. The CDSW Workbench Interface
- Using the Workbench
- Using the Sidebar
- Using the Code Editor
- Engines and Sessions
4. Running Python and R Code in CDSW
- Running Code
- Using the Session Prompt
- Using the Terminal
- Installing Packages
- Using Markdown in Comments
5. Using Apache Spark 2 in CDSW
- Scenario and Dataset
- Copying Files to HDFS
- Interfaces to Apache Spark 2
- Connecting to Spark
- Reading Data
- Inspecting Data
6. Exploratory Data Science in CDSW
- Transforming Data
- Using SQL Queries
- Visualizing Data from Spark
- Machine Learning with MLlib
- Session History
7. Teams and Collaboration in CDSW
- Collaboration in CDSW
- Teams in CDSW
- Using Git for Collaboration
8. Conclusion
Học trực tuyến
Học tại Hồ Chí Minh
Học tại Hà Nội