Accelerating Data Engineering Pipelines

3 Lessons

8h

Duration

English

Language

Prerequisites:

Intermediate knowledge of Python (list comprehension, objects)
Familiarity with pandas a plus
Introductory statistics (mean, median, mode)

Technologies: pandas, cuDF, Dask, NVTabular, Plotly

Assessment Type: Skills-based coding assessments evaluate your ability to efficiently filter through millions of data points in the context of an interactive dashboard.

Certificate: Upon successful completion of the assessment, you’ll receive an NVIDIA DLI certificate to recognize your subject matter competency and support your professional career growth.

Hardware Requirements: You’ll need a desktop or laptop computer capable of running the latest version of Chrome or Firefox. You’ll be provided with dedicated access to a fully configured, GPU-accelerated workstation in the cloud

Data engineering is the foundation of data science and lays the groundwork for analysis and modeling. In order for organizations to extract knowledge and insights from structured and unstructured data, fast access to accurate and complete datasets is critical. Working with massive amounts of data from disparate sources requires complex infrastructure and expertise. Minor inefficiencies can result in major costs, both in terms of time and money, when scaled across millions to trillions of data points.

In this workshop, we’ll explore how GPUs can improve data pipelines and how using advanced data engineering tools and techniques can result in significant performance acceleration. Faster pipelines produce fresher dashboards and machine learning (ML) models, so users can have the most current information at their fingertips.

Learning Objectives

By participating in this workshop, you’ll learn:

How data moves within a computer. How to build the right balance between CPU, DRAM, Disk Memory, and GPUs.
How different file formats can be read and manipulated by hardware.
How to scale an ETL pipeline with multiple GPUs using NVTabular.
How to build an interactive Plotly dashboard where users can filter on millions of data points in less than a second.

Learning Path

Introduction

Meet the instructor.
Create an account at courses.nvidia.com/join

15 mins

Lesson 1 - Data on the Hardware Level

Explore the strengths and weaknesses of different hardware approaches to data and the frameworks that support them:

- - Pandas
  - CuDF
  - Dask

Break

15 mins

Lesson 2 - ETL with NVTabular

Learn how to scale an ETL pipeline from 1 GPU to many with NVTabular through the perspective of a big data recommender system.

- Transform raw json into analysis-ready parquet files
- Learn how to quickly add features to a dataset, such as Categorify and Lambda operators

Break

60 mins

Lesson 3 - Data Visualization

Step into the shoes of a meteorologist and learn how to plot precipitation data on a map.

- Learn how to use descriptive statistics and plots like histograms in order to assess data quality
- Learn effective memory usage, so users can quickly filter data through a graphical interface

FInal Project: Data Detective

Users are complaining that the dashboard is too slow. Apply the techniques learned in class to find and eliminate efficiencies in the backend code

60 mins

Final Review

Review key learnings and answer questions.
Complete the assessment and earn your certificate.
Complete the workshop survey.
Learn how to set up your own AI application development environment.

More Courses

Course 2

Fundamentals of Deep Learning

In this workshop, you’ll learn how deep learning works through hands-on exercises in computer vision and natural language processing. You’ll train deep learning models from scratch, learning tools and tricks to achieve highly accurate results.

Fundamentals of Deep Learning

3 lessons - 8 hours

View Course

Course 3

Fundamentals of Accelerated Computing with CUDA Python

This workshop teaches you the fundamental tools and techniques for running GPU-accelerated Python applications using CUDA® GPUs and the Numba compiler.

Accelerated Computing with CUDA Python

3 lessons - 8 hours

View Course

Leistungen

Journal

Karriere

Über uns

Accelerating Data Engineering Pipelines

3

Lessons

8h

Duration

English

Language

Learning Objectives

Learning Path

More Courses

You might also be interested in these courses

Course 2

Fundamentals of Deep Learning

Fundamentals of Deep Learning

Course 3

Fundamentals of Accelerated Computing with CUDA Python

Accelerated Computing with CUDA Python

Kontakt