Skip to main content

Scalable data preparation & ML using Apache Spark on AWS (Level 200)

Analyzing, transforming and preparing large amounts of data is a foundational step of any data science and ML workflow. This session shows how to build end-to-end data preparation and machine learning (ML) workflows. We explain how to connect Apache Spark, for fast data preparation in your data processing environments on Amazon EMR and AWS Glue interactive sessions from Amazon SageMaker Studio. Uncover how to access data governed by AWS Lake Formation to interactively query, explore, visualize data, run and debug Spark jobs as you prepare large-scale data for use in ML. Download slides »
Speaker: Suman Debnath, Principal Developer Advocate, Data Engineering, AWS
Duration: 30mins