This week, we're starting
Module 6: Batch Processing.
Reminder: the
previous homework deadline is in less than 24 hours.
In this module, you'll learn how batch processing works with Spark and PySpark.
You'll cover:
• Batch processing fundamentals and Spark basics
• Installing and running Spark locally or in Colab
• Working with Spark SQL and DataFrames
• Handling schemas and processing NYC taxi data
• How Spark clusters, joins, and groupBy work internally
• Running Spark in the cloud with Dataproc and BigQuery
Homework deadline: 10 March, 12 AM CET
We also recently had a
workshop with dlt on AI-assisted data ingestion.
Watch the recording and
check out the code if you missed it. Practice what you learned in the
homework assignment, and
submit it here.
Обсуждение 4
Обсуждение не доступно в веб-версии. Чтобы написать комментарий, перейдите в приложение Telegram.
Обсудить в Telegram