avatar
Data Engineering Zoomcamp
@dezoomcamp
02.03.2026 08:39
This week, we're starting Module 6: Batch Processing.

Reminder: the previous homework deadline is in less than 24 hours.

In this module, you'll learn how batch processing works with Spark and PySpark.

You'll cover:

• Batch processing fundamentals and Spark basics
• Installing and running Spark locally or in Colab
• Working with Spark SQL and DataFrames
• Handling schemas and processing NYC taxi data
• How Spark clusters, joins, and groupBy work internally
• Running Spark in the cloud with Dataproc and BigQuery

Homework deadline: 10 March, 12 AM CET

We also recently had a workshop with dlt on AI-assisted data ingestion. Watch the recording and check out the code if you missed it. Practice what you learned in the homework assignment, and submit it here.
GitHub
data-engineering-zoomcamp/06-batch at main · DataTalksClub/data-engineering-zoomcamp
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼 - DataTalksClub/data-engineering-zoomcamp
15
👍 2
4 3 8.8K

Обсуждение 4

Обсуждение не доступно в веб-версии. Чтобы написать комментарий, перейдите в приложение Telegram.

Обсудить в Telegram

Data Engineering Zoomcamp

30.1K
Открыть в Telegram