Data Engineering Zoomcamp (@dezoomcamp): This week, we're starting . Reminder: the is in less than 24 hours. In this module, you'l…

This week, we're starting Module 6: Batch Processing.

Reminder: the previous homework deadline is in less than 24 hours.

In this module, you'll learn how batch processing works with Spark and PySpark.

You'll cover:

• Batch processing fundamentals and Spark basics
• Installing and running Spark locally or in Colab
• Working with Spark SQL and DataFrames
• Handling schemas and processing NYC taxi data
• How Spark clusters, joins, and groupBy work internally
• Running Spark in the cloud with Dataproc and BigQuery

Homework deadline: 10 March, 12 AM CET

We also recently had a workshop with dlt on AI-assisted data ingestion. Watch the recording and check out the code if you missed it. Practice what you learned in the homework assignment, and submit it here.

Data Engineering Zoomcamp

30.1K

Обсуждение 4

Data Engineering Zoomcamp

Пожаловаться

Обсуждение 4

Data Engineering Zoomcamp

Вход в экосистему

Ваши настройки cookie