Data Engineering Bootcamp - Series 1
Get Started Today and Build Your Career in Data Engineering!

Data Engineering Bootcamp - Series 1 free download
Get Started Today and Build Your Career in Data Engineering!
Take your first step into the world of data engineering and future-proof your career with this hands-on, project-based bootcamp built on the modern data stack. Taught by a seasoned data architect with over 11 years of industry experience, this course blends theory with practice, designed for aspiring data engineers, software engineers, analysts, and anyone eager to learn how to build real-world data pipelines.
You will learn to design scalable data lakes, build dimensional data models, implement data quality frameworks, and orchestrate pipelines using Apache Airflow, all using a real-life ride-hailing application use case to simulate enterprise-scale systems.
What You’ll Learn
Section 1: Context Setup
Build your foundation with the Modern Data Stack, understand OLTP systems, and explore real-world data platform architectures.
Gain clarity on how data flows in data-driven companies
Learn using a ride-hailing app scenario
Get properly onboarded into the bootcamp journey
Section 2: Data Lake Essentials
Learn how to build and manage scalable data lakes on AWS S3.
S3 architecture, partitioning, layers, and schema evolution
IAM, encryption, storage classes, event notifications
Lifecycle management, backup & recovery
Hands-on with Boto3 S3 APIs
Section 3: Data Modeling
Master star schema design and implement SCD Type 1 and Type 2 dimensions.
Dimensional & Fact modeling
ETL development for analytical reporting
Build end-to-end models and data marts with hands-on labs
Section 4: Data Quality
Ensure trust and integrity in your data pipelines.
Understand accuracy, completeness, and consistency
Implement DQ checks using industry best practices
Use data contracts for accountability
Section 5: AWS Athena
Query massive datasets with serverless power using AWS Athena.
Learn DDL, Glue Catalog, and workgroup management
Automate queries using Boto3 APIs
Compare Athena vs Presto vs Trino
Optimize queries with best practices
Section 6: Apache Spark
Build production-grade data pipelines with PySpark on AWS EMR.
Learn Spark architecture and PySpark APIs
Build data pipelines using the WAP (Write-Audit-Publish) pattern
Run scalable jobs on AWS EMR
Apply UDFs and data quality within transformation logic
Section 7: Apache Airflow
Orchestrate workflows using Airflow and build custom plugins:
Design DAGs, schedule pipelines, manage dependencies
Automate Spark jobs using custom AWS EMR plugin
Hands-on labs for ingestion and transformation DAGs
Build reliable, reusable orchestration solutions
What You’ll Build
A production-style data platform for a ride-hailing company, including:
Data lake on AWS S3
Dimensional data model with SCD logic
Spark-based transformation pipelines
Automated orchestration with Airflow
Query layer with Athena
Built-in data quality validations