Azure DataBricks - Data Engineering With Real Time Project

Real Time Project on Retail Data - PySpark ,SQL, Delta/Delta Live Table,Unity Catalogue, AutoLoader & Performance Tuning

Azure DataBricks - Data Engineering With Real Time Project
Azure DataBricks - Data Engineering With Real Time Project

Azure DataBricks - Data Engineering With Real Time Project free download

Real Time Project on Retail Data - PySpark ,SQL, Delta/Delta Live Table,Unity Catalogue, AutoLoader & Performance Tuning

By Completing this course you will be equipped with below Data Engineer Roles & Responsibilities in the real time project

• Designing and Configuring Unity Catalogue for Better Access Control & Connecting to External Data Stores

• Designing and Developing Databricks(PySpark) Notebooks to Ingest the data from Web(HTTP) Services

• Designing and Developing Databricks(PySpark) Notebooks to Ingest the data from SQL Databases

• Designing and Developing Databricks(PySpark) Notebooks to Ingest the data from API source Systems

• Designing and Developing Spark SQL External and Managed Tables

• Developed Databricks Spark SQL Reusable Notebooks To Create and populate Delta Lake Tables

• Developed Databricks SQL Code to populate Reporting  Dimension tables

• Developed Databricks SQL Code to populate Reporting  SCD Type 2 Dimension tables

• Developed Databricks SQL Code to populate Reporting Fact Table

• Designing and Developing Databricks(PySpark ) Notebooks to Process and Flatten Semi Structured JSON Data using EXPLODE function

• Designing and Developing Databricks(PySpark ) Notebooks to Integrate(JOIN)  Data and load into Datalake Gold Layer

• Designing and Developing Databricks(PySpark) Notebooks to Process  Semi Structured JSON Data in DataLake Silver Layer

• Designing and Developing Databricks(SQL) Notebooks to Integrate Data and load into Datalake Gold Layer

• Developed Databricks Jobs for Scheduling the Data Ingestion  and Transformation Notebooks

• Designing and Configuring Delta Live Tables in all layers for seamless Data Integration

• Setup Azure Monitor and Log Analytics for Automated Monitoring of Job Runs and Stored Extended Log Details

• Setup Azure Key Vault and Configure Key Vault Backed Secret Scopes in Databricks Workspace

• Configuring GitHub Repository and creating Git Repo Folders in Databricks Workspace

• Designing and Configuring CI/CD Pipelines to release the code into multiple environment

• Identifying performance bottle necks and perform the performance tuning using ZORDER BY , BROADCAST JOIN , ADAPTIVE QUERY EXECUTION , DATA SALTING and LIQUID CLUSTERING