Azure Data Engineering Full Stack

Overview

Azure data engineers are responsible for data-related tasks that include provisioning data storage services, batch data and ingesting streaming, implementing security requirements, transforming data, implementing data retention policies, identifying performance bottlenecks, and accessing external data sources. In the world of big data, raw, unorganized data is often stored in relational, non-relational, and other storage systems. However, on its own, raw data doesn’t have the proper context or meaning to provide meaningful insights to analysts, data scientists, or business decision makers.

Big data requires a service that can orchestrate and operationalize processes to refine these enormous stores of raw data into actionable business insights. Azure Data Factory Training in Hyderabad is a managed cloud service that’s built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.

For example, imagine a gaming company that collects petabytes of game logs that are produced by games in the cloud. The company wants to analyze these logs to gain insights into customer preferences, demographics, and usage behavior. It also wants to identify up-sell and cross-sell opportunities, develop compelling new features, drive business growth, and provide a better experience to its customers.

Azure Data Factory:

Module 1: Azure Data Factory Introduction

What is Azure Data Factory(ADF)?
Azure Data Factory Key Components
- Pipeline
- Activity
- Linked Service
- Data Set
- Integration Runtime
- Triggers
- Data Flows
Create Azure Bolb Storage Account
Create Azure data lake Storage Gen2 Account
Create Azure SQL Database
Creation of Azure Data Factory Resourse

Module 2: Working with Copy Data Activity

Understanding Azure Data Factory UI
Data Ingestion from Blob Storage Service to Azure SQL Database
Data Ingestion from Azure Blob Storage to Data Lake Storage Gen2
Create Linked service for various data stores and compute
Creation of Datasets that points to file and table
Design Pipelines with various activities
Create SQL Server on Virtual Machines( On-Premise)
Define Copy activity and it features
Copy Activity-Copy Behaviour
Copy Activity_Data Integration Units
Copy Activity- User Properties
Copy Activity- Number of parallel copies

Module 3 : Azure Data Factory- General Activities

Lookup Activity
Get Metadata Activity
Stored Procedure Activity
Execute Pipeline Activity
Delete Activity
Set Variable Activity
Script Activity
Validation Activity
Web Activity
Wait Activity
Understanding of Each Activity
Filter Activity

Module 4 : Azure Data Factory – Interation & Conditionals

Filter Activity
ForEach Activity
Switch Activity
if Condition Activity
Until Activity

Module 5 : Azure Data Factory – Types of Integration Runtimes

Azure IR (Auto Resolve Integration Runtime)
Selfhosted IR
SSIS IR

Module 6 : Azure Data Factory – Types of Triggers

Stoarge Event Tigger
Schedule Trigger
Tumbling Window Trigger

Module 7 : Introduction to DataFlows

Filter Transformation
Select Transformation
Derived Column Transformation
Aggregator Transformation
Join Transformation
Union Transformation

Module 8 : Practical Scenarios and Use Cases

Practice_Session1_Copy Data from File System to Azure SQL Database
Practice_Session2_Copy Data from Multiple Files(ADLS Gen2)To Multiple Tables(Azure SQL Database)
Practice_Session3_Copy Data from Multiple Files(ADLS Gen2)To Multiple Tables(Azure SQL Database) Using Parameters
Practice_Session4_Dynamically Copy Multiple Files(ADLS Gen2)To Multiple Tables(Azure SQL Database)
Practice_Session5_Dynamically Copy Multiple Files To Multiple Tables_Lookup_GetMetadata_For-Each_If Condition Activities
Practice_Session6_Copy Multiple CSV Files with Same Structure To Single Table
Practice_Session7_FilteringFileFormats Using Getmetadata_Filter_ForEach_Copy_Activity
Practice_Session8_Bulk Copy from Tables to Files Using Config Table
Practice_Session9_Bulk Copy from Tables to Files Using Lookup Activity_Custom SQL Query
Practice_Session10_Container Parameterization_Using Lookup and For-Each Activity
Practice_Session11_Azure Key Vault Integration with ADF Resource
Practice_Session12_Pipeline Execution_Success Audit log and Failure Audit Log
Practice_Session13_Pipeline Execution Automation_Schedule Trigger_Storage Event Trigger
Practice_Session14_Copy Data from On-Premise SQL Server to ADLS Gen2 using Self hosted IR
Practice_Session15_Email Notifications_Logic Apps
Practice_Session16_Incremental OR Delta Load Implementation
Practice_Session17_ADF_Designing DataFlows

Module 9: ADF_Assignments & Case Studies

ADF_Assignment1_Create Azure Blob Storage Account_Dala Lake Storage Gen2 Account
ADF_Assignment2_Create Azure SQL Database Instance
ADF_Assignment3_Data Ingestion_Copy Data Tool(CDT)
ADF_Assignment4_Add New Columns While Copying Data
ADF_Assignment5_CopyData Activity_Executepipeline Activity_ADLS Gen2_SQLDB
ADF_Assignment6_FilterFileFormats based on File Size and Delete Files from Source Storage
ADF_Assignment7_Insert Metadata_Get Metadata_Stored Procedure Activity
ADF_Assignment8_Insert Metadata_About CSV Files in Azure Storage_Get Metadata_Stored Procedure Activity
ADF_Assignment9_CopyData Activity_Linked Service_Dataset_Pipeline Parameters_Copy Multiple Files_To_Tables
ADF_Assignment10_Copy Data Activity_Copy Behaviour
ADF_Assignment11_Snowflake_Integration
ADF_Assignment12_Snowflake_To_ADLS_Gen2_StagedCopy
ADF_Assignment13_ADF_AWS_S3_Bucket_Integration
ADF_Assignment14_GCP_To_ADLS_Gen2_Integration
ADF_Assignment15_Dataflows_Rank Transformation
ADF_Assignment16_Dataflows_Parse Transformation
ADF_Assignment17_Dataflows_Stringfy Transformation
ADF_Assignment18_Dataflows_SurrogateKey_Transformation
ADF_Assignment19_Dataflows_Windows Transformation
ADF_Assignment20_Dataflows_Coniditional Split_Transformation
ADF_Assignment21_Dataflows_Aggregator_Sorter Transformation
ADF_Assignment22_Dataflows_Lookup Transformation
ADF_Assignment23_Dataflows_Exists Transformation
ADF_Assignment24_REST API Integration
ADF_Assignment25_Data Activity_Filter By Last Modified Date
ADF_Assignment26_Data Activity_Copy behaviour_Preserve Hierarchy_Flatten Hierarchy_Merge Files
ADF_Assignment27_Copy Data Activity_Filter By Last Modified Date_Dynamic Date Expressions
ADF_Assignment28_Copy Data from JSON File To Azure SQL Database Table
ADF_Assignment29_Execute Copy Data Activity based on File Count in the Container
ADF_Assignment30_Copy Data Activity_List of Files Configuration
ADF_Assignment31_Dataflows_Flatten Transformations
ADF_Assignment32_Dataflows_Pivot Transformations
ADF_Assignment33_Databricks Notebook_Integration with Azure Data Factory
ADF_Assignment34_Thumbling Window Trigger_Introduction
ADF_Assignment35_Implement_Thumbling Window Trigger
ADF_Assignment36_Differences Between Debug VS Tigger Now
ADF_Assignment37_Row Format Storage Internals
ADF_Assignment38_Columnar Format Storage Internals
ADF_Assignment39_Copy Data_On-premise File System To ADLS Gen2
ADF_Assignment40_Copy Data from On-premise To Azure Cloud Storages
ADF_Assignment41_Copy Data Activity_Excel File Formats
ADF_Assignment42_Copy Data Activity_Excel File Formats_Lookup Activity_Pipeline Variables
ADF_Assignment43_Copy Data Activity_XML File Formats
ADF_Assignment44_Insert the Metadata about a storage Container Dynamically using Parameterized Stored Procedure
ADF_Assignment45_Introduction To Slowly Changing Dimensions
ADF_Assignment46_Implementation of SCD Type1 Dimension
ADF_Assignment47_SCD Type2 Introduction
ADF_Assignment48_SCD Type2 Implementation

Azure Synapse Analytics:

Module 1: Processing Data Using Azure Synapse Analytics

Provisioning an Azure Synapse Analytics Workspace
Analyzing data using serverless SQL pool
Provisioning and configuring Spark pools
Processing data using Spark pools and a lake database
Querying the data in a lake database from serverless SQL pool
Scheduling naotebooks to process data incrementally

Module 2: Synapse DataFlows

Copying data using a Synapse data flow
Performing data transformation using activities such as join,sort, and filter
Monitoring data flows and pipelines
Configuring partitions to optimize data flows
Parameterizing mapping data flows
handling schema changes dynamically in data flows using schema drift

Module 3: Azure Synapse SQL Pool

Loading data into dedicated SQL pools using Polybase and T-SQL
Loading data into dedicated SQL pools using COPY INTO
Creating distributed tables and modifying table distribution
Creating statistics and automating the update of statistics

Module 4: Monitering Synapse SQL and Spark Pools

Configuring a Log Analytics workspace for Synapse SQL Pools
A Log Analytics workspace for Synapse Spark Pools
Using Kusto queries to monitorSQL and Spark Pools
Creating workbooks in a log Analytics workspace to visualize monotoring data
Monitoring table disbrution,dataskew, and index health using Syanapse DMVs
Building monitoring dashboards for Synapse with Monitor

Module 5: Synapse Pipelines to Orchestrate Data

Introducing Synapse Pipelines
- Integration runtime
  - Azure IR
  - Self Hosted IR
- Activities
- Pipelines
- Triggers
  - Scheduled trigger
  - Storage event Trigger
  - Tumbling window Trigger
Creating linked services
Defining source and target datasets
Using various activities in Synapse pipelines
Scheduling Synapse pipelines

Module 6: Working with Python and Spark SQL in Azure Syanapse

Pyspark (Python)
Spark(Scala)
.NET Spark (C#)
Spark SQL

Module 7: Azure Synapse dedicated SQL Pool

Hash-distributed tables
Round-robin-distributed tables
Replicated tables

Azure Databricks:

Module 1: Introduction to Azure Databricks

Introduction to Databricks
Azure Databricks Architecture
Azure Databricks Main Concepts
Types of Data Processing Paradigms_Traditional Data Processing Approach
Traditional Data Processing vs Distributed Computing Framework
Different Distributed Computing Frameworks_Hadoop vs Apache Spark
Evolution of Azure Databricks History

Module 2: Core Databricks Concepts

Workspace
Notebooks
Library
Folder
Repos
Data
Compute
Workflows

Module 3: Types Of Clusters

All-Purpose Clusters
Job Clusters
Pools

Module 4: Databricks – Internal Storage

Databricks File System (DBFS)

Module 5: Databricks – External Storage

Azure Blob Storage
Azure Datalake Storage Gen2
Azure SQL Database
Azure Synapse Dedicated SQL Pool
Snowflake

Module 6: Storages – Azure Credentials

Account Access Key
Shared Access Signature Token
OAuth2.0 Azure Service Principal

Module 7: Databricks Notebooks – Magic Commands

%Python or %py
%r
%scala
%sql

Module 8: Databricks Utilities

File System Utility
Widgets Utility
Secrets Utility
Notebook Utility

Module 9: Bigdata File Format

Row – Based File Formats
- CSV,TSV, and AVRO
Columnar File Formats
- Parquet,Delta, and ORC

Module 10: CSV File Format

Reading Data
Reading Data from Multiple CSV Files
Writing Data

Module 11: JSON File Format

Single Line JSON
Multi Line JSON
Complex Multi Line JSON
- Arrays
- Struct Fields

Module 12: Excel File Format

Single Sheet Reading
Multiple Sheet Reading Using List object
Dynamically Reading Multiple Sheets

Module 13: XML File Format

Simple XML Files
Complex XML Files

Module 14: Libraries

Install Cluster Libraries
- Maven Package
- PyPI Package
- CRAN Package

Module 15: Databricks – Big Data Workloads

Batch Processing
Structured Streaming ( Real Time Processing)

Module 16: Databricks – Accesing Azure Data Lake

Account Access Key
Shared Access Signature Token
Mounting Azure Data Lake (Service Principle)

Module 17: Spark Structured Streaming

ReadStream
WriteStream
output modes
Triggers
- Fixed Interval
- One Time
- Continues
Managing Streams

Module 18: Azure databricks – Types of Loads

History Load
Incremental Load

Module 19: Notebook – Code Modularity

%run
dbutils.notebook.run()

Module 20: Introduction To Spark SQL Module

Managed Tables(Internal Tables)
- DataFrame API
- Spark SQL API
Un-Manged Tables(External Tables)
- DataFrame API
- Spark SQL API
Temporary Views(Temporary Table)
Global Temporary Views

Module 21: Introduction To Delta Lake

Delta Lake Features
- ACID transactions
- Handling metadata
- Streaming and batch workfloads
- Schema enforcement
- Time travel
- Upserts and delets
Delta Lake Components
- _delta_log(Transaction log)
- Versioned parquet files
Delata lake Operations
- Create Table
- Upsert to a table
- Read a table
- Update a table
- Delete from a table
- Display table history
- Time table
- Clean up snapshots with VACUUM
- Delta Lake table history
- Restore a Delta table to an earlir state
- Vacuum unused data files

Module 22: Delta Lake – Slowly Changing Dimension

Type1 Dimension
Type2 Dimension
Type3 Dimension

Module 23: Databricks – Azure SQL Database

Reading Data With Jdbc Driver
Writing Data With Jdbc Driver

Module 24: Databricks – Synapse Dedicated SQL Pool

Reading Data From Synapse Table
Writing Data To Synapse Table

Module 25: Databricks – Snowflake

Reading Data From Snowflake Table
Writing Data To Snowflake Table

Module 26: Delta Lake – Performance Optimization Technics

OPTIMIZE a Table
Z-ORDER by Columns

Module 27: Databricks Integration With Azure Data Factory

Call a Notebook using Notebook Activity
SetVariable Activity
Trigger ADF Pipeline

Module 28: Azure Key Vault Integration With databricks

Create Secrets
Create SecretScope

Azure Databricks Practice Sessions :

ADB_Session1_Types of Data Processing Paradigms_Traditional Data Processing Approach ADB_Session2_Traditional Data Processing vs Distributed Computing Framework ADB_Session3_Different Distributed Computing Frameworks_Hadoop vs Apache Spark ADB_Session4_Evolution of Azure Databricks History ADB_Session5_Introduction to Azure Databricks_Create Azure Databricks Workspace ADB_Session6_Azure Databricks Workspace Assets ADB_Session7_Azure Databricks_Magic Commands ADB_Session8_Azure Databricks File System(DBFS) ADB_Session9_DBFS_dbutils.fs Utility_%fs Magic Command ADB_Session10_DBFS_dbutils.fs Utility_%fs Magic Command_%sh Shell Command ADB_Session11_Azure Databricks_dbutils_Widgets Utility ADB_Session12_Reading Data from CSV File Format ADB_Session13_Reading Data from Simple Single Line JSON File Format ADB_Session14_Reading Data from Simple Multi Line JSON File Format ADB_Session15_Reading Data from Complex Multi Line JSON File _Flattening Arrays_Struct fields ADB_Session16_Reading Data from Excel File Format ADB_Session17_Reading Data from XML File Format ADB_Session18_Azure Databricks_Batch Data Processing ADB_Session19_Azure Databricks_Structured Streaming API ADB_Session20_Azure Databricks_History Load_Incremental Load ADB_Session21_Azure Databricks Integration with Azure Data Factory ADB_Session22_Calling a Notebook from Another Notbook using %run ADB_Session23_Calling a Notebook from Another Notbook using dbutils.notebook.run() ADB_Session24_Introduction to Spark SQL Module ADB_Session25_Create Managed Tables Using DataFrame API and Spark SQL API ADB_Session26_Create Un-Managed Tables Using DataFrame API and Spark SQL API ADB_Session27_Introduction to Delta Lake ADB_Session28_Schema Validation_Schema Evalution_DeltaTableBuilder API ADB_Session29_Accessing Azure Blob Storage Using Account Access Key_SecretScope ADB_Session30_Accessing Azure Blob Storage Using Shared Access Singnature Token_SecretScope ADB_Session31_Create Mount Points to Azure BlobStorage_ADLS Gen2 ADB_Session32_Accessing Azure SQL Database_JDBC Driver ADB_Session33_Reading_Writing Data to Azure Synapse Dedicated SQL Pool using JDBC ConnectionString ADB_Session34_Implementation of Slowly Changing Dimension Type1 ADB_Session35_Implementation of Slowly Changing Dimension Type3

Curriculum

₹15,000.00

Course Features

Lectures 25
Quizzes 0
Duration 4 weeks
Skill level All levels
Language English
Students 50
Assessments Yes