Azure Data Bricks with Pyspark
About The Course Azure Databricks training(ADB) in Hyderabad is an easy, fast, and collaborative Apache spark-based analytics platform. It accelerates innovation by bringing data science data engineering and business together. Making the process of data analytics more productive more secure …
Overview
About The Course
Azure Databricks training(ADB) in Hyderabad is an easy, fast, and collaborative Apache spark-based analytics platform. It accelerates innovation by bringing data science data engineering and business together. Making the process of data analytics more productive more secure more scalable and optimized for Azure.
Databricks cloud service is built by the team that started the Spark research project at UC Berkeley that later became Apache Spark and is the leading Spark-based analytics platform. This new service, named Microsoft Azure Databricks training, provides data science and data engineering teams with a fast, easy and collaborative Spark-based platform on Azure. It gives Azure users a single platform for Big Data processing and Machine Learning.
Best Azure Databricks training is a “first party” Microsoft service, the result of a unique year-long collaboration between the Microsoft and Databricks teams to provide Databricks’ Apache Spark-based analytics service as an integral part of the Microsoft Azure platform.
ADB training in Hyderabad leverages Azure’s security and seamlessly integrates with Azure services such as Azure Active Directory, SQL Data Warehouse, and Power BI.
- Azure Databricks + Apache Spark + enterprise cloud = Azure Databricks
- It is a fully-managed version of the open-source Apache Spark analytics and it features optimized connectors to storage platforms for the quickest possible data access.
- It offers a notebook-oriented Apache Spark as-a-service workspace environment which makes it easy to explore data interactively and manage clusters
- It is secure cloud-based machine learning and big data platform.
- It is supporting multiple languages such as Scala, Python, R, Java, and SQL.
Azure Databricks Course Curriculum
Module 1: Big Data Analytics
- What is Big Data Analytics
- Data Analytics Platform
- Storage
- Compute
- Data Processing Paradigms
- Monolithic Computing
- Distributed Computing
- Distributed Computing Frameworks
- Hadoop MapReduce
- Apache Spark
- Distributed Storage
- Big Data Analytics : Data Lakes
- Tightly Coupled Data Lake
- Looseky Coupled Data Lake
Module 3: Core Databricks Concepts
- Workspace
- Notebooks
- Library
- Folder
- Repos
- Data
- Compute
- Workflows
Module 5: Databricks – Internal Storage
- Databricks File System (DBFS)
Module 7: Storages – Azure Credentials
- Account Access Key
- Shared Access Signature Token
- OAuth2.0 Azure Service Principal
Module 9: Databricks Utilities
- File System Utility
- Widgets Utility
- Secrets Utility
- Notebook Utility
Module 11: CSV File Format
- Reading Data
- Reading Data from Multiple CSV Files
- Writing Data
Module 13: Excel File Format
- Single Sheet Reading
- Multiple Sheet Reading Using List object
- Dynamically Reading Multiple Sheets
Module 15: Libraries
- Install Cluster Libraries
- Maven Package
- PyPI Package
- CRAN Package
Module 17: Databricks – Accesing Azure Data Lake
- Account Access Key
- Shared Access Signature Token
- Mounting Azure Data Lake (Service Principle)
Module 19: Notebook – Code Modularity
- %run
- dbutils.notebook.run()
Module 21: Intruduction To Delta Lake
- Delta Lake Features
- ACID transactions
- Handling metadata
- Streaming and batch workfloads
- Schema enforcement
- Time travel
- Upserts and delets
- Delta Lake Components
- _delta_log(Transaction log)
- Versioned parquet files
- Delata lake Operations
- Create Table
- Upsert to a table
- Read a table
- Update a table
- Delete frmm a table
- Display table history
- Time table
- Clean up snapshots with VACUUM
- Delta Lake table history
- Restore a Delta table to an earlir state
- Vacuum unused data files
Module 27: Databricks Integration With Azure Data Factory
- Call a Notebook using Notebook Activity
- SetVariable Activity
- Trigger ADF Pipeline
Module 2: Introduction to Azure Databricks
- Introduction to Databricks
- Azure Databricks Architecture
- Azure Databricks Main Concepts
Module 4: Types Of Clusters
- All-Purpose Clusters
- Job Clusters
- Pools
Module 6: Databricks – External Storage
- Azure Blob Storage
- Azure Datalake Storage Gen2
- Azure SQL Database
- Azure Synapse Dedicated SQL Pool
- Snowflake
Module 8: Databricks Notebooks – Magic Commands
- %Python or %py
- %r
- %scala
- %sql
Module 10: Bigdata File Format
- Row – Based File Formats
- CSV,TSV, and AVRO
- Columnar File Formats
- Parquet,Delta, and ORC
Module 12: JSON File Format
- Single Line JSON
- Multi Line JSON
- Complex Multi Line JSON
- Arrays
- Struct Fields
Module 14: XML File Format
- Simple XML Files
- Complex XML Files
Module 16: Spark Structured Streaming
- ReadStream
- WriteStream
- output modes
- Triggers
- Fixed Interval
- One Time
- Continues
- Managing Streams
Module 18: Azure databricks – Types of Loads
- History Load
- Incremental Load
Module 20: Intruduction To Spark SQL Module
- Managed Tables(Internal Tables)
- DataFrame API
- Spark SQL API
- Un-Manged Tables(External Tables)
- DataFrame API
- Spark SQL API
- Temporary Views(Temporary Table)
- Global Temporary Views
Module 22: Delta Lake – Slowly Changing Dimension
- Type1 Dimension
- Type2 Dimension
- Type3 Dimension
Module 23: Databricks – Azure SQL Database
- Reading Data With Jdbc Driver
- Writing Data With Jdbc Driver
Module 24: Databricks – Synapse Dedicated SQL Pool
- Reading Data From Synapse Table
- Writing Data To Synapse Table
Module 25: Databricks – Snowflake
- Reading Data From Snowflake Table
- Writing Data To Snowflake Table
Module 26: Delta Lake – Performance Optimization Technics
- OPTIMIZE a Table
- Z-ORDER by Columns
Module 28: Azure Key Vault Integration With databricks
- Create Secrets
- Create SecretScope
Azure Databricks Regular Class Practice Sessions
- Session1_Introduction to Big Data Analytics Platform
- Session2_Big Data Analytics_Data Processing Paradigms (Compute)
- Session3_Distributed Computing Frameworks_Apache Hadoop vs Apache Spark
- Session4_Big Data Analytics_Distributed Storage_Key Takeaways
- Session5_Big Data Analytics_Tightly and Loosely Coupled Data Lakes
- Session6_Distributed Computing Cluster_Scalability
- Session7_Introduction to Azure Databricks
- Session8_Create Azure Databricks Workspace
- Session9_Azure Databricks_Types of Clusters_Configurations
- Session10_Creation of All-Purpose Cluster_Databricks Pools
- Session11_Introduction to Databricks File System(DBFS)
- Session12_Databricks File System(DBFS)_dbutils.fs Utility_%fs Magic Command
- Session13_Databricks_Spark Data API’s_RDD_DataFrame_Dataset
- Session14_Databricks_Different Ways of Creating DataFrame
- Session15_Reading Data from Single CSV File_DataFrame API
- Session16_Reading Data from Single CSV File_User Defined Schema
- Session17_Reading Data from CSV Files_Data Parsing Modes
- Session18_Reading Data from Single Line JSON File Format
- Session19_Reading Data from Multi Line JSON with Explicit Schema
- Session20_Reading Data from Multi Line Complex JSON File Format
- Session21_Reading Data from Multiple Excel Sheets Dynamically
- Session22_Databricks_Reading Data from XML File Format
- Session23_Databricks_Batch Data Processing
- Session24_Databricks_Batch Data Processing_Transformations
- Session25_Databricks_Batch Data Processing_Narrow_Wide Transformations
- Session26_Databricks_Data Merging_Joining Two DataFrames_Types of Joins
- Session27_Databricks_Data Merging_Union_UnionAll_UnionByName
- Session28_Databricks_Batch Data Processing_DataFrame Writer API_Save Modes
- Session29_Databricks_Spark Structured Streaming API_Real-Time Processing
- Session30_Databricks_Calling a Notebook from another Notebook using %run magic Command
- Session31_Databricks_Calling a Notebook from another Notebook using run() Method
- Session32_Databricks_Introduction to Spark SQL Module
- Session33_Databricks_Spark SQL_Create Global Managed Tables_DataFrame API_SQL API
- Session34_Databricks_Spark SQL_Create Global Un-Managed Tables_DataFrame API_SQL API
- Session35_Spark SQL_Types of Views_Local_Global Temporary Views
- Session36_Introduction to Delta Lake
- Session37_Create Delta Lake Tables_Explore Components of Delta Lake
- Session38_Databricks_Delta Lake_Time Travel Using Version and TimeStamp
- Session39_Databricks_Delta Lake_Schema Validation_Enforcement
- Session40_Databricks_Delta Lake_Schema Evolution Using mergeSchema Option
- Session41_Databricks_Delta Lake_Updates_Deletes in Data Lake with Delta Lake
- Session42_Databricks_Delta Lake_OPTIMIZE_ZORDER
- Session43_Databricks_Delta Table_Vacuum Command
- Session44_Databricks_Designing Workflow to Orchestrate Multiple Tasks
- Session45_Databricks_Implementation of History Load_Incremental Load
- Session46_Calling a Databricks Notebook from ADF Pipeline
- Session47_Databricks_Reading_Writing Data To Azure Blob Storage_Account Accee Key
- Session48_Databricks_Reading_Writing Data To Azure Data Lake Gen2_Azure Service Principal
- Session49_Databricks_Create Mount Point to Azure Blob Storage_Data Lake Storage Gen2
- Session50_Databricks_Read_Write_Azure SQL Database
- Session51_Create Snowflake Free Trail Account
- Session52_Read and write data from Snowflake
- Session53_Read and write data from Synapse Dedicated SQL Pool
- Session54_Databricks_Introduction to Slowly Changing Dimension(SCD)
- Session55_Databricks_Implementation of SCD Type 0 Dimension
- Session56_Databricks_Implementation of SCD Type 1 Dimension
- Session57_Databricks_Introduction to SCD Type 2 Dimension
- Session58_Databricks_Implementation of SCD Type 2 Dimension
- Session59_Databricks_Implementation of SCD Type 3 Dimension
- Session60_Data Engineering_Medallion Project Architecture
Azure Databricks_Assignments & Case Studie
- ADB_Assignment1_Azure Databricks_Types of Clusters
- ADB_Assignment2_Azure Databricks_Cluster_Pools
- ADB_Assignment3_Azure Databricks_Compute_On-Demand vs Azure Spot VM Instances
- ADB_Assignment4_Azure Databricks_Bigdata File formats
- ADB_Assignment5_Reading Data from Multiple CSV Files With the Same StructureADB_Assignment1_Reading TSV Files_User Defined Schema
- ADB_Assignment6_Apache Spark_Transformations_Actions
- ADB_Assignment7_Create DataFrame Using Python Collection Objects_List_Tuple_Dictionary
- ADB_Assignment8_Create DataFrame_Define Schema Programatically Using StructType() & StructField()
- ADB_Assignment9_Reading Single_Double_PIPE Delimited Files
- ADB_Assignment10_Reading_Multiple_Different_Delimiter CSV Files
- ADB_Assignment11_Spark Low Level API’s vs Structured API’s
- ADB_Assignment12_Creation of Structured API_DataFrame
- ADB_Assignment13_Creation of DataFrame_Schemas
- ADB_Assignment14_Python Functions
- ADB_Assignment15_Python Dictionaries_Functions_Widgets
- ADB_Assignment16_Flatten Multi Line Complex JSON Files_Python User Defined Function
- ADB_Assignment17_Flatten Arrays_Maps_explode()_explode_outer() Functions
- ADB_Assignment18_Batch ETL Processing_Replace Nulls with Literals
- ADB_Assignment19_Batch ETL Processing_GroupBy_Aggregation Processing
- ADB_Assignment20_Batch ETL Processing_PySpark_Join Types
- ADB_Assignment21_Batch ETL Processing_PySpark_Union_UnionAll
- ADB_Assignment22_Batch ETL Processing_PySpark_Distinct_DropDuplicates Methods
- ADB_Assignment23_Batch ETL Processing_GroupBy_Aggregation Processing
- ADB_Assignment24_Create Workflow to orchistrate Multiple Tasks
- ADB_Assignment25_Implement Slowly Changing Dimension Type1 and Type3
- ADB_Assignment26_Batch Processing_Data Processing Techniques_Python List Comprehension
- ADB_Assignment27_Batch Processing_Sorting on Single Column_sort() method
- ADB_Assignment28_Batch Processing_Sorting on Multiple Columns
- ADB_Assignment29_Batch Processing_PySpark_Date Functions
- ADB_Assignment30_Batch Processing_PySpark_Date Functions
- ADB_Assignment31_Batch Processing_PySpark_Indentify or Check Duplicates in DataFrame
- ADB_Assignment32_Batch Processing_PySpark_Dropping Rows that Contains Null Values using dropna() & na.drop() Methods
- ADB_Assignment33_Batch Processing_PySpark_Replacing Nulls with another Value Using fillna() Method_na.fill() Method
- ADB_Assignment34_Batch Processing_Reading and Writing Data to Snowflake Cloud Data Platform
- ADB_Assignment35_Delta Lake_Schema Validation_Enforcement
- ADB_Assignment36_Delta Lake_Schema Evolution
- ADB_Assignment37_Update_Delete Operations in data lake with Delta Lake
- ADB_Assignment38_Audting Data Changes with Operation History
Curriculum
Curriculum
- 1 Section
- 2 Lessons
- 4 Weeks






