Overview

About The Course

Azure Databricks training(ADB) in Hyderabad is an easy, fast, and collaborative Apache spark-based analytics platform. It accelerates innovation by bringing data science data engineering and business together. Making the process of data analytics more productive more secure more scalable and optimized for Azure.

Databricks cloud service is built by the team that started the Spark research project at UC Berkeley that later became Apache Spark and is the leading Spark-based analytics platform. This new service, named Microsoft Azure Databricks training, provides data science and data engineering teams with a fast, easy and collaborative Spark-based platform on Azure. It gives Azure users a single platform for Big Data processing and Machine Learning.

Best Azure Databricks training is a “first party” Microsoft service, the result of a unique year-long collaboration between the Microsoft and Databricks teams to provide Databricks’ Apache Spark-based analytics service as an integral part of the Microsoft Azure platform.

ADB training in Hyderabad leverages Azure’s security and seamlessly integrates with Azure services such as Azure Active Directory, SQL Data Warehouse, and Power BI.

Azure Databricks + Apache Spark + enterprise cloud = Azure Databricks
It is a fully-managed version of the open-source Apache Spark analytics and it features optimized connectors to storage platforms for the quickest possible data access.
It offers a notebook-oriented Apache Spark as-a-service workspace environment which makes it easy to explore data interactively and manage clusters
It is secure cloud-based machine learning and big data platform.
It is supporting multiple languages such as Scala, Python, R, Java, and SQL.

Azure Databricks Course Curriculum

Module 1: Big Data Analytics

What is Big Data Analytics
Data Analytics Platform

Storage
Compute
Data Processing Paradigms

Monolithic Computing
Distributed Computing
Distributed Computing Frameworks

Hadoop MapReduce
Apache Spark
Distributed Storage
Big Data Analytics : Data Lakes

Tightly Coupled Data Lake
Looseky Coupled Data Lake

Module 3: Core Databricks Concepts

Workspace
Notebooks
Library
Folder
Repos
Data
Compute
Workflows

Module 5: Databricks – Internal Storage

Databricks File System (DBFS)

Module 7: Storages – Azure Credentials

Account Access Key
Shared Access Signature Token
OAuth2.0 Azure Service Principal

Module 9: Databricks Utilities

File System Utility
Widgets Utility
Secrets Utility
Notebook Utility

Module 11: CSV File Format

Reading Data
Reading Data from Multiple CSV Files
Writing Data

Module 13: Excel File Format

Single Sheet Reading
Multiple Sheet Reading Using List object
Dynamically Reading Multiple Sheets

Module 15: Libraries

Install Cluster Libraries
- Maven Package
- PyPI Package
- CRAN Package

Module 17: Databricks – Accesing Azure Data Lake

Account Access Key
Shared Access Signature Token
Mounting Azure Data Lake (Service Principle)

Module 19: Notebook – Code Modularity

%run
dbutils.notebook.run()

Module 21: Intruduction To Delta Lake

Delta Lake Features
- ACID transactions
- Handling metadata
- Streaming and batch workfloads
- Schema enforcement
- Time travel
- Upserts and delets
Delta Lake Components
- _delta_log(Transaction log)
- Versioned parquet files
Delata lake Operations
- Create Table
- Upsert to a table
- Read a table
- Update a table
- Delete frmm a table
- Display table history
- Time table
- Clean up snapshots with VACUUM
- Delta Lake table history
- Restore a Delta table to an earlir state
- Vacuum unused data files

Module 27: Databricks Integration With Azure Data Factory

Call a Notebook using Notebook Activity
SetVariable Activity
Trigger ADF Pipeline

Module 2: Introduction to Azure Databricks

Introduction to Databricks
Azure Databricks Architecture
Azure Databricks Main Concepts

Module 4: Types Of Clusters

All-Purpose Clusters
Job Clusters
Pools

Module 6: Databricks – External Storage

Azure Blob Storage
Azure Datalake Storage Gen2
Azure SQL Database
Azure Synapse Dedicated SQL Pool
Snowflake

Module 8: Databricks Notebooks – Magic Commands

%Python or %py
%r
%scala
%sql

Module 10: Bigdata File Format

Row – Based File Formats
- CSV,TSV, and AVRO
Columnar File Formats
- Parquet,Delta, and ORC

Module 12: JSON File Format

Single Line JSON
Multi Line JSON
Complex Multi Line JSON
- Arrays
- Struct Fields

Module 14: XML File Format

Simple XML Files
Complex XML Files

Module 16: Spark Structured Streaming

ReadStream
WriteStream
output modes
Triggers
- Fixed Interval
- One Time
- Continues
Managing Streams

Module 18: Azure databricks – Types of Loads

History Load
Incremental Load

Module 20: Intruduction To Spark SQL Module

Managed Tables(Internal Tables)
- DataFrame API
- Spark SQL API
Un-Manged Tables(External Tables)
- DataFrame API
- Spark SQL API
Temporary Views(Temporary Table)
Global Temporary Views

Module 22: Delta Lake – Slowly Changing Dimension

Type1 Dimension
Type2 Dimension
Type3 Dimension

Module 23: Databricks – Azure SQL Database

Reading Data With Jdbc Driver
Writing Data With Jdbc Driver

Module 24: Databricks – Synapse Dedicated SQL Pool

Reading Data From Synapse Table
Writing Data To Synapse Table

Module 25: Databricks – Snowflake

Reading Data From Snowflake Table
Writing Data To Snowflake Table

Module 26: Delta Lake – Performance Optimization Technics

OPTIMIZE a Table
Z-ORDER by Columns

Module 28: Azure Key Vault Integration With databricks

Create Secrets
Create SecretScope

Azure Databricks Regular Class Practice Sessions

Session1_Introduction to Big Data Analytics Platform
Session2_Big Data Analytics_Data Processing Paradigms (Compute)
Session3_Distributed Computing Frameworks_Apache Hadoop vs Apache Spark
Session4_Big Data Analytics_Distributed Storage_Key Takeaways
Session5_Big Data Analytics_Tightly and Loosely Coupled Data Lakes
Session6_Distributed Computing Cluster_Scalability
Session7_Introduction to Azure Databricks
Session8_Create Azure Databricks Workspace
Session9_Azure Databricks_Types of Clusters_Configurations
Session10_Creation of All-Purpose Cluster_Databricks Pools
Session11_Introduction to Databricks File System(DBFS)
Session12_Databricks File System(DBFS)_dbutils.fs Utility_%fs Magic Command
Session13_Databricks_Spark Data API’s_RDD_DataFrame_Dataset
Session14_Databricks_Different Ways of Creating DataFrame
Session15_Reading Data from Single CSV File_DataFrame API
Session16_Reading Data from Single CSV File_User Defined Schema
Session17_Reading Data from CSV Files_Data Parsing Modes
Session18_Reading Data from Single Line JSON File Format
Session19_Reading Data from Multi Line JSON with Explicit Schema
Session20_Reading Data from Multi Line Complex JSON File Format
Session21_Reading Data from Multiple Excel Sheets Dynamically
Session22_Databricks_Reading Data from XML File Format
Session23_Databricks_Batch Data Processing
Session24_Databricks_Batch Data Processing_Transformations
Session25_Databricks_Batch Data Processing_Narrow_Wide Transformations
Session26_Databricks_Data Merging_Joining Two DataFrames_Types of Joins
Session27_Databricks_Data Merging_Union_UnionAll_UnionByName
Session28_Databricks_Batch Data Processing_DataFrame Writer API_Save Modes
Session29_Databricks_Spark Structured Streaming API_Real-Time Processing
Session30_Databricks_Calling a Notebook from another Notebook using %run magic Command
Session31_Databricks_Calling a Notebook from another Notebook using run() Method
Session32_Databricks_Introduction to Spark SQL Module
Session33_Databricks_Spark SQL_Create Global Managed Tables_DataFrame API_SQL API
Session34_Databricks_Spark SQL_Create Global Un-Managed Tables_DataFrame API_SQL API
Session35_Spark SQL_Types of Views_Local_Global Temporary Views
Session36_Introduction to Delta Lake
Session37_Create Delta Lake Tables_Explore Components of Delta Lake
Session38_Databricks_Delta Lake_Time Travel Using Version and TimeStamp
Session39_Databricks_Delta Lake_Schema Validation_Enforcement
Session40_Databricks_Delta Lake_Schema Evolution Using mergeSchema Option
Session41_Databricks_Delta Lake_Updates_Deletes in Data Lake with Delta Lake
Session42_Databricks_Delta Lake_OPTIMIZE_ZORDER
Session43_Databricks_Delta Table_Vacuum Command
Session44_Databricks_Designing Workflow to Orchestrate Multiple Tasks
Session45_Databricks_Implementation of History Load_Incremental Load
Session46_Calling a Databricks Notebook from ADF Pipeline
Session47_Databricks_Reading_Writing Data To Azure Blob Storage_Account Accee Key
Session48_Databricks_Reading_Writing Data To Azure Data Lake Gen2_Azure Service Principal
Session49_Databricks_Create Mount Point to Azure Blob Storage_Data Lake Storage Gen2
Session50_Databricks_Read_Write_Azure SQL Database
Session51_Create Snowflake Free Trail Account
Session52_Read and write data from Snowflake
Session53_Read and write data from Synapse Dedicated SQL Pool
Session54_Databricks_Introduction to Slowly Changing Dimension(SCD)
Session55_Databricks_Implementation of SCD Type 0 Dimension
Session56_Databricks_Implementation of SCD Type 1 Dimension
Session57_Databricks_Introduction to SCD Type 2 Dimension
Session58_Databricks_Implementation of SCD Type 2 Dimension
Session59_Databricks_Implementation of SCD Type 3 Dimension
Session60_Data Engineering_Medallion Project Architecture

Azure Databricks_Assignments & Case Studie

ADB_Assignment1_Azure Databricks_Types of Clusters
ADB_Assignment2_Azure Databricks_Cluster_Pools
ADB_Assignment3_Azure Databricks_Compute_On-Demand vs Azure Spot VM Instances
ADB_Assignment4_Azure Databricks_Bigdata File formats
ADB_Assignment5_Reading Data from Multiple CSV Files With the Same StructureADB_Assignment1_Reading TSV Files_User Defined Schema
ADB_Assignment6_Apache Spark_Transformations_Actions
ADB_Assignment7_Create DataFrame Using Python Collection Objects_List_Tuple_Dictionary
ADB_Assignment8_Create DataFrame_Define Schema Programatically Using StructType() & StructField()
ADB_Assignment9_Reading Single_Double_PIPE Delimited Files
ADB_Assignment10_Reading_Multiple_Different_Delimiter CSV Files
ADB_Assignment11_Spark Low Level API’s vs Structured API’s
ADB_Assignment12_Creation of Structured API_DataFrame
ADB_Assignment13_Creation of DataFrame_Schemas
ADB_Assignment14_Python Functions
ADB_Assignment15_Python Dictionaries_Functions_Widgets
ADB_Assignment16_Flatten Multi Line Complex JSON Files_Python User Defined Function
ADB_Assignment17_Flatten Arrays_Maps_explode()_explode_outer() Functions
ADB_Assignment18_Batch ETL Processing_Replace Nulls with Literals
ADB_Assignment19_Batch ETL Processing_GroupBy_Aggregation Processing
ADB_Assignment20_Batch ETL Processing_PySpark_Join Types
ADB_Assignment21_Batch ETL Processing_PySpark_Union_UnionAll
ADB_Assignment22_Batch ETL Processing_PySpark_Distinct_DropDuplicates Methods
ADB_Assignment23_Batch ETL Processing_GroupBy_Aggregation Processing
ADB_Assignment24_Create Workflow to orchistrate Multiple Tasks
ADB_Assignment25_Implement Slowly Changing Dimension Type1 and Type3
ADB_Assignment26_Batch Processing_Data Processing Techniques_Python List Comprehension
ADB_Assignment27_Batch Processing_Sorting on Single Column_sort() method
ADB_Assignment28_Batch Processing_Sorting on Multiple Columns
ADB_Assignment29_Batch Processing_PySpark_Date Functions
ADB_Assignment30_Batch Processing_PySpark_Date Functions
ADB_Assignment31_Batch Processing_PySpark_Indentify or Check Duplicates in DataFrame
ADB_Assignment32_Batch Processing_PySpark_Dropping Rows that Contains Null Values using dropna() & na.drop() Methods
ADB_Assignment33_Batch Processing_PySpark_Replacing Nulls with another Value Using fillna() Method_na.fill() Method
ADB_Assignment34_Batch Processing_Reading and Writing Data to Snowflake Cloud Data Platform
ADB_Assignment35_Delta Lake_Schema Validation_Enforcement
ADB_Assignment36_Delta Lake_Schema Evolution
ADB_Assignment37_Update_Delete Operations in data lake with Delta Lake
ADB_Assignment38_Audting Data Changes with Operation History

Curriculum

1 Section
2 Lessons
4 Weeks

Expand all sectionsCollapse all sections

Session
2
- 1.0
  Self-paced Video Learning Assignments
- 1.1
  Azure Databricks Cource Curriculum

₹16,000.00

Course Features

Lectures 2
Quizzes 0
Duration 4 weeks
Skill level All levels
Language English
Students 50
Assessments Yes

Azure Data Bricks with Pyspark

Overview

About The Course

Azure Databricks Course Curriculum

Module 1: Big Data Analytics

Module 3: Core Databricks Concepts

Module 5: Databricks – Internal Storage

Module 7: Storages – Azure Credentials

Module 9: Databricks Utilities

Module 11: CSV File Format

Module 13: Excel File Format

Module 15: Libraries

Module 17: Databricks – Accesing Azure Data Lake

Module 19: Notebook – Code Modularity

Module 21: Intruduction To Delta Lake

Module 27: Databricks Integration With Azure Data Factory

Module 2: Introduction to Azure Databricks

Module 4: Types Of Clusters

Module 6: Databricks – External Storage

Module 8: Databricks Notebooks – Magic Commands

Module 10: Bigdata File Format

Module 12: JSON File Format

Module 14: XML File Format

Module 16: Spark Structured Streaming

Module 18: Azure databricks – Types of Loads

Module 20: Intruduction To Spark SQL Module

Module 22: Delta Lake – Slowly Changing Dimension

Module 23: Databricks – Azure SQL Database

Module 24: Databricks – Synapse Dedicated SQL Pool

Module 25: Databricks – Snowflake

Module 26: Delta Lake – Performance Optimization Technics

Module 28: Azure Key Vault Integration With databricks

Azure Databricks Regular Class Practice Sessions

Azure Databricks_Assignments & Case Studie

Curriculum

Curriculum

You May Like

DATA SCIENCE

DATA ANALYST (PYTHON+POWER BI+ MICROSOFT EXCEL+ MY SQL+GIT)

DEVOPS WITH AWS ON LINUX

Amazon Web Services with MySQL and Linux Basics

NEW ONLINE BATCH ON PYTHON(CORE& ADVANCE)

Course Features

Login with your site account

Register a new account

Modal title