Hands-On Guide: Build Your First AWS Data Lake with Glue & S3
Автор: ETL SQL
Загружено: 2024-11-17
Просмотров: 1535
If you liked this video & my teaching style & wish to join my crash course live training then fill the form below:
https://forms.gle/aAFLAg2u4TokxsB19
I keep small batches and group students with similar aspirations and current knowledge level. If you have more than 5 years of data engineering experience & wish to learn AWS then this crash course is just for you.
Fill the form mentioned above & I will setup a introduction call soon.
AWS Data Engineering crash course covering Amazon Redshift, AWS Glue, Amazon EMR & Airflow to run end to end pipeline : • End-to-End ETL Pipeline in AWS: Redshift, ...
In this video I have explained how you can use AWS Glue with Apache Hudi format to build data lake. This is a beginner video and intentionally I have kept it simple for understanding purpose.
In this video I have given overview of AWS Glue , Apache Hudi and a demo to build a SCD-1 type dimension table. How you can run Update & Insert on top of Hudi table.
Video Timeline:
00:00 Introduction
00:35 What is AWS Glue
00:50 Glue Data Catalog
01:25 Glue Crawlers
02:25 Glue Studio visual builder
02:45 Apache Hudi
04:12 My intention of making this video
05:05 AWS Glue - serverless service
05:51 AWS Glue - Spark ETL
05:57 AWS Glue - Python Jobs
06:10 AWS Glue vs AWS Lambda for python jobs
06:46 AWS Glue Data Catalog
07:01 What is Metadata
07:20 Metadata - business, technical, operational
08:10 Glue Data Catalog - why is it so powerful
08:40 AWS Glue crawlers for automated data discovery
09:50 do I use glue crawler a lot ?
10:20 Glue Studio visual builder
10:42 do I use visual builder a lot ?
12:14 Apache Hudi - open source data lake format
12:58 Datalake vs Data warehouse
14:22 Hudi ACID
14:44 Hudi versioning
14:52 Hudi integration with Glue, Athena, Redshift
15:37 Revisit the concepts before Demo
15:50 Demo (2 input files)
16:25 Demo - create Glue job
20:04 Demo - save glue job and run it
20:18 Demo - Glue job input arguments
21:02 Demo - Glue job script walkthrough
22:42 Job complete , check table data
23:41 Demo - run second file for UPSERT (SCD-1)
25:38 Demo - Glue continuous driver logs
26:24 Demo - Hive style partitioning in Hudi
27:01 Demo -2nd run complete, check data
27:18 Demo - end of demo
Will you be interested in AWS data engineering session with me ?
If you wish to download the presentation slides , sample data files & source code for AWS Glue job , Amazon EMR pyspark application , Amazon Redshift sql script & Managed Airflow DAG code used in the crash course video then check the link below:
https://mailchi.mp/45b9673b727b/aws-d...
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: