Backfill Streaming Data Pipelines in Kappa Architecture

Автор: Databricks

Загружено: 2022-07-19

Просмотров: 7548

Описание:

Streaming data pipelines can fail due to various reasons. Since the source data, such as Kafka topics, often have limited retention, prolonged job failures can lead to data loss. Thus, streaming jobs need to be backfillable at all times to prevent data loss in case of failures.
One solution is to increase the source's retention so that backfilling is simply replaying source streams, but extending Kafka retention is very costly for Netflix's data sizes. Another solution is to utilize source data stored in DWH, commonly known as the Lambda architecture. However, this method introduces significant code duplication, as it requires engineers to maintain a separate equivalent batch job.
At Netflix, we have created the Iceberg Source Connector to provide backfilling capabilities to Flink streaming applications. It allows Flink to stream data stored in Apache Iceberg while mirroring Kafka's ordering semantics, enabling us to backfill large-scale stateful Flink pipelines at low retention cost.

Connect with us:
Website: https://databricks.com
Facebook:   / databricksinc
Twitter:   / databricks
LinkedIn:   / data.  .
Instagram:   / databricksinc

Backfill Streaming Data Pipelines in Kappa Architecture

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

MLOps on Databricks: A How-To Guide

MLOps on Databricks: A How-To Guide

Kappa vs Lambda Architectures and Technology Comparison

Kappa vs Lambda Architectures and Technology Comparison

Watermarks: Time and Progress in Apache Beam and Beyond

Watermarks: Time and Progress in Apache Beam and Beyond

Democratizing Metrics at Airbnb

Democratizing Metrics at Airbnb

Delta Live Tables A to Z: Best Practices for Modern Data Pipelines

Delta Live Tables A to Z: Best Practices for Modern Data Pipelines

Chillout Lounge Radio - 24/7 Live | Smooth Background Music | Focus, Study, Work, Sleep, Meditation

Chillout Lounge Radio - 24/7 Live | Smooth Background Music | Focus, Study, Work, Sleep, Meditation

Алексей Чернобровов - Как архитектура DWH влияет на Data Quality

Алексей Чернобровов - Как архитектура DWH влияет на Data Quality

System Design Concepts Course and Interview Prep

System Design Concepts Course and Interview Prep

The Parquet Format and Performance Optimization Opportunities Boudewijn Braams (Databricks)

The Parquet Format and Performance Optimization Opportunities Boudewijn Braams (Databricks)

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Making Apache Spark™ Better with Delta Lake

Making Apache Spark™ Better with Delta Lake

Learn to Efficiently Test ETL Pipelines

Learn to Efficiently Test ETL Pipelines

Confluent Keynote: Reimagining Data Pipelines for the Streaming Era | Current 2022

Confluent Keynote: Reimagining Data Pipelines for the Streaming Era | Current 2022

Data Architecture Strategies –Data Architecture Solution Architecture Platform Architecture

Data Architecture Strategies –Data Architecture Solution Architecture Platform Architecture

Streaming from Apache Iceberg - Building Low-Latency and Cost-Effective Data Pipelines

Streaming from Apache Iceberg - Building Low-Latency and Cost-Effective Data Pipelines

Dive Deeper into Data Engineering on Databricks

Dive Deeper into Data Engineering on Databricks

Near Real-Time Netflix Recommendations using Apache Spark (Nitin Sharma and Elliot Chow)

Near Real-Time Netflix Recommendations using Apache Spark (Nitin Sharma and Elliot Chow)

Productizing Structured Streaming Jobs Burak Yavuz Databricks

Productizing Structured Streaming Jobs Burak Yavuz Databricks

So Fresh and So Clean: Learn How to Build Real-Time Warehouses on Lakehouse

So Fresh and So Clean: Learn How to Build Real-Time Warehouses on Lakehouse

Градиентный спуск, как обучаются нейросети | Глава 2, Глубинное обучение

Градиентный спуск, как обучаются нейросети | Глава 2, Глубинное обучение