Bringing DuckDB to the Cloud: Dual Execution Explained
Автор: MotherDuck
Загружено: 2024-06-27
Просмотров: 1117
Don't miss this special episode! Stephanie, a founding engineer at MotherDuck, will talk about what it takes to put a database in the cloud, specifically DuckDB. Mehdi and Steph will explain dual execution, showing how it works and what users need to know about running things in the cloud or locally.
--------------------------------------
Join MotherDuck founding engineer Stephanie for a deep dive into the architecture of MotherDuck, the serverless data warehouse built on DuckDB. This video demystifies what makes MotherDuck different from a self-hosted DuckDB instance by breaking down its three key components: the client layer (including DuckDB in WASM for browsers), the server-side compute layer, and the cloud storage layer. We explore how MotherDuck leverages the DuckDB extension system, a crucial design choice that avoids forking the open-source project and allows for rapid adoption of new DuckDB features. This approach extends DuckDB's parser, optimizer, and storage to create a robust, cloud-native experience for data analysts and developers.
Discover the power of dual execution, MotherDuck's hybrid query model that intelligently decides whether to run operations locally on your machine or remotely in the cloud. We provide a hands-on demonstration showing how to use `EXPLAIN` to analyze a query plan, revealing how a join between a local Parquet file and a remote cloud table is optimized. You'll learn how this unique DuckDB optimization logic minimizes data transfer and leverages your local compute for maximum efficiency. We'll also show you how to take control with the `md_run` parameter to force local or remote execution for specific scans, including Parquet, CSV, and Delta Lake files.
This session also covers the practical challenges of running DuckDB at scale and how MotherDuck solves them. Learn about our secure, pluggable secret manager for easily querying data in S3, GCS, and Azure without exposing credentials. We'll touch on our differential storage implementation, which enables powerful features like database sharing and time travel, transforming DuckDB from a single-player tool into a collaborative data platform. We cap it off with a performance comparison, showing the speed benefits of querying large S3 files with MotherDuck's cloud compute versus a local DuckDB client. Finally, we discuss how MotherDuck contributes back to the DuckDB open source project and how you can get involved.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: