DuckDB and recommenders : a lightning fast synergy ft. Khalil Muhammad

Автор: MotherDuck

Загружено: 2024-02-19

Просмотров: 3156

Описание:

Talk from the DuckDB user meetup that happened in Dublin on 23 January 2024!

Future events: https://motherduck.com/events/

☁️🦆 Start using DuckDB in the Cloud for FREE with MotherDuck : https://hubs.la/Q02QnFR40

📓 Resources
Slides : https://docs.google.com/presentation/...
Khalil Linkedin :   / mihai-bojin

➡️ Follow Us
LinkedIn:   / motherduck
Twitter :   / motherduck
Blog: https://motherduck.com/blog/

#datascience #dataengineering #duckdb

--------------------------------------

Discover how DuckDB revolutionizes machine learning workflows, particularly for building recommender systems. This video moves beyond simple SQL queries to showcase DuckDB's power in accelerating development. We start with a primer on recommender systems, explaining how they learn user preferences using "positive samples" (what users interact with) and the often-elusive "negative samples." You'll understand the common challenges in ML projects, such as ensuring reproducibility for your data science team, managing scalability with growing data, and avoiding GPU IO bottlenecks during model training.

Learn how DuckDB acts as the central glue in your data engineering pipeline to solve collaboration and scale. We demonstrate a practical architecture using a "dataset spec" to create reproducible snapshots of your training data from various cloud data sources, enabling seamless teamwork. For handling datasets larger than memory, we dive into a key technique for PyTorch and TensorFlow model training: creating an iterable dataset. By using DuckDB's `fetch_record_batch` command, you can efficiently stream data directly to your model, feeding your GPU faster and enabling training on massive datasets without memory constraints.

Unlock incredible speed with DuckDB performance tuning and advanced features. We'll show you why DuckDB is significantly faster than Pandas for many data manipulation tasks and how proper memory configuration is key. A major highlight is implementing a custom negative sampling algorithm directly within SQL using a DuckDB Python UDF (User-Defined Function), a task that is often complex in other systems. Through concrete benchmarks, you'll see a potential 10x performance gain. We also share practical DuckDB optimization tips, including how to analyze the memory impact of window functions and set memory limits to prevent errors.

Finally, we cover essential best practices for productionizing your DuckDB-powered ML pipeline. Learn the importance of data hygiene and establishing a single, configured entry point for your DuckDB connections to ensure consistency. This video illustrates that by adopting DuckDB, you gain not just raw speed but also the convenience and cost-savings needed for modern machine learning tasks, making it a powerful tool for any data professional looking to build and deploy recommender systems efficiently.

DuckDB and recommenders : a lightning fast synergy ft. Khalil Muhammad

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Why should you care about DuckDB? ft. Mihai Bojin

Why should you care about DuckDB? ft. Mihai Bojin

Exploring Monte Carlo Simulations With DuckDB ft. James McNeill

Exploring Monte Carlo Simulations With DuckDB ft. James McNeill

Why DuckDB Is Great, Next-Gen Viz Tech, and Our War on Tableau (ft. Ryan Melehan)

Why DuckDB Is Great, Next-Gen Viz Tech, and Our War on Tableau (ft. Ryan Melehan)

Зачем использовать DuckDB в ваших конвейерах данных (при участии Нильса Клэйса)

Зачем использовать DuckDB в ваших конвейерах данных (при участии Нильса Клэйса)

Программируем с ИИ в VS Code - БЕСПЛАТНО! Сможет каждый!

Программируем с ИИ в VS Code - БЕСПЛАТНО! Сможет каждый!

Leveraging DuckDB and Delta Lake Together

Leveraging DuckDB and Delta Lake Together

Big Data is Dead | MotherDuck

Big Data is Dead | MotherDuck

DuckDB 🦆

DuckDB и MotherDuck для начинающих: ваше полное руководство

DuckDB и MotherDuck для начинающих: ваше полное руководство

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Создайте озеро данных для бедных с нуля с помощью DuckDB

Создайте озеро данных для бедных с нуля с помощью DuckDB

Древнее искусство управления данными (вступительная лекция профессора Ханнеса Мюлейзена)

Древнее искусство управления данными (вступительная лекция профессора Ханнеса Мюлейзена)

DuckDB & Iceberg : The Future of Lightweight Data Management

DuckDB & Iceberg : The Future of Lightweight Data Management

Understanding DuckLake: A Table Format with a Modern Architecture

Understanding DuckLake: A Table Format with a Modern Architecture

DuckDB против Pandas против Polars для разработчиков Python

DuckDB против Pandas против Polars для разработчиков Python

Kubernetes — Простым Языком на Понятном Примере

Kubernetes — Простым Языком на Понятном Примере

Gábor Szárnyas - DuckDB: The Power of a Data Warehouse in your Python Process

Gábor Szárnyas - DuckDB: The Power of a Data Warehouse in your Python Process

Using the {arrow} and {duckdb} packages to wrangle medical datasets that are Larger than RAM

Using the {arrow} and {duckdb} packages to wrangle medical datasets that are Larger than RAM

Three strategies to tackle Big Data in R and Python

Three strategies to tackle Big Data in R and Python

Why and How we integrated DuckDB & MotherDuck with GoodData

Why and How we integrated DuckDB & MotherDuck with GoodData