Gianluca Campanella: The unreasonable effectiveness of feature hashing | PyData London 2019

Автор: PyData

Загружено: 2019-07-18

Просмотров: 4844

Описание:

Feature hashing is a computationally efficient pre-processing technique for sparse, high-dimensional features. Starting from an overview of the method, this talk covers: the impact of hash functions, hash size and collisions on statistical performance; three libraries for model training with feature hashing; hash reversibility and its implications for model interpretability.

www.pydata.org

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

00:00 Welcome!
1:00 - Introduction
1:53 - Background- Supervised ML
3:29 - Categorical Features
4:00 - One-hot encoding
4:36 - Bag of words
5:20 - High dimensional feature space
9:20 - Feature Hashing
11:26 - Hash function
12:24 - Feature Hashing in Python
13:27 - Hashing of Unicode Strings
15:05 - Projection
15:06 - Collisions
17:58 - Sign Functions
20:26 - Feature Hashing- Example
26:40 - Feature Hashing- Use Case
30:10 - Library Support
31:27 - Recap
32:20 - Q&A

S/o to https://github.com/Cyborg-vs-Droids for the video timestamps!

Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVi...

Gianluca Campanella: The unreasonable effectiveness of feature hashing | PyData London 2019

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Vincent Warmerdam: How to Constrain Artificial Stupidity | PyData London 2019

Vincent Warmerdam: How to Constrain Artificial Stupidity | PyData London 2019

Maciej Arciuch, Karol Grzegorczyk: Embeddings! Embeddings everywhere! | PyData London 2019

Maciej Arciuch, Karol Grzegorczyk: Embeddings! Embeddings everywhere! | PyData London 2019

How the HashingVectorizer works

How the HashingVectorizer works

A Bluffer's Guide to Dimension Reduction - Leland McInnes

A Bluffer's Guide to Dimension Reduction - Leland McInnes

Jan van der Vegt: A walk through the isolation forest | PyData Amsterdam 2019

Jan van der Vegt: A walk through the isolation forest | PyData Amsterdam 2019

Jaroslaw Szymczak - Gradient Boosting in Practice: a deep dive into xgboost

Jaroslaw Szymczak - Gradient Boosting in Practice: a deep dive into xgboost

Understanding the Discrete Fourier Transform and the FFT

Understanding the Discrete Fourier Transform and the FFT

HDBSCAN, Fast Density Based Clustering, the How and the Why - John Healy

HDBSCAN, Fast Density Based Clustering, the How and the Why - John Healy

Kishan Manani - Feature Engineering for Time Series Forecasting | PyData London 2022

Kishan Manani - Feature Engineering for Time Series Forecasting | PyData London 2022

Effective Pandas I Matt Harrison I PyData Salt Lake City Meetup

Effective Pandas I Matt Harrison I PyData Salt Lake City Meetup

LSTM is dead. Long Live Transformers!

LSTM is dead. Long Live Transformers!

4 Hours Chopin for Studying, Concentration & Relaxation

4 Hours Chopin for Studying, Concentration & Relaxation

Ruben van de Geer: A Primer (or Refresher) On Linear Algebra for Data Science | PyData London 2019

Ruben van de Geer: A Primer (or Refresher) On Linear Algebra for Data Science | PyData London 2019

Ariana Grande, Mariah Carey, Justin Bieber, Christmas Songs Christmas Songs Playlist 2026

Ariana Grande, Mariah Carey, Justin Bieber, Christmas Songs Christmas Songs Playlist 2026

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Alejandro Saucedo: Guide towards algorithm explainability in machine learning | PyData London 2019

Alejandro Saucedo: Guide towards algorithm explainability in machine learning | PyData London 2019

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Kirstie Whitaker: The Turing Way: A how to guide for reproducible research | PyData London 2019

Kirstie Whitaker: The Turing Way: A how to guide for reproducible research | PyData London 2019

GraphRAG: союз графов знаний и RAG: Эмиль Эйфрем

GraphRAG: союз графов знаний и RAG: Эмиль Эйфрем

Экспресс-курс RAG для начинающих

Экспресс-курс RAG для начинающих