Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

1 + 1 = 1 or Record Deduplication with Python

Автор: PyGotham 2018

Загружено: 27 окт. 2018 г.

Просмотров: 15 566 просмотров

Описание:

Speaker: Flávio Juvenal

Record Deduplication, or more generally, Record Linkage is the task of finding which records refer to the same entity, like a person or a company. It's used mainly when there isn't a unique identifier in records like Social Security Number for US citizens. This means one can't trivially find duplicate records in a single dataset, neither easily link records from different datasets. Without an identifier, record linkage looks for matches by cleaning and comparing record attributes in a fuzzy way. Imagine you have two datasets with information about people, but without any unique identifier in the records. You have to compare attributes like name, date of birth, and address in a smart way to find which records from the two datasets refer to the same person. A similar approach must be used to dedupe records in a single dataset, so Record Deduplication is a kind of Record Linkage.

There are a number of important applications of data deduplication in government and business. For example, by deduping records from Census data, the Australian government was able to find there were 250,000 fewer people in the country than they previously thought. This reduction impacted the estimations of government agencies and even caused the revision economical projections. Similarly, businesses can use record linkage techniques to enrich their customers' data with publicly available datasets.

In this talk, you'll learn with Python examples the main concepts of Record Deduplication, what kinds of problems can be solved, what's the most common workflow for the process, what algorithms are involved, and which tools and libraries you can use. Although some of the discussed concepts are related to data mining, any intermediate-level Python developer will be able to learn the basics of how to dedupe data using Python.

1 + 1 = 1 or Record Deduplication with Python

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

A Deep Dive into Python Stack Frames

A Deep Dive into Python Stack Frames

Entity Resolution Explained Step by Step

Entity Resolution Explained Step by Step

Inside the Cheeseshop:How Python Packaging Works

Inside the Cheeseshop:How Python Packaging Works

Deep & Melodic House 24/7: Relaxing Music • Chill Study Music

Deep & Melodic House 24/7: Relaxing Music • Chill Study Music

1 + 1 = 1 or Record Deduplication with Python | Flávio Juvenal @ PyBay2018

1 + 1 = 1 or Record Deduplication with Python | Flávio Juvenal @ PyBay2018

Clean architecture in Python

Clean architecture in Python

Dedupe.io - Introduction and Demo

Dedupe.io - Introduction and Demo

Write a Git Client from Scratch

Write a Git Client from Scratch

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Градиентный спуск, как обучаются нейросети | Глава 2, Глубинное обучение

Градиентный спуск, как обучаются нейросети | Глава 2, Глубинное обучение

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]