The Elephant in the room between data collection and data science with Katya Kovalenko
Автор: Data Science and AI Hub
Загружено: 2025-11-12
Просмотров: 26
Whether you call it wrangling, cleaning, or preprocessing, data prep is often the most expensive and time-consuming part of the analytical pipeline. It may involve converting data into machine-readable formats, integrating across many datasets or outlier detection, and it can be a large source of error if done manually. Lack of machine-readable or integrated data limits connectivity across fields and data accessibility, sharing, and reuse, becoming a significant contributor to research waste.
For students, it is perhaps the greatest barrier to adopting quantitative tools and advancing their coding and analytical skills. AI tools are available for automating the cleanup and integration, but due to the one-of-a-kind nature of these problems, these approaches still require extensive human collaboration and testing. I review some of the common challenges in data cleanup and integration, approaches for understanding dataset structures, and strategies for developing and testing workflows.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: