Data Preprocessing for Machine Learning | Go Beyond Basic Cleaning
Автор: Data Geek is my name
Загружено: 2025-09-01
Просмотров: 348
Most beginners stop at dropping duplicates and nulls. But if you want machine learning models to perform at their best, you need advanced data preparation. In this video, I’ll show you step-by-step how to prepare datasets for ML using Python — including imputation, scaling, encoding, outlier treatment, and pipelines. By the end, you’ll know how to transform messy data into high-performance input that powers accurate models.
🔗 Download code & sample DB here: https://github.com/data-geek-lab/beyo...
How to download Anaconda Navigator to use Jupyter Notebook and many other tools for data analytics: • How to Download Anaconda for Jupyter Noteb...
==== Support my channel ====
🔔 Don’t forget to LIKE & SUBSCRIBE for more Python & Data Analysis tutorials!
☕ Want to Buy Me A Coffee: https://buymeacoffee.com/datageekismy...
💎 Donate on PayPal : https://www.paypal.com/donate/?hosted...
== *Great Books For Mastering Data Science and Data Cleaning ==
*Data Science and Machine Learning (providing a Python code)*: Mathematical and Statistical Methods: https://amzn.to/41AqOfT
*Linear Algebra for Data Science, Machine Learning, and Signal Processing*: https://amzn.to/3JFm4Q4
*Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow*: https://amzn.to/45YSE73
Disclaimer: This content is for educational purposes only. Affiliate links may be included, and I may earn a small commission at no extra cost to you. Thank you for supporting the channel!
Timestamps:
0:00 – Intro
1:00 – Review of the csv dataset
1:56 - In Jupyter Notebook Step 1: Import libraries and Load & preview the dataset
3:37 - Step 2: Quick EDA & data types (Shows information of the dataset)
5:26 Step 3: Fix dates & basic schema
6:33 Step 4: Feature engineering from dates
12:23 Step 5: Split features & target
17:29 Step 6: Outlier exploration (numeric) use z-score/IQR to detect outliers
19:30 Step 7: Preprocessing pipeline Impute missing values (KNN for numeric; constant for categorical)
27:08 Optional step: Outlier capping (winsorization) adding a custom transformer to cap extreme values after imputation.
29:33 Step 8: Train a model with preprocessing pipeline - Using Logistic Regression as a baseline to demonstrate how preprocessing and modeling fit together.
35:05 Compare with winsorization variant
38:55 Step 9 Export the preprocessing pipeline (save the fitted preprocessor + model for reuse.
40:17 Outro
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: