Data Science EDA with code (Exploratory Data Analysis)

Автор: Balaji Shetty

Загружено: 2025-11-12

Просмотров: 268

Описание:

🔥 Master Data Science EDA (Exploratory Data Analysis) in Python! Complete hands-on tutorial with real student dataset analysis.

📊 PROJECT: Analyzing 10 Engineering Students Performance Data
• Student records with branches, semesters, subject marks, grades
• Missing values handling • Statistical analysis • Pattern discovery

⏱️ DETAILED TIMESTAMPS:

📚 INTRODUCTION (00:00 - 02:51)
00:00 - Introduction to EDA
00:42 - What is EDA? (Being a detective with data)
01:21 - Why EDA? (Find patterns, spot errors, make decisions)
01:44 - Real-Life Examples (Schools, Netflix, Sports, Weather)
02:15 - Our Student Dataset Overview

💻 DATA LOADING AND SETUP (02:51 - 04:58)
02:51 - Starting with Google Colab
02:59 - Import Libraries | import pandas as pd, numpy, matplotlib, seaborn
03:26 - Creating Student Dataset | Dictionary to DataFrame conversion
04:22 - Adding Missing Values | np.nan values
04:46 - Creating DataFrame | pd.DataFrame(data)

👀 VIEWING DATA (05:06 - 07:50)
05:06 - First 5 Records | df.head()
06:36 - Last 3 Records | df.tail(3)
07:03 - Dataset Dimensions | df.shape (10 rows x 15 columns)
07:34 - Shape Attribute | df.shape[0] for rows, df.shape[1] for columns

🔍 DATA TYPES (07:52 - 08:44)
07:52 - Check Column Types | df.dtypes
• Float64 & Object = Text data (names, branches)

🚨 MISSING VALUES (08:46 - 11:18)
08:46 - Count Missing Values | df.isnull().sum()
09:00 - Identify Missing Data (Subject 4: 5 missing, Subject 5: 5 missing)
10:05 - Understanding NaN (Not a Number = Missing value)
10:53 - Missing Value Summary | missing_values.sum()

📊 STATISTICAL ANALYSIS (11:20 - 12:57)
11:20 - Complete Statistics | df.describe() (count, mean, std, min, 25%, 50%, 75%, max)
11:44 - Understanding df.describe() (ONE command for all stats!)
12:14 - Individual Statistics:
• Mean | df['average'].mean()
• Median | df['average'].median()
• Maximum | df['average'].max()
• Minimum | df['average'].min()
• Std Deviation | df['average'].std()
• Variance | df['average'].var()

🎯 BRANCH ANALYSIS (13:00 - 16:12)
13:00 - Average by Branch | df.groupby('branch')['average'].mean()
13:22 - Top 3 Students Overall | df.nlargest(3, 'average')
13:35 - Students Per Branch | df['branch'].value_counts()
14:05 - Count Frequency | .value_counts() function
14:28 - Branch Performance | .groupby().mean().sort_values()
14:54 - Top N Students | df.nlargest(n, 'column')
15:56 - Select Columns | df[['student_id', 'name', 'branch', 'average']]

📈 GRADE DISTRIBUTION (16:14 - 17:42)
16:14 - Grade Count | df['grade'].value_counts()
16:28 - Grade Percentage | df['grade'].value_counts(normalize=True) * 100
16:42 - Filter A+ Students | df[df['grade'] == 'A+']
17:18 - Filtering Technique | df[condition]

🆘 STRUGGLING STUDENTS (17:46 - 18:52)
17:46 - Below 75% Students | df[df['average'] less than 75]
18:18 - Branch Performance | df.groupby('branch')['average'].mean()
18:30 - UG vs PG Analysis | df.groupby('level')['average'].mean()
18:45 - Semester Comparison | df.groupby('semester')['average'].mean()

📋 EDA FUNCTIONS SUMMARY (18:55 - 22:58)
19:01 - Data Loading | pd.read_csv(), pd.read_excel()
19:33 - Quick Overview | df.head(), df.shape
19:45 - Missing Values | df.isnull().sum()
20:00 - Duplicates | df.duplicated().sum()
20:15 - Data Types | df.dtypes
20:27 - Statistics | df.describe()
20:40 - Outliers | sns.boxplot()
21:03 - Correlation | df.corr(), sns.heatmap()
21:21 - Distribution | plt.hist()
21:36 - Relationships | sns.scatterplot()
21:53 - Categories | df['column'].value_counts()
22:10 - Grouping | df.groupby('column').mean()
22:29 - Trends | Time series analysis
22:39 - Cleaning | df.fillna(), df.drop_duplicates()
22:52 - Export | df.to_excel(), df.to_csv()

📊 VISUALIZATION TYPES (23:04 - 24:13)
23:08 - Histogram | plt.hist() - Data distribution
23:16 - Box Plot | sns.boxplot() - Outlier detection
23:28 - Scatter Plot | sns.scatterplot() - Variable relationships
23:43 - Heat Map | sns.heatmap() - Correlation matrix
23:51 - Pair Plot | sns.pairplot() - Multiple comparisons
23:57 - Bar Chart | plt.bar() - Category comparison
24:06 - Pie Chart | plt.pie() - Percentage distribution

24:13 - Next Lecture: Graphs with Live Examples!

📚 ESSENTIAL COMMANDS:
✅ df.head() / df.tail() - View data
✅ df.shape - Dimensions
✅ df.dtypes - Data types
✅ df.isnull().sum() - Missing values
✅ df.describe() - Statistics
✅ df['col'].value_counts() - Frequencies
✅ df[condition] - Filtering
✅ df.groupby('col').mean() - Grouping
✅ df.nlargest(n, 'col') - Top values

💻 SOURCE CODE:
📂 Colab Notebook 1:
https://colab.research.google.com/dri...

📂 Colab Notebook 2:
https://colab.research.google.com/dri...

🎯 Perfect for: Data Science Beginners | Python Students | Engineering Students | Data Analysts

👍 LIKE if helpful! 🔔 SUBSCRIBE for more! 💬 COMMENT your questions!

#DataScience #Python #EDA #PandasTutorial #DataAnalysis #MachineLearning #PythonProgramming #LearnPython #DataScience2024

Data Science EDA with code (Exploratory Data Analysis)

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

OOP-1 Never Forget Python OOP Again! Complete Hands-On Tutorial | Classes, Objects & Constructors

OOP-1 Never Forget Python OOP Again! Complete Hands-On Tutorial | Classes, Objects & Constructors

Data Analytics with Projects Full Free Course (New Updated) by WsCube Tech

Data Analytics with Projects Full Free Course (New Updated) by WsCube Tech

Python Tutorial For Beginners | Edureka

Python Tutorial For Beginners | Edureka

Data visualization with R in 36 minutes

Data visualization with R in 36 minutes

1393 Capital Gain Loss

1393 Capital Gain Loss

Wyjaśniamy o co chodzi z Grenlandią. Czy naprawdę może wybuchnąć wojna USA-Dania?

Wyjaśniamy o co chodzi z Grenlandią. Czy naprawdę może wybuchnąć wojna USA-Dania?

Data Analyst Portfolio Project (Exploratory Data Analysis With Python Pandas)

Data Analyst Portfolio Project (Exploratory Data Analysis With Python Pandas)

I Read Honey's Source Code

I Read Honey's Source Code

The Science of Data Visualization

The Science of Data Visualization

Perbandingan Analisis Regresi Linear Sederhana Dengan Menggunakan Google Collab dan SPSS

Perbandingan Analisis Regresi Linear Sederhana Dengan Menggunakan Google Collab dan SPSS

The Man Behind Google's AI Machine | Demis Hassabis Interview

The Man Behind Google's AI Machine | Demis Hassabis Interview

Исследовательский анализ данных с помощью Pandas Python

Исследовательский анализ данных с помощью Pandas Python

GenAI by Example: Prompt-Based C & Python Code Generation with Mistral AI

GenAI by Example: Prompt-Based C & Python Code Generation with Mistral AI

New York Airbnb EDA Project with Python | Data Analytics Python Resume Project | - Datasets #23/30

New York Airbnb EDA Project with Python | Data Analytics Python Resume Project | - Datasets #23/30

OOP-2 Learn Python OOP in 9 Minutes | Object-Oriented Programming Tutorial for Beginners

OOP-2 Learn Python OOP in 9 Minutes | Object-Oriented Programming Tutorial for Beginners

How to NAIL Exploratory Data Analysis | Playbook Ep. 4

How to NAIL Exploratory Data Analysis | Playbook Ep. 4

Their feelings are really hurt...

Their feelings are really hurt...

Learn Pandas in 30 Minutes - Python Pandas Tutorial

Learn Pandas in 30 Minutes - Python Pandas Tutorial

C Programming | Episode 3 | Operators | Exam Point of View | CCS University

C Programming | Episode 3 | Operators | Exam Point of View | CCS University

7 библиотек визуализации данных Python за 15 минут

7 библиотек визуализации данных Python за 15 минут