Data Science EDA with code (Exploratory Data Analysis)
Автор: Balaji Shetty
Загружено: 2025-11-12
Просмотров: 268
🔥 Master Data Science EDA (Exploratory Data Analysis) in Python! Complete hands-on tutorial with real student dataset analysis.
📊 PROJECT: Analyzing 10 Engineering Students Performance Data
• Student records with branches, semesters, subject marks, grades
• Missing values handling • Statistical analysis • Pattern discovery
⏱️ DETAILED TIMESTAMPS:
📚 INTRODUCTION (00:00 - 02:51)
00:00 - Introduction to EDA
00:42 - What is EDA? (Being a detective with data)
01:21 - Why EDA? (Find patterns, spot errors, make decisions)
01:44 - Real-Life Examples (Schools, Netflix, Sports, Weather)
02:15 - Our Student Dataset Overview
💻 DATA LOADING AND SETUP (02:51 - 04:58)
02:51 - Starting with Google Colab
02:59 - Import Libraries | import pandas as pd, numpy, matplotlib, seaborn
03:26 - Creating Student Dataset | Dictionary to DataFrame conversion
04:22 - Adding Missing Values | np.nan values
04:46 - Creating DataFrame | pd.DataFrame(data)
👀 VIEWING DATA (05:06 - 07:50)
05:06 - First 5 Records | df.head()
06:36 - Last 3 Records | df.tail(3)
07:03 - Dataset Dimensions | df.shape (10 rows x 15 columns)
07:34 - Shape Attribute | df.shape[0] for rows, df.shape[1] for columns
🔍 DATA TYPES (07:52 - 08:44)
07:52 - Check Column Types | df.dtypes
• Float64 & Object = Text data (names, branches)
🚨 MISSING VALUES (08:46 - 11:18)
08:46 - Count Missing Values | df.isnull().sum()
09:00 - Identify Missing Data (Subject 4: 5 missing, Subject 5: 5 missing)
10:05 - Understanding NaN (Not a Number = Missing value)
10:53 - Missing Value Summary | missing_values.sum()
📊 STATISTICAL ANALYSIS (11:20 - 12:57)
11:20 - Complete Statistics | df.describe() (count, mean, std, min, 25%, 50%, 75%, max)
11:44 - Understanding df.describe() (ONE command for all stats!)
12:14 - Individual Statistics:
• Mean | df['average'].mean()
• Median | df['average'].median()
• Maximum | df['average'].max()
• Minimum | df['average'].min()
• Std Deviation | df['average'].std()
• Variance | df['average'].var()
🎯 BRANCH ANALYSIS (13:00 - 16:12)
13:00 - Average by Branch | df.groupby('branch')['average'].mean()
13:22 - Top 3 Students Overall | df.nlargest(3, 'average')
13:35 - Students Per Branch | df['branch'].value_counts()
14:05 - Count Frequency | .value_counts() function
14:28 - Branch Performance | .groupby().mean().sort_values()
14:54 - Top N Students | df.nlargest(n, 'column')
15:56 - Select Columns | df[['student_id', 'name', 'branch', 'average']]
📈 GRADE DISTRIBUTION (16:14 - 17:42)
16:14 - Grade Count | df['grade'].value_counts()
16:28 - Grade Percentage | df['grade'].value_counts(normalize=True) * 100
16:42 - Filter A+ Students | df[df['grade'] == 'A+']
17:18 - Filtering Technique | df[condition]
🆘 STRUGGLING STUDENTS (17:46 - 18:52)
17:46 - Below 75% Students | df[df['average'] less than 75]
18:18 - Branch Performance | df.groupby('branch')['average'].mean()
18:30 - UG vs PG Analysis | df.groupby('level')['average'].mean()
18:45 - Semester Comparison | df.groupby('semester')['average'].mean()
📋 EDA FUNCTIONS SUMMARY (18:55 - 22:58)
19:01 - Data Loading | pd.read_csv(), pd.read_excel()
19:33 - Quick Overview | df.head(), df.shape
19:45 - Missing Values | df.isnull().sum()
20:00 - Duplicates | df.duplicated().sum()
20:15 - Data Types | df.dtypes
20:27 - Statistics | df.describe()
20:40 - Outliers | sns.boxplot()
21:03 - Correlation | df.corr(), sns.heatmap()
21:21 - Distribution | plt.hist()
21:36 - Relationships | sns.scatterplot()
21:53 - Categories | df['column'].value_counts()
22:10 - Grouping | df.groupby('column').mean()
22:29 - Trends | Time series analysis
22:39 - Cleaning | df.fillna(), df.drop_duplicates()
22:52 - Export | df.to_excel(), df.to_csv()
📊 VISUALIZATION TYPES (23:04 - 24:13)
23:08 - Histogram | plt.hist() - Data distribution
23:16 - Box Plot | sns.boxplot() - Outlier detection
23:28 - Scatter Plot | sns.scatterplot() - Variable relationships
23:43 - Heat Map | sns.heatmap() - Correlation matrix
23:51 - Pair Plot | sns.pairplot() - Multiple comparisons
23:57 - Bar Chart | plt.bar() - Category comparison
24:06 - Pie Chart | plt.pie() - Percentage distribution
24:13 - Next Lecture: Graphs with Live Examples!
📚 ESSENTIAL COMMANDS:
✅ df.head() / df.tail() - View data
✅ df.shape - Dimensions
✅ df.dtypes - Data types
✅ df.isnull().sum() - Missing values
✅ df.describe() - Statistics
✅ df['col'].value_counts() - Frequencies
✅ df[condition] - Filtering
✅ df.groupby('col').mean() - Grouping
✅ df.nlargest(n, 'col') - Top values
💻 SOURCE CODE:
📂 Colab Notebook 1:
https://colab.research.google.com/dri...
📂 Colab Notebook 2:
https://colab.research.google.com/dri...
🎯 Perfect for: Data Science Beginners | Python Students | Engineering Students | Data Analysts
👍 LIKE if helpful! 🔔 SUBSCRIBE for more! 💬 COMMENT your questions!
#DataScience #Python #EDA #PandasTutorial #DataAnalysis #MachineLearning #PythonProgramming #LearnPython #DataScience2024
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: