Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

Python tutorial: Cumulative Distribution Functions

Автор: DataCamp

Загружено: 2016-11-09

Просмотров: 85241

Описание:

Learn about empirical cumulative distribution functions: https://www.datacamp.com/courses/stat...

We saw in the last video the clarity of bee swarm plots. However, there is a limit to their efficacy. For example, imagine we wanted to plot the county-level voting data for all states east of the Mississippi River and all states west. We make the swarm plot as before, but using a DataFrame that contains all states, with each classified as being east or west of the Mississippi.

The bee swarm plot has a real problem. The edges have overlapping data points, which was necessary in order to fit all points onto the plot. We are now obfuscating data. So, using a bee swarm plot here is not the best option.

As an alternative, we can compute an empirical cumulative distribution function, or ECDF. Again, this is best explained by example. Here is a picture of an ECDF of the percentage of swing state votes that went to Obama. A x-value of an ECDF is the quantity you are measuring, in this case the percent of vote that sent to Obama. The y-value is the fraction of data points that have a value smaller than the corresponding x-value. For example, 20% of counties in swing states had 36% or less of its people vote for Obama. Similarly, 75% of counties in swing states had 50% or less of its people vote for Obama.

Let's look at how to make one of these from our data. The x-axis is the sorted data. We need to generate it using the NumPy function sort, so we need to import Numpy, which we do using the alias np as is commonly done. The we can use np.sort() to generate our x-data. The y-axis is evenly spaced data points with a maximum of one, which we can generate using the np.arange() function and then dividing by the total number of data points. Once we specify the x and y values, we plot the points.

By default, plt.plot() plots lines connecting the data points. To plot our ECDF, we just want points. To achieve this we pass the string '.' and the string 'none' to the keywords arguments marker and linestyle, respectively.

As you remember from my forceful reminder in an earlier video, we label the axes. Finally, we use the plt.margins() function to make sure none of the data points run over the side of the plot area. Choosing a value of 0.02 gives a 2% buffer all around the plot.

The result is the beautiful ECDF I just showed you. We can also easily plot multiple ECDFs on the same plot. For example, here are the ECDFs for the three swing states. We see that Ohio and Pennsylvania were similar, with Pennsylvania having slightly more Democratic counties. Florida, on the other hand, had a greater fraction of heavily Republican counties.

In my workflow, I almost always plot the ECDF first. It shows all the data and gives a complete picture of how the data are distributed. But don't take my word for how great ECDFs are. You can see for yourself in the exercises!

Python tutorial: Cumulative Distribution Functions

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

Calculating a Cumulative Distribution Function (CDF)

Calculating a Cumulative Distribution Function (CDF)

Intro to Data Analysis / Visualization with Python, Matplotlib and Pandas | Matplotlib Tutorial

Intro to Data Analysis / Visualization with Python, Matplotlib and Pandas | Matplotlib Tutorial

Basic Sound Processing in Python | SciPy 2015 | Allen Downey

Basic Sound Processing in Python | SciPy 2015 | Allen Downey

Я в опасности

Я в опасности

Probability Distribution Functions (PMF, PDF, CDF)

Probability Distribution Functions (PMF, PDF, CDF)

Lambda Expressions & Anonymous Functions ||  Python Tutorial  ||  Learn Python Programming

Lambda Expressions & Anonymous Functions || Python Tutorial || Learn Python Programming

Two Effective Algorithms for Time Series Forecasting

Two Effective Algorithms for Time Series Forecasting

Distributions (10 videos)

Distributions (10 videos)

FRM: Terms about distributions: PDF, PMF and CDF

FRM: Terms about distributions: PDF, PMF and CDF

StatQuest: Logistic Regression

StatQuest: Logistic Regression

Weird Habits That Actually Reveal High Intelligence

Weird Habits That Actually Reveal High Intelligence

Python Tutorial: Learn Python For Data Science

Python Tutorial: Learn Python For Data Science

Rolling statistics - p.11 Data Analysis with Python and Pandas Tutorial

Rolling statistics - p.11 Data Analysis with Python and Pandas Tutorial

StatQuest

StatQuest

Python Tutorial: Exploratory Data Analysis

Python Tutorial: Exploratory Data Analysis

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Python Classes and Objects  ||  Python Tutorial  ||  Learn Python Programming

Python Classes and Objects || Python Tutorial || Learn Python Programming

Анализ дифференциальной экспрессии генов в R с DESeq2

Анализ дифференциальной экспрессии генов в R с DESeq2

DOMINACJA CZERWONYCH DIABŁÓW! PIĘĆ GOLI, TRZY NIEUZNANE! MAN UNITED - MAN CITY, SKRÓT MECZU

DOMINACJA CZERWONYCH DIABŁÓW! PIĘĆ GOLI, TRZY NIEUZNANE! MAN UNITED - MAN CITY, SKRÓT MECZU

Мои 25 лучших трюков с пандами

Мои 25 лучших трюков с пандами

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: infodtube@gmail.com