R code for Big Data: New Tricks for Econometrics

Автор: Brian Byrne

Загружено: 2021-02-26

Просмотров: 448

Описание:

https://sites.google.com/view/vinegar...

The Munnell et al (1996) HMDA Machine Learning with ctree, Logit Modelling and Random Forest
Varian (2014) uses the following snippets of R Script and compares the relative performance of each of the following approaches: (1) cTree (2) Logit (3) RandomForests
Decision trees often present with the problem of variable selection bias and overfitting. One of more recent algorithms developed to mitigate this problem is relates to the self - pruning embedded in Conditional Inference Trees (CTREE), created by Hothorn, Hornik, and Zeileis (2006). The CTREE algorithm is considered unbiased because it selects the predictors through a "... global null hypothesis of independence between any of the m covariates and the response" (Hothorn et al., 2006, p. 2), followed by using statistical hypothesis testing and their p-values to inspect and choose the best predictors used in each split of the data and, in this way, build the tree. According to the authors: "If the global hypothesis can be rejected, we measure the association between Y and each of the covariates Xj, j = 1, . . . , m, by test statistics or P-values indicating the deviation from the partial hypotheses." (Hothorn et al., 2006, p. 3).
Logistic Regression provides a more traditional framework for solving classification problems. An intuitive explanation is provided here.
Random Forests generate many classification trees. To classify a new object from an input vector, we run that vector through each tree in the forest. Each tree leans towards a classification or exercises a vote. Majority vote wins. Alternatively, average wins when not pursuing classification. Random forest introduces additional randomness when growing the trees. Rather that unearth the most important feature while splitting branches, random forests probe to discover the best feature contained within a random subset of features. This yields greater diversity. When using the HMDA Boston data, the ctree mis-classifies 228 of the 2,380 observations - producing an error rate of 9.6 percent. In comparison, a straight logit model does somewhat better, mis-classifying 225 when predicting, producing an error rate of 9.5 percent. The random forest method mis-classified 223 of the 2,380 cases. Overall, the Random Forest approach produced a marginally better performance relative to the ctree.

R code for Big Data: New Tricks for Econometrics

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Building an algorithm to predict Mortgage approval based on historical HMDA lending practices

Building an algorithm to predict Mortgage approval based on historical HMDA lending practices

Conditional Inference Decision Trees with CTREE in Rstudio

Conditional Inference Decision Trees with CTREE in Rstudio

Как сжимаются изображения? [46 МБ ↘↘ 4,07 МБ] JPEG в деталях

Как сжимаются изображения? [46 МБ ↘↘ 4,07 МБ] JPEG в деталях

Алгоритм случайного леса наглядно объяснен!

Алгоритм случайного леса наглядно объяснен!

Ваш Word и LaTeX устарел. Пора переходить на Typst

Ваш Word и LaTeX устарел. Пора переходить на Typst

Моделирование Монте-Карло

Моделирование Монте-Карло

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Введение в статистику и анализ данных

Введение в статистику и анализ данных

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

Трамп опять презирает Зеленского?

Трамп опять презирает Зеленского?

Мы будем жить до 130 лет! Как создатель Maps.me Юрий Мельничек делает лекарство от старости

Мы будем жить до 130 лет! Как создатель Maps.me Юрий Мельничек делает лекарство от старости

4 часа Шопена для обучения, концентрации и релаксации

4 часа Шопена для обучения, концентрации и релаксации

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Выучите R за 39 минут

Выучите R за 39 минут

Понимание Active Directory и групповой политики

Понимание Active Directory и групповой политики

Экспресс-курс RAG для начинающих

Экспресс-курс RAG для начинающих

Учебник по Excel за 15 минут

Учебник по Excel за 15 минут

Понимание GD&T

Background to Boston HMDA dataset from - Munnell et al (1996) and Varian (2014)

Background to Boston HMDA dataset from - Munnell et al (1996) and Varian (2014)

Статистика стала проще!!! Узнайте о t-критерии, хи-квадрат тесте, p-значении и многом другом

Статистика стала проще!!! Узнайте о t-критерии, хи-квадрат тесте, p-значении и многом другом