Data Transformation (Log, square root, cube root, Tukey Ladder, and Boxcox methods ) In R studio

Автор: Wakjira Tesfahun

Загружено: 2021-12-02

Просмотров: 6352

Описание:

Data Transforming
Most parametric tests require that residuals be normally distributed and that the residuals be homoscedastic. One approach when residuals fail to meet these conditions is to transform one or more variables to better follow a normal distribution. Often, just the dependent variable in a model will need to be transformed. However, in complex models and multiple regression, it is sometimes helpful to transform both dependent and independent variables that deviate greatly from a normal distribution.

There is nothing illicit in transforming variables, but you must be careful about how the results from analyses with transformed variables are reported. For example, looking at the turbidity of water across three locations, you might report, “Locations showed a significant difference in log-transformed turbidity.” To present means or other summary statistics, you might present the mean of transformed values, or back transform means to their original units.

Some measurements in nature are naturally normally distributed. Other measurements are naturally log-normally distributed. These include some natural pollutants in water: There may be many low values with fewer high values and even fewer very high values.

For right-skewed data—tail is on the right, positive skew, common transformations include square root, cube root, and log.

For left-skewed data—tail is on the left, negative skew—, common transformations include square root (constant – x), cube root (constant – x), and log (constant – x).

Because log (0) is undefined—as is the log of any negative number—, when using a log transformation, a constant should be added to all values to make them all positive before the transformation. It is also sometimes helpful to add a constant when using other transformations.

Another approach is to use a general power transformation, such as Tukey’s Ladder of Powers or a Box-Cox transformation. These determine a lambda value, which is used as the power coefficient to transform values. X.new = X ^ lambda for Tukey, and X.new = (X ^ lambda – 1) / lambda for Box–Cox.

The function transformTukey in the rcompanion package finds the lambda which makes a single vector of values—that is, one variable—as normally distributed as possible with a simple power transformation.

The Box–Cox procedure is included in the MASS package with the function boxcox. It uses a log-likelihood procedure to find the lambda to use to transform the dependent variable for a linear model (such as an ANOVA or linear regression). It can also be used on a single vector.
Packages used in these tutors
The packages used in this chapter include:
• MASS
• rcompanion
• psych
The following commands will install these packages if they are not already installed:
if(!require(MASS)){install.packages("MASS")}
if(!require(rcompanion)){install.packages("rcompanion")}
if(!require(psych)){install.packages("psych")}
the scrpit for this tutorials!
Data transformation
data=c(1,3,4,5,6,100,233,1000,1500,2000,10000,45000,9000,12000,20000)
library(rcompanion)
plotNormalHistogram(data)
qqnorm(data)
qqline(data,col="blue")
#Square root transformation
data_sqrt= sqrt(data)
library(psych)
skew(data)
plotNormalHistogram(data_sqrt)
#Cube root transformation
data_cub=sign(data) * abs(data)^(1/3)
plotNormalHistogram(data_cub)
#Log transformation
data_log =log(data)
plotNormalHistogram(data_log)
#Tukey's Ladder of Powers transformation
data_tuk =transformTukey(data,plotit=TRUE)
plotNormalHistogram(data_tuk)
#Box-Cox transformation
library(MASS)
Box = boxcox(data~ 1,lambda= seq(-2,2,0.1))
Create a data frame with the results
Cox = data.frame(Box$x, Box$y)
Order the new data frame by decreasing y
Cox2 = Cox[with(Cox, order(-Cox$Box.y)),]
Display the lambda with the greatest
Cox2[1,]
Extract that lambda
lambda = Cox2[1,"Box.x"]
Transform the original data
data_box=(data^lambda-1)/lambda
plotNormalHistogram(data_box)

Data Transformation (Log, square root, cube root, Tukey Ladder, and Boxcox methods ) In R studio

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Статистика 101: Преобразования переменных, преобразование квадратного корня в Excel

Статистика 101: Преобразования переменных, преобразование квадратного корня в Excel

Manipulate your data. Data wrangling. R programmning for beginners.

Manipulate your data. Data wrangling. R programmning for beginners.

027 Optimal Lambda for Box Cox transformation in Excel & R

027 Optimal Lambda for Box Cox transformation in Excel & R

Transforming Nonnormal Data in R

Transforming Nonnormal Data in R

Статистика 101: Преобразования переменных, логарифмическое преобразование в Excel

Статистика 101: Преобразования переменных, логарифмическое преобразование в Excel

Преобразование данных с примером | Преобразование Бокса-Кокса

Преобразование данных с примером | Преобразование Бокса-Кокса

Столбчатая диаграмма с использованием R с планкой погрешности, визуализация данных, GGplot2, пост...

Столбчатая диаграмма с использованием R с планкой погрешности, визуализация данных, GGplot2, пост...

Расчет наследуемости и BLUP α-решетки-MET в R

Расчет наследуемости и BLUP α-решетки-MET в R

Sade - Ultimate

Sade - Ultimate

4 часа Шопена для обучения, концентрации и релаксации

4 часа Шопена для обучения, концентрации и релаксации

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Tidyverse in R - tips & tricks

Tidyverse in R - tips & tricks

Преобразование Бокса-Кокса | № 22 в статистике для науки о данных

Преобразование Бокса-Кокса | № 22 в статистике для науки о данных

Econometrics - Estimating VAR model in R

Econometrics - Estimating VAR model in R

Scarlatti: Sonatas

Scarlatti: Sonatas

Happy January Jazz ~ Relaxing Winter Coffee Music and Bossa Nova Instrumental for Great Mood

Happy January Jazz ~ Relaxing Winter Coffee Music and Bossa Nova Instrumental for Great Mood

Циклы с использованием программирования R

Циклы с использованием программирования R

2 HOURS :: Yann Tiersen, 6 pieces piano

2 HOURS :: Yann Tiersen, 6 pieces piano "Amélie", Piano Cover by @RoseWilson

Smooth Jazz & Soul R&B 24/7 – Soul Flow Instrumentals

Smooth Jazz & Soul R&B 24/7 – Soul Flow Instrumentals

4 Hours Chopin for Studying, Concentration & Relaxation

4 Hours Chopin for Studying, Concentration & Relaxation