linear regression empirical analysis in r

Know about data
Relationship Between Price and Mpg
Linear Regression
Conclusion
Reference

Today we will use R to establish a linear relationship between the price of a car and mpg(mileage).

Know about data

First of all, we import the auto.dta and view the first six observations for checking.

library(haven)
auto <- read_dta("auto.dta")
head(auto)

## # A tibble: 6 x 12
##   make  price   mpg rep78 headroom trunk weight length  turn displacement
##   <chr> <dbl> <dbl> <dbl>    <dbl> <dbl>  <dbl>  <dbl> <dbl>        <dbl>
## 1 AMC ~  4099    22     3      2.5    11   2930    186    40          121
## 2 AMC ~  4749    17     3      3      11   3350    173    40          258
## 3 AMC ~  3799    22    NA      3      12   2640    168    35          121
## 4 Buic~  4816    20     3      4.5    16   3250    196    40          196
## 5 Buic~  7827    15     4      4      20   4080    222    43          350
## 6 Buic~  5788    18     3      4      21   3670    218    43          231
## # ... with 2 more variables: gear_ratio <dbl>, foreign <dbl+lbl>

Relationship Between Price and Mpg

We caculate the correlation between price and mpg. Futhermore, we can draw a scatter plot to sketch their relationship.

cor(auto$price, auto$mpg)

## [1] -0.4685967

scatter.smooth(x=auto$price, y=auto$mpg, main="price ~ mpg",xlab="price",
     ylab="mpg")

Draw a density plot to scan the distribution of price.

library(e1071)
plot(density(auto$price), main="Density Plot: price", ylab="Frequency", sub=paste("Skewness:", round(e1071::skewness(auto$price), 2)))  
polygon(density(auto$price), col="blue")

Linear Regression

Now we find that price and mpg have negative relationship, when mpg increases, price decreases. In order to know it deeply, we use linear regression to analyze.

linearMod <- lm(price ~ mpg, data=auto)
summary(linearMod)

## 
## Call:
## lm(formula = price ~ mpg, data = auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3184.2 -1886.9  -959.8  1359.7  9669.7 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11253.06    1170.81   9.611 1.53e-14 ***
## mpg          -238.89      53.08  -4.501 2.55e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2624 on 72 degrees of freedom
## Multiple R-squared:  0.2196,	Adjusted R-squared:  0.2087 
## F-statistic: 20.26 on 1 and 72 DF,  p-value: 2.546e-05

We use log(price) to replace price and do the previous process again.

auto$lp<-log(auto$price)
scatter.smooth(x=auto$lp, y=auto$mpg, main="log(price) ~ mpg",
     xlab="log(price)", ylab="mpg")

linearMod <- lm(lp ~ mpg, data=auto)
summary(linearMod)

## 
## Call:
## lm(formula = lp ~ mpg, data = auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.58486 -0.25525 -0.08595  0.22935  1.02393 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.349342   0.153489  60.912  < 2e-16 ***
## mpg         -0.033277   0.006958  -4.782 8.93e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.344 on 72 degrees of freedom
## Multiple R-squared:  0.2411,	Adjusted R-squared:  0.2305 
## F-statistic: 22.87 on 1 and 72 DF,  p-value: 8.93e-06

Conclusion

As seen in the results above, we can find that the p-values and t Statistic are statisticaly significant. But simple linear regression isn’t enough for idenifying casual effect, we will discuss it next time.

Reference

r-statistics.co by Selva Prabhakaran

linear regression empirical analysis in r

Know about data

Relationship Between Price and Mpg

Linear Regression

Conclusion

Reference

近期文章

近期评论

标签

热门

文章归档

分类目录

功能