University
of Applied Sciences, Frankfurt
Department
of Computer Science & Engineering 
Md. Kabir Hosen
Idea: Forming a model equation with
multiple Regression analysis on the observed data collected and the predicted
value. Calculation of residual errors, scatter plot, descriptive statistics,
mean, median, ratio and correlation in R.
Questionnaire:
1.    
Respondent
Name: 
………………………………………………….
2.      What is
your age group?
i.                  
Under 18 
ii.                
18-26
iii.             
27-35
iv.             
36-others
3.     Living place ………………………………………………………….
4.     Gender
                              
i.           
Male
                            
ii.           
Female
                         
iii.           
Others………………………………………………………
5.     Occupation
                              
i.           
Student
                            
ii.           
Employee
                         
iii.           
others
6.     Your Favorite Fast Food?
                              
i.           
KFC
                            
ii.           
Mc Donald
                         
iii.           
Pizza Hut
                         
iv.           
Burger King
                            
v.           
Others………………………………………………………
7.     Price is
                              
i.           
Cheap
                            
ii.           
Average
                         
iii.           
Good
                         
iv.           
Outstanding
8.     Quality of Service
                                
i.           
Good 
                              
ii.           
Very good
                           
iii.           
Excellent
                           
iv.           
Others……………………………………………………..
9.     Test of food
                              
i.           
Good 
                            
ii.           
Very good
                         
iii.           
Excellent
                         
iv.           
Others………………………………………………………
10.                       
How many times do you go
to your fast food restaurant in per month?
                              
i.           
1 - 2
                            
ii.           
3-5
                         
iii.           
6-10
                         
iv.           
More than 10
11.                       
 Which
one of the reasons you go to your restaurant?
i.                    
Special occasion (birthday, holiday)
ii.                   
Regular Meal
iii.                 
Business Lunch
iv.                
Just for the food
12.                       
Overall Satisfaction 
                              
i.           
Good
                            
ii.           
Very good
                         
iii.           
Excellent
                         
iv.           
Others………………………………………………….
Response
Variable:
 Response variable is “Favorite Fast Food”.
 Prediction:                       
We are going to predict the “Favorite Fast Food” according to the
customer feedback data Ex. "Age", "Gender",
"Occupation", "Price", "Quality of Service",
"Taste of Food", "Monthly Restaurant Visit", "Reasons
for Restaurant Visit" & “Satisfaction". 
Aims
of a Successful Guest Survey: 
The survey will undertake to:
1.
Measure overall customer satisfaction.
2. Learn about the customer.
3. Identify buying habits and dining patterns.
5. Find out why customers visit restaurant.
6. Learn what influences guest purchase decisions.
7. Learn what guests believe you do well and not so well.
8. Discover what we can do to improve operations.
9. Identify processes for change that will improve customer satisfaction.
10. How to increase customer loyalty.
2. Learn about the customer.
3. Identify buying habits and dining patterns.
5. Find out why customers visit restaurant.
6. Learn what influences guest purchase decisions.
7. Learn what guests believe you do well and not so well.
8. Discover what we can do to improve operations.
9. Identify processes for change that will improve customer satisfaction.
10. How to increase customer loyalty.
11.
Finally Measure which Fast Food we are going to launch. 
“Favorite Fast Food Prediction with Live Data”
……………………………………………………………………………
Introduction:
A regression with two or more
explanatory variables is called a multiple regression. Rather than modeling the
mean response as a straight line, as in simple regression, it is now modeled as
a function of several explanatory variables. The function lm can be used to
perform multiple linear regression in R and much of the syntax is the same as
that used for fitting simple linear regression models. To perform multiple
linear regression with p explanatory variables use the command: 
>lm(response ~ explanatory_1 +
explanatory_2 + … + explanatory_p) 
Here the terms response and explanatory_i
in the function should be replaced by the names of the response and
explanatory variables, respectively, used in the analysis.
 Ex. Data was collected on 50 guest
recently sold in the Frankfurt city. It consisted of the "Age" ,
"Gender", "Occupation", "Fav_FastFood",
"Price", "Q_Service", "Taste_Food",    "Monthly_Visit",
"Reasons_Visit" & 
"Satisfaction". 
The
following program reads in the data.
>data1<-read.csv(file.choose(),header=T)  # Read data
from Guest Feedback Excel CSV File
>data1
 Suppose we are only interested in
working with a subset of the variables (e.g. “Fav_FastFood” , “Price”, and
“Age”). It is possible (but not necessary) to construct a new data frame
consisting solely of these values using the commands:
>
myvars=c('Fav_FastFood','Age', 'Price')
> Guestdata=data1[myvars]
> names(Guestdata)
[1]      "Fav_FastFood"  "Age"          "Price"       
> Guestdata
   Fav_FastFood
  Age   Price
1             3 
                     3        2
2             2 
                    2        2
3             2  
                  2        2
4             2  
                  3        2
5             1  
                  2        2
6             3  
                  2        2
7             4  
                  2        2 
………………………….
………………………..up to 50 reading
Before
fitting our regression model we want to investigate how the variables are
related to one another. We can do this graphically by constructing scatter
plots of all pair-wise combinations of variables in the data frame. This can be
done by typing:
Guestdata=c(”Fav_FastFood”,”Age”,”Price”)
>plot(Guestdata)
To fit a multiple linear regression
model with “Fav_FastFood” as the response / dependent variable and “Age” and “Price” as the explanatory / independent variables, use the
command: 
> Guestdata=(lm(Fav_FastFood~Age+Price))             
> Guestdata
Call:
lm(formula = Fav_FastFood ~
Age + Price)
Coefficients:
(Intercept)          Age        Price  
     4.5163      -0.9469       0.2334 
This output indicates that the fitted
value is given by, Y^=4.5163 + -0.9469x1 + 0.2334x2
Inference in the multiple regression setting is typically performed in a number of steps. We begin by testing whether the explanatory variables collectively have an effect on the response variable, i.e.
H0:
β1=β2=….βp=0
If we can reject this hypothesis, we continue by testing whether the individual regression coefficients are significant while controlling for the other variables in the model.
We can access
the results of each test by typing:
>
Guestdata=(lm(Fav_FastFood~Age+Price))                # Reduced Model
>
summary(Guestdata)
Call:
lm(formula
= Fav_FastFood ~ Age + Price)
Residuals:
   
Min        1Q              Median       3Q                        Max
-2.3226        -1.0892      -0.1158
       1.4459        2.0910 
Coefficients:
                             Estimate     Std. Error  
t value       Pr(>|t|)    
(Intercept)
                     4.5163    
    0.9586  
   4.711           2.22e-05 ***
Age                       -0.9469        
0.3353     -2.824          0.00694 ** 
Price                      0.2334         0.2822 
     0.827          0.41237    
---
Signif.
codes:  0 ‘***’ 0.001
‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual
standard error:
1.373 on 47 degrees of freedom
Multiple
R-squared:  0.1478,   
Adjusted R-squared:  0.1115 
F-statistic:
4.075
on 2 and 47 DF, p-value: 0.02333
The output shows that F = 4.075 (p < 0.02333), indicating that we should clearly accept the null hypothesis that the variable Age collectively have effect on Fav_FastFood. But Price has no effect on Fav_FastFood (response variable).In addition, the output also shows that R2= 0.1478 and R2 adjusted = 0.1115.
Testing a subset of variables using a partial F-test
Sometimes we are
interested in simultaneously testing whether a certain subset of the
coefficients are equal to 0 (e.g. 3 = 4 = 0). We can do this using a partial F-test. This test involves comparing
the SSE from a reduced model
(excluding the parameters we hypothesis are equal to zero) with the SSE from
the full model (including all of the
parameters).
In R we can perform partial F-tests by
fitting both the reduced and full models separately and thereafter comparing
them using the anova function. 
Ex. Suppose we include the variables
“Age”, “Price”"Gender", "Occupation",
"Q_Service", "Taste_Food", 
"Monthly_Visit", "Reasons_Visit" &
"Satisfaction" in our model and are interested in testing whether the
"Gender", "Occupation", "Q_Service",
"Taste_Food",  "Monthly_Visit", “Price”,
"Reasons_Visit" & "Satisfaction" are not significant
after taking “Age” into consideration. 
#
Reduced Model
>
reduced=(lm(Fav_FastFood~Price+Age))
>
reduced
Call:
lm(formula = Fav_FastFood ~ Price + Age)
Coefficients:
(Intercept)        Price          Age 
     4.5163       0.2334      -0.9469 
   
>
summary(reduced)
Call:
lm(formula = Fav_FastFood ~ Age + Price)
Residuals:
    Min     
1Q  Median      3Q    
Max 
-2.3226 -1.0892 -0.1158  1.4459 
2.0910 
Coefficients:
            Estimate   
Std. Error  t value                   Pr(>|t|)    
(Intercept)   4.5163    
0.9586           4.711                             2.22e-05
***
Age          -0.9469     0.3353     -2.824                  0.00694 ** 
Price         0.2334     0.2822       0.827                   0.41237   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1
Residual standard error: 1.373 on 47
degrees of freedom
Multiple R-squared:  0.1478,   
Adjusted R-squared:  0.1115 
F-statistic: 4.075 on 2 and 47 DF,  p-value: 0.02333
 #
Full Model 
>
attach(full)
>full=(lm(Fav_FastFood~Price+Age+Gender+Occupation+Q_Service+Taste_Food+Monthly_Visit+Reasons_Visit+Satisfaction))
> full
Call:
lm(formula =
Fav_FastFood ~ Price + Age + Gender + Occupation + 
    Price +
Q_Service + Taste_Food + Monthly_Visit + Reasons_Visit + 
    Satisfaction)
Coefficients:
  (Intercept)          Price            Age         Gender                   Occupation      Q_Service     
      5.14578        0.17980       -1.10841       -0.24159       -0.57401       -0.13015 
Taste_Food    Monthly_Visit        Reasons_Visit    Satisfaction  
       0.09593         -0.25246                          0.22270         0.29820  
> summary(full)
Call:
lm(formula =
Fav_FastFood ~ Price + Age + Gender + Occupation + 
    Price +
Q_Service + Taste_Food + Monthly_Visit + Reasons_Visit + 
    Satisfaction)
Residuals:
     Min         1Q             Median        3Q              Max 
-2.31351     -1.12243     -0.06685
     0.87608       2.15450 
Coefficients:
              Estimate                    Std. Error   t value         Pr(>|t|)   
(Intercept)    5.14578   
2.97431   1.730               0.09133 . 
Price          0.17980     0.30538      0.589              0.55933   
Age           -1.10841    0.38296    
-2.894              0.00613 **
Gender        -0.24159    0.41801   
-0.578            0.56654   
Occupation    -0.57401   
1.71982   -0.334                    0.74030   
Q_Service     -0.13015   
0.24642   -0.528            0.60029   
Taste_Food     0.09593   
0.29604   0.324            0.74760   
Monthly_Visit
-0.25246    0.36233   -0.697 
                0.48997   
Reasons_Visit  0.22270   
0.27519    0.809                   0.42314   
Satisfaction   0.29820   
0.29773    1.002             0.32257  
---
Signif.
codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
‘.’ 0.1 ‘ ’ 1
Residual
standard error: 1.437 on 40 degrees of freedom
Multiple
R-squared:  0.2054,    Adjusted R-squared:  0.02659 
F-statistic:
1.149 on 9 and 40 DF, p-value:
0.3531
#Compare the Models
> anova(reduced, full)                                
Analysis of
Variance Table
Model 1: Fav_FastFood ~ Price + Age
Model 2: Fav_FastFood ~ Price + Age + Gender +
Occupation + Price + Q_Service + Taste_Food + Monthly_Visit + Reasons_Visit +
Satisfaction
  Res.Df    RSS     Df
              Sum of Sq      F              Pr(>F)
1     47 88.562                           
2     40 82.577       7                  5.9849      0.4142         0.8878
The output shows
the results of the partial F-test. Since F=
0.4142 (p-value=0.8878) we can
reject the null hypothesis (3 = 4 = 0) at the 5% level of significance. It
appears that the variables "Gender", "Occupation",”Price”
"Q_Service", "Taste_Food", 
"Monthly_Visit", "Reasons_Visit" & "Satisfaction"
do contribute significant information to the “Favorite Fast Food” once the
variable “Age” has not taken into consideration.
Confidence
and Prediction Intervals
We often use our
regression models to estimate the mean response or predict future values of the
response variable for certain values of the response variables. The function
predict() can be used to make both confidence intervals for the mean response
and prediction intervals. To make confidence intervals for the mean response
use the option interval=”confidence”. To make a prediction interval use the
option interval=”prediction”. By default this makes 95% confidence and
prediction intervals. If you instead want to make a 99% confidence or
prediction interval use the option level=0.99.
Ex. Obtain a 95% confidence interval
for the mean Fav_FastFood of Age whose level is 2 and Price level is 2).
>
reduced=(lm(Fav_FastFood~Price+Age))
>
predict(reduced,data.frame(Age=2,Price=2),interval="confidence")
           fit                 lwr    
                 upr
1        3.08924       2.615599             3.56288
A 95% confidence interval is given by
(2.615599, 3.56288) 
Ex. Obtain a 95% prediction interval
for the mean Fav_FastFood of Age whose level is 2 and Price level is 2
>
predict(reduced,data.frame(Age=2,Price=2),interval="prediction")
      fit    
               lwr     
                     upr
1  
3.08924           0.287413                5.891067
A 95% prediction interval is given by
(0.287413, 5.891067). 
Note
that
this is quite a bit wider than the
confidence interval, indicating that the variation about the mean is fairly
large.
Conclusion:
After consideration of all scenarios we
formulate our multiple regression model equation and we observed that only “Age” (independent variable) has the significant
impact on choosing the Favorite Fast
Food (response variable). 
More: Contact: kabircse115@gmail.com
 
No comments:
Post a Comment