Statistics Homework Solutions
Problem
#9643

Multiple Regression

q4data.xls could not uploaded but I copied the data into a Word doc data.doc for you.

Question 4 : The file Q4Data.xlscontains sales data (in thousands of $) for a medical supplies company that sells its products in three regions South (coded 1), West (coded 2) and Midwest (coded 3). Each region is divided into a number of sales territories with a total of 25 territories. Data are also given for advertising (in hundreds of $) and bonuses paid to the salespeople (in hundreds of $). Your task is to prepare for management an analysis showing how the sales vary with advertising and bonus amounts, and across the three sales regions.

a) As a first step, compute summary statistics (mean and five number summary) for sales separately for the three regions. Compare the sales for the three regions based on these statistics.
b) Make side-by-side box plots of sales for the three regions and compare. Are there any outlier sales territories?
c) Run a regression of sales on advertising, bonus and region. Explain why the 1-2-3 coding for region is not correct for fitting the model. What is the correct way to code the regions? Use the correct coding. Choose Midwest as the base region. Is the model a good fit to the data?
d) From the regression output, write separate equations for the three regions. For any fixed amounts spent on advertising and bonuses, which region has the highest sales and by how much compared to the other two regions?
e) Suppose we chose South as the base region instead of Midwest. Without actually doing the regression, tell what will be the new coefficients for Midwest and West. Explain your answer:

Attached file(s):
Attachments
MULTIPLE.doc  View File
data.doc  View File

Attachment Content Summary (Note: view attachment at the above link before purchasing. Actual attachment content may vary slightly from that shown below.)

MULTIPLE.doc
MULTIPLE REGRESSION HANDOUT

sets of observations on all variables we want to fit a model of the
form



), then the multiple regression model is:

.

represents the average difference in salary between males and females,
keeping all other variables fixed.

is say $1000 then it would mean that contribution to your salary is
$1000 if you are in Dept. 1, $2000 if you are in Dept. 2, $3000 if you
are in Dept. 3 and $4000 if you are in Dept. 4. This is certainly not
the case, especially since the labels 1, 2, 3, 4 given to the
departments are completely arbitrary. Note that the Dept is a nominal
variable and there is no implied order (certainly not linearly
increasing order) among the departments.

are say $1000, -$2000 and -$500, then we can say that, keeping
everything else the same (e.g., years of experience, education, gender,
etc.) the advertising salaries are on the average $1000 more than the
sales salaries, the engineering salaries are on the average $2000 less
than the sales salaries and the purchase salaries are on the average
$500 less than the sales salaries.

The p-values for the coefficients given in the output are measures of
statistical significance of the coefficients, the smaller the p-value
(closer to 0) the more significantly different the coefficient is from
zero. Nonsignificant coefficients (those with large p-values, generally
taken to be larger than .05 or .10) may be effectively set equal to 0.
Thus those independent variables may be dropped from the regression
equation. This is not a good method in general to decide which
independent variables to keep in the equation and which ones to drop.
There are better methods available, e.g., stepwise regression. However,
we will not have time to study these more advanced methods. If you want
to know about them for your project or otherwise, please talk to me.

Example 1 (Regression of Salary for Temco Data)

Regression Analysis: Salary versus YrsEm, PriorYr, Educ, Super

The regression equation is

Salary = 24496 + 655 YrsEm - 161 PriorYr + 1636 Educ + 179 Super

Predictor Coef SE Coef T P

Constant 24496 2049 11.95 0.000

YrsEm 655.3 139.6 4.69 0.000

PriorYr -161.1 231.5 -0.70 0.490

Educ 1636.4 406.1 4.03 0.000

Super 178.80 97.59 1.83 0.074

S = 5643 R-Sq = 76.0% R-Sq(adj) = 73.7%

Analysis of Variance

Source DF SS MS F P

Regression 4 4138631127 1034657782 32.49 0.000

Residual Error 41 1305620276 31844397

Total 45 5444251403

Source DF Seq SS

YrsEm 1 3187557844

PriorYr 1 160191

Educ 1 844012874

Super 1 106900218

Unusual Observations

Obs YrsEm Salary Fit SE Fit Residual St
Resid

1 18.0 38985 50785 1745 -11800
-2.20R

2 15.0 32920 49285 1700 -16365
-3.04R

40 22.0 69246 62839 3535 6407
1.46 X

41 27.0 65487 69693 3779 -4206
-1.00 X

42 6.0 48695 45610 4249 3085
0.83 X

R denotes an observation with a large standardized residual

X denotes an observation whose X value gives it large influence.

Since PriorYr and Super are statistically nonsignificant (have p-values
=0.490 and 0.074) let us drop them from the equation, and refit the
model.

Regression Analysis

The regression equation is

Salary = 23177 + 672 YrsEm + 1916 Educ

Predictor Coef StDev T P

Constant 23177 1770 13.10 0.000

YrsEm 672.3 141.7 4.75 0.000

Educ 1916.5 379.3 5.05 0.000

S = 5738 R-Sq = 74.0% R-Sq(adj) = 72.8%

Analysis of Variance

Source DF SS MS F P

Regression 2 4028348130 2014174065 61.17 0.000

Residual Error 43 1415903273 32927983

Total 45 5444251403

Source DF Seq SS

YrsEm 1 3187557844

Educ 1 840790286

Unusual Observations

Obs YrsEm Salary Fit StDev Fit Residual St
Resid

1 18.0 38985 52528 1468 -13543
-2.44R

2 15.0 32920 50511 1477 -17591
-3.17R

40 22.0 69246 57134 1787 12112
2.22R

R denotes an observation with a large standardized residual

Make Indicator Variables option we code the Gender variable to 2 dummy
(0-1) variables: Female in column C9 and Male in column C10. Similarly,
we code the Dept variable to four dummy variables for the four
departments. However, as stated at the beginning, when we do regression
we must include one less dummy variable than those just created. For
gender we use the Female dummy (thus comparing Females with Males) and
for Dept. we use the dummies for Purchase, Advertising and Engineering
(thus comparing these three depts. with Sales). The regression output is
as follows:

Regression Analysis: Salary versus YrsEm, Educ, ...

The regression equation is

Salary = 16286 + 722 YrsEm + 1750 Educ + 1935 Female + 5234 Advertse

+ 8561 Engg + 8636 Purchase

Predictor Coef SE Coef T P

Constant 16286 2050 7.94 0.000

YrsEm 722.4 121.4 5.95 0.000

Educ 1750.2 314.2 5.57 0.000

Female 1935 1427 1.36 0.183

Advertse 5234 2322 2.25 0.030

Engg 8561 1806 4.74 0.000

Purchase 8636 2290 3.77 0.001

S = 4682 R-Sq = 84.3% R-Sq(adj) = 81.9%

Analysis of Variance

Source DF SS MS F P

Regression 6 4589320575 764886762 34.89 0.000

Residual Error 39 854930828 21921303

Total 45 5444251403

Source DF Seq SS

YrsEm 1 3187557844

Educ 1 840790286

Female 1 18019710

Advertse 1 5116152

Engg 1 226193685

Purchase 1 311642897

Unusual Observations

Obs YrsEm Salary Fit SE Fit Residual St
Resid

2 15.0 32920 42875 2069 -9955
-2.37R

40 22.0 69246 60178 1656 9068
2.07R

R denotes an observation with a large standardized residual

Notice that there is not a statistically significant difference between
Males and Females (p = 0.183). Also, all three departments differ
significantly from Sales.

Example 2 (P/E Ratio of Stocks)

Here is some data on 19 company stocks. We want to develop a model for
P/E Ratio as a function of Profit, Growth and Type of Industry.

Company

Profit Growth Industry P/E Ratio

Exxon

6.5 10 1 11.3

Chevron

7 5 1 10

Texaco

3.9 5 1 9.9

Mobil

4.3 7 1 9.7

Amoco

9.8 8 1 10

Pfizer

14.7 12 2 11.9

Bristol Meyers

13.9 14 2 16.2

Merck

20.3 16 2 21

American Home Products 16.9 11 2 13.3

Abbott Laboratories

15.2 18 2 15.5

Eli Lilly

18.7 11 2 18.9

Upjohn

12.8 10 2 14.6

Warner-Lambert

8.7 7 2 16

Amdahl

11.9 4 3 8.4

Digital

9.8 19 3 10.4

Hewlett-Packard

8.1 18 3 14.8

NCR

7.3 6 3 10.1

Unisys

6.9 6 3 7

IBM

9.2 6 3 11.8

Regression Analysis: PERATIO versus PROFIT, GROWTH, Oil, Drug

The regression equation is

PERATIO = 6.70 + 0.183 PROFIT + 0.213 GROWTH + 0.84 Oil + 3.82 Drug

Predictor Coef SE Coef T P

Constant 6.704 2.178 3.08 0.008

PROFIT 0.1827 0.2092 0.87 0.397

GROWTH 0.2128 0.1311 1.62 0.127

Oil 0.835 1.509 0.55 0.589

Drug 3.819 1.779 2.15 0.050

S = 2.309 R-Sq = 69.5% R-Sq(adj) = 60.8%

Analysis of Variance

Source DF SS MS F P

Regression 4 169.887 42.472 7.97 0.001

Residual Error 14 74.650 5.332

Total 18 244.537

Source DF Seq SS

PROFIT 1 131.398

GROWTH 1 13.194

Oil 1 0.724

Drug 1 24.570

The only significant variable is Drug (dummy variable for drug stocks)
which means that the drug stocks differ significantly from computer
stocks.
data.doc
SALES ADV BONUS REGION

963.5 374.27 230.98 1

893 408.5 236.28 1

1057.25 414.31 271.57 1

1183.25 448.42 291.2 2

1419.5 517.88 282.17 3

1547.75 637.6 321.16 3

1580 635.72 294.32 3

1071.5 446.86 305.69 1

1078.25 489.59 238.41 1

1122.5 500.56 271.38 2

1304.75 484.18 332.64 3

1552.25 618.07 261.8 3

1040 453.39 235.63 1

1045.25 440.86 249.68 2

1102.25 487.79 232.99 2

1225.25 537.67 272.2 2

1508 612.21 266.64 3

1564.25 601.46 277.44 3

1634.75 585.1 312.35 3

1159.25 524.56 292.87 1

1202.75 535.17 268.27 2

1294.25 486.03 309.85 2

1467.5 540.17 291.03 3

1583.75 583.85 289.29 3

1124.75 499.15 272.55 2



Solution Summary

The solution answers the question(s) below.

Solution
What is this?
By OTA - Overall OTA Rating
Purchase Cost Now
$2.19 CAD (was ~$59.85)
Included in Download
  • Plain text response
  • Attached file(s):
    • multiple regress.doc
$2.19 Instant Download
Add to Cart
Why you can trust BrainMass.com
  • Your Information is Secure
  • Best Online Academic Help Service
  • Students find real academic Success
Related Solutions
Browse