Purchase Solution

Principal Component Analysis and Regression

Not what you're looking for?

Ask Custom Question

TQ4: Principal Component Analysis and Regression
NOTE: Use MATLAB or MINITAB OR SAS and Include all your code
The table attached (see excel file) contains data related to performance and success statistics for LPGA golfers in 2009. The matrix X contains 11 predictor variables:
1. Average drive (yards)
2. Percent of fairways hit
3. Percent of greens reached in regulation
4. Average putts per round
5. Percent of sand saves (2 shots to hole)
6. Tournaments played in
7. Green in regulation putts per hole
8. Completed tournaments
9. Average percentile in tournaments (high is good)
10. Rounds completed
11. Average strokes per round
The column vector y contains the output variable, prize winnings ($1000s). For each variable in x and y.
Explore the input space using PCA. Then you use the principal components to predict
prize winnings ($1000s) with linear regression (PCR).
1. Divide the data into training, test, and validation data sets. Use the same training, test, and validation data sets that you used in TQ3.
2. Perform PCA of the standardized input training data. Analyze the results of the PCA. What can you say about the true dimension of the input space? What do the PCA loadings tell you about relationships between the variables?

3. Develop several competing regression models using the PC scores of the data (PCR). First, select enough PCs to explain at least 95% of the information in the input space. Then, choose the signi_cant PCs based on eigenvalues. Finally, select PCs most useful for predicting prize winnings ($1000s). You should end up with at least three PCR models.
4. Compare the performance of the PCR models using the root mean squared error (RMSE) of the test data set. Select the best model. Explain why it is the best.

5. Find the validation error of your best model.

6. Compare the validation performance of your best PCR model with that of your best regression model from TQ3.

Purchase this Solution

Solution Summary

The Solution uses MINITAB to analyse the LPGA golfer data provided.

Solution provided by:
Education
  • BSc, University of Bucharest
  • MSc, Ovidius
  • MSc, Stony Brook
  • PhD (IP), Stony Brook
Recent Feedback
  • "Thank you "
  • "Thank You Chris this draft really helped me understand correlation."
  • "Thanks for the prompt return. Going into the last meeting tonight before submission. "
  • "Thank you for your promptness and great work. This will serve as a great guideline to assist with the completion of our project."
  • "Thanks for the product. It is an excellent guideline for the group. "
Purchase this Solution


Free BrainMass Quizzes
Terms and Definitions for Statistics

This quiz covers basic terms and definitions of statistics.

Measures of Central Tendency

Tests knowledge of the three main measures of central tendency, including some simple calculation questions.

Know Your Statistical Concepts

Each question is a choice-summary multiple choice question that presents you with a statistical concept and then 4 numbered statements. You must decide which (if any) of the numbered statements is/are true as they relate to the statistical concept.

Measures of Central Tendency

This quiz evaluates the students understanding of the measures of central tendency seen in statistics. This quiz is specifically designed to incorporate the measures of central tendency as they relate to psychological research.