Applied Multivariate
Statistical Analysis - RICHARD A. JOHNSON
University of Wisconsin-Madison
DEAN W. WICHERN
Texas A&M University
Contents
1 ASPECTS OF MULTIVARIATE ANALYSIS
1.1 Introduction 1
1.2 Applications of Multivariate Techniques 3
1.3 The Organization of Data 5
Arrays, 5
Descriptive Statistics, 6
Graphical Techniques, 11
1.4 Data Displays and Pictorial Representations 19
Linking Multiple Two-Dimensional Scatter Plots, 20
Graphs of Growth Curves, 24
Stars, 25
Chernoff Faces, 28
1.5 Distance 30
1.6 Final Comments 38
Exercises 38
References 48
2 MATRIX ALGEBRA AND RANDOM VECTORS
2.1 Introduction 50
2.2 Some Basics of Matrix and Vector Algebra 50
Vectors, 50
Matrices, 55
2.3 Positive Definite Matrices 61
2.4 A Square-Root Matrix 66
2.5 Random Vectors and Matrices 67
2.6 Mean Vectors and Covariance Matrices 68
Partitioning the Covariance Matrix, 74
The Mean Vector and Covariance Matrix
for Linear Combinations of Random Variables, 76
Partitioning the Sample Mean Vector
and Covariance Matrix, 78
2.7 Matrix Inequalities and Maximization 79
XV
1
50
vii
viii Contents
Supplement 2A: Vectors and Matrices: Basic Concepts 84
Vectors, 84
Matrices, 89
Exercises 104
References 111
3 SAMPLE GEOMETRY AND RANDOM SAMPLING
3.1 Introduction 112
3.2 The Geometry of the Sample 112
3.3 Random Samples and the Expected Values of the Sample Mean and
Covariance Matrix 120
3.4 Generalized Variance 124
Situations in which the Generalized Sample Variance Is Zero, 130
Generalized Variance Determined by I R I
and Its Geometrical Interpretation, 136
Another Generalization ofVariance, 138
3.5 Sample Mean, Covariance, and Correlation
As Matrix Operations 139
3.6 Sample Values of Linear Combinations of Variables 141
Exercises 145
References 148
4 THE MULTIVARIATE NORMAL DISTRIBUTION
4.1 Introduction 149
4.2 The Multivariate Normal Density and Its Properties 149
Additional Properties of the Multivariate
Normal Distribution, 156
4.3 Sampling from a Multivariate Normal Distribution
and Maximum Likelihood Estimation 168
The Multivariate Normal Likelihood, 168
Maximum Likelihood Estimation of JL and I, 170
Sufficient Statistics, 173
4.4 The Sampling Distribution of X and S 173
Properties of the Wishart Distribution, 174
4.5 Large-Sample Behavior of X and S 175
4.6 Assessing the Assumption of Normality 177
Evaluating the Normality of the Univariate Marginal Distributions, 178
Evaluating Bivariate Normality, 183
4.7 Detecting Outliers and Cleaning Data 189
Steps for Detecting Outliers, 190
4.8 Transformations To Near Normality 194
Transforming Multivariate Observations, 198
Exercises 202
References 209
112
149
Contents ix
5 INFERENCES ABOUT A MEAN VECTOR
5.1 Introduction 210
5.2 The Plausibility of Ito as a Value for a Normal
Population Mean 210
5.3 Hotelling's T
2 and Likelihood Ratio Tests 216
General Likelihood Ratio Method, 219
5.4 Confidence Regions and Simultaneous Comparisons
of Component Means 220
Simultaneous Confidence Statements, 223
A Comparison of Simultaneous Confidence Intervals
with One-at-a-Time Intervals, 229
The Bonferroni Method of Multiple Comparisons, 232
5.5 Large Sample Inferences about a Population Mean Vector 234
5.6 Multivariate Quality Control Charts 239
Charts for Monitoring a Sample of Individual Multivariate Observations
for Stability, 241
Control Regions for Future Individual Observations, 247
Control Ellipse for Future Observations, 248
T
2
-Chart for Future Observations, 248
Control Charts Based on Subsample Means, 249
Control Regions for Future Subsample Observations, 251
5.7 Inferences about Mean Vectors
when Some Observations Are Missing 252
5.8 Difficulties Due to Time Dependence
in Multivariate Observations 256
Supplement SA: Simultaneous Confidence Intervals and Ellipses
as Shadows of the p-Dimensional Ellipsoids 258
Exercises 260
References 270
6 COMPARISONS OF SEVERAL MULTIVARIATE MEANS
6.1 Introduction 272
6.2 Paired Comparisons and a Repeated Measures Design 272
Paired Comparisons, 272
A Repeated Measures Design for Comparing Treatments, 278
6.3 Comparing Mean Vectors from Two Populations 283
Assumptions Concerning the Structure of the Data, 283
Further Assumptions when n1 and n2 Are Small, 284
Simultaneous Confidence Intervals, 287
The Two-Sample Situation when, 290
6.4 Comparing Several Multivariate Population Means
(One-Way Manova) 293
Assumptions about the Structure of the Data for One-way MAN OVA, 293
A Summary of Univariate AN OVA, 293
Multivariate Analysis of Variance (MAN OVA), 298
210
272
x Contents
6.5 Simultaneous Confidence Intervals for Treatment Effects 305
6.6 Two-Way Multivariate Analysis of Variance 307
Univariate Two-Way Fixed-Effects Model with Interaction, 307
Multivariate Two-Way Fixed-Effects Model with Interaction, 309
6.7 Profile Analysis 318
6.8 Repeated Measures Designs and Growth Curves 323
6.9 Perspectives and a Strategy for Analyzing
Multivariate Models 327
Exercises 332
References 352
7 MULTIVARIATE LINEAR REGRESSION MODELS
7.1 Introduction 354
7.2 The Classical Linear Regression Model 354
7.3 Least Squares Estimation 358
Sum-of-Squares Decomposition, 360
Geometry of Least Squares, 361
Sampling Properties of Classical Least Squares Estimators, 363
7.4 Inferences About the Regression Model 365
Inferences Concerning the Regression Parameters, 365
Likelihood Ratio Tests for the Regression Parameters, 370
7.5 Inferences from the Estimated Regression Function 374
Estimating the Regression Function at z0, 374
Forecasting a New Observation at z0, 375
7.6 Model Checking and Other Aspects of Regression 377
Does the Model Fit?, 377
Leverage and Influence, 380
Additional Problems in Linear Regression, 380
7.7 Multivariate Multiple Regression 383
Likelihood Ratio Tests for Regression Parameters, 392
Other Multivariate Test Statistics, 395
Predictions from Multivariate Multiple Regressions, 395
7.8 The Concept of Linear Regression 398
Prediction of Several Variables, 403
Partial Correlation Coefficient, 406
7.9 Comparing the Two Formulations of the Regression Model 407
Mean Corrected Form of the Regression Model, 407
Relating the Formulations, 409
7.10 Multiple Regression Models with Time Dependent Errors 410
Supplement 7 A: The Distribution of the Likelihood Ratio
for the Multivariate Multiple Regression Model 415
Exercises 417
References 424
354
Contents xi
8 PRINCIPAL COMPONENTS
8.1 Introduction 426
8.2 Population Principal Components 426
Principal Components Obtained from Standardized Variables, 432
Principal Components for Covariance Matrices
with Special Structures, 435
8.3 Summarizing Sample Variation by Principal Components 437
The Number of Principal Components, 440
Interpretation of the Sample Principal Components, 444
Standardizing the Sample Principal Components, 445
8.4 Graphing the Principal Components 450
8.5 Large Sample Inferences
A
452
Large Sample Properties of Ai and ej, 452
Testing for the Equal Correlation Structure, 453
8.6 Monitoring Quality with Principal Components 455
Checking a Given Set of Measurements for Stability, 455
Controlling Future Values, 459
Supplement 8A: The Geometry of the Sample Principal
Component Approximation 462
The p-Dimensional Geometrical Interpretation, 464
The n-Dimensional Geometrical Interpretation, 465
Exercises 466
References 475
9 FACTOR ANALYSIS AND INFERENCE
FOR STRUCTURED COVARIANCE MATRICES
9.1 Introduction 477
9.2 The Orthogonal Factor Model 478
9.3 Methods of Estimation 484
The Principal Component (and Principal Factor) Method, 484
A Modified Approach-the Principal Factor Solution, 490
The Maximum Likelihood Method, 492
A Large Sample Test for the Number of Common Factors, 498
9.4 Factor Rotation 501
Oblique Rotations, 509
9.5 Factor Scores 510
The Weighted Least Squares Method, 511
The Regression Method, 513
9.6 Perspectives and a Strategy for Factor Analysis 517
9.7 Structural Equation Models 524
The LISREL Model, 525
Construction of a Path Diagram, 525
Covariance Structure, 526
Estimation, 527
Model-Fitting Strategy, 529
426
477
xii Contents
Supplement 9A: Some Computational Details
for Maximum Likelihood Estimation 530
Recommended Computational Scheme, 531
Maximum Likelihood Estimators of p = LzL'z + \flz, 532
Exercises 533
References 541
10 CANONICAL CORRELATION ANALYSIS
10.1 Introduction 543
10.2 Canonical Variates and Canonical Correlations 543
10.3 Interpreting the Population Canonical Variables 551
Identifying the Canonical Variables, 551
Canonical Correlations as Generalizations
of Other Correlation Coefficients, 553
The First r Canonical Variables as a Summary of Variability, 554
A Geometrical Interpretation of the Population Canonical
Correlation Analysis 555
10.4 The Sample Canonical Variates and Sample
Canonical Correlations 556
10.5 Additional Sample Descriptive Measures 564
Matrices of Errors of Approximations, 564
Proportions of Explained Sample Variance, 567
10.6 Large Sample Inferences 569
Exercises 573
References 580
11 DISCRIMINATION AND CLASSIFICATION
11.1 Introduction 581
11.2 Separation and Classification for Two Populations 582
11.3 Classification with Two Multivariate Normal Populations 590
Classification of Normal Populations When I1 = I2 = I, 590
Scaling, 595
Classification of Normal Populations When I1 #:- I2, 596
11.4 Evaluating Classification Functions 598
11.5 Fisher's Discriminant Function-Separation of Populations 609
11.6 Classification with Several Populations 612
The Minimum Expected Cost of Misclassification Method, 613
Classification with Normal Populations, 616
11.7 Fisher's Method for Discriminating
among Several Populations 628
Using Fisher's Discriminants to Classify Objects, 635
11.8 Final Comments 641
Including Qualitative Variables, 641
Classification Trees, 641
Neural Networks, 644
543
581
Selection ofVariables, 645
Testing for Group Differences, 645
Graphics, 646
Practical Considerations Regarding Multivariate Normality, 646
Exercises 647
References 666
12 CLUSTERING, DISTANCE METHODS, AND ORDINATION
12.1 Introduction 668
12.2 Similarity Measures 670
Distances and Similarity Coefficients for Pairs of Items, 670
Similarities and Association Measures
for Pairs ofVariables, 676
Concluding Comments on Similarity, 677
12.3 Hierarchical Clustering Methods 679
Single Linkage, 681
Complete Linkage, 685
Average Linkage, 689
Ward's Hierarchical Clustering Method, 690
Final Comments-Hierarchical Procedures, 693
12.4 Nonhierarchical Clustering Methods 694
K-means Method, 694
Final Comments-Nonhierarchical Procedures, 698
12.5 Multidimensional Scaling 700
The Basic Algorithm, 700
12.6 Correspondence Analysis 709
Algebraic Development of Correspondence Analysis, 711
Inertia, 718
Interpretation in Two Dimensions, 719
Final Comments, 719
12.7 Biplots for Viewing San1pling Units and Variables 719
Constructing Biplots, 720
12.8 Procrustes Analysis: A Method
for Comparing Configurations 723
Constructing the Procrustes Measure of Agreement, 724
Supplement 12A: Data Mining 731
Introduction, 731
The Data Mining Process, 732
Model Assessment, 733
Exercises 738
References 7 45
APPENDIX
DATA INDEX
SUBJECT INDEX
Contents xiii
668
748
758
761
Tags : Book Applied Multivariate Statistical Analysis Pdf download Book Applied Multivariate Statistical Analysis by RICHARD A. JOHNSON, DEAN W. WICHERN Pdf download Author RICHARD A. JOHNSON, DEAN W. WICHERN written the book namely Applied Multivariate Statistical Analysis Author RICHARD A. JOHNSON, DEAN W. WICHERN Pdf download Study material of Applied Multivariate Statistical Analysis Pdf download Lacture Notes of Applied Multivariate Statistical Analysis Pdf