The correspondence analysis of a two-way contingency table is now accepted as a very versatile tool for helping users to understand the structure of the association in their data. In cases where the variables consist of ordered categories, there are a number of approaches that can be employed and these generally involve an adaptation of singular value decomposition. Over the last few years, an alternative decomposition method has been used for cases where the row and column variables of a two-way contingency table have an ordinal structure. A version of this approach is also available for a two-way table where one variable has a nominal structure and the other variable has an ordinal structure. However, such an approach does not take into consideration the presence of the nominal variable. This paper explores an approach to correspondence analysis using an amalgamation of singular value decomposition and bivariate moment decomposition. A benefit of this technique is that it combines the classical technique with the ordinal analysis by determining the structure of the variables in terms of singular values and location, dispersion and higher-order moments.

The analysis of categorical data is a very important
component in statistics, and the presence of ordered variables is a common
feature. Models and measures of association for ordinal categorical variables
have been extensively discussed in the literature, and are the subject of
classic texts including Agresti [

The visual description of the association between two
or more variables is a vital tool for the analyst since it can often provide a
more intuitive view of the nature of the association, or interaction, between
categorical variables than numerical summaries alone. One such tool is
correspondence analysis. However, except in a few cases ([

The correspondence analysis approach described here,
referred to as singly ordered correspondence analysis, is shown to be
mathematically similar to the doubly ordered approach. The singly ordered and
doubly ordered approaches share many of the features that make the classical
approach popular. Details of classical correspondence analysis can be found by
referring to, for example, Beh [

This paper is divided into seven further sections.
Section

Consider a two-way contingency table

For the

A more formal approach to determine whether there
exists an association between the row and column categories involves
decomposing the matrix of Pearson ratios,

Classically, correspondence analysis involves
decomposing the matrix of Pearson ratios using singular value decomposition
(SVD) so that

For the decomposition of (

Suppose we omit the trivial column vector from

The SVD of these contingencies leads to the Pearson
chi-squared statistic being expressed in terms of the sum of squares of the
singular values such that

When a two-way contingency table consists of at least
one ordered variable, the ordinal structure of the variable needs to be taken
into consideration. Over the past few decades, there have been a number of
correspondence analysis procedures developed that take into account the ordinal
structure of the variables; see, for example, [

For a doubly ordered two-way contingency table, the
correspondence analysis approach of Beh [

For the decomposition of (

The matrix

By considering the BMD (

Another type of decomposition, and one that was
briefly discussed by Beh [

If one considers the decomposition of the matrix of
Pearson contingencies using the hybrid decomposition of (

The effect of the column location component on the
two-way association in the contingency table is measured by

The first-order row location component on the two-way
association in the contingency table is measured by

Partitions of other measures of association using
orthogonal polynomials have also been considered. D'Ambra et al. [

One system of coordinates that could be used to
visualize the association between the row and column categories is to plot
along the

However, standard coordinates infer that each of the axes is given an equal weight of 1. Thus, while the difference within the row or column variables can be described by the difference between the points, they will not graphically depict the association between the rows and columns. Therefore, alternative plotting systems should be considered.

Analogous to the derivation of profile coordinates in
Beh [

The relationship between the row (and column) profile
coordinates and the Pearson chi-squared statistic can be shown to be

Consider the

The squared Euclidean distance between two row profile
coordinates,

By considering the definition of the row profile
coordinates given by (

Therefore, if two row profile coordinates have similar profile, their position in the correspondence plot will be very similar. This distance measure also shows that if two row categories have different profiles, then the position of their coordinates in the correspondence plot will lie at a distance from one another.

Similarly, the squared Euclidean distance between two
column profiles,

These results verify the property of distributional
equivalence as stated by Lebart et al. [

If two profiles having identical profiles are aggregated, then the distance between them remains unchanged.

If two profiles having identical distribution profiles are aggregated, then the distance between them remains unchanged.

The interpretation of the distance between a
particular row profile coordinate and a column profile coordinate is a
contentious one and an issue that will not be described here, although a brief
account is given by Beh [

For the classical approach to correspondence analysis, transition formulae allow for the profile coordinates of one variable to be calculated when the profile coordinates of a second variable are known.

To derive the transition formulae for a contingency
table with ordered columns and nonordered rows, postmultiply the left- and
right-hand sides of (

In a similar manner, it can be shown that

Beh [

If the
positions of the row profile coordinates are dominated by the first principal
axis, then

If the
positions of the row profile coordinates are dominated by the second principal
axis, then

If the position
of the column profile coordinates are dominated by the first principal axis,
then

If the
positions of the column profile coordinates are dominated by the second
principal axis, then

However, it is still possible that

For both classical and doubly ordered correspondence
analysis, when either the row or column profile positions is situated close to
the origin of the correspondence plot, then there is no association between the
rows and columns. This is indeed the case too for singly ordered correspondence
analysis as indicated by (

The results above show that the mathematics and characteristics of this approach to singly ordered correspondence analysis are very similar to doubly ordered correspondence analysis and classical simple correspondence analysis. However, there are properties of the singly ordered approach that distinguish it from the other two techniques. This section provides an account of these properties.

The row component associated with the

To show this,
recall that the total inertia may be written in terms of bivariate moments and
in terms of the eigenvalues such that

The row component values are arranged in descending order.

This property
follows directly from Property

A singly ordered correspondence analysis allows for the inertia associated with a particular axis of a simple correspondence plot (called the principal inertia) to be partitioned in bivariate moments.

Again, this
property follows directly from Property

It is possible to identify which bivariate moment contributes the most to a particular squared singular value and hence its associated principal axis.

This is readily seen from Property

For classical correspondence analysis, the axes are
constructed so that the first axis accounts for most of the information in
variation in the categories, the second axis describes accounts for the second
most amount of variation, and so on. However, it is unclear what this variation
is, or whether it is easily identified as being statistically significant. By
considering the partition of the singular values, as described by (

Consider the contingency table given by Table

Cross-classification of 121 hospital patients according to analgesic drug and its effect.

Analgesic drug effect | ||||||
---|---|---|---|---|---|---|

Poor | Fair | Good | Very Good | Excellent | Total | |

Drug A | 5 | 1 | 10 | 8 | 6 | 30 |

Drug B | 5 | 3 | 3 | 8 | 12 | 31 |

Drug C | 10 | 6 | 12 | 3 | 0 | 31 |

Drug D | 7 | 12 | 8 | 1 | 1 | 29 |

Total | 27 | 21 | 33 | 20 | 19 | 121 |

If only a comparison of the drugs, in terms of the
mean value and spread across the different levels of effectiveness, was of
interest, attention would be focused on the quantities

The Pearson chi-squared statistic of Table

When a classical correspondence analysis is applied,
the squared singular values are

Classical correspondence plot of Table

Figure

The component values that are associated with
explaining the variation in the position of the drug coordinates in Figure

Singly ordered correspondence plot of Table

Applying singly ordered correspondence
analysis yields

Figure

An important feature of Figure

By observing the distance of each category from the
origin in Figure

Contribution of the drugs tested to each axis of Figure

Principal axis 1 | Principal axis 2 | |||
---|---|---|---|---|

Drug tested | Contr'n | % Contr'n | Contr'n | % Contr'n |

Drug A | 0.02705 | 12.86 | 0.00011 | 0.14 |

Drug B | 0.08053 | 38.29 | 0.05524 | 67.79 |

Drug C | 0.04884 | 23.22 | 0.01229 | 15.08 |

Drug D | 0.05392 | 25.63 | 0.01384 | 16.99 |

Total | 0.21034 | 100 | 0.08148 | 100 |

Contribution of the effectiveness of the drugs tested to each axis of Figure

Principal axis 1 | Principal axis 2 | |||
---|---|---|---|---|

Rating | Contr'n | % Contr'n | Contr'n | % Contr'n |

Poor | 0.01356 | 4.46 | 0.00124 | 1.60 |

Fair | 0.07502 | 24.62 | 0.03576 | 46.23 |

Good | 0.01953 | 6.41 | 0.02432 | 31.44 |

Very good | 0.05615 | 18.43 | 0.00406 | 5.25 |

Excellent | 0.14040 | 46.08 | 0.01197 | 15.48 |

Total | 0.30466 | 100 | 0.07735 | 100 |

Recall that Drug C and Drug D are positioned close to
one another in Figure

Correspondence analysis has become a very popular
method for analyzing categorical data, and has been shown to be applicable in a
large number of disciplines. It has long been applied in the analysis of
ecological disciplines, and recently in health care and nursing studies [

The aim of this paper has been to discuss new
developments of correspondence analysis for the application to singly ordered
two-way contingency tables. Applications of the classical approach to
correspondence analysis can be made, although the ordered structure of the
variables is often not always reflected in the output. When a two-way table
consists of one ordered variable, such as in sociological or health studies
where responses are rated according to a Likert scale, the ordinal structure of
this variable needs to be considered. The singly ordered correspondence
analysis procedure developed by Beh [