CONFIDENCE CIRCLES FOR CORRESPONDENCE ANALYSIS USING ORTHOGONAL POLYNOMIALS

An alternative approach to classical correspondence analysis was developed in [3] and involves decomposing the matrix of Pearson contingencies of a contingency table using orthogonal polynomials rather than via singular value decomposition. It is especially useful in analysing contingency tables which are of an ordinal nature. This short paper demonstrates that the confidence circles of Lebart, Morineau and Warwick (1984) for the classical approach can be applied to ordinal correspondence analysis. The advantage of the circles in analysing a contingency table is that the researcher can graphically identify the row and column categories that contribute or not to the hypothesis of independence.


Introduction
The correspondence analysis technique of [3] was shown to be mathematically similar to the classical correspondence analysis approach discussed by several authors, including Lebart, Morineau and Warwick (1984), [8] and [9].However there is a major difference between the approaches, and this is concerned with the method of decomposing the Pearson chi-squared statistic.The classical approach decomposes the statistic into singular values by partitioning the matrix of Pearson contingencies using singular value decomposition.The approach of [3] decomposes the Pearson chisquared statistic into bivariate moments, such as linear-by-linear, linearby-quadratic, etc, by partitioning the matrix of Pearson contingencies using the orthogonal polynomials defined in [4].Therefore the interpretation of the correspondence plots is very different.The ordinal correspondence plots of [3] graphically show how categories within a variable are similar or not by their proximity from each other along the first (linear), second (dispersion) and higher axes.The interpretation of the correspondence plots from the classical correspondence analysis technique is unclear.Points significantly far from the origin indicate that they contribute to the dependency between the row and column variables, while points close to the origin indicate they do not make such a contribution.While the interpretation of the ordinal plots allows us to reach the same conclusions, the classical correspondence plot will not explain how two points far from each other are different; the classical approach will only make the conclusion that they are different.
With ordinal correspondence plots, we can determine which row and column categories, if any, contribute to the dependency between the two variables using confidence circles.Lebart et al. (1984) defined such circles for classical correspondence analysis.This paper shows that similar confidence circles can be calculated for each row and column profile co-ordinate by using ordinal correspondence analysis.The derivations presented here are for the row categories, while those for the column categories can be made in a similar manner.
Section 2 defines the notation to be used in this presentation as well as defining the radii length of the confidence circle for the i th row profile co-ordinate in a plot using classical correspondence analysis.Section 3 shows that for the correspondence analysis approach of [3] the radius of the confidence circles can be derived in exactly the same way as those from classical correspondence analysis.Section 4 shows the relationship between the marginal frequencies of a set of categories and the radii length of the confidence circles.Section 5 consists of two examples which show the application of the confidence circle using doubly ordered correspondence analysis.

Confidence Circles for Classical Correspondence Analysis
Consider an I × J two-way contingency table, N, where the (i, j) th cell entry is denoted as n ij for i = 1, 2, . . ., I and j = 1, 2, . . ., J. Let the grand total of N be n and the probability matrix be P so that the (i, j) th cell entry is p ij = n ij /n for which p ij and the j th column marginal probability as The confidence circle of Lebart et al. (1984) is a method of observing the importance of a profile's position in a correspondence plot.Generally, if the origin lies outside the confidence circle for a particular category, then that category contributes to the dependency between the row and column categories of the contingency table.If the origin lies within the circle for a particular category, then that category does not contribute to the dependency between the variables.Lebart et al. (1984) showed that for classical correspondence analysis the radii length of the confidence circle fot the i th row profile co-ordinate can be calculated by where χ 2 (J−1) is the theoretical chi-squared value with J − 1 degrees of freedom at the α level of significance.Generally a correspondence plot consists of only two dimensions, but can include three or more.However, visually representing multiple dimensions is conceptually difficult; [2], [7] and [11]( [11], [12]) presented some novel approaches to visualising multiple dimensions.If a correspondence plot consists of two dimensions, then with 2 degrees of freedom and at the 5% level of significance, χ 2 (2) = 5.99.Therefore, the radius of the confidence circle for the i th row profile co-ordinate can be approximated by

Confidence Circles for Ordinal Correspondence Analysis
The radii length of the confidence circle for the i th row profile using the correspondence analysis of [3] is mathematically identical to the radii length using classical correspondence analysis.[6] calculated confidence circles for their analysis using the same orthogonal polynomial definitions as we do here but the plotting system they considered is different.Suppose that a doubly ordered correspondence analysis is applied to a two-way contingency table.Then denote the row profile co-ordinate of the i th row category along the k th axis as f * ik for k = 1, 2, . . ., J − 1 which is defined by This row profile co-ordinate is the weighted sum of the column orthogonal polynomials or order k, {b k (j)}, where the weights used are from the profile of the i th row category, {p ij /p i• }; see [3] for a derivation of f * ik .
By using equations (3.1.10)and (3.1.11) of [3], the relationship between the chi-squared statistic and the row profile co-ordinates is For the i th row profile co-ordinate, the contribution to X 2 is X 2 i where for all i = 1, 2, . . ., I and where X 2 i has a Pearson chi-squared distribution with J − 1 degrees of freedom; χ 2 (J−1) .From (4) By comparison with (2) the radii length for the confidence circle of the i th row profile co-ordinate can be taken to be the square root of the right hand side of (5) with X 2 i replaced by the 100 (1 − α) % point of its approximate distribution; χ 2 J−1 .When the ordinal correspondence plot consists of two dimensions, the square root of (5) with this replacement is identical to (2).
Confidence circles can also be calculated with the centre at the origin.Those points not contained within the circle all contribute to the dependency of the row and column variables that form the table.Those points lying within the circle, do not make such a contribution.[6] considered confidence circles with the centre at the origin, as well as circles with the origin at the position of the profile co-ordinate.However, Lebart et al. (1984, p183) state that In practice, instead of drawing concentric circles around the origin, it is clearer and easier to draw them around each point concerned, and look at the position of the origin.
The disadvantage of drawing a circle with the centre at the origin is that it assumes that points close to the origin will never significantly contribute to the dependency of the row and column variables, while those far from the origin will always make such a contribution.While this may occur in many situations, it will not always occur.

Relationship Between a Marginal Frequency and its Radii Length
Observing (2), the radii length will depend on the proportion of observations classified into a category of the contingency table.
A large proportion of observations classified will have a relatively small radii length, while a small proportion of classified observations will have a relatively large radii length.These observations can be seen in the application of confidence circles in Lebart et al. (1984, p51, Table 5).
The radii length defined by (2) shows that a variable with equi-probable responses will have equal length radii for each of the response.Therefore, when conducting an ordinal correspondence analysis on ranked data, as has been done by [5], the radius of the confidence circle for each of the row and column profile co-ordinates will be identical.For such an application, the length of the radii for all of the categories can be taken to be where n is the number of judges (or consumers) who rank, according to their preference for, t products/treatments.The value of χ 2 (t−1) is the theoretical Pearson chi-squared value with t-1 degrees of freedom at the α level of significance.However, [1] noted that, as the rankings of a product/treatment are not independent, the Pearson chi-squared statistic does not have a chi-squared distribution, although (t − 1) X 2 /t does.So in order to use (6), we use not the X 2 profile but the (t − 1) X 2 /t profile which is {p ij /p i• } multiplied by { t−1 t b v (j)}.

Example 1 -Drug Data
Consider the contingency table given by Table 1 which was analysed in [4].
The study was aimed at testing four analgesic drugs (named A, B, C and D) and their effect on 121 hospital patients.The patients were given an ordered five point scale consisting of the categories Poor, Fair, Good, Very Good and Excellent on which to make their judgement.
It can be seen that The Pearson chi-squared statistic of the contingency table is 47.072, which at 12 degrees of freedom is highly significant.Therefore there is an asso-ciation bet-the drug used and its eflect on the patients.The ordinal correspondence plot for the row (drugs) profile coordinates is given by Figure 1.Similarly, the ordinal mrrespondence plot for the mlumn (jnment) profile ecwrrdinatea is given by F i 2.
Pigun 2. 95% Coddeuce Cirdes for the Judgemate in Table 1 The row profile co-ordinates graphically depicted by F i e 1 are accompanied by the 95% confidence circle for eaeh drug tested.Figure 2 also includes the 95% con6dem circles for the judgement the patients gave for each drug.Both these figures wmist of axes which reflect the variation in terms of the location (firat principal axis) and dispersion (second principal axis) components.These two axis explain 75% of the vsriation in the druga; location=54.07%,dispersion=M.93%.They also explain 80.21% of the variation in the judgements; location = 72.45%,dispersion = 7.76%.
There are higher order moments which explain more of the variation in the categories than the dispersion, but we wish to highlight the variation only in terms of the location and dispersion components.
Figure 1 shows that drug A is the only drug to contribute to the independence hypothesis as the origin passes through its confidence circle.Therefore, drugs B, C and D have an effect on the results.We can see from Table 1 that drug B is rated as Excellent, drug C is rated Poor to Good, while drug D is considered to have a Fair effect on the patient.
Therefore, it would be advised that the drug associated with Drug B be used to treat patients suffering from analgesic illnesses.
Figure 2 shows that only Poor and Good contribute to the independence hypothesis.Therefore, Excellent, Very Good and Fair all can be used to characterise the drugs that were tested.In further studies we could possibly only consider a three-point scale rather than a five-point scale as was carried out in this experiment.By observing Figure 2, Good may be considered by some researchers as a descriptive response of the drugs as the origin barely falls within the confidence circle for this category.

Example 2 -Bean Data
Consider the bean data of [1] and analysed in [5].A consumer study was conducted to determine which variety of snap bean was the most preferred.A lot of each of the three bean varieties were displayed in retail stores and 123 consumers were asked to rank the beans according to first, second and third choice.Table 2 lists the preferences of each variety of bean.The Pearson chi-squared statistic for the Table 2 data is 79.561.Using the more appropriate Anderson chi-squared statistic, this value becomes 53.041, which at 4 degrees of freedom is highly significant.We use the Anderson chi-squared statistic rather than the Pearson value as Table 2 is a ranked data set where the rank assigned to a bean variety is not independent of the rank assigned to another bean.Therefore, there is a difference in ranking the three varieties of bean.However, by just observing  As there are three treatments, and therefore three rankings, the twodimensional correspondence plot will describe all of the variation that exists for each category.
The row profile co-ordinates of Table 2 are presented as Figure 3 which also includes the 95% confidence circles for each bean variety.The column profile co-ordinates from the correspondence plot of Table 2 is included as Figure 4 and is constructed in the same manner as described in [5].It also contains the 95% confidence circles for each rank.
As the row and column marginal frequencies are identical, the radius of each confidence circle will be equal; the radius length is 0.18018, using (2).
This is verified by observing the radii length of each row and column profile co-ordinate in Figures 3 and 4 respectively.

Call for Papers
As a multidisciplinary field, financial engineering is becoming increasingly important in today's economic and financial world, especially in areas such as portfolio management, asset valuation and prediction, fraud detection, and credit risk management.For example, in a credit risk context, the recently approved Basel II guidelines advise financial institutions to build comprehensible credit risk models in order to optimize their capital allocation policy.Computational methods are being intensively studied and applied to improve the quality of the financial decisions that need to be made.Until now, computational methods and models are central to the analysis of economic and financial decisions.However, more and more researchers have found that the financial environment is not ruled by mathematical distributions or statistical models.In such situations, some attempts have also been made to develop financial engineering models using intelligent computing approaches.For example, an artificial neural network (ANN) is a nonparametric estimation technique which does not make any distributional assumptions regarding the underlying asset.Instead, ANN approach develops a model using sets of unknown parameters and lets the optimization routine seek the best fitting parameters to obtain the desired results.The main aim of this special issue is not to merely illustrate the superior performance of a new intelligent computational method, but also to demonstrate how it can be used effectively in a financial engineering environment to improve and facilitate financial decision making.In this sense, the submissions should especially address how the results of estimated computational models (e.g., ANN, support vector machines, evolutionary algorithm, and fuzzy models) can be used to develop intelligent, easy-to-use, and/or comprehensible computational systems (e.g., decision support systems, agent-based system, and web-based systems) This special issue will include (but not be limited to) the following topics: • Computational methods: artificial intelligence, neural networks, evolutionary algorithms, fuzzy inference, hybrid learning, ensemble learning, cooperative learning, multiagent learning

Figure 3 .
Figure 3. 95% Confidence Cfrcles for the Bean Varieties in n b l e 2

Table 1 . Cross-classification of 121 Hospital Patients According to Analgesic Drug and its Effect Poor Fair Good Very Good Excellent
Table 1 consists of ordered column categories and nonordered categories.The natural scores 1, 2, 3, 4 and 5 are applied to the

Table 2 .
Consumer Rankings of Three Varieties of Bean

Table 2
it is not evident which bean variety or rank contributes to this relationship.Confidence circles will determine which categories do so.

•
Application fields: asset valuation and prediction, asset allocation and portfolio selection, bankruptcy prediction, fraud detection, credit risk management • Implementation aspects: decision support systems, expert systems, information systems, intelligent agents, web service, monitoring, deployment, implementation