The Prevalence of Asthma and Declared Asthma in Poland on the Basis of ECAP Survey Using Correspondence Analysis

Results of epidemiological and public health surveys are often presented in the form of cross-classification tables. It is sometimes difficult to analyze data described in this way and to understand relations between variables. Graphical methods such as correspondence analysis are more convenient and useful. Our paper describes an application of correspondence analysis to epidemiological research. We apply the basic concepts of correspondence analysis like profiles, chi-square distance to medical data concerning prevalence of asthma. We aim at describing the relationship between asthma, region, and age. The data presented in this paper come from Epidemiology of Allergy in Poland (ECAP) survey in years 2006–2008. Correspondence analysis shows that there is a fundamental difference in the structure of age groups for people with symptoms compared to those who have declared asthma (regardless of the level of symptoms of asthma and the level of declaration). The variable which best differentiates declared asthma in all regions is “wheezing and whistling.” Correspondence analysis also shows significant differences between locations. Our analyses are performed in the R package “ca”.


Purpose
e analysis is based on data from the ECAP survey [1]. ECAP is a questionnaire-based survey on International Study of Asthma and Allergies in Childhood (ISAAC [2]) and European Community Respiratory Health Survey (ECRHS [3]). In our analysis we consider 18617 subjects (50.4% adults aged 20-44 years, 24.2% children 6-7 years, and 25.4% children aged 13-14 years, 53.8% female and 46.2% male). e structure of symptoms of asthma and the structure of declared asthma are studied. Both structures are related to three age groups: children aged from 6 up to 7 years (Ch1), children aged from 13 to 14 years (Ch2), and adults aged from 20 to 44 years (Ad). e study examines differences and similarities of these structures in eight major Polish cities: Warszawa, Lublin, Białystok, Gdańsk, Poznań, Wrocław, Katowice, Kraków, and in one rural area near Zamość. e locations are presented in Figure 1.
We have taken into account the following two symptoms: "whistling and wheezing in breathing" and "difficulty in breathing. " First of these symptoms is known to be a good indicator of asthma [4]. e symptoms concern the years that preceded the moment of survey. "Declared asthma" is understood as a disease which the respondent reported in the response to a question of the interviewer. We consider the problem of undetected asthma in different regions and different age groups of patients.

Statistical Methods
Correspondence analysis [5] now becomes an important tool in epidemiological research [6][7][8][9]. It is useful in analyzing multivariate data, most oen given in a cross-tab form (cross-classi�cation contingency tables). �raditional approach to such data is to use chi-square tests and, in the special case  of 2 × 2 tables, standard epidemiological measures odds ratio (OR) and relative risk (RR). However, this approach is not adequate if we are to discover and explain associations between many variables (features and symptoms). Chi-square test can only tell us that there are statistically signi�cant dependencies. More sophisticated methods are needed to identify the form, direction, and strength of these dependencies. Correspondence analysis with its graphical output allows to describe and easily interpret the structure of such data. Strong association between variables is clearly shown as closeness of the corresponding points in a graph. Our paper uses correspondence analysis applied to the relative frequency of cases (see Tables 1, 3, 5, 7, 9, and 11) instead of absolute counts. is method has been chosen because the sample sizes in individual cities and age groups signi�cantly differ from one another. us in our paper we use correspondence analysis in a nonstandard way.
Let us explain the criterion we are going to use for comparisons. First, we try to determine how to compare the structure of declared asthma on the one hand and the structure of symptoms on the other hand, in different locations and different age groups. e method we use allows us to better understand these two structures and their mutual relation. In our paper, the emphasis is on the relative ratio of frequency of examined features in the three age groups. us, we are less interested in the levels of incidence of symptoms and declared asthma in each age group. ese levels depend on many factors which we cannot fully identify. Factors affecting the frequency of the analyzed features may also in�uence the level and type of pollutants in the air, in water, and in food products. ey also may in�uence the awareness of the respondents as to which symptoms can be regarded as typical, and are associated with different levels of diagnosis of allergic diseases by physicians. For example, if the levels of declared asthma in two regions are different, this does not necessarily mean that the prevalence of asthma varies signi�cantly in these regions. �ust one of these regions may have less well-developed prevention.
erefore, the structure we are trying to understand and describe is the distribution of the percentage of people with properties of interest to us: declared asthma and having symptoms of asthma, assuming that the three age groups are equinumerous. Let us explain this using Tables 1 and 2 as an example. Table 1 shows that percentages of respondents having symptoms (wheezing and whistling) in different locations (Katowice, Zamość, Kraków, Wrocław, Lublin, Gdańsk, Warszawa, Poznań, and Białystok) and the age groups (Ch1, Ch2, Ad). For example in Katowice the percentage of respondents was, respectively, 19%, 10%, and 12%. In contrast, Table 2 shows the proportion of respondents having symptoms in the three age groups. For example in Katowice, 46%, 24%, and 29% of people with "wheezing and whistling" belong to the groups younger children (Ch1), older children (Ch2), and adults (Ad), respectively, (under the assumptions that the groups are equinumerous). In other words, we adopt the Bayesian philosophy and try to estimate the posterior distribution of the age groups given occurrence of symptoms, under the uniform prior distribution.
Let us explain the advantages of the above described approach, using the following hypothetical example. Imagine that the surveyed group has 1,000 people and the number of people with symptoms of asthma in each age group equals, respectively, 3, 6, and 1 or, alternatively, 300, 600, and 100. Although the incidence in individual cases differ dramatically (3/1000, 6/1000, 1/1000 or 300/1000, 600/1000, 100/1000) the structure in both cases has the same form (30%, 60%, and 10%). e assumption that the groups are equinumerous is somewhat arbitrary, but it is needed because the age structure Symptom "wheezing and whistling" ".o" of the various Polish regions is not identical. In the language of correspondence analysis, the structure under examination will be called a pro�le. In the following �ve sections we will discuss �ve problems of medical relevance. e �rst problem will be described in a more detailed way to introduce some general ideas and notations.

Comparison of "Wheezing and Whistling" with Declared Asthma
We recall that our study concerns data from the ECAP survey. In this section we examine two variables: "wheezing and whistling, " a symptom of asthma and declared asthma. e three age groups (Ch1, Ch2, and Ad) and nine locations are the same as described in Section 1. e meaning of symbols In Tables 1 and 2 is the following: Ch1; children aged 6-7 years, Ch2; children aged 13-14 years, Ad; adults aged 20-44 years. Warszawa (Wa), Lublin (L), Białystok (B), Gdańsk (Gd), Poznań (Poz), Wrocław (Wr), Katowice (Kat), Kraków (Kr), rural region in the area of Zamość (Zam). Symbol "o" aer the abbreviated name of a city/region stands for "symptom, " symbol "a, " analogously stands for "declared asthma. " is notation will be used also in the rest of our paper.
To compare the relative frequencies in different cities we use correspondence analysis. e essence of this method is its graphic form. Figure 2 displays an output of correspondence analysis. Rows and columns of cross-classi�cation Symptom "wheezing and whistling" represented as points. In Table 2, rows correspond to cities and columns-to age groups. Black dots in Figure 2 represent the structure of wheezing and whistling (symbol "o" aer the abbreviated name of a city) and declared asthma (symbol "a, " analogously). For example, "Poz.o" stands for "Poznań; respondents with wheezing and whistling, " "Poz.a" stands for "Poznań; respondents with declared asthma. " Red triangles represent three different age groups. More precisely, in Figure 2 we present the relative frequency of pro�les (the rows in Table 2). Distances between points in the graph (black dots) are equal to the chi-squared distances between pro�les. For example, the distance between the pro�le for "Kat.o" (Katowice; symptom "wheezing and whistling") and the pro�le "War.a" (Warszawa; declared asthma) is �ote that the reference point is the average pro�le (39%, 32%, and 29%). e position of points representing the cities (dots) in relation to points representing the age groups (red triangles) indicates the contribution of the age groups to the pro�le. e sizes (areas) of dots are proportional to sums of rows in Table 1.
In correspondence analysis, explanatory strength of variables is conveniently described by partitioning of the socalled inertia (variance of the data). e percentage of total inertia explained by the two axes in Figure 2 is 100%. It is not surprising because the row pro�les lie on a two-dimensional simplex. e horizontal axis captures 95.5% of inertia, and the vertical axis 4.5%. Figure 2 from the epidemiological point of view. e projection on the �rst (horizontal) axis clearly shows that in the group of declared asthma there is far greater percentage of younger children (Ch1) than older children (Ch2), and this is regardless of the city although the biggest disparity is visible for Gdańsk, Białystok, Warszawa and Wrocław, and the smallest for Kraków, Lublin, Katowice, and Zamość. e projection on the second (vertical) axis well separates the adult respondents from children (both Ch1 and Ch2). In Zamość there is a relatively small proportion of adults in the group of declared asthma. Let us note that "small" or "great" is understood in relation to the average pro�le (i.e., 39%, 32%, and 29%, see Table 2) and, consequently, concerns the relative comparison of the cities. For example, in relation to asthma symptoms, the distributions in Zamość, Kraków, Lublin, and Białystok are similar to the average pro�le, and distributions in Poznań and Wrocław deviate from it. Recall that the sizes of dots in Figure 2 are meaningful: they are proportional to sums of corresponding rows in Table 1. It is clear that declared asthma is not as common as its symptoms. Diagnostics of asthma in both groups of children is signi�cantly different.

Preliminary Conclusions. Let us explain the interpretation of results shown in
We can see that in Figure 2 the black points form two clearly visible clusters. e �rst cluster, on the le� hand side, corresponds to the "wheezing and whistling" variable in different locations, and it is clearly associated with two age groups: "6-7 years" and "Adults" (depicted as red triangles). e cluster on the right hand side corresponds to "declared asthma" and is associated with the age group "13-14 years. " In a group with symptoms of asthma, it appears that there is a higher percentage of younger children and adults than of older children. ese proportions are reversed in the group with declared asthma. is phenomenon may be due to two reasons. First, in the group of younger children it is harder to detect asthma, than for older children. Second, older children may not have symptoms, which disappear with age, partly because they are diagnosed and are treated. e largest percentage of asthma symptoms in the group of younger children is in Poznań and Wrocław, and then in Gdańsk, Warszawa, and Katowice. e smallest (around 42%) is in Zamość, Kraków, Lublin, and Białystok.
Correspondence analysis showed an essential difference in the structure of age groups for respondents with symptoms of asthma compared to those with the declared asthma (regardless of the level of symptoms and the level of declaration). It has also demonstrated the difference between the cities. e following cities seem to be outliers from the rest: Poznań and Wrocław (in a group with symptoms) and Gdańsk, Białystok, and Zamość (in the group with asthma declared), see Figure 2.

Comparison of Breathing Difficulties and
Declared Asthma e purpose of this chapter is to compare the structure of declared asthma related to breathing problems, a symptom of asthma. We want to show the relationship of speci�c symptoms (difficulty in breathing) in relation to declared asthma. It will be shown in Tables 3 and 4 and in Figure 3. As before, we use the following symbols: o-symptoms, adeclared asthma.
In Tables 3 and 4 and in Figure 3 we use the same symbols as in Tables 1 and 2 and Figure 2. e horizontal axis captures 61.2% of inertia, and the vertical axis 38.8%.

Preliminary Conclusions.
We see that breathing problems occur more frequently in Wrocław and Białystok in adults (Ad) than in children Ch1, Ch2 (see Figure 3). In Zamość, more respiratory problems in children occur in the group Ch1, and less among adults. Another exception is Poznań, where respiratory problems are much more common for both groups of children (Ch1 and Ch2) than for adults (Ad). Surprisingly, in Gdańsk and Białystok, occurrence of respiratory problems among adults is relatively high, while occurrence of declared of asthma is relatively low. We can offer two explanations of this fact. It might be possible that in these cities there is a low detection rate of asthma in adults. Or maybe it is connected with the occurrence of other diseases associated with difficulties in breathing. It is clear that breathing difficulties are not strongly correlated with declared asthma. It may be related to different diseases.

The Prevalence of Wheezing and Whistling and Breathing Difficulties
Now we examine the prevalence of each of the two symptoms "wheezing and whistling" and "breathing difficulties" separately. e results concerning "wheezing and whistling" are presented in Tables 5 and 6 and Figure 4, and the results concerning "breathing difficulties" in Tables 7 and 8 and Figure 5. Tables 5 and 6 and in Figure  4 we again use the same symbols as in Tables 1 and 2 and Figure 2. e horizontal axis in Figure 4 captures 91.5% of inertia, and the vertical axis 8.5%.

Preliminary Conclusions.
In the group of younger children (Ch1) "whistling and wheezing" occurs most frequently in Wrocław and Poznań, while in the group of adults-in Gdansk, and in the group of older children (Ch2)-in Lublin, Krak�w, and Zamość. A level similar to the average pro�le (46%, 24%, and 30%) is in Warszawa and Katowice for all age groups. Figure 5, the horizontal axis captures 61.0% of inertia, and the vertical axis 39.0%.  Moreover, in Wrocław, and Białystok far more adult people (Ad) have breathing problems than in both groups of children (Ch1 and Ch2). In Poznań, Lublin, Kraków, and Zamość more people have breathing problems in groups Ch1 and Ch2 than in adults.

Declared Asthma
We examine the prevalence of declared asthma separately. e results are presented in Tables 9 and 10 and Figure 6.
e horizontal axis in Figure 6 captures 85.6% of inertia, and the vertical axis 14.4%.

Preliminary Conclusions.
For younger children in the group Ch1 most cases of asthma were recorded in Zamość, Poznań, Katowice, Kraków, and Lublin. In these cities, asthma occurs signi�cantly more frequently in the Ch1 group than in both Ch2 and Ad groups. Among older children

Problem of Undetected Asthma
To examine this problem we will consider the pair of variables: declared asthma and "wheezing and whistling" in a different way than in our previous analysis. We regard "wheezing and whistling" as a good indicator of occurrence of asthma. erefore we are interested in the incidence of declared asthma only among respondents with "wheezing and whistling. " e results are presented in Tables 11 and 12 and Figure 7. e horizontal axis in Figure 7 captures 67.7% of inertia, and the vertical axis 32.3%.

Preliminary Conclusions.
Among younger children (Ch1) the best diagnostics of asthma is in Katowice and the worst is in Białystok, because in Katowice we have the highest percentage (in the group Ch1) of declared asthma among respondents with "wheezing and whistling, " while the lowest percentage is in Białystok. Analogously, among older children (Ch2) the highest percentage is in Gdańsk and Warszawa, and the relatively low in Lublin. Among adults (Ad) the highest percentage is in Kraków, and the lowest in Katowice, Lublin, and Białystok (points corresponding to these cities are located far from the point "Ad" on the graph).

General Conclusions
It is common knowledge that asthma represents a serious public health problem. According to WHO 235 million people suffer from asthma, among them 30 million in Europe.
In some countries up to 20% of population suffer from it. Over 255 thousand of people in the world yearly die of asthma. In Europe asthma is one of the most common chronic noncommunicable diseases in children with average prevalence 5-20%. European Union spends near 17.7 billion EUR per year due to asthma. e overall cost of treating respiratory diseases in Europe is 100 billion EUR annually and is still rising. A better understanding of factors affecting prevalence of asthma is of great importance for �nding better strategies for its prevention and treatment. e research presented here is concerned with asthma problems in Poland [1] which is an important public health issue in our country. However, our conclusions are probably also relevant to other countries. Correspondence analysis shows an essential difference in the structure of age groups for respondents with symptoms of asthma compared to those with the declared asthma (regardless of the level of symptoms and the level of declaration). "Wheezing and whistling" better differentiates declared asthma than "difficulties in breathing. " Our analysis also shows signi�cant differences between age groups and cities. We also consider the problem of underdiagnosed asthma. e map of correspondence analysis indicates locations and age group where this problem may be serious. Declared asthma and its symptoms are more frequent in urban areas than in rural areas. e big difference between prevalence of symptoms of asthma and declared asthma, revealed by our analysis, may suggest directing a prevention program at the improvement of asthma diagnostics in the group in younger children and selected regions. e available funds can be better allocated in this way. e future research can use correspondence analysis to examine the relation between asthma and such factors as allergic rhinitis, positive skin prick tests, atopic dermatitis, and family history of allergy.
Our results con�rm that the graphical output of correspondence analysis is a convenient and �exible tool of detecting interdependencies in big data sets. We can recommend wider use of this method for epidemiological applications.
We propose a simple tool for discovering nonuniform occurrence of symptoms asthma as well as declared asthma in different age groups and different locations. Outlying locations for particular age groups may be therefore given more attention and more careful prevention programs.
e novelty of our approach is in applying the correspondence analysis to the relative frequency instead of absolute counts. is approach has proved useful in presented medical applications.