World perspectives : The OMGE database for inflatntnatory bowel disease

FT DE DOMBAL. World perspectives: The OMGE database for inflammatory bowel disease. Can J Gastroenterol 1993;7(7):550-556. This paper discusses several aspects of the creation, establishment and maintenance of a multinational database conceming patients with inflammatory bowel disease (IBD). lhe discussion is based on experience gained with the World Organisation of Gastroentcrology (OMGE) Research Committee's multinational survey of patients with IBD, comprising (in 1992) 4612 cases entered from 50 centres in 27 countries. The purposes of such a database are described, as are a number of practical problems in its creation and maintenance, such as deciding upon a (minimum) dataset, measured to maximize objectivity, and other difficulties in implementation. Finally, the benefits of th is type of research are discussedand illustrated by highlighting some of the recent findings from the OMGE Research Committee's survey.

Perspectives mondiales: Base de donnees OMGE pour les maladies intestinales inflammatoires RESUME: Cet article aborde divers aspects de la creation, du fonctionnement et de la mise a jour continue d'une base de donnees intemationale pour les patients atteints de maladie intestinale inflammatoire.La presentation se base sur !'experience tiree d'une enquete intemationale du comite de recherche de !'Organisation mondiale de gastroenterologie (OMGE) ayant porte sur des patients atteints de maladie intestinale inflammatoire, et qui comprenait, en 1992, 4 612 cas inscrits clans 50 centres repartis clans 27 pays.Les but.s de cette base de donnees sont decries, ainsi qu'un certain nombre de problemes pratiques relatifs a sa mise au point et a son maintien, comme par exemple, les dkisions qui portent sur les series de donnees minimum utilisees pour rehausser le dep d'objectivite et aurres difficultes de realisation.Finalement, les avantages de ce type de recherche sont presences et illustres a l'aide de cenains resultats recents tires de l'enquete menee par le comite de recherche de l'OMGE. in clinical medicine was concerned primarily with the annotation of experience.Since then , the pendulum has swung in a different direction, so that nowadays any research which does not include a randomized controlled trial or deal with some highly complex aspect of serology or immunology is greeted with the gravest suspicion.
Yet it is possible to argue that perhaps the pendulum has swung too far, and two developments have lent this argument some force.The fi rst is the increasing ease of commun ication between centres, countries and continents, so that 'shared experience' in real time becomes a practical proposition.The second concerns the deve lopment of information technologyto the point where data storage is limited solely by our ability to collect the data concerned.These reasons account for a revived interest in the multicentre 'survey' but there is a clear lesson tO be leamed from earlier single-centre studies.This concerns the objectivity of the data collected; it is not sensible (nor scientifically permissable) to correlate data between different centres when these data are collected under different circumstances and using different criteria.
The twin keys, therefore, to the establishment and maintenance of a use-ful database in any clinical area are data collection from a wide number of representative centres, and the strictest possible rigour and quality control over the collected data.It is in these two areas that the World Organisation of Gastroenterology (OMGE) inflammatory bowel disease (IBD) survey has perhaps broken new ground, and why this OMGE survey forms the template for discussion in this paper.

SURVEY DESCRIPTION
The multinational survey of IBD which has given rise to the database discussed in this presentation has been well documented elsewhere {1-5).
Briefly, the survey began in 1976 with fairly modest aims: namely to establish the initial presentation and management of patients with IBD; and co study the ulcerative colitis:Crohn's disease (UC: CD) ratio in various centres.The survey now encompasses 50 centres in 27 countries and, as of 1992, some 4612 patients had been entered into the survey.Of these patients, 2779 are categorized as having UC and 1712 as having CD.A further 75 patients (1.6% of the total group) are classified as having indeterminate colitis.
The remaining 46 patients have been rejected from the survey because of incomplete data collection or the establishment of an alterative diagnosis.The original aims of the survey have been much expanded.In recent years, attention has moved from initial presentation to follow-up of these patients.
A total of 9000 patient-years of observed follow-up had been recorded by 1990possibly the largest multinational follow-up ever attempted in relation to IBO.
The data from this survey have thus given rise to an international database of considerable proportions.For example, the presentation data alone amount to 627,000 datapoints, with a missing data rate of just 1 or 2% from the patients entered into the survey.As a result, the survey data present a unique opportunity to: provide new and different insights mto the presentation and natural history of IBD; and provide a basis for discussion of the purposes, difficulties and practical imple-mentation of a large multicentre database in this clinical area.These latter aspects will be discussed.

DEFINITION AND PURPOSE OF A 'DATABASE'
What is a 'database'?A database is defined in the dictionary as 'a collection of information suitable for storing on a computer'.This definition reflects a common process, understandable but erroneous.In this process, a survey is begun (often with vague aims) and, as the number of patients increases and the amount of data expands, analysis becomes difficult and it is decided 'to computerize the survey'.
This process loses sight of the essential purpose behind large-scale survey work and computerized databases.The purpose of a large-scale survey is primarily the acquisition of information from a representative sample.This process is different from ( though complementary to) the small-scale controlled clinical trial, the strength of which is that the conditions of study can be carefully and scientifically controlled, but the defect of which is that the results may only apply to a small series of patients (since it is virtually impossible -especially in IBD -to guarantee that a handful of patients will be representative).
There are thus many possible reasons why a large-scale representative sample of patients should also be studied.These surveys supplement carefully controlled small-scale studies by allowing the correlation of results between many individual centres or even between countries and continents, establishment and definition of consensus practice, provision of guidelines for educational purposes and so on.
Moreover, once it has been decided to collect a large representative sample, it makes good sense nowadays to minimize the tedium of analysis by establishing a database of information on an appropriate computer system.It remains true, however, that the need for computerization should be driven by an a priori consideration of the purposes of the survey and not (as is customary) by a dataset which has grown beyond the capacity of those managing it to analyze it by hand.
CAN J GASTROENTEROL VOL 7 No 7 SEPTEMBER/OCTOBER 1993 OMGE database for IBD

PRACTICAL PROBLEMS CONCERNING DATABASES
Data conformity: Surprisingly, one of the major difficulties (even today) in making comparisons between centres and continents is that different centres collect different informauon.There 1s a clear need in establishing a database to maximize conformity by the construction and circulation of agreed pro formata which constitute a minimum dataset of infonnation about the patients concerned.The content of the pro formata (and hence lhe database) is defined by the purpose of the study, recalling that the dataset collected constitutes a 'trade-off'.The more data collected, the more extensive the potential analysis, but the more data required, the more likely participants are to drop out of the survey because of the cost in time, effort or dollars.
The original OMGE dataset and pro formata have been described elsewhere (1), as has another version of the data collection form devised in collaboration with the National Foundation for Ileitis and Colitis and the International Organisation for the Study of Inflammatory Bowel Disease (6).A more recent version of this second form trialled by OMGE centres and designed to maximize ease of data collection at individual patient visit is shown in Figure l.Maximizing objectivity: Any database of information is only as valid as the data contained within it.If the data within it are not capable of reliable and reproducible elicitation, the whole database may be worthless.In the case of IBO, problems in eliciting and recording reproducible data are severe.This applies to clinical symptoms and signs (7), physicians' estimates of patient progress (8), and calculation of various indices of severity and activity (9).
The OMGE survey has attempted to address this problem in a number of ways.First, observer variation studies concerning the data to be recorded have been carried out prior to any widespread data collection.Second, preliminary versions of the data collection form have been piloted pnor to widespread distribution.Third, and crucial Crcaunine.

Proteins To1al
.. Add CRI'.orosomuco,d pluteltts tf a,•a,Jablt ..   to the success( ul prosecution of any multicemre survey, the OMGE ~urvey ream have instituted a system of computer-aided quality control dunng and throughout the data collection process. Computer-aided quality control: Computer-aided quality control is the hallmark of the difference between early surveys ( with all their inherent faults) and the more rigorous data collection currently possible.The process has been described in more detail elsewhere (1 ).Essentially, in thi~ process, each participating centre's data are entered mto a computer-based quality control system which constantly checks the md1vidual centre's data against the remaining data pool from the survey as a whole.Differences which exceed limits of tolerance are highlighted, and are drawn to the attentton of the survey team and the participating centre.Data analysis: Another problem which overcomes many surveys and databases constitutes what is commonly known as 'the data graveyard'.In this situation, the amount of data eventually overwhelms the investigator's ability to analyze it, so that increasing volumes of data lie unanalyzed.
CAN J GA~'TROENTEROL VOL 7 No 7 SEPTEMBER/OCTOBER 1993 The remedy for this situation is, in principle, simple and twofold: computerization and the creation of a dedicated analysis team.Of these, the second is the more important.ln the case of the OMGE IBD survey, the problem was addressed by prior creation of a central dedicated analysis team to ensure that analyses agreed prior to the commencement of the survey were carried out, further queries and suggestions from survey participants might be responded to so that the best possible value might be obtained from the data collected.Figure 3) Cumulative mortality of inflammatory bowel disease patients over a I 0-year period from disease onset; based on work by Softley ( 11 ).CD Crohn's disease; UC Ulcerative colitis  Computerization is by no means a sinecure.There are various disadvantages (including costs and resources).It is all too easy to select the wrong computer package for a database survey if the purposes of the survey are not agreed prior to data collection.Finally, it is worth remembering that a computer system which stores patient data must comply with legislation in force in the various participating countries concerning data processing and data protection.This legislation has, of course, to be respected in relation to OMGE or any other surveys.

BENEFITS OF DATABASES
General benefits: ln general terms, the benefits of database collection are several.They include practical benefits; it has been widely shown (10) that the mere presence of a database and its impact on information flow has beneficial effects upon clinical care, both in terms of physician performance and patient outcome.In the specific case under discussion , it is perhaps best to illustrate some of the benefits of the database by highlighting some of the more interesting results presented at the recent World Congress in Sydney, Australia in 1990 (11).D iagnosis: By the late 1970s, computer-aided analysis had demonstrated a congruence of diagnostic thought around the world concerning the distinction between UC and CD (1).In the early 1980s, a simple scoring system was devised and tested for this purpose, and proved to be over 90% effective in a series of 510 cases ( 4).By 1990, repeat studies in 4000 patients showed the congruence of thought still to existthe 'match' between computer-ai<led prediction and actual centre diagnosis being 92.5% in 4066 cases (Table l).Moreover, the initial diagnosis proved highly stable.Once a diagnosis of CD or UC had been established, after nine years of follow-up, the diagnosis was altered in only 2%of patients (Table 2).UC:CD ratio: Early work in the late 1970s revealed considerable variation in the ratio between UC and CD patients in individual centres ( varying between 3 .4: 1 in favour of CD and 9: 1 in favour of UC) (1).Computer-aided analysis revealed that the differences were unlikely to be due to differing diagnostic criteria or semantics in the centres concerned (1 ).More recently, expansion of the series has enabled centres co be grouped by continent and geographical region (5).lnterestingdifferences have emerged; there is a relatively high CD ratio in North America and W estem Europe compared with a relatively high UC ratio in Mediterranean countries, eastern Europe and central Asia, with a ratio approximating to 1: 1 in the Middle-East and South America (Figure 2).
Natural history: The OMGE survey clearly confirms the suspicion from a number of studies -namely that a considerable change has taken place in the natural history of IBD (particularly of UC) in the past two or three decades.
Studies in the 1960s showed a cumulative mortali ty for UC of approximately 20% over 10 years, with a lower mortality for CD.By 1990, actuarial analysis of over 9000 patient-years of observed follow-up became possible from the OMGE survey.Thb indicated a far lower cumulative mortality: approximately 3 to 4% over IO years from onset of Jisease (irrespective of whether the pauent haJ UD or CD) (Figure 3 ).In terms of survival fo llowing referral (Figure 4 ), similar finJings emergeJ with a cumulative mortality over 10 years of approx imate ly 5% for CD but only 2% UC ( in 93 10 observed years of follow-up).These same studies abo confirmed on a multinational ba-si~, the single-centre data from St Marks I lospital, Lone.Ion, U nited Kingdom ( 12) (and other stuc.lies)concerning reduction m cancer risk (the OMGE sericl:> showed a cumulative risk in patients with total colitis of 9% over 20 years opposed to 50% in earlier 1960s studies [ 13]).
Attack rates in UC patients: By 1990, 1855 patient-years had become available for detailed sruJy concerning attack rates, both overall and in terms of severity of attack (Table 3 ).A number of interesting features emerge from this analysb.First, the overall attack rate (between SO anJ 60% per annum) was sim ilar to that recorded in other series in th e 1960s (14,1 5).H owever, in the 1960s series, up to a quarter of a ll attacks were classified as severe (according co the Truelove and Witts c riteria) (16).This figure had fa llen to around 10% in the OMGE series, in which data were largely recorded from the 1980s.Finally, the attack rate in UC patients in the OMGE series was significantl y affected by maintenance sulphasalazine therapy ( 44% in those receiving maintenance therapy versus 69% in those who were not).The severi ty of attacks, however, were unaffected by maintenance therapy.

DISCUSSION AND CONCLUSIONS
In many ways, the data set out in the preceding tables and figures speak for themselves.For good or ill, what is beyond question is that t hese Jaca could not have been collected without the benefit of a multicentre, worldwide study anJ by disciplined effort &om the participants m the centres involved, The results from the study (particula rly those concerning the congruence of diagnostic thought, t he a ltered natural history and the accurate prcJicnon of short term prognosis) seem to mdicatc that there is some practical val ue in this type of exercise.
It needs to be emphasized, however, chat before such databases arc crefiteJ, their purpose should be JefineJ and their use should be piloted.Observer variation studies are needed to maximize objectivity, and careful pnor consultatio n is nee<led to minimize the amount of information which participants will be aske<l to collect.
A ll databases need quality.All databases involve different people collecting Jara anc.l putting it into a central pool.Data collected without rcgar<l to quality are worse than useless.Moreover, quality cannot be built in or added on at a late stage.It must be considered prior to the start of data collection, and quality control needs to be insrnutec.lth roughout the collection penod.It is in this role (perhaps above all other) that the computer has a helpful role to play.
Thus, from the present survey some evidence may he aJJuce<l that -with attention to the pomts menuone<lthe establishment of a multinattonal or multiccntre database a<lds considerable value w single-centre studies.
ACKNOWLEDGEMENTS: It 1s app,1rem from cons1derauon of the,e Jma that this paper coulJ not have been written without tbe effective and cnthus1a~nc partu;1patmn of colleagues worlJwiJc.Th"' participation 1, ,1c-knowleJged with warm gratitude, anJ 1£ there 1s benefit 111 the resultant analystS, the creJir should l;irgely go to those who have given their time and experience to the survey.

TABLE 1
Stability of match between computer-aided prediction and actual centre diagnosis (at the patient's first referral) in 4066 cases over 12 years 'After mean of 9.0 yeois.adjustment levels over that period were ulcerative colltls (UC) to Crohn's disease (CD) 2.6%.CD to UC 2 1%

TABLE 4
Comparison of predicted response to therapy at admission (from the World Organisation of Gastroenterology computer prediction system) versus actual response to therapy• 16)tual response to therapy (assessed one month later by clinicians) In 215 patients with ulcerative colitis and 154 patients with Crohn's disease (preliminary results; for more detailed results In expanded series, see reference16) Dombnl FT, Myren), Bouchier LAD, Watkinson G, Softley A, eds.lnflammarory Bowel Disease, 2nd e<ln.Some International Data and Reflections.Oxfor<l: Oxford University Pre;;s.(In press) 6. de Dombal FT.Measures of disease activiry.