Alpha errors , beta errors and negative trials

Reports of negative trials arc increasing in number as standard therapy 
for many gastrointestinal diseases is refined. The validity of a negative report depends 
on the number of patients in the trial, the alpha and bern error and the difference in 
efficacy which the trial is able to detect. The relationship between these parameters 
is discussed and a formula given for the calculation of trial size. All reports of negative 
trials should include not only the number of patients involved and the level of 
significance of the results but also the beta error and the detectable difference in 
efficacy of the treatments.


NORMAL DISTRIBUTION
Thl' normal d1,1r1hu1t()n 1, re.illy ,1 pruhahilit\ distnbutron .It is ,ymml'tm ,rnd hL'll-sh aped ( Figure I).Thi;.tvpc ol dhtrtht1tH1n 1s common tor factor, 1,•h1ch shnw l'ariahilit\ and arc continuoLh Thl' d1strihucit1n rs described hy rt~ mean and the lkv1a11on nt value, Imm the mean.It', tlw st,rndard dt,,•ia1ion The ,1rca under the Lllr\'l' ts 100''., and corre~ponds 10 a prnbahiln, uf I Roughly l)5",, of, al lll'' lie w1thm I he arl' a ddincd hv thl' mcnn ± 2 standard de, 1atrons from the llk'an Thl' ilrl'a x ,n each 1atl of I 1gurc I ts 2 1''., nf the' tot,11 nren and corresponds, thnet1,rc.10 a probabilrty of l1l121 Tlw pn,hahtltry nf the next nwa-,urcd va lue foiling into crtllL'r of th e sh,1dl'd arl'as is rhen O.Cl1 Populations and ~amplcs: It rs m'ccssary to grasr rhe d1ffrreml' between popul.1111,n~ and samples hcfore understand mg tlw concept nf error ,i nee error 1~ ,1 Thl' aspartatl' .1mmorrnnskrnsl'(ASTI \>lhc•ahhv Canadian males 1s a population ol ,•,dul', \\IL' will nevl'r, fnr l'conomit ,111d practical rl'a,011,, mt•a,urc thl' AST ol c,•cry Canadian male.What wt• can dP is measure the AST of a sample of Canad1.1nmale, and use this to makl' st,1tcments ,ihout dw AST of rhe popu-lati\>ll of all C,111,1d1.111A lpha e r ror:A;;sumt• that rhe populanon distrihution c,f AST values 1s kno,1 n l<>r every hl'althy Can.1d1nn mah: and every healthy British male.In reality they ,lrL' 1dcnl1cal as dlu,cr:ued rn hgur.: 2 Figur" 2) Alf,lw enm Thedism/,wion of AST l'ctl11c.1 /or rhe pop11larion of Canadwn ( -) and Brir1sh I J male. , and a 1am/1lc nf Bn[ls/, malesf !I-or prncncal reasons the population v;ilues 11 ill nevt'r he known and we arc forced to resort to sampling 1( we wish to compare the two populations.It is quite possible rhat rhe mean nf a sampk• ( which should accurately reflect tlw mean of the population) nf Brici:;h male, will fall sufficiently far from the mean of the population o! Canndinn males th,u we would concludl' on the basis of the sample that it is likely that thL' AST values of Canadian population and Briush pl>pulauon art' Jifferent.In fact they arc nor and this is an alpha error.ThL'rc is no differcnct' hct\\'een the population, but our sample h,1s mblead u~ into bclie1•ing that there b .Alph:1 error occurs when a difference betweL'n the populations studied is claimed hut no actual difference L'xisb Beta error: Aga111.assume rhat the populauons art• known This time in rt•al1ty there is a difference between the populatiom,.The ~listributil,n ()f values of the F igur e 3) Bera error.-Po/mlwion Canadian malr.,,--Populurwn Bn11s'1 male.1, samf,le Bri11s/, male., two population:, overlap somewhat ns in Figure 3.We obtain a sample of British males as before.By chance, the mean of the sample now foils sufficemly close to the mean of the Canadian males for us to say that there is no difference between the AST values of Canadians and those of the British.The populations, however.really arc different.We h:we t'xtrnpolated hack fn>m our sample of British males t1> cc111eludc that rlwrc is 110 difference hctwt•en the populations when therL' actually is a difference.This is a beta error.Bera l'rrnr occur, when no difference between the populations heing studied is claimed hut a difference actually exists This type of error is especially important in 'negat1\'e' trials, ie.tna ls in which no difference is claimed.Th,s 1s more easily ~el'n by wny of an ex;imple.If we obtained a sample of populatitm B whose mean fell at point X we \\'ould conclude chnt population A and population B were identical.We wou ld conclude thi:; because the mean of rhe sample I which is being used to estimate the mean of population B) foils sufficiently close ro the mean of population A for u:; to say that there is no statistical difference between populations A and B Since these population:-.are actually different chis is a heta error.
As might be expected from Figure 4 there 1s a mathem;itical relationship hecween alpha and beta.This has been calcu lated for various levels of alph;i anJ hern and tables :ire availahlL' (Geigy) The function, fl alpha.beta), is shtnvn in Tuhlc I. [(. for example, one wishes to set the alpha level m 0.05 and hcta at O I tlwn the value of f( alphn,beta) 1s 10. 5 Tim value is used in the calculation of trial :;1:e as will he seen.C h oos ing al ph a and b eta leve ls: In practice, the risk of alpha error 1s usually taken as 5'\,.In mher words.there is a probability of0.05 that an alpha error may occur.There w ill be a 5'';, chance that we will detect a difference when no difference actually exisrs.Beta error is USln1lly set at O.l or 0.2 There are good theoretical grounds for choosing this le,•t•I These relate to the excessive size of samples required m levels more ~tringent than 0.1.At hem kveb greater than 0.2 the risk of a fobe nc~ative result is generally considered to be un::icceptahle.A bew level nf0.2 means that there is a 20''., chance that we will miss a d ifference even if a difference actually exists.1he number of patients required will have to rake alp h a and hem k,•el~ intC' accou nt.This is 1101 d ifficult since there is a dcfi ned relm1onsh i p h e tween rhcse l'ariables which c:in he expressed matht'matically as f(alpha.betn).

CALCULATION OF SIZE FOR NEGATIVE TRIALS
Any formula fo r calcu lminn of trial size must incorporate a lp h ;i error.bcm error the di fference he1,,•ec'n treatments that 1, t() he detected ,md 1hc frc•qucn cy ()f the studied C\'ent.
There ,1rc manv ditferc'n l matht•mati -  ments, ((:i,b) fu ncuon of alpha and hern For exam pie, consider com pa ring the efficac.:vllf ,ulfasal:i:ine rind 5-ASA in main ta ining remission in ulccrritivc colitis.Ninety percent of patiems w ith ulcerative colitis treated with 4 g of sulfas,1la-:111c d aily rq11a1n 111 n: m 1:,s1on fo r six months .Th is is the percentage success rate.p.We wish to test w h eth er 5-ASA ts as c ffecti\'L' as sulfasala:inc tn maintain ing remission over ;i period of six month s.Let us set alpha nt 0.05 and hew :ll 0. 1, ie. we will accept a 5''., chance <if a folse positive nr alph::i error and a 10''., ch:i nee of ;i fa lse ncg:it ive or hctn error.

Methodology of negative trrols
From Tnble I the function f(alpha.bera)eq ua b 10.5 Let us also accept thm 5-ASA will still h e a useful t reatment ifit is 10''., less effccuve than sulfasa lnine thl'n d -10 P uttin g t h ese values inw the equation: Similarly if we nrc prep;ucd to accep t :i 20'';, chance of a false negative result.
as opposed to a 10''., risk, then the numbers become l-f2 in cnch treatment It 1s still a trial of imposing size.What docs not sig nifica nt m ean ?From thi s dbcussion it shou ld be nppnrent that the' frequently mnde s tatemen t 'there 1s 11<1 d ifference between the groups as P , 0 05' does not have much m eaning without reference to rhc beta error and lO the s1:e of d ifference that was being sought.The P value refers to rhe ris k of a false p os111ve result Since 'negative' rnals do not have a positi\'e resu lt the P \'i1luc Pl'r sc 1s not helpful.It may indeed be the case th at there 1s no d ifference het\\"ecn the groups h u t the failure to d etect a di ffe rence may rilso b.: bemuse of ;i large hera error and a small (but clirncally important) d 1ffcrcncc between grou ps.We would not accep t ,1 report of a difference between treatments 1f informatton was n ot gi"en on the result:-.of s1gnrficance testing.Why should we accept reports o f no difference bc•tween treatments unless the correspon ding beta e r ror is gi\'en 1 TRIAL DESIGN Trial d esign represen ts a balance hetween the statistical rcquircml'nts ( which tend to in crease patien t nu m hers ) ,md clinical p racucality ( which tends to m111 -im1zc patient number, ).When designing tr1als we cannot change tlw k,•el of alpha error.The beta error may be varied ,omewha t but b nm a cnrical dcrerminant of ,i:e.The re,ponse ro stan<larJ rrearment is a biological fact The on ly parameter wh ich 1s variable and n major determinate of st:e is the difference between llTatment,.This difference has a ma1or effect on trial si:e since the number of p,1ue1w, required is apprllx 1mately inversely proport1(1nal to the square o f thL' difference l the smaller the difference the greater the n umher ol para metres ct l'on propose unc formulc dcsrinee a calculer l'ampleur de l'cpreul'e Tous les rapports ncgatifs devraicnt inclure non sculcment le nombre de particip;ints ct la portec significativc des resultats mais encore l'e rreur bera e r la difference dccelablc clans l'cfficacire des traitemcnts. m1sll'admg.
In the example of sulfasalazine and 5-A SA, rhe q ucstion could be asked as

Figure 5 )
Figure5) The cjfecr of dccreasm!(rhc d1/Jcrc11cc hcltl'cen swdicd gro11/1s on rhc mc.1g111t1ide of beta error Figure 5a the d ifference hctwcen popula-1ion A :.11d B is huge Th e means nre widely sepa rated :ind the overlnp is sma ll.Tlw risk of lx-ta erwr is corresp ondingly ,mall.If we push the rwo populations closer.as in Figure 5b, it can be seen that 1he nsk n f hew error increases dram,nically.Pushing 1hc samples clnser is, in effect.decrerising the d ifference between A and Band corresponds to t he trial situation of tryi n g to d e tect s mall d ifferences between treatm ents.Th e ~m:1l ler 1hc d ifference hctwe\'n trcatmems Lh ar 11•c wish tc, detect , the greater tlw risk ()f overlap and h ence. of hem error This is a maJor prohlem in 1ri:ils Jesig1wd lO show no difference .111J a frequcru flaw in apparently ncgmi\'c triab.To show no Jitfrre ncc• what"lL'Vl'r me~ns th :11 popu lntions A an d B a rc superimpti...ed.In this s1tumion, infi ni tely large numhc rs <'! paiknts \\'() Ul d he req uired and 111 pracucc th is 1s never ,Khicvcd Some lim11ation i:, imposed .One never shows that tw(, gr()ups art' 1Jentirnl bu 1 s1 m r lv 1ha1 it is unlikely that they arc different.Effect of outcome: finally.<1m• cou ld rl'.'ason that the endpoint u n der study VoL 2 No. 4. November 1988 \\"Ill haw a place in th e csumatinn of trinl s1:e If. fnr t•xamplc, rel:ipscs 111 ulcer,1• li\'C colins Wt're exceedi ngly rare then clcnrly the numhcr o f patients requi red in a trial \\"ould be a ffected.
cal merhnds for calculating trial si:c The formula tn be used depends on w h ether the (iu tcc,mc• 1s qual11.1t1venr quanllla-1ivc' and nn the design o f the trial The following formu la was d esigned for calc u lat ion of a qual1rar1vc I\Utcome and spc-c1 fically for rwgati\'e tria ls (Makuch and Simon) n =2 p x ( 100 -p) x f( a,b) d2 \,\lhl'rc n numhc'r of pmtents requ ired on c'ach tremmcnt: p percentage of su ccesses tha t w ill occur on s wndard treatment.d :icceprnhlc difference 111 e ffi cacy hetween the old a nd new treat- in the study.Th ese figure, sho\\' that a very large number uf priuent:-.a rc required to sh o w comparabl e efficacy.In fact.if we WCluld acCL'pt n n ly a 'i"':, differe n n' in e ffi cacy then the numbers arc 756 in each gro u p or 1512 in all.Clea rly, to show nhsolu tely no difference in cffirncy is impossible O n th e other hand, 1f a 30''., difference in e ffi cacy was acceptable then the n umber required i:, 21 on c:ich treatment or 42 in rill.