Reference values for maximal inspiratory pressure : A systematic review

1Rehabilitation Sciences Graduation Program, School of Physical Education, Physiotherapy and Occupational Therapy; 2Department of Physical Therapy, Universidade Federal de Minas Gerais, Belo Horizonte; 3Department of Physical Therapy, Universidade Federal do Rio Grande do Norte, Natal, Brazil; 4School of Kinesiology, University of British Columbia, Vancouver; 5Department of Physical Therapy, Burnaby Hospital, Burnaby; 6Department of Physical Therapy, University of British Columbia; 7Vancouver Coastal Health Research Institute; 8Institute for Heart and Lung Health, University of British Columbia, Vancouver, British Columbia Correspondence and reprints?: Dr Darlene Reid, Muscle Biophysics Laboratory, 617 – 828 West 10th Avenue, Vancouver, British Columbia V5Z 1M9. Telephone 604-875-4111 ext 66056, e-mail darlene.reid@ubc.ca Impairment of the respiratory muscles compromises ventilation, gas exchange and oxygen delivery to tissues (1). Respiratory muscle weakness has been reported in several conditions including those that are cardiovascular (2,3), pulmonary (4) and neuromuscular (5,6) in origin. Inspiratory muscle strength is reflected by the pressure developed within the thorax; these pressure measures are informative for clinical evaluation as well as physiological studies (1). Measurement of maximal inspiratory pressure (MIP) is a straightforward test in which individuals are asked to perform a forceful inspiration against an occluded mouthpiece (1,7). The advantages of this test are that it is noninvasive and performed quickly. On the other hand, its dependence on voluntary effort, and the wide range of normative values limit the clinical utility of MIP. Consequently, a low value can reflect inspiratory muscle impairment but also can be due to poor test performance (8-10). Even a well-performed MIP test is difficult to interpret due to uncertainty of the most representative normative values. Numerous sets of reference values for MIP have been reported (5,7-30); however, the usefulness of normal MIP values is obscured by the large variability among studies. The variation among these reports likely indicates differences in participant demographics and technical aspects of test performance (1,7). Participant characteristics that have been considered to influence MIP include effort and understanding of test performance, age, sex, height, weight, fitness level and smoking status (31-33). Apparatus set-up and test performance issues that can affect MIP values include type of mouthpiece, presence of a small leak, pressure evaluated (ie, peak or plateau), number of trials and lung volume at the starting point of test performance (1,7,19,27,34,35). Due to the large interstudy variation of normative values for MIP, the data from a single study may not be appropriate to establish the lower limit indicating respiratory muscle weakness (25). The American Thoracic Society (ATS)/European Respiratory Society (ERS) (1) Review

statement on respiratory muscle testing states that the normal ranges of MIP are wide, and the values in the lower quartile of the normal range can be consistent with either normal strength or mild to moderate weakness.This guideline states that an MIP of −80 cmH 2 O usually excludes clinically important inspiratory muscle weakness; however, this threshold does not consider age and sex, and is higher than the mean predicted values for older men, and middle-age and older women (8,10,13).
Unfortunately, universally applicable normal values for MIP resulting from prediction equations and, specifically, agreement on the lower limits of normal, are not available.Because the application and interpretation of MIPs for clinical evaluation is complicated by the extremely wide range of reported normative values, the purpose of the present meta-analysis was to synthesize and to evaluate the quality of study design and methodology to determine normative MIP values in healthy adults.

Search strategy and selection criteria
A search of the Medline, EMBASE, Cochrane, Cumulative Index to Nursing and Allied Health (CINAHL) and Sport Discus databases (from inception to May 2012) was conducted.Inclusion criteria were as follows: participants were healthy adults (>18 years of age); the purpose of the study was to determine reference values for MIP; and published in English or Portuguese.Articles were excluded if they were a review article, thesis or dissertation; and the measurement was assessed in a standing position rather than sitting.The search strategy used the following terms: "respiratory muscles" combined with "maximal inspiratory pressure" and "reference values".These terms were modified to meet the requirements of the different databases.Details of search strategies are available on request from the corresponding author.Additional studies were identified by examining the reference list of all included articles.Duplicates were removed, and two authors independently reviewed titles and abstracts of citations retrieved from the search.Disagreements were discussed until consensus for inclusion or exclusion was reached.Subsequently, two authors reviewed the full text of all selected articles to determine whether they met the inclusion criteria.

Data abstraction
Two {of the five?} authors (AWS, WDR, FC, ISP, VFP) abstracted data from the included articles regarding subject characteristics and technical aspects of test performance including: age, sex, height, weight, lung function, presence of diseases, fitness, smoking status, race, mouthpiece type, small leak, pressure evaluated, gauge type, lung volume of test performance, noseclip, total time of the MIP manoeuvre, number of trials, criterion for stopping, instruction and demonstration, interval between manoeuvres, screen incentive and calibration of the instrument.MIP was also abstracted and, when possible, MIP and other data were abstracted according to age per decade (eg, 20 to 29 years) and sex.Due to the different definitions reported for peak, plateau and average pressures in the 22 studies, the following definitions were operationalized: plateau pressure was reported as the highest pressure that could be sustained for a defined minimum period (7,16,25,27); and peak pressure was defined as the highest value reached during MIP (1,18,19).Authors of two studies (9,23) were contacted to provide additional information for missing data.

Methodological quality of included studies
Methodological quality of each study was independently assessed by two authors using the relevant items of the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) scale (36,37).QUADAS is an evidence-based, validated, quality assessment tool used specifically for systematic reviews to evaluate risk of bias and study accuracy (36,37).Given that MIP was not compared with another reference standard in the retrieved articles, seven of the 14 items were relevant for quality assessment.Additional items specific to the quality of MIP testing methodology were also abstracted.

Synthesis and meta-analysis
Subject characteristics and technical aspects of the test, including quality assessment, were synthesized in tabular format.Meta-analyses were performed on MIP by inputting mean and SD of MIP (cmH 2 O) values that were categorized according to age (divided into six ranges) and sex.Studies were included in the meta-analyses if data were reported in comparable age ranges and sex.Six studies reported data in comparable age ranges (12,15,24,26,29,30).The random-effects model (40) was selected for the meta-analyses to examine methodological variation in the included studies.The inverse variance and variance component (τ 2 ) were used to calculate the weight applied in the random-effects model (39)(40)(41).The method of moments estimate were used to calculate individual τ 2 (39,40).The homogeneity statistic, Q, was calculated to provide a measure of the heterogeneity of MIP among studies.A sensitivity analysis was performed to determine whether one or more studies contributed more to hetereogeneity.

Search and selection
A flow diagram depicting the search and selection of studies is shown in Figure 1.The search of databases identified 4367 titles and abstracts, of which 19 full-text articles were reviewed.Eleven additional articles were identified from the reference lists of articles (manual search).Of the 30 full-text articles reviewed, eight were excluded because the study purpose was not to determine reference values of MIP (31,33,35,42,43); it was a review (43); the MIP manoeuvre was performed at functional residual capacity rather than residual volume (9) or in a standing rather than sitting position (20,44).

Methodological quality of studies
Methodological quality, as assessed according to seven items of the QUADAS, ranged between 0 and 7, with a mean score of 3 (Table 1).The most common items that were reported included: similar data available during test results and in practice (15 of 22 studies); study participants were representative of participants who would be tested in practice (12 of 22 studies); and the selection criteria were clearly described (12 of 22 studies).The quality criteria that were least often reported were whether the reference standard was likely to correctly classify the target condition (six of 22 studies); and whether uninterpretable/intermediate tests results were reported (six of 22 studies).Other issues that affected methodological quality were patient characteristics and MIP technique not queried by QUADAS.These issues are described in more detail in the following two sections and in Tables 2 and 3.

Characteristics of MIP technique
The methodology for measuring MIP varied considerably (Table 3).A flanged mouthpiece was used in five studies (14,16,22,24,27).A small leak was reported in 18 studies but the size was not specified in four (14,22,24,25).Most studies reported using nose clips (16 of 22) and provided instructions (18 of 22).The number of repetitive manoeuvres to obtain MIP was at least five in two studies (12,18) and at least seven in two others (25,27).In six studies, the authors considered the learning effect for MIP test; it was reported that the final manoeuvre was not the best manoeuvre in four {five?} studies {(12,16,24,25,28)}, the test could be stopped when the subject considered him or herself unable to perform better (22).Several studies considered the highest value within two or three measures that were within 5% to 10% (Table 3).The lower limit of normal was reported in nine studies (8,19,21,24,25,(27)(28)(29)(30).Most used the fifth percentile of the negative residuals of MIP (8,19,21,25,27,30), which was calculated in each age group for both sexes in one report (27).In contrast, two studies provided an alternative definition (Table 3) and one did not define the lower limit of normal (28).

Meta-analyses
MIP values for men and women in different age groups derived from the random effects analysis are shown in Table 4.Each age range reflected data from 59 to 96 subjects from at least five of the six studies (12,15,24,26,29,30).The total number of participants was 840.These six studies had an average QUADAS quality score of 3.5 of 7 (range 2 to 6).MIP decreased with age for both men and women.For the same age group, men tended to have a higher MIP than women.Sensitivity analysis of withdrawing studies from the meta-analysis often lowered the Q statistic.One study (26) contributed more to the Q statistic than the other five.MIP values from this report were within the range of the other five studies.As mentioned in the methods section, the random-effects model (40) was selected for the meta-analyses to examine methodological variations of the included studies.

DISCUSSION
Our meta-analysis provides a synthesis of normative values for MIPs, the most commonly performed measure to indicate inspiratory muscle strength.Our meta-analysis was based on 840 subjects, the largest and most representative reference group available for this measure.This synthesis demonstrates strong age-related trends and sex differences in MIP, similar to trends reported by the included studies.
Our meta-analysis considered data from 840 subjects recruited from North America, Brazil, Sweden and The Netherlands.Therefore, the reference values are reflective of women and men from different ethnic backgrounds, with associated differences in body stature that could be expressed as variations in MIPs.Mean MIP data from each of the six studies that contributed to the meta-analyses had mean values (in different age groups for each sex) that overlapped with at least one other study.MIP values from three reports (29,12,24) were generally higher for all age groups in both sexes, whereas the data from the two reports (26,30) were generally lower in both sexes for most age groups.The sensitivity analyses further verified that no single study contributed to the heterogeneity of data.Interestingly, the two studies that recruited American subjects and reported the highest quality scores (8,21) provided regression equations that produced MIPs that were higher than our mean values in one case (21) and lower in the other (8).In other words, the reports with high methodological quality provided mean MIP values that bracket the mean data derived from our synthesis, which lends further credence to the validity of the data derived by the meta-analyses.
Application of reference values require consideration of participant and methodological differences.Participant characteristics varied considerably among studies, and many of these could not be considered by our subgroup analyses.In addition to age and sex, other factors are reported to influence MIP (eg, height, weight, fitness and smoking status) (31)(32)(33).Unfortunately, there is no consensus regarding which of these variables have a significant influence on MIP or the directional influence of some of these variables.For example, height has been shown to be positively predictive (14,22), negatively predictive (8,22) and not predictive (5,16,18,19,21,(23)(24)(25)29,30) of MIP.Most studies demonstrated that MIP was significantly affected by sex and age in healthy subjects.This is in accordance with the study by Black and Hyatt (5), who reported a linear regression of MIP on age in both males and females; however, the regression was not significant in subjects <55 years of age.Our data are in accordance with this early investigation (5) and indicates modest decreases in MIP before 45 to 55 years of age (Table 4).Unfortunately, there is no consensus regarding the threshold when age-related decline in MIP occurs.It has been reported to occur after 30 years of age in one study (16) or after 60 years of age in another (25).It is plausible that MIP has not undergone study with sufficiently large sample sizes to determine the many factors that may contribute to valid reference values.A recent article on reference values for spirometry examined 55,136 healthy subjects (45), whereas the number of participants in the studies for the present systematic review ranged from 60 to 252.Considering the relatively small sample sizes and the ethnic diversity in studies performed on different continents, it is not surprising that MIP values varied among studies.Future studies should recruit larger numbers and factor in confounders such as ethnicity, smoking history, physical activity and parameters of body stature (ie, BMI, height and weight).
A primary methodological issue that likely contributed to the large interstudy variation of normative values for MIP is participant learning.Given that MIP is volitional, it requires full understanding of the task to be performed, maximal participant effort and enthusiastic coaching from the technologist.To attain a valid MIP, the selection of truly maximum efforts depends not only on setting a limit on variation between successive measurements, but also by carefully considering whether optimal learning has occurred.Several studies did not consider the need to continue the test if the final manoeuvre {was the largest?} (5,8,10,11,(13)(14)(15)(17)(18)(19)(21)(22)(23)27,28,30).Incomplete learning combined with too few repetitions to achieve a maximal value are likely primary factors that influence heterogeneity of MIP values among studies.
The large interstudy variation of normative values for MIP may also be explained by technical aspects of test performance including variation in the type of mouthpiece, size of leak and the point of MIP measurement (peak or plateau) (Table 3).Five of the 22 studies used flanged mouthpieces (14,16,22,24,27), which have been reported to provide lower values than those obtained with rubber tube mouthpieces (1).A larger leak can also result in lower values (ie, MIP measurements obtained with leak of 2 mm internal diameter were 17% and 22% lower than those with leak of 1 mm or no leak, respectively) (35).A plateau measure that is averaged over 1 s would be lower than a peak pressure defined as the highest value reached during a brief maximal effort (18,19).Notably, not all studies after 2002 used the 1 s average pressure, which was recommended by the ATS/ERS, even though this measure is more reproducible than the peak pressure (1,46).

Limitations
The present synthesis was limited by variability in study quality, participant characteristics and MIP technique.Of the 22 included studies, only six were included in the present meta-analysis because of the necessity for MIP data to be reported according to sex and stratified into one or more of six comparable age groups (Table 4).Nine other included studies stratified MIP data according to age groups; however, the categories included either a wider age range or straddled the above-mentioned groupings (5,8,11,13,14,17,21,23,25).The methodological quality of studies (ie, QUADAS assessment) examined in the meta-analysis ranged between 2 and 6.Unfortunately, it was not possible to include some of the studies that had a QUADAS score of 7 of 7 because the MIP data were not reported in similar age groups (8) or they were presented as regression equations (8,21).
In spite of an ATS/ERS statement on respiratory muscle testing, many subsequent reports included in the present review did not describe methods consistent with these guidelines.Several studies did not use the recommended flanged mouthpiece or did not report the type of mouthpiece (26)(27)(28)(29)(30). Description of a sustained pressure for 1.5 s to evaluate the maximum sustained pressure for 1 s was rarely detailed.It was also seldom noted whether the tester was experienced, and how much urging and encouragement was provided to the individual being tested.Regarding some of these technical issues, it was not possible to conclude whether some of these methods were not performed or simply not reported.
Attributing a bias toward larger or smaller MIP due to participant characteristics in a particular study was also difficult because of small sample sizes, the large number of potential confounders that could influence strength measures, and equivocal findings regarding their effect.For example, greater height and fitness were not consistently associated with higher MIPs.Because of these uncertainties and omissions in reporting, it was difficult to conclude whether MIP data were higher or lower in a particular study due to differences in MIP technique or participant characteristics.Thus, to date, no predictable correction factors can be applied to MIP data to normalize technical or participant differences.

Practical implications
Standardized techniques for measuring MIP will ensure more reliable and valid data.The ATS/ERS statement on respiratory muscle testing recommends that MIPs are measured using a flanged mouthpiece with a 2 mm leak at residual volume (1).The latter two criteria were consistently met by all six studies included in the present meta-analysis (12,15,24,26,29,30).A sustained pressure for 1.5 s to evaluate the maximum sustained pressure for 1 s is also recommended to obtain reliable measures (1).Finally, the importance of an experienced tester who can optimize volitional effort of the participant by clear instruction and strong encouragement is essential (1).These latter two issues, although not always reported in the included studies, are highly recommended for attaining MIP.
Regardless of the reasons for the differences in MIP among studies, appropriate normative values are required by pulmonary function laboratories to provide a valid reference of respiratory function for an individual patient.Ideally, appropriate reference values should be obtained in the same laboratory from a sample that represents the demographic characteristics of the patients to be tested.From a practical perspective, such reference values are usually not established in many centres due to limited resources.In these situations, the data provided by our meta-analysis likely are the broadest representation of an ethnically diverse sample and are recommended for general use when patient referrals have considerable ethnic variation.Whether using these data or reference values from a single study, selection should be determined by ensuring close alliance with ATS/ERS methodology of MIP measures and that the participants' characteristics were similar to those of the patient of interest.Future larger scale studies are required to investigate other subjectrelated factors that influence MIPs, aside from age and sex.Until larger scale studies of similar magnitude to the reported 55,136 healthy subjects for spirometric reference values (45), confirmation of additional independent correlates of MIP (such as height, body mass index, fitness, smoking) is unlikely.These normative reference values are especially important to establish in elderly patients because the commonly accepted threshold to indicate inspiratory weakness is 80 cmH 2 O; this value is lower than the predicted mean values reported in several studies including the data synthesized in the present meta-analysis.

TABLE 1 Ratings of Quality Assessment of Diagnostic Accuracy Studies (QUADAS) Author (reference), year QUADAS rating Total (of 7) 1 2
(36)AS Items(36): 1. Was the spectrum of persons representative of the demographics of the patients who will receive the test in practice?; 2. Were selection criteria clearly described?; 3. Is the reference standard likely to correctly classify the target condition?;9. Was the execution of the reference standard described in sufficient detail to permit its replication?; 12. Were the same clinical data available when test results were interpreted as would be available when the test is used in practice?; 13.Were uninterpretable/intermediate test results reported?; 14.Were withdrawals from the study explained?

TABLE 2
*Range of mean data shown for different subgroups.BMI Body mass index; F Female; FEV 1 Forced expiratory volume in 1 s; Gp Group; M Male; NR not reported

TABLE 3 Technical aspects that influence maximal inspiratory pressure (MIP) Author (reference), year
Peak The highest value reached during a brief maximal effort; Plateau The highest pressure sustained for a defined minimum period; RV Residual volume; SEE Standard error of the estimate of the model Smoking exemplifies another characteristic that could influence normative values if the participant had undetected lung disease.
NR5th % Fifth percentile of the negative residuals of maximal inspiratory pressure; F Female; FRC Functional residual capacity; LLN Lower limit of the normal range; M Male; Max Maximum; Min Minimum; n/a not applicable; NR Not reported;