How Has Intervention Fidelity Been Assessed in Smoking Cessation Interventions? A Systematic Review

Introduction Intervention fidelity concerns the degree to which interventions are implemented as intended. Fidelity frameworks propose fidelity is a multidimensional concept relevant at intervention designer, provider, and recipient levels; yet the extent to which it is assessed multidimensionally is unclear. Smoking cessation interventions are complex, including multiple components, often delivered over multiple sessions and/or at scale in clinical practice; this increases susceptibility variation in the fidelity with which they are delivered. This review examined the extent to which five dimensions from the Behaviour Change Consortium fidelity framework (design, training, delivery, receipt, and enactment) were assessed in fidelity assessments of smoking cessation interventions (randomised control trials (RCTs)). Methods Five electronic databases were searched using terms “smoking cessation,” “interventions,” “fidelity,” and “randomised control trials.” Eligible studies included RCTs of smoking cessation behavioural interventions, published post 2006 after publication of the framework, reporting assessment of fidelity. The data extraction form was structured around the framework, which specifies a number of items regarding assessment and reporting of each dimension. Data extraction included study characteristics, dimensions assessed, data collection, and analysis strategies. A score per dimension was calculated, indicating its presence. Results 55 studies were reviewed. There was a wide variability in data collection approaches used to assess fidelity. Fidelity of delivery was the most commonly assessed and linked to the intervention outcomes (73% of the studies). Fidelity of enactment scored the highest according to the framework (average of 92.7%), and fidelity of training scored the lowest (average of 37.1%). Only a quarter of studies linked fidelity data to outcomes (27%). Conclusion There is wide variability in methodological and analytical approaches that precludes comparison and synthesis. In order to realise the potential of fidelity investigations to increase scientific confidence in the interpretation of observed trial outcomes, studies should include analyses of the association between fidelity data and outcomes. Findings have highlighted recommendations for improving fidelity evaluations and reporting practices.


Introduction
Intervention fidelity is defined as the "methodological strategies to monitor and enhance the reliability and validity of behavioural interventions" [1] and the extent to which interventions are implemented as intended. The term fidelity is often used interchangeably with terms such as "quality assurance," "adherence," and "treatment integrity" [2]. There has been considerable investment into designing and evaluating the effectiveness of health behaviour change interventions [3], yet comparatively less into investigating how and why these interventions work to achieve intended outcomes. A recent Medical Research Council (MRC) process evaluation guidance [4] has emphasised the importance of investigating fidelity.
Assessing fidelity can inform intervention replication and scalability and is key to promoting research transparency and increasing scientific confidence in interpretation of outcomes [5]. Assessing fidelity can also identify provider training needs and aspects of intervention implementation that could be targeted for improvement in future adaptations [6]. Behaviour change interventions are often complex, comprising multiple components [4]. Behavioural interventions may also involve tailoring to meet the needs of the individuals taking part [7]. Health behaviour change interventions are an example of such complex interventions. In addition to multiple interacting components and tailoring, these interventions are often delivered in busy and unpredictable settings, by diverse intervention providers, to a wide range of recipients [8]. Combined, these factors increase susceptibility to variable fidelity by increasing the opportunities and avenues where intervention variability could potentially be introduced at the designer, provider, and recipient levels.
Fidelity itself is an equally complex concept [9]. There are numerous models of fidelity from various disciplines, which vary in their conceptualisation and proposed measurement of fidelity [10,11]. In recognition of this, the National Institutes of Health (NIH) Behaviour Change Consortium (BCC) synthesised numerous fidelity frameworks into an integrated fidelity framework [1,6]. This framework proposes that fidelity is relevant at the intervention designer, provider, and recipient levels, representing a fidelity pathway to outcomes. It proposes five fidelity dimensions ( Figure 1).
This framework recommends strategies for assessing, enhancing, and reporting each dimension. It is important to assess fidelity across all five, as a lack of fidelity to just one could detrimentally impact intervention outcomes. The framework consists of a checklist outlining key fidelity concepts in each of the five dimensions, which (when present in a study) ensure good fidelity. The checklist contains items which are fidelity assessment strategies (e.g., there is a plan for the assessment of whether or not the active ingredients were delivered) as well as fidelity enhancement strategies (e.g., use of treatment manual).
Despite the recognised importance of fidelity and the availability of guidance for assessing and reporting it, fidelity is not frequently investigated for complex interventions. A review of evaluations of intervention programmes published between 1980 and 1994 identified only 24% assessed fidelity [12]. Borrelli et al. [6] reviewed 342 health behaviour change interventions and identified 22% reported strategies for maintaining provider skills (linked to training), 27% reported checking adherence against the protocol (delivery), and 35% reported using treatment manuals to guide the intervention (delivery). 12% of the total studies reported using all three strategies for fidelity whilst 54% reported none. Fidelity assessments do not appear to be improving over time. A more recent review of 28 adult physical activity interventions [5] identified that delivery was the most commonly assessed fidelity domain. Similarly, a review of fidelity assessments in 65 physiotherapist delivered physical activity/exercise interventions identified only 40% of studies reporting on two or more of the fidelity domains [13]. Both reviews noted wide variation in methods used to assess fidelity and that fidelity of design was the least investigated domain.
Other reviews have focused on the assessment of specific fidelity dimensions. Rixon et al. [14] focused on fidelity of "receipt" amongst studies citing the use of the NIH BCC framework. A total of 33 studies were identified, with 19.6% addressing receipt and 12.1% including strategies to enhance receipt. Similarly, Walton et al. [15] reviewed the assessment of fidelity of "delivery" and engagement in 66 health behaviour change interventions, identifying that 32% measured engagement, 30% measured fidelity of delivery, and 36% measured both. They found similar numbers of studies used observational and self-report measures, but noted self-report measures have limitations and observational measures are recommended as the gold-standard measure. They also noted that objective measures such as intervention records were used but these do not measure the participant's understanding of the intervention. Objective measures (e.g., participants demonstrating the skills) were not used by any studies.
Fidelity assessment studies often do not explore association between fidelity and outcomes [6]. As such, there is limited data available on whether extent of fidelity has a positive or detrimental impact on outcomes. Available evidence suggests that greater fidelity is associated with improved outcomes [1]. There is also contrasting views on this matter, as others argue that adaptation and tailoring, which may result in loss of fidelity/standardisation, are important to effecting change.

Fidelity in Smoking Cessation Interventions.
Smoking cessation behavioural support typically involves offering advice, practical tips, and coping techniques aimed at helping people to cope with cessation and the withdrawal of nicotine. It also explains how to use smoking cessation medications effectively [16]. It can be delivered through a range of modalities (face to face, digital, telephone, and group). It is widely implemented in practice, where it is delivered across care settings (primary care, stop smoking services, and secondary care), at varying levels of intensity, by a wide range of care providers (e.g., doctors, nurses, trained advisors, and pharmacists) [17] to a wide range of patient and public groups. It is a prime example of a complex intervention that is arguably susceptible to variable delivery in practice. Smoking is a priority target behaviour for behaviour change interventions and a public health priority due to its role in reinforcing health inequalities (Public Health [18]).
Behavioural support for smoking cessation has been implemented widely in clinical practice. In the UK, smoking cessation support is nationally available via the NHS stop smoking service, which offers free weekly support, nicotine replacement therapies, and other medicated aids [19]. Smoking cessation services have been shown to be highly effective [20]. However, outcomes across services are highly variable [21]. Recent studies looking at fidelity in NHS stop smoking services have shown that on average, approximately half of the intervention that is specified in the manual is delivered by stop smoking practitioners [22], representing low fidelity. This has also been found to be similar within telephone smoking cessation behavioural support [23]. Reasons underpinning this are unclear. There are national guidelines outlining how these interventions should be delivered, as well as intervention manuals for individual services [24]. There is even evidence to show that practitioners within the same service operating under the same treatment manual can have variable success rates [25], raising the possibility that the interventions are potentially delivered with variable degrees of fidelity.
All five stages of the BCC framework can be linked to smoking cessation interventions; the intervention may be designed and training offered at central NHS level but the delivery, receipt, and enactment take place within each smoking cessation service. Fidelity of design is important to ensure "that a study adequately tests its hypotheses in relation to its underlying theoretical and clinical processes" [26]. In the context of smoking cessation behavioural support, the logic model and/or treatment manual should contain intervention components or behaviour change techniques (BCTs) linked to the theory it claims to be based on.
Fidelity of training refers to "standardising training between providers, ensuring that providers are trained to criterion, and monitoring and maintaining provider skills over time" [26]. For smoking cessation interventions, this looks at whether stop smoking advisors were trained consistently, using standardised procedures. Was their training/acquired competence assessed prior to intervention delivery (i.e., by role playing delivering a session to a client/smoker and assessing whether delivered according to manual/as intended)? Treatment delivery is defined as "treatment differentiation (did the providers only deliver the target treatment and not other treatments), treatment competency (did providers maintain the skill set learned in training), and treatment adherence (delivery of the treatment components as intended)" [26]. Within smoking cessation interventions, this looks at whether the intervention was delivered as intended/specified in manuals (e.g., audio-recording sessions).
Receipt refers to "whether or not the participant understood the treatment (as well as the accuracy of understanding) and demonstrates knowledge of, and ability to use, the skills or recommendations learned in treatment" [26]. This involves trying to ascertain whether the smoker understood during the session what they need to do prior to the quit attempt, e.g., prepare themselves by obtaining nicotine replacement therapy (NRT), removing cues to smoking from their house (such as ash trays), amongst others.
Enactment "involves assessment, monitoring, and improving the ability of participants to perform treatmentrelated behavioural skills and cognitive strategies in relevant real-life settings" [26]. In smoking cessation, this refers to whether smokers take the necessary steps to aid their quit attempt, e.g., use the NRT as directed. NIH was chosen as it is intended to synthesise other existing frameworks and is thus arguably comprehensive and represents a fidelity pathway to outcomes.
Reviews of other specific behaviours have been conducted, but there has been no review to date investigating fidelity assessments of smoking cessation interventions and the methods used in cases where it has been assessed. This is crucial given that research shows smoking cessation interventions are often delivered with variable degrees of fidelity [22].

Research Aims and
Objectives. This review is aimed at investigating how fidelity has been assessed in studies that claim to investigate it. Specifically, the review looked at how the five dimensions of the NIH BCC fidelity framework (design, training, delivery, receipt, and enactment) have been assessed and reported in RCTs of smoking cessation behavioural support interventions. Fidelity is important in aiding interpretation of trial outcomes. There has been no review to date investigating fidelity assessment within smoking cessation interventions. This has been broken down further into the following: (1) Which dimensions of the framework have been reported and assessed?
(2) What methodological approaches have been used to collect and analyse fidelity data for each dimension?
(3) To what extent were associations between fidelity outcomes and intervention outcomes (e.g., smoking cessation) investigated?

Methods
The review was conducted in accordance with the PRISMA systematic review guidelines [27]. Intervention. Only smoking cessation behavioural interventions were included. This included smoking in terms of tobacco cigarettes, shisha or water pipe smoking, and secondhand smoke. Interventions focusing solely on cannabis smoking were excluded. Studies with a behavioural support element with a human interaction component (i.e., face-toface or telephone interactions) were included. Interventions solely featuring a distant modality of interaction (such as web/app-based materials) or pharmacological interventions (e.g., Champix) were excluded due to the absence of a human interaction component. Studies delivered with a provider and recipient interaction element (i.e., face to face, via telephone) have more scope for variability in fidelity across all five dimensions, particularly delivery, receipt, and enactment. In comparison, for digital interventions (where fidelity of delivery may be standardised), the variation lies in receipt and enactment. Intervention fidelity is still relevant to digital, but due to feasibility constraints, the review focused solely on those involving interaction between provider and recipient (face to face or telephone), as this is where there is the greatest scope for variation in fidelity and greater relevance of the NIH BCC dimensions.

Study Design.
The studies were required to report an assessment of intervention fidelity data, either mentioned in the abstract or assessed in the full text if it was unclear from the abstract. The BCC fidelity framework was designed for fidelity assessments of cluster/RCT designs. Therefore, only RCTs comparing the intervention against a control (i.e., no intervention, standard practice, and another intervention) were eligible. Studies that involved training staff to deliver smoking cessation interventions but reported no participant outcomes were also excluded as they did not detail the actual intervention delivery.
Studies published in English, in peer-reviewed journals, were included. Study protocols were included, as the primary interest of the review is methods. Research/conference abstracts were excluded. Only studies published post 2006 were included, following publication of the BCC framework in 2005.

Search Strategy.
In December 2018, five databases were searched electronically: MEDLINE, EMBASE, Ovid Nursing Full Text Plus, CINAHL, and PsycINFO.
The search strategy included terms related to fidelity, smoking, cessation, and intervention (Table 1). Search terms were informed by previous systematic reviews of intervention fidelity [28] and Cochrane reviews on smoking cessation [29][30][31]. Terms for RCTs were adapted from Coppo et al. [32]. Terms within each category were combined using "OR" (i.e., smoking terms were combined as "smok * " OR "tobacco"). These individual search strings were combined with "AND".
The search strategy was validated by conducting an initial search and checking whether it retrieved a criterion paper identified during the scoping search (study 8).

Study Selection.
Following deduplication, remaining entries were screened by the primary researcher (SB) at title and abstract level against the inclusion and exclusion criteria. The full text was screened for unclear studies and those fulfilling the inclusion criteria. For interrater reliability, the second researcher (FL) screened 10% of the studies at the abstract level and percentage agreement was calculated.

Data Extraction.
A data extraction form was developed featuring five main sections: study characteristics, fidelity definitions, dimensions assessed, data collection, and analysis strategies. Study characteristics included the research question/aim, design, participant details, and the results/conclusion summary. Data was also extracted on intervention mode of delivery, intervention providers, and theoretical basis (use of theory/models of fidelity and/or behaviour change).
The assessment of fidelity incorporated the BCC checklist from Bellg et al. [1] and looked at each of the five dimensions (design, training, delivery, receipt, and enactment) in further detail (see Supplementary Table 3). For example, fidelity of enactment includes two subcomponents looking at the assessment of participant performing intervention skills and strategies used to do this. The studies were coded for both enhancement and assessment strategies.
The BCC guidance recommends using this as a checklist for scoring fidelity assessments (e.g., [33]). Each component should be rated as present, absent but should be present, or not applicable. Absent but should be present was defined as occurrences where the "treatment fidelity information was inappropriately omitted, preventing the coder from being able to accurately assess the scientific validity of the article" [6]. Categories were considered not applicable when "the particular treatment fidelity strategy was not applicable to the study in question," e.g., studies that indicated abstinence would be assessed long term rather than during the intervention period (as indicated in the checklist) were marked as not applicable [6].
A percentage fidelity score was calculated by taking the number of fidelity strategies reported divided by the total number of applicable strategies for each dimension (e.g., a study with three of the four applicable delivery components present/reported would score 75% for delivery fidelity). Studies received 0 if the dimension was not assessed. Fidelity scores were classified as low (less than 50%), medium (51 to 79%), and high (80 to 100%) based on published criteria [26]. An overall fidelity score was also calculated (number of components present in all five dimensions divided by the total number of applicable components across all five dimensions).
Data extraction included the methods for collecting data fidelity (e.g., audio/video taping and provider checklists), the methods for analysing this data, and the methods for investigating the statistical association between fidelity and outcomes. For example, delivery studies may have reported the percentage of components delivered compared to the manual.
The data extraction form was piloted on the criterion paper (study 6) and amended as necessary. Full data extraction for all studies was completed by the primary researcher (SB). The interrater reliability was calculated for 4% of the total studies, and agreement was defined as both researchers agreeing on whether a particular aspect of the data extraction was present, e.g., whether the study provided information about treatment dose.

Data Synthesis.
Where appropriate, quantitative data were summarised using descriptive statistics. Fidelity data analysis methods were summarised and described using narrative synthesis. Five studies were removed (manual search for duplicates (n = 4) and one study was not original research (n = 1)). All 511 papers were screened at the abstract level. A second reviewer (FL) screened 7% of the 511 papers (n = 35), with 74% agreement between the two researchers. Any discrepancies were discussed until agreement was reached. 124 studies fit the eligibility criteria or were unclear and required further screening. Twenty-six systematic review papers were removed as they were not primary research.

Results
The remaining 98 papers were screened at the full-text level, and 50 studies were excluded as they were not primary research. One study (study 11) described the findings from seven individual studies conducted within hospitals, which were treated individually. A final sample of 55 studies was included.

Which Fidelity Dimensions Were Reported and Assessed (RQ1)?
The studies were assessed for fidelity in each of the five dimensions of the framework (design, training, delivery, receipt, and enactment). The percentages are the proportion of NIH BCC framework components listed for each of the five fidelity dimensions that are reported in individual papers. Table 3 shows each fidelity dimension in rank order for average fidelity score across studies, how many components it contains, and the most and least reported component. The reporting of the fidelity subcomponents within each dimension was collated in full (Supplementary Table 2). The table shows enactment had the highest average fidelity score (92.7%) and training had the lowest (37.1%), indicating that enactment had the highest number of framework components present in studies conducting fidelity assessments. Table 4 shows the overall level of observed fidelity across all five dimensions in each study, ranked from the lowest to highest scoring. The average fidelity score in terms of

How Has Fidelity Been Assessed?
3.4.1. Use of Theory and Frameworks. Almost half the studies (n = 24, 43%) did not cite the use of a theoretical framework (see Table 5 for summary). Motivational interviewing (MI) and Motivational Interviewing Treatment Integrity (MITI) scale were most commonly used in intervention design and to assess fidelity (MI n = 14, 25%; MITI n = 7, 12%). One study (5) used the RE AIM framework in addition to MITI, and another (18) used the fidelity protocol implementation index to assess fidelity. No studies cited the use of the BCC framework.

Data Collection
Methods. Fidelity data collection methods were not always reported, and where information was provided, methods were found to vary across and within fidelity dimensions. The most commonly used methods are shown in Table 6 (full data collection methods detailed in Supplementary Table 4). Many studies used multiple methods. Audiotaping and provider/participant self-reports were the most commonly used methods across all dimensions.

Groups
Where Fidelity Was Assessed. The majority of studies assessed fidelity in the intervention group only (n = 37, 67%). The remaining studies assessed fidelity in both the intervention and control groups (n = 16, 29%). Almost half of the studies did not specify the proportion of their sample that fidelity was assessed in (n = 27, 49%). The remaining studies varied from 10% (n = 4, 7%) to 100% of the sample (n = 12, 22%) (average 55%). The study references are shown in Table 7, and full fidelity assessment data is shown in Supplementary Table 4.

Measurement Time Points.
The majority of studies measured fidelity at multiple time points (n = 20, 36%). 16 studies measured it at the end of the intervention (29%), eight during the intervention (15%), and one before and during the intervention (2%, study 5). It was not specified in the studies which fidelity dimensions were assessed at which time points. The majority of studies were unclear about the number of times fidelity was measured (n = 46, 84%). Four studies assessed fidelity on an ongoing basis (10%), eight once during the study (15%), and one assessed fidelity five times (2%, study 27). The study references are shown in Table 8, and full fidelity assessment data is shown in Supplementary Table 4. 3.4.5. Fidelity Sampling Method. The majority of studies were unclear about their participant sampling method for the fidelity assessment (n = 32, 58%). The remaining studies either used purposive sampling (n = 2, 5%), random sampling (n = 8, 14%), or included the whole sample (n = 13, 24%). Two studies specified they were assessing fidelity amongst intervention providers in the sessions delivered, and two studies specified assessing fidelity in participant groups receiving the intervention. The study references are shown in Table 9, and full fidelity assessment data is shown in Supplementary Table 4.

How Was Fidelity Data Collected and Analysed (RQ2)?
Data was extracted on whether the studies assessed and reported findings of fidelity assessments for each dimension of the framework (Supplementary Table 5). Seven studies included were protocols, and so it was not possible to extract the reporting details (3, 5, 6, 8, 25, 30, and 43). These protocols may not have linked fidelity to outcomes because they did not have results as yet. For design, only one study (2%, 6) assessed design but did not report the findings. With regard to the approaches to fidelity analysis, the studies varied widely (Supplementary Tables 5 and 6). Methods for assessing and reporting the findings were described for each dimension as follows: design n = 1 study (1.8%), training n = 8 studies (14.5%), delivery n = 35 studies (63.6%), receipt n = 31 studies (56.4%), and enactment n = 27 studies (49.1%).
The majority of the studies did not analyse the fidelity data in any of the dimensions as they instead reported on strategies to enhance fidelity. For example, for training, two studies (16,38) reported that the providers were continually trained and practised until they were able to deliver a session with fidelity. However, the assessment methods for this were not specified. In another study (37), the nurses (providers) were required to engage in role play to gain certification in the intervention procedures (fidelity of training). Finally, one study (43) provided feedback on recorded consultations. However, the nature of the feedback was unclear and it was unclear what they did with the role play data.
The delivery of the intervention was most commonly assessed for fidelity and the findings reported. The data was typically analysed by a staff member observing delivery of the intervention and completing a checklist of the components delivered or a self-reported checklist by intervention practitioners. The checklists were compared against intervention protocol, and scores were calculated to show delivery as outlined in the intervention protocol, e.g., using MITI checklists (studies 2, 5, 6, 9, 12, 14, 17, 18, 26-28, 30, 32, 34-39, 43-48, and 55). For receipt, participants were asked to complete questionnaires assessing usage and utility of intervention components in relation to their views on quitting smoking, for example (26). Questionnaires were also a commonly used method to assess enactment, e.g., using questionnaires to assess smoking reduction/quit behaviour and compare this to the fidelity of the intervention received (32). Design was rarely assessed or reported. Additional detail on the precise ratings/scoring of questionnaires was not reported for receipt or enactment data. It was unclear how the questionnaire data was used.

To What Extent Were Associations between Fidelity Outcomes and Intervention Outcomes Investigated (RQ3)?
All the studies did not report whether they had assessed reliability or validity. The majority of studies did not report whether they examined if there was an association between fidelity and study outcomes (n = 40, 73%  19, 20, 23, 27, 36, 40, 41, 43, and 49). Some studies explored the relationship between fidelity of delivery and a range of outcomes; one study looked at practitioners use of intervention materials in relation to quitting (43), another looked at practitioners' adherence to MI and smoking status (47), another assessed the predictive value of clients' characteris-tics on the practitioner's MI adherence (46), and another compared intervention conditions to study outcomes (55). The statistical analysis methodology used was not clearly stated in any of these studies, except in one study where odds ratios were calculated for reducing/quitting smoking and compared to the intervention delivery fidelity participants received from the practitioner (32). The studies reported greater positive outcomes with greater fidelity outcomes.

Discussion
This review is aimed at investigating how fidelity studies of smoking cessation behavioural support trials have assessed fidelity according to the five dimensions and recommendations of the NIH BCC fidelity framework (design, training, delivery, receipt, and enactment). The review looked at the methodology and analysis approaches used by studies to assess fidelity data and draw associations with the intervention outcomes. 55 studies were reviewed, and all the studies had low or medium overall fidelity in terms of the proportion of components recommended by the NIH BCC framework that were reported and assessed. The researchers discussed discrepancies on studies outlining training staff to deliver smoking cessation interventions but reported no participant outcomes and decided to exclude them as they were not RCTs. Participant performance of the intervention skills will be assessed in settings in which the intervention might be applied (n = 53, 96%) A strategy will be used to assess performance of the intervention skills in settings in which the intervention might be applied) (n = 51, 93%) Design 56.1% (5-95%) 7 Information about the treatment dose in the intervention condition (n = 51, 93%) Plans to address possible setbacks in implementation (i.e., backup systems or providers) (n = 13, 24%) The participants' ability to perform the intervention skills being assessed during the intervention period (n = 50, 91%) Multicultural factors considered in the development and delivery of the intervention (e.g., provided in native language; protocol is consistent with the values of the target group) (n = 7, 13%) Delivery 44.5% (0-77%) 9 The method to ensure that the content of the intervention is delivered as specified (n = 47, 85%) Whether there was a plan for the assessment of whether or not proscribed components were delivered (e.g., components that are unnecessary or unhelpful) (n = 1, 2%) Description of how providers will be trained (n = 42, 76%) Presence of a training plan that takes into account trainees' different education and experience and learning styles (n = 1, 2%) Average fidelity score refers to the presence of framework components.

Journal of Smoking Cessation
Enactment had the highest average fidelity score, and delivery was the most assessed and reported of the dimensions, with study staff completing checklists of the intervention components delivered and comparing it to the intervention protocol. This indicates that enactment had the highest number of components present but delivery was most commonly assessed and reported in studies conducting fidelity assessments. However, this could be due to the fact that outcome measurements for smoking cessation trials confound enactment, as the act of smoking cessation is enactment. As such, it would not necessarily be appropriate to explore the association between enactment and outcomes, as they are the same thing in this respect.
The main limitations of how fidelity is currently assessed for smoking cessation behavioural support trials are not investigating fidelity in both intervention and control arm. The studies stated that fidelity was assessed in the "intervention group only" or "intervention and control group," so it was not possible to ascertain the exact nature of the fidelity assessment. This is a missed opportunity to explore contamination and treatment differentiation between trial arms. Another limitation is that none of the studies specified the sample used for conducting the fidelity assessment. The five dimensions of the framework each relate to a different part of the intervention; design, training, and delivery relate to the intervention providers, and receipt and enactment relate to the recipients. The vast majority of studies did not report whether there was an association between fidelity and study outcomes, indicating that when fidelity is measured it is not being interpreted with regard to the effect it has on study outcomes (i.e., helping participants to reduce/quit smoking). It is possible that the association was assessed but not reported.

Implications of Findings.
The fidelity of delivery analysis may have been the most commonly assessed because of the existing emphasis on this dimension in the broader fidelity literature [34] and the often highlighted importance of this dimension in particular. It has been argued that assessing fidelity within intervention delivery is key to furthering understanding of the relationship between the intervention, the process, and the outcomes [34]. If an intervention is   10 Journal of Smoking Cessation deemed to be ineffective at producing the results intended, the initial response may be to attribute that to a poorly designed intervention. However, the results may actually be due to a poor fit between the intervention design and delivery and the only way to deduce this is through assessing fidelity, known as a "type III error" [35], which refers to the concept of falsely dismissing a potentially effective intervention. However, Borrelli et al. [6] would argue that it is important to assess fidelity from all aspects of the intervention process in order to highlight how loss of fidelity at the designer, provider, and/or recipient level can each impact on outcomes in different ways. This is required to assess whether an intervention is effective and if not, which aspects of it may be contributing to its ineffectiveness. Fidelity assessments in smoking cessation are largely unidimensional and often focus on investigating a single dimension of delivery, echoing findings from other systematic reviews of fidelity assessments (i.e., O'Shea et al. [13] and Lambert et al. [5]). When considering smoking behavioural support in particular, the healthcare professional is required to deliver a host of techniques so fidelity of delivery is arguably very important. However, whether or not these behaviour change techniques lead to behaviour change relies on various factors, namely, the client/smoker understanding these techniques and what they need to do. They must understand when their quit date will be (goal setting) and what they need to do to prepare for it (action planning) and then implement this (i.e., enactment, such as removing all cues to smoking such as ash trays and lighters (environmental restructuring) and obtaining their medication and starting it (pharmacological support)). If recipient level fidelity dimensions are not explored, then it is challenging to fully understand how and whether the intervention worked as intended.
The current review resonates with findings from other fidelity reviews in that all the studies had low or medium observed levels of fidelity [6,12]. However, the present review differs in other respects. In Borrelli et al. [6] review, 27% of the studies reported checking study adherence against protocol. In this review, a higher percentage (62%) reported using a checklist to assess whether the intervention components were delivered as intended. This increase could be attributed to the nature of the studies. Borrelli et al. [6] looked at health behaviour change interventions overall (including multiple behaviours) whilst this review focused on a single behaviour of smoking cessation.
Other reviews focusing on specific aspects of the framework yielded similar results to this review. Rixon et al. [14] found fidelity of receipt was reported infrequently, whilst Walton et al. [15] looked at delivery and engagement with health behaviour change interventions and found it was most commonly measured. They also noted observational measures were the gold-standard methodology but that most studies used audiotaping and self-report questionnaires, as was the case in this review.
Many studies also reported fidelity enhancement strategies (such as regular supervision with practitioners) rather than fidelity assessment methods across all dimensions. The framework contains both fidelity enhancement and assessment items, and studies will vary in employing both. It is important to distinguish between the two strategies in order to understand whether the studies are assessing fidelity or assessing strategies that may lead to an increase in fidelity. Studies appear to be trying to enhance fidelity by considering fidelity at the study planning stages. However, this logically follows that fidelity should subsequently be assessed and reported to ascertain if the enhancement strategies increase fidelity but this review shows this is not routinely done.
None of the studies cited a fidelity theory or framework. This may in part help explain the limitations of current fidelity assessments for smoking cessation behavioural support, as they are not drawing on the best available guidance and conceptualisations of fidelity. In a study of 264 participants looking at barriers to researchers carrying out fidelity assessments, it was found that 89% indicated that fidelity is important. The majority of participants (68%) identified using strategies to assess fidelity (e.g., recording sessions) and enhancing fidelity (e.g., training manual), but only 30.9% indicated they reported these strategies within publications, noting the most common reason as being poor knowledge or understanding (77.4%). This indicates that researchers may have good awareness of the importance of intervention fidelity but poor knowledge and understanding is a barrier to addressing these in complex intervention trials [36].

Implications for Research.
The key implications emerging from this review for consideration by intervention developers and researchers are that they should assess and report the following: (1) Details of fidelity in all stages of the intervention, from initial design through to participant enactment. This review showed most studies solely focused on delivery fidelity. Whilst this is important, the other aspects are equally important for assessing fidelity and linked outcomes (2) Fidelity assessment processes, such as where the fidelity sample has been drawn from. This review shows that fidelity assessment and measurement is not routinely reported and synthesising the evidence is difficult. Others planning or conducting fidelity assessments for similar studies cannot learn from existing evidence and methods and replicate. Inclusion of this would provide a clearer picture of fidelity in different stages of the overall process. Fidelity is included as an item in reporting guidelines (i.e., TIDiER, [37]) (3) How fidelity measures and outcomes are linked. Only a small proportion of studies reported using checklists to score delivery, comparing this against the intervention protocol to check delivery was as intended and statistically assessed fidelity measures with outcome measures. The purpose of assessing fidelity is to aid the interpretation of the outcomes, and omitting this is a missed opportunity. It is also important to compare the intervention and control arms to look at treatment differentiation and contamination. This is particularly important in pragmatic trials evaluating interventions against usual/standard care which may have an active control arm (i.e., some behaviour change techniques) in order to maximise internal validity of trial and interpretation of outcomes. The majority of studies measured fidelity at multiple time points but did not specify which fidelity dimensions were assessed at which time points. This would have been an opportunity to explore fidelity drift/loss of sustainability within studies. Following similarly clear methodology from delivery across all dimensions and reporting the results demonstrate the correlation between fidelity levels and intervention outcomes (4) The difference between fidelity assessment and enhancement strategies. This review shows many studies are using the latter and thus trying to maximise fidelity during the trial. Whilst this is vital and should be planned for, it is important to also follow through and explore whether or not fidelity is maintained. A separate measure of fidelity is necessary and should logically follow to link fidelity to intervention outcomes 4.3. Strengths and Limitations. This study has a number of strengths and limitations. The use of the BCC framework has allowed for a comprehensive review of the literature using a framework that has unified previous fidelity models. However, one limitation is the use of only published articles. Smoking cessation interventions are designed and delivered in a wide variety of settings and may well be evaluated and assessed for fidelity. They may also be assessed for cost-effectiveness to ascertain feasibility in a local context. These may be published as evaluation or programme reports, which are excluded from reviews of this nature.
Furthermore, it is possible that the studies used strategies to enhance fidelity and/or assessed fidelity but did not report it. This study is aimed at looking at what authors report in fidelity assessments. The studies reviewed varied greatly in their description of the interventions. This could be overcome by contacting study authors for further information or to understand whether they explored fidelity but did not report. However, this was beyond the scope of feasibility in the present study.
One potential limitation of this study is the use of the BCC framework to guide data extraction and analysis. There are multiple frameworks of fidelity that differ in how fidelity is defined and/or guidance for measurement (e.g., [10,11]),   Specified assessing fidelity in participant groups receiving the intervention (n = 2, 5%) Studies 43 and 44 13 Journal of Smoking Cessation but the current review highlights the utility of the BCC framework for assessing fidelity. NIH was chosen as it is intended to synthesise other existing frameworks and is thus arguably comprehensive and represents a fidelity pathway to outcomes. Interrater reliability was also assessed in a very small percentage of studies (4%), decreasing possible reliability.
Furthermore, the five dimensions each have varying numbers of subcomponents within them. It is easier to score higher if there are fewer components, as the overall percentage increases quicker. Future research could look at weighting the components to allow for a more equal comparison.

Future
Research. Future research could benefit from focusing and addressing the implications above to ensure more accurate reporting of fidelity and subsequently a more accurate interpretation of the effectiveness of an intervention. Further research is needed to identify whether the dimensions within the framework differ in their importance and effect on intervention outcomes. For example, perhaps delivery fidelity was most commonly reported due to the ease of assessing and reporting this dimension compared to others. This could potentially be used to offer guidance on how to measure and report fidelity for interventions where a comprehensive fidelity assessment using the framework may not be feasible, such as in local authority public health settings.

4.5.
Conclusions. The review looked at which dimensions of the framework have been reported and assessed. It highlighted that fidelity evaluations in smoking cessation behavioural support interventions are not comprehensively exploring fidelity at the intervention designer, provider, and recipient levels. Providers focus predominantly on assessing fidelity of delivery and enactment. There is wide variability in methodological and analytical approaches that precludes comparison and synthesis across studies. Many studies reported assessing numerous components of fidelity; however, the findings/results were not reported in turn. This represents a waste of research effort and lack of transparency. Findings have highlighted recommendations for improving fidelity evaluations and reporting practices, such as ensuring studies are using fidelity assessments to aid interpretation of the outcomes.

Data Availability
Data is available as supplementary data tables.

Conflicts of Interest
The authors declare that they have no conflicts of interest.

Supplementary Materials
Supplementary