Qualitative Evaluation of the Project P.A.T.H.S.: An Integration of Findings Based on Program Implementers

An integration of the qualitative evaluation findings collected from program implementers conducting the Project P.A.T.H.S. (Positive Adolescent Training through Holistic Social Programmes) in different years (n = 177 participants in 36 focus groups) was carried out. General qualitative data analyses utilizing intra and interrater reliability techniques were performed. Results showed that the descriptors used to describe the program and the metaphors named by the informants that could stand for the program were generally positive in nature. Program participants also perceived the program to be beneficial to the development of the students in different psychosocial domains. The present study further supports the effectiveness of the Tier 1 Program of the Project P.A.T.H.S. in Hong Kong based on the perspective of the program implementers.


Introduction
In the process of program evaluation, understanding the client's perspective is usually the primary focus. One example is the use of the client satisfaction approach in capturing the views of the program participants. Comparatively speaking, the viewpoint of the program implementers about the program is not adequately explored in the evaluation literature [1]. There are several justifications for including the views of the program implementers. First, as pointed out by Peterson and Esbensen [2], "personnel, consciously or unconsciously, influence the effectiveness of prevention program, it is important to assess their perceptions when evaluating a specific program to provide insight into the context in which the program operates" (page 219). Second, according to utilizationfocused evaluation [3], in order "to achieve more reliable and valid evaluations, a number of data sources and perspectives should be combined" ( [4]; page 1225), the program implementers are one of the stakeholders who should be involved in the evaluation process. Third, with reference to the principle of triangulation, evaluation data based on various sources can help to cross-check program effectiveness across the data collected from different sources and help to paint a full picture of program effects. Fourth, because the program implementers have professional training and experience, they may give a good assessment of program effectiveness. Fifth, inclusion of the program implementers' views in judging the program and their own performance can give them a sense of respect and fairness and avoid biases that generate from the evaluation data based on the clients only. Finally, evaluation that includes questions that ask the program implementers about program implementation and their own performance can facilitate their reflective practice, which enhances professional growth and development [5,6].
The proposal to evaluate the view of the program implementers is also highlighted in the existing evaluation frameworks. Although different evaluation emphases exist in the evaluation literature in the international context, there are common evaluation frameworks and standards that are maintained by researchers in the mainstream scientific 2 The Scientific World Journal community. For example, the Centers for Disease Control and Prevention [7] suggested a comprehensive framework for program evaluation in public health, in which the engagement of the stakeholders is an important step. Similar focus can be seen in other evaluation frameworks, such as the What Works Clearing House [8] in the context of education. Regarding evaluation standards, the Joint Committee on Standards for Education Evaluation [9] proposed several areas of evaluation criteria in different domains. In the above evaluation frameworks, engagement of the program implementers in the evaluation process is an indispensable step.
Although the experimental/quantitative approach is the dominant approach in the field and it is commonly regarded as the gold standard, it is not the only option, and there are alternate approaches. For example, according to Patton [3], quantitative evaluation (thesis), qualitative evaluation (antithesis), and utilization-focused evaluation (synthesis) are different approaches to evaluation. There is more effort to carry out qualitative evaluation where the subjective viewpoints, qualitative data, and nonartificiality in the data collection process are emphasized.
How can the views of program implementers be assessed? There are different ways to capture the views of the program implementers. For example, rating scales or single-item open-ended questions are used to understand the viewpoints of the program implementers in subjective outcome evaluation. Although qualitative subjective outcome evaluation is good, its method to assess implementers' views by some open-ended questions in paper form lead to a lack of contact between the implementers and researchers. Therefore, it would be desirable to use other means, such as in-depth interviews and/or focus groups to collect qualitative data.
Reviews of the literature show that there is a remarkable surge of interest in using focus groups in program evaluation in western countries. For example, Nabors and colleagues [10,11] used focus groups for an assessment of program needs, strengths, and weaknesses, and to gain ideas for future program development. However, little has been documented about the use of focus groups in program evaluation in the Asian context. Twinn [12] criticized that "focus groups appear to have been used quite extensively with populations of black and Hispanic ethnic origins" (page 655) because this methodology has been originally developed for Anglo-Celtic populations [12].
The focus group method has been used successfully to assess client satisfaction and quality assurance in a variety of fields. It has also become a popular method in program evaluation in many research contexts, such as health settings [13,14]. Focus groups offer many potential advantages, such as being cost and time effective in collecting information. Morgan [15] noted that a focus group of eight people may generate more ideas than eight individual interviews. Clearly, the strength of the focus group method is that it brings clients together to discuss their perceptions about the services that they have received. This allows for interaction between group members, which stimulates thoughts and recall of experiences.
Focus groups can be particularly helpful for the discovery of service problems and suggestions for fixing those problems [16]. Moreover, the data drawn from focus group interviews can be used to compare data gathered from other research methods, that is, to use focus groups for triangulation [17]. Along the same line, Conners and Franklin [18] provide a strong argument for the use of a qualitative methodology. They stressed that qualitative methodologies may address some concerns about surveys that result in inflated satisfaction scores, as clients are more critical when qualitative methodologies are used, and they have more freedom to express their concerns about all aspects of care in a way that is impossible with many studies. Therefore, qualitative methods are invaluable in providing depth to the exploration of people satisfaction that is not possible with quantitative surveys. As Merriam [19] stressed, "the product of a qualitative study is richly descriptive" (page 8). As such, qualitative evaluation via focus groups is an important strategy to capture the views of the program implementers.
In the Project P.A.T.H.S. (Positive Adolescent Training through Holistic Social Programmes), the Tier 1 Program is a universal positive youth development program provided for secondary 1 to 3 students in Hong Kong. There were 52 schools that joined the experimental implementation phase (2005)(2006)(2007)(2008) and more than 200 schools that joined the full implementation phase (2006)(2007)(2008)(2009). Several studies have already documented the positive program effects based on the students' objective and subjective outcomes collected from survey questionnaires [20][21][22]. Qualitative evaluation has also been conducted in order to understand the program effects of the Project P.A.T.H.S. in Hong Kong based on the perspective of the program participants [23,24]. The related findings were integrated and presented in another paper by Shek and Sun in this special issue. On the other hand, qualitative evaluation based on focus group methodology has been carried out in order to understand the views of the program implementers [25,26]. Again, it is illuminating if an integration of the existing qualitative studies based on the program implementers can be carried out. Thus, the present study attempted to integrate the existing qualitative evaluation findings based on the perspective of the program implementers in the experimental and full implementation phases of the Project P.A.T.H.S. in Hong Kong.  the core program. Thirty six focus groups consisting of 138 teachers and 39 social workers in total were conducted. The average number of classes per school was 4.83 (range: 3-6), and the average number of respondents per school was 5.11 (range: [1][2][3][4][5][6][7][8][9][10][11][12][13][14]. The characteristics of the schools joining this process evaluation study can be seen in Table 1.

Methods
As data collection and analyses in qualitative research are very labor intensive, it is the usual practice that small samples are used. In the present context, the number of focus groups and instructor participants can be regarded as respectable. In addition, the strategy of randomly selecting informants and schools that joined the Tier 1 Program can help to enhance the generalizability of the findings. An interview guide (Table 2) was used for conducting focus group interviews with instructors. The interview questions were designed with reference to the CIPP (context, input, process, product) model and previous research [25,26].
A total of 36 focus groups designed to elicit implementers' perceptions of the Project P.A.T.H.S. were conducted. All focus group interviews were jointly conducted by two trained colleagues. During the interviews, the respondents were encouraged to verbalize their views and perceptions of the program. In the interviews, the interviewers adopted the role of facilitators and were conscious of being open to accommodate both positive and negative experiences expressed by the informants. As the interviewers either had training in social group work and/or substantial group work experience, they were conscious of the importance of encouraging the informants to express views of a different nature, including both positive and negative views. The interviews were audio recorded, with the respondents' consent. The audio recordings were then fully transcribed and checked for accuracy.
The data were analyzed by two trained research assistants. After initial coding, the positivity nature of the codes was determined, with four possibilities (positive code, negative code, neutral code, and undecided code). The coding and categorization were further cross-checked by another trained research assistant. To enhance the reliability of the coding on the positivity nature of the raw codes, both intra and interrater reliability were carried out. For intrarater reliability, two research assistants who had been involved in the coding individually coded 20 randomly selected responses for each question. For interrater reliability, another two research assistants who had not been involved in the data collection and analyses coded 20 randomly selected responses for each question without knowing the original codes given at the end of the scoring process with reference to the finalized codes.
In qualitative research, it is important to consider ideological biases and preoccupations of the researchers. As program developers, the author might have the preoccupation that the implemented program was good and it was beneficial to the students. Additionally, the researchers might have the tendency to focus on positive evidence rather than negative evidence. Thus, several safeguards against the subtle influence of such ideological biases and preoccupations were included in the present study. To begin with, the researchers were conscious of the existence of ideological preoccupations (e.g., positive youth development programs are beneficial to adolescents) and conducted data collection and analyses in a disciplined manner. Second, both inter and intrarater reliability checks on the coding were carried out. Third, multiple researchers and research assistants were involved in the data collection and analysis processes. Fourth, the author was conscious of the importance and development of audit trails. The audio files, transcriptions, and steps involved in 4 The Scientific World Journal  (i) Do you think that the program can promote students' self-confidence/ability to face the future? (ii) Do you think that the program can enhance students' abilities in different areas?

Optional Questions
(iii) Do you think that the program can enhance students' spirituality aspect? (iv) Do you think that the program can promote the students' bonding with family, teachers, and friends? (v) Do you think that the program can establish students' compassion and care for others? (vi) Do you think that the program can promote students' participation and care for society? (vii) Do you think that the program can promote students' sense of responsibility to society, family, teachers, and peers? the development of the coding system were properly documented and systematically organized.

Results
In this paper, qualitative findings on the following three areas are presented: (1) descriptors that were used by the informants to describe the program, (2) metaphors (i.e., incidents, objects, or feelings) that were used by the informants to depict the program, and (3) implementers' perceptions of the benefits of the program to students. For the descriptors used by the informants to describe the program, there were 270 raw descriptors that could be further categorized into 133 categories (Table 3). Among these descriptors, 169 (62.6%) were coded as positive and 7% were classified as neutral in nature. In order to examine the reliability of the coding, two research assistants who did the coding of raw data recoded 20 randomly selected raw descriptors at the end of the scoring process, and the average intrarater agreement percentage calculated on the positivity of the coding from these descriptors was 92% (range: 80-100%). Finally, these 20 randomly selected descriptors were coded by another two research staff members who did not know the original codes given, and the average interrater agreement percentage calculated on the positivity of the coding was 88.5% (range: 80-95%).
For the metaphors that were used by the informants that could stand for the program, there were 72 raw objects involving 128 related attributes (Table 4). Results showed that 40 metaphors (55.6%) and 65 related attributes (50.8%) were classified as positive in nature, while 26 metaphors (36.1%) and 47 related attributes (36.7%) were regarded as neutral responses. Reliability tests showed that the average intrarater agreement percentage calculated on the positivity of the coding from these metaphors was 89% (range: 80-100%), whereas the average interrater agreement percentage calculated on the positivity of the coding was 91% (range: 80-100%).
The perceived benefits of the program to the program participants are shown in Table 5. There were 518 meaningful responses decoded from the raw data that could be categorized into several levels, which are benefits at the societal level, familial level, interpersonal level, personal level, general benefits, and benefits to instructors. The findings showed that 404 responses (78%) were coded as positive responses and 64 responses (12.36%) were counted as neutral responses. In order to examine the reliability of coding, the research assistants recoded 20 randomly selected responses, with knowledge of the original codes given at the end of the scoring process. The average intrarater agreement percentage calculated from these responses was 91.5% (range: 85-97.5%). The raw benefit categories were coded again by another two research staff members who did not know the original codes given. The average interrater agreement percentage calculated from these responses was 89.5% (range: 85-92.5%).

Discussion
As Donnermeyer and Wurschmidt [27] pointed out, implementers' "level of enthusiasm and support for a prevention curriculum influences their effectiveness because their attitudes are communicated both explicitly and subtly to students during the time it is taught and throughout the remainder of the school day" (page 259-260). Therefore, understanding their views is very important. The purpose of this study was to evaluate the Tier 1 Program of the Project P.A.T.H.S. using findings based on focus groups involving program implementers in the experimental and full implementation phases (2005-2009) of the project. There are several characteristics of this study. First, a large sample of participants (n = 177 in 36 focus groups) participated in the study. Second, different datasets collected at different points of time were included in this integrative study. Third, implementers of the program in different grades were invited to participate in the study. Fourth, this is the first known scientific study of focus group evaluation of a positive youth development program based on program implementers in China. Finally, this is also the first focus group evaluation study based on such a large sample of program implementers in the global context.
Based on the integrative analyses, two salient observations can be highlighted from the findings collected from different cohorts of students. First, the program was perceived positively from the perspective of the program implementers (Tables 3 and 4). The program implementers generally used positive descriptors and metaphors to describe the program. Although some implementers perceived the program in a negative light, this is not the dominant view. Second, results in Table 5 show that the program had a beneficial effect on the participants, with 78% of the responses coded as positive. Generally speaking, benefits in both the personal and interpersonal levels were observed. The above observations are generally consistent with the qualitative evaluation findings based on the program participants reported by Shek and Sun in this special issue. In short, different stakeholders had 6 The Scientific World Journal   The Scientific World Journal 9   There is a growing trend for using focus group methodology in order to understand the views of stakeholders in the field of evaluation, and the number of qualitative evaluation studies is increasing in the field. For example, Chen et al. [28] employed different evaluation methods (including qualitative evaluation) and pointed out that there were several limitations in employing participatory evaluation with at-risk youth. Mahoney et al. [29] used qualitative methodology to evaluate a tobacco prevention program among 5th grade students using impressions from classroom teachers and program presenters. Pedersen et al. [30] examined relationship quality in a community mentoring program via qualitative methodology. O'Rourke and Key [31] evaluated a school-based youth development peer group with integrated medical care using focus groups. Scheer and Gavazzi [32] used focus groups to evaluate the program "Families and Systems Teams Initiative." In line with the above examples, the present study demonstrates the value of focus group methodology in evaluation contexts.
In qualitative studies, it is important to examine alternative explanations [33]. The first alternative explanation is that the positive findings are a result of demand characteristics. However, this explanation is not likely because the informants were encouraged to voice their views without restriction and negative voices were, in fact, heard. In addition, there is no reason to believe that the participants acted favorably to please the researchers. The second alternative explanation is that the findings are due to selection bias. However, this argument cannot stand as the schools and program implementers were randomly selected. The third alternative explanation is that the positive findings are due to ideological biases of the researchers. As several safeguards were used     Introduced personal development education into education system 1 1

12
The Scientific World Journal  Promoting schools' concern on student development 1 1 The Scientific World Journal 13 to reduce bias in the data collection and analysis process, including calculation of intra and interrater reliability, this possibility is not high. Finally, it may be argued that the perceived benefits are due to other youth enhancement programs. However, this argument can be partially dismissed because none of the schools in this study participated in the major youth enhancement programs in Hong Kong, including the Adolescent Health Project and Understanding the Adolescent Project. In addition, participants in the focus group interviews were specifically asked only about the program effects of the P.A.T.H.S. Project. There are several limitations of the study. First, although the number of schools and workers participating in the study can be regarded as on the high side according to the common practice in mainstream qualitative evaluation studies, it would be helpful if more schools and workers could be recruited. Second, besides one-shot focus group interviews, regular and ongoing qualitative evaluation data could be collected. Third, although focus group interview data were collected, inclusion of other qualitative evaluation strategies, such as in-depth individual interviews, would be helpful in order to further understand the subjective experiences of the program implementers. Despite the above limitations, the present qualitative findings based on the experiences of program implementers showed that the respondents had positive perceptions of the program and implementers, and they perceived benefits of the programs throughout the years.