Analysis of Risk Factors in Global Software Development: A Cross-Continental Study Using Modified Firefly Algorithm

In today's competitive world, software organizations are moving towards global software development (GSD). This became even more significant in times such as COVID-19 pandemic, where team members residing in different geographical locations and from different cultures had to work from home to carry on their tasks and responsibilities as travelling was restricted. These teams are distributed in nature and work on the same set of goals and objectives. Some of the key challenges which software practitioners face in GSD environment are cultural differences, communication issues, use of different software models, temporal and spatial distance, and risk factors. Risks can be considered as a biggest challenge of other challenges, but not many researchers have addressed risks related to time, cost, and resources. In this research paper, a comprehensive analysis of software project risk factors in GSD environment has been performed. Based on the literature review, 54 risk factors were identified in the context of software development. These were further classified by practitioners into three dimensions, i.e., time, cost, and resource. A Pareto analysis has been performed to discover the most important risk factors, which could have bad impact on software projects. Furthermore, a modified firefly algorithm has been designed and implemented to evaluate and prioritize the pertinent risk factors obtained after the Pareto analysis. All important risks have been prioritized according to the fitness values of individual risks. The top three risks are “failure to provide resources,” “cultural differences of participants,” and “inadequately trained development team members.”


Introduction
In the last 2 decades, the world has changed significantly [1]. e exponential advancement in technology has resulted in exchanging information among peers more efficiently and effectively [2]. In the past, we had to walk our way to meet or to have a conversation with someone, but now we can communicate easily using mobile devices. All of this is not a result of high-rise structures, instead a result of technological advancement. Moreover, the field of software development also witnessed a massive and rapid change around the world to embrace the needs of their clients. In order to have more advantages, some software firms have moved from co-located environment to distributed environment [3]. In the last decade, there has been a rise in trend among the software firms to move towards distributed software development, in order to find low-cost and skilled resources [4]. As a result, software development has become diverse, multisite, and globally distributed, and this is also called global software development (GSD). However, software professionals also face some challenges, such as social and cultural diversity in globally distributed team while performing some tasks [5].
GSD is also known as offshore software development or outsourced software development. From the past two decades, software outsourcing which is a corporate-level strategy has been adopted by numerous firms [6]. e software outsourcing model is used in order to produce high-quality software at a low cost [7,8]. But it is rather easier said than done when it comes to adopting GSD for software projects due to a number of barriers [9]. Globalization, as a result of technological advancement, results in cultural heterogeneity and diversity [10,11]. People, businesses, and various organizations are investing a lot of capital in order to understand and overcome barriers that comes with cross-cultural teams.
e barriers of GSD, if catered properly, can ensure timely and successful implementation of software projects [12].
GSD is a contemporary model. In GSD, developers within a team are spread cross-borders. e team members keep exchanging information and work together even after being in diverse time zones and organizational boundaries [13]. Although it becomes very difficult for the team members to work in a GSD setting, it has got acceptance by the industry due to restricted travel freedom as well as increasing travelling cost. Cheap skilled labour, better productivity, work efficiency, economic benefit, etc., are some of the key benefits of GSD [3,14,15]. Keeping aside all these benefits, people working in a GSD environment face many difficulties such as lack of communication, strategic issues, project management issues, and cross-cultural backgrounds of the team [16][17][18]. Various issues related to GSD are depicted in Figure 1.
We can divide the GSD projects into 2 categories, offshore and onshore. e reason for the failure of offshore projects is physical time constraints and cultural differences. Not just time and culture but communication gap is also a major issue faced by the offshore and onshore teams [20,21]. is can result into less productivity, poor project quality, and decreased efficiency [22,23]. erefore, in order to harness the advantages of GSD, it is imperative to look into the risk factors that come with it and mitigate those risks before starting any project that involves distributed teams [19,[22][23][24].
ere are various risks associated with GSD projects. If the team is located in different countries around the world or in different regions globally, they can face obstacles such as geographical risk, language barrier, and even weather conditions [25,26].
Majority of the software organizations are at risk in the GSD environment. ey tend to reduce the risks using standard risk management tools. However, they realize that these tools are not competent enough to cater to the crucial and critical characteristics of GSD. erefore, this research aims at identifying and prioritizing the most pertinent risk factors for GSD. For this purpose, we have employed modified firefly algorithm (MFA). Firefly algorithm (FA) is a machine learning (ML) technique which is getting popular these days due to their ability to deal with unstructured data [27]. Simple firefly algorithm (FA) does not provide any way to validate its results i.e., fitness scores. erefore, MFA has been designed and implemented that calculates the variance of all fitness values of risks with respect to time, cost, and resource to make sure that fitness values obtained are reliable.
is paper comprises 7 sections. Section 1 gives the introduction to this research. Literature review is presented in Section 2. Research methodology is elaborated in Section 3. Section 4 provides the results and discussion. Research implications are discussed in Section 5. Section 6 highlights limitation and future research directions. e last section concludes this research study.

Literature Review
We can define risk as the possibility of having a negative or positive effect on an occurring event [28]. Management strategies have many critical functions, one of which is known as risk management. It looks into the loopholes of the system, by the internal control mechanism, which has tested procedures and practices to manage the loopholes. It also helps to identify, analyze, evaluate, inspect, and handle the risk [29,30]. Barriers [31]. To look for any undesired or unexpected errors in a project, a well-planned risk management strategy is always needed by SME [32]. ere are five risk management steps in risk management process [29,30,33,34] (see Figure 2): Step 1: Identify the risk. e task of the team is to highlight risks that might affect the project, for which various techniques are used, out of which the first is to maintain a project risk register.
Step 2: Classify the risk. Different risks are grouped together according to their estimated cost or likely impact and probability of occurrence. For example, credit risk, is classified according to the likelihood of the collection of repayments from the debtor.
Step 3: Analyze the risk. After identification of risk, next important step is to analyze the consequence of each risk, where nature of risk and its capacity to affect project result are determined. is information is also fed into the project risk register.
Step 4: Control the risk. After risk analysis, risk control takes place. It is the method by which software firms evaluate risks and take action to mitigate or eliminate such risks or threats. is is known as the risk control hierarchy, i.e., eliminating the hazard is the most effective control, which must always be aimed at.
Step 5: Review risk control. It is to ensure that the control measures that have been implemented are effective and efficient. It must be reviewed and revised to make sure that they work as planned to determine if any remedial action needs to be taken immediately.

Relevant Work.
e authors of [35] applied FA to optimize the established parameters of varying estimation models. ey used it in comparison to other metaheuristic instructions like genetic algorithm (GA) and particle swarm optimization (PSO) algorithm. Models named previously are the variations constructive cost model also known as COCOMO. Authors claim that their experimental results show that FA is more precise and causes decrease in error over the other the GA and PSO.
In [36], researchers proposed a hybrid software faultprediction (SFP) model that was constructed using FA and artificial neural network (ANN), along with an observational differentiation of GA and PSO grounded evolutionary techniques. From the PROMISE repository, they took seven different faulty data sets to perform their studies. Based on these data, they claim that the results are showing that the FA-ANN model outperformed GA and PSO ANN faultprediction models. e authors concluded that (FA-ANN)based model does not cause any as such hindrance as shown by other models and proved to be statistically significant. On the other side, this proposed model reduces the software cost and enhances the final product quality.
In [37], the authors made some alterations in the FA to the portfolio optimization problem that gave them a satisfactory exploitation/exploration balance in the portfolio. ey call this an upgraded FA. e authors claim that the enhanced or upgraded algorithm showed to be consistently better than the original, for all portfolio problems. ey made this conclusion after comparing their upgraded FA algorithm with five previous results of optimization metaheuristics from the publications. ey are confident that the upgraded firefly algorithm is by far better than previous measurements of required performance indicators. e authors of [38] argue that among many of the effortprediction models available, making the choice can be a hard for the project managers. eir paper researches the possibility to improve the accuracy of software cost estimations. ey accomplish this by using a FA with the ANN models used for cost predictions. We are talking about cost estimation as compared to the PSO. ey used functional link ANN models with radial basis function network. ey argue based on their results that ANN models are better for data processing when incorporated with the FA in addition to the intuitionistic fuzzy C-means.

Project Time, Cost, and Resource Risk Dimensions.
e risks in software development projects can be categorized into time, cost, and resource or a combination of these by examining risk sources [39]. Project time, cost, and resource are the main concerns of project management that may negatively influence one of more aspects of project performance in the GSD environment (see Figure 3).

Time-Related Risk
Factors. It will not be an exaggeration to say that "the time is what defines the success of any project." e project managers face 2 kinds of challenges when it comes to time-related risk factors: (1) number of adjustments needed during the project's execution; (2) time spent on unessential activities [40]. How to deal with these two challenges in a simple way is by having a well-defined project plan and timeline beforehand and then following it. Having a well-defined project plan is needed for an effective time management, and project managers should make such project plan and timeline to reduce the above-mentioned two risk factors.

Cost-Related Risk Factors.
e cost is another constraint in addition to time which can be easily measured. Just like the timeline of the project, the cost structure for the project also has to be estimated beforehand as accurately as possible [41]. Cost is simply the amount of money that should be invested in a particular project to finish it. It is also known as budget for the project. Knowing the cost structure beforehand gives one a baseline against which one can measure and monitor project's actual cost while the project is in progress.
is allows the project manager to avoid facing any surprise costs which can pop up during the project.

Resource-Related Risk Factors.
e resources in a project are of two kinds: one is the human and the other is material. e project managers should take the availability of both into consideration. is constraint is greatly dependent upon the cost structure: e more money one has, the more material resources one can buy and higher quality expertise can be hired as well. Of course, money cannot solve the problem of availability and accessibility in the market; hence, a project manager should keep such challenges in mind when figuring out the timeline and the cost structure.

Software Development Risks.
A sophisticated and structured literature review was conducted in view of the risks faced during software development, management, and assessment of risks. Survey of the literature review has resulted in the identification of fifty-four probable risk factors related to software development industry. Identified risk factors along with the references are shown in Table 1.

Research Methodology
For the attainment of the objectives of the research and the analysis of the pertinent risk factors which are related to the GSD environment, a systematic research methodology is followed (see Figure 4).

Research Design.
is research will employ experimental as well as simulation-based research design. First, an exhaustive literature review has been conducted to identify the GSD risks related to project cost, time, and resources. Later, three hundred forty-two large-and medium-sized software houses from the US, Pakistan, and Australia had been shortlisted through convenience sampling technique to collect the data: Step 1: identification of software development risks In the first step, fifty-four risk factors relevant to the software project in a software development (SD) environment were identified after extensive literature review. Search keywords including but not limited to "risk management in SD," "software risk management using ML," "project management risks," and "risk assessment in distributed projects" were used to search databases such as the Google Scholar, Science Direct, and Web of Science.
Step 2: shortlisting of risks by practitioners In the next step, the list of fifty-four risk factors was given to industry experts working in GSD environment to further remove duplicate risk factors and finalize the risk factors relevant to GSD. is resulted in a reduced list of twenty-six risks factors related to cost, time, and resources that can affect GSD projects negatively (see Table 2). 6 industry experts gave their feedback at this stage, and their short profiles are presented (see Table 3).
Step 3: questionnaire development Once the practitioners have shortlisted risk factors, a survey questionnaire was developed which was mapped to all twenty-six risk factors relevant to GSD.
Step 4: data collection In this step, data were collected by sending the questionnaire to seven hundred sixty large-and mediumsized software houses based in the US, Pakistan, and Australia. Project managers, team leaders, system analysts, and business analysts are the respondents of this research, whose active participation concluded this research.
Step 5: the most important risk identification using Pareto analysis In this step, a Pareto analysis has been performed to summarize experts' opinions and recognize the important risk factors with respect to time, cost, and resource in the GSD environment. Pareto chart is an industry benchmark. Used not just to pin point the major areas of concern, it also aids management and other decision makers in achieving the solution [48]. Both the bar graph and a line graph are the components that make a Pareto chart. Attributes that are under consideration are represented by the bar. A bar represents risk factors that were identified through the literature survey. e line represents the cumulative percentage of the attributes. In our scenario, the line Inappropriate leadership and control [43] Computational Intelligence and Neuroscience 5 represents the frequency of expert opinion. e bar in a Pareto chart is always displayed in the descending order, which results in the ease to spot the most common attributes. It highlights the most important risk factors of the software industry in our scenario.
Step 6: implementation of MFA to prioritize risk FA does not provide any way to validate fitness values; therefore, MFA was used to calculate variance of all fitness values of risk with respect to time, cost, and resource to make sure that fitness values obtained are reliable. Figure 5 shows the framework of MFA.
We initialize the fireflies' population by considering (1).Risk identification will be based on initial population (data) that will be generated through questionnaires.
where x is the firefly position in the iteration, β 0 e − yr 2 is attraction between fireflies, and αε defines randomization and vector of random numbers. Fitness values of risks related to project time, cost, and resource will be evaluated from objective function. Risk classification will be done by calculating variance of time, cost, and resource risks and combined variance will also be calculated.
Here, FR is the fitness value of risk, LVL defines likely and very likely, ULVUL defines unlikely and very unlikely, TN defines total neutral, and n defines total no. of responses. Step-1 Step-2 Step-3 Step-4 Step-5 Step-6  Computational Intelligence and Neuroscience Risk analysis will be based on fitness values of risks which will be calculated using where x is the firefly position in the iteration, β 0 e − yr 2 (xj − xi) is attraction between firefly j and i, and αε defines randomization and vector of random numbers. Risk reduction will be performed by ranking individual risks to prioritize the most important risks.

Data Collection Procedure.
Various risks relevant to GSD were assessed by developing a survey questionnaire. e questionnaire had a total of thirty-three questions, out of which 18 questions were addressing 3 categories of GSD risks, namely, time, cost, and resource. e remaining 15 questions were general and open-ended. e survey questions were closed-ended and scored with a 5-point Likert scale from very unlikely to very likely. e questionnaire was circulated to more than seven hundred fifty medium and large software companies based in Australia, Pakistan, and the USA. A total of four hundred sixty responses were received. One hundred eighteen responses were rejected due to missing information. So, a total of three hundred forty-two valid responses were left for analysis. For sample data set, see Tables 4 and 5. Later, a Pareto analysis was carried out to find out the most pertinent risk factors.

Results and Discussion
is section will first provide numerical illustration of our proposed methodology and secondly will discuss the results.

Numerical Illustration.
Our model was applied to software houses from the US, Australia, and Pakistan. In order to achieve the objectives of this study and to analyze the pertinent risk factors associated within GSD, we will use the integrated Pareto and MFA. e reason of using integrated Pareto-MFA is Pareto analysis will help in data reduction, i.e., reducing by short listing the most pertinent risk  Computational Intelligence and Neuroscience factors while MFA will enable us to rank them. e proposed method's application is divided into two phases as follows.
Phase 1. Identification of critical risks using Pareto analysis In this phase, the data collected from the survey as mentioned in section "Data Collection Procedure" for the shortlisted twenty-six risks were used for conducting the Pareto analysis. Pareto analysis revealed that among the twenty-six shortlisted risks, 7 risk factors are responsible for 80% of the project risk within the GSD. ese risk factors are "failure to provide resources," "cultural differences of participants," "inadequately trained development team members," "inappropriate task timings," "cost overruns," "inadequate technical resources," and "lack of balance on the project team." For Pareto analysis result, see Figure 6.
Step 2: in this step, MFA was applied on the survey data of the most pertinent risk factors using the objective function given as equation (2) in section "Implementation of Modified Firefly Algorithm to Prioritize Risk". e fitness scores of each of the risk factors will be used to further calculate the variance in order to evaluate that the resulting scores are consistent. In this research study, the variance was less than 0.01, and therefore, it depicts that our fitness values are consistent and have no outliers. For final fitness scores, variance, and ranking of pertinent risk factors, see Table 7.

Discussion.
In this research, we performed comprehensive literature review to identify possible risk factors under 3 dimensions of GSD risks which are time, cost, and resource. en, professionals were requested to verify the relevance of risk factors and map risk factors to each dimension as well as merge duplicate risks. Moreover, we conducted Pareto analysis to identify the most pertinent risk factors for the GSD projects. And finally, we employed MFA to rank the most pertinent risk factors. Our study has revealed that "failure to provide resources (R3)" is the most critical risk factor for GSD projects on first rank. is risk indicates that one of the biggest risks in any GSD project is the nonavailability of the required resources. e next ranked risk factor in our analysis is "cultural   Computational Intelligence and Neuroscience differences of participants can cause problems like rework, loss of data, confusions, etc., (R19)." Its second position signifies that this GSD risk factor needs serious attention too and has to be treated and taken care of in order to successfully implement GSD projects. When team members are from diverse culture and backgrounds, it becomes a great challenge to have understanding and harmony among all team members [18]. Having good collaboration among project team members is imperative for the smooth implementation of any project in general, and this result shows that it is also a significant risk factor even for GSD projects. e next most important risk factor for GSD project as per our study is "inadequately trained development team members (R18)" which is the third most important risk. As technology is advancing at a rapid pace, continuous training of development team is imperative and it is even more important in case of GSD projects. So, in order for any successful completion of GSD projects, all team members need to be trained periodically so as to mitigate this important risk factor. e fourth most important GSD risk factor as revealed by the study is "inappropriate task timings (R12)" which is about assigning unrealistic deadlines for each task. is risk also needs attention in order to complete the project within the agreed timeline and successfully. "Cost overruns (R7)" comes next in at the fifth position in our analysis of the most important risk factors for GSD. is is such an important risk factor as it can actually derail the whole project as well as the business viability for the software house(s) in the GSD projects. In order to get true financial benefit as well as successful completion of any GSD project, catering to this risk is highly desirable. e sixth and the seventh most important risk factors are "lack of balance on the project team (R17)" and "inadequate technical resources (R23)" which also needs attention of GSD team leads and decision makers in order to successfully implement the GSD projects.

Research Implications
Various theoretical and practical implications can be observed as a result of this research study. From theoretical perspective, this research has done a significant contribution by identifying and analyzing the most pertinent risk factors associated with GSD with respect to time, cost, and resources. From the methodological standpoint, this research is the first to integrate Pareto analysis and MFA for the purpose of risk assessment in general and GSD in particular. is study enabled us to harness the advantages of both these methods as follows: (i) Using Pareto analysis, we were able to identify the risks that creates the most impact on GSD projects (ii) MFA helped us in evaluating the risk factors and get the most reliable and consistent results Talking about the research findings, to the best of our knowledge, this is first study which focused on the risk assessment of GSD in cross-continental environment using Pareto and MFA and the seven most pertinent risk factors have been identified and ranked accordingly, which may be taken care of one by one in a GSD environment. From managerial point of view, this study is a significant contribution.
e findings of this research study may assist practitioners to realize the risk factors involved in GSD in advance and can guide the top management and policy makers to set the proactive, active, and reactive risk mitigation mechanism to overcome these risks and complete their GSD projects successfully.

Limitations and Future Directions
e data have been gathered from Pakistan, Australia, and the USA in this research. To expand or widen the scope, other countries will also be included in future. is will help us to understand what are common trends and what are different trends related to GSD in other countries of the world. e results of this study cannot be generalized as the data are collected using convenience sampling as a sampling procedure. Random sampling will perform better than convenience sampling, and this problem will be resolved because results and finding of the study could be generalized.
GSD environment has risks associated with it, and to distinguish those risks, we use ML techniques. In future, other nature-inspired algorithms such as genetic algorithm, particle swarm optimization, and lion optimization algorithms can be used to rank the risks. Moreover, multicriteria decision-making techniques can also be used for ranking the identified risk factors of GSD.

Conclusion
It is not simple to create or maintain a GSD environment in the field of software engineering. In GSD, distributed software teams are facing many challenges which should be recognized earlier in the development process. Good risk management practices must be incorporated in distributed teams, because you are dealing with practitioners who are from different cultures, time zones, geographical locations, backgrounds, and past project experiences. ML algorithms or techniques give more practical approach than conventional techniques to address risk management. In this study, a comprehensive analysis of software project risk factors in GSD environment has been accomplished. Fifty-four software development project risks factors have been identified from the literature, and these are further shortlisted to twenty-six risks by software practitioners and classified into 3 dimensions: time, cost, and resource. A Pareto analysis that was performed revealed that 7 risk out of twenty-six shortlisted risks are the most important risk factors that could have bad impact on software projects, with respect to project time, cost, and resource in GSD environment. Furthermore, the MFA has been designed and implemented to evaluate and prioritize the pertinent risk factors obtained after the Pareto analysis. All the important risks have been prioritized according to the fitness value of the individual risks.
Data Availability e survey data set used to support the results and findings of this study is not been made accessible because the privacy of the data set must be maintained due to PhD studies. However, a sample data set is fused in the paper.

Conflicts of Interest
e authors declare no conflicts of interest.

Authors' Contributions
Asim Iftikhar wrote the manuscript with support from the co-authors, performed literature review, performed data collection, and implemented algorithm to get results. Syed Mubashir Ali contributed to literature review, methodology, algorithm implementation, and data analysis. Muhammad Alam carried out conceptualization, supervision, methodology, review, and editing. Shahrulniza Musa performed supervision and conceptualization. Mazliham Mohd Su'ud: Resources and supervision.