Understanding Contributor to Developer Turnover Patterns in OSS Projects: A Case Study of Apache Projects

the


Introduction and Motivation
Open source software (OSS) is a good example of global software development.It has gained a lot of attraction from the public and the software engineering community over the past decade.The success of an OSS project is highly dependent on the infrastructure provided by the community to the developers and users in order to collaborate with each other [1].It is important to understand how the OSS project and the community surrounding it evolve over time.During the project and community evolution, the roles of the members change significantly, depending on how much the member wants to get involved into the community.Unlike a project member in a software company whose role is determined by a project manager and remains unchanged for a long period of time until the member is promoted or leaves, the role in an OSS project is not preassigned and is assumed by a member as he/she interacts with other members.An active and determined member usually becomes a "core member" through the following path: a newcomer starts as a "reader", reading messages on the mailing lists, going through the wiki pages and other documentation, and so forth, in order to understand how the system works.Later, he starts to discover and report bugs, which does not require any technical knowledge, and becomes a "bug reporter".After gaining good understanding of the system and community, he may start fixing small and easy bugs which he identifies himself or are reported by other members of the system, hence playing the role of either a "bug fixer, " "peripheral developer, " or an "active developer." To this stage, his bug fixes are usually accepted through patches submitted on the mailing lists or bug tracking system.Finally, after some important contributions are accepted by the core developers, the member may obtain the right of committing source code directly to the source control repository, hence becoming the "core member" of the project.This process is also called "joining script" [2], also referred to as "immigration process" [3].The general layered structure of OSS communities as discussed above is further depicted in Figure 1, in which the role closer to the center has a larger radius of influence on the system.
The figure depicts an ideal model of role change in the OSS community.However, not all members want to be or become the "core member." Some remain "passive user" and some stop somewhere in the middle.The key point is that OSS makes it possible for an aspiring and determined developer to be part of the "core members" group of developers through continuous contributions.On the other hand, the sustainability of an OSS project is related to the growth of the developer community.The community surrounding an OSS must regenerate itself through the contributions of their members and continuous emergence of new "core members" otherwise the project is going to stop or fail.An example is the GIMP project (http://www.gimp.org/)[5], which started as an academic project.When the creators left the university and decided to work on something else, the project stopped for more than a year until someone else decided to take over the control and resume working on the project.Therefore, attracting or integrating new members is an important aspect to keep the system and the community evolve over time.
Given these precedents, the research goal of the study presented in this paper is to understand the pattern of contributions made by members who eventually attained the role of a developer in an OSS project, that is, joining the "core members" group of developers.We are interested to investigate the key factors which led members towards attaining the role of a developer.We studied the immigration process in OSS projects as done in the past by others but using a quantitative approach based on extensive data mining.The contribution of this paper is manifolds: we study the contributions made by the members in terms of bugs reported, comments on different bugs, attachments or source-code patch submissions to fix certain bugs, and social relation with other members on the mailing list in a particular OSS community/project.Further, we analyze the contributions made by members before and after attaining the role of a developer.Moreover, we compute the ratio of average contributions made by a developer (before attaining the role of a developer) and compare it with the average contributions made by other members of the project.
The rest of the paper is structured as follows: the related work comparable to our approach will be discussed in Section 2. Research questions are outlined in Section 3. In Section 4, the methodology we used to extract information from different software repositories is described.Section 5 presents the results based on the research questions, and finally, in Section 6, we conclude our work.

Related Work
The process of joining an OSS project has been studied by many researchers in the past.In this line, the best known model which describes the organizational structure of an OSS project is the "onion model" [10] (cf. Figure 1), a visual analogy which depicts how the members of an OSS project are positioned within a community.The onion-like structure represents only a static picture of the project, lacking the time dimension which is required to study the role transformation (i.e., promotion) from being a passive user to the core member of the project.Ye et al. complemented this shortcoming with a more theoretical identification and description of roles [5].According to this model, a core member is supposed to go through all the roles, starting as a passive user, until he/she attains the role of a core member.In this regard, Jensen and Scacchi also studied and modeled the process of role migration in OSS projects [11], focusing on end users who eventually become core members.They identified different paths for the joining process and concluded that the organizational structure of studied OSS projects is highly dynamic in nature.
Von Krogh et al. studied the joining and specialization process of FreeNet project [2].Based on the data gathered from publicly available documents, mailing list archives, and the source control repository, they discovered that offering bug fixes is much common among newcomers who eventually become core members of the project.They also found that a certain period of time, ranging from couple of weeks to several months, was required before a newcomer could contribute to a technical discussion.There also exist few research studies which have reported and quantified the onion-like structure of a community for many OSS projects.For example, Mockus et al. [12] studied the Apache httpd server and Mozilla web browser projects and Dinh-Trong and Bieman [13] studied the FreeBSD project.According to their findings, the "core members" group is composed of small number of members.Surrounding the "core members" group is a large group of contributors (i.e., active developers, peripheral developers, etc.) who submit bug reports, offer bug fixes and participate heavily in discussions on the mailing lists.
In an ethnographic study, Ducheneaut studied the Python project in order to investigate the contribution of the members during their role transition from being a newcomer towards attaining the role of a core member by taking into account data from mailing lists and source control repository [14].He found that prior technical commitment and good social standing in the community were strong factors in joining the core members group of developers having writeaccess to the source control repository.Bird et al. [3] used the mailing lists and source control repository to investigate the time required for members to be invited into the "core members" group of an OSS project.They applied hazard rate analysis or survival analysis [15] to model the timedependent phenomena such as employment duration.They used survival analysis to understand which factors influence the duration and occurrence of such events and to what degree.They modeled the duration between activities by considering the first appearance of a member on the mailing list to the first commit on the source control repository.One of their findings was that prior patch submission had a strong effect on becoming part of the "core members" group of a project.Herraiz et al. [16] studied the GNOME project and found two different patterns of joining the project: (1) volunteers/contributors who follow the "onion model", and (2) firm/organization sponsored developers who do not.They found that hired developers gain knowledge quickly enough to start writing code than the volunteers.
Although these research studies were carried out in detail on different OSS projects, they considered data only from mailing lists and source control repositories.However, we also take into account bug repositories and quantify the contributions made by members in terms of the following bugs reported, comments on bugs, social relation with other members based on comments, social relation with other members based on email exchanges on the mailing list, and patch submissions on bug repositories.In addition to that, there is no published work known to us which studies the contributions made by a member before and after attaining the role of a developer in an OSS project.Therefore, we have quantified and analyzed the average rate of contributions made by a member before and after attaining the role of a developer which makes this work unique in contrast to other related pieces of work which have been done so far in this area.

Research Questions
As mentioned earlier, the success of an OSS project is in its long-term survival which is potentially due to the existence of a community surrounding the project.We are particularly interested to identify the role of a community in the longterm survival of an OSS project as well as the key factors which promote a nondeveloper (In this paper, we will use the term "nondeveloper" to refer to all those members who do not have write-access to the source control repository.) to the role of a developer (In this paper, we will use the term "developer" to refer to all those members who have write-access to the source control repository.).Further, we are interested to know if the potential developers (In this paper, we will use the term "potential developer"' to refer to all those members who started as a passive user and later attained the role of a developer.)follow the onion model or if there is a sudden integration of developers into the "core members" group of an OSS project.In order to address these key points, we have outlined few research questions in the following which will be addressed using data from publicly available software repositories of few selected Apache projects: (1) RQ-1: What is the ratio of contributions made by the developers and nondevelopers to the project over the period of time?
Previous studies [12,17] on various OSS projects have shown that most part of the source-code development is carried out by the developers of those projects.
We will investigate what are the contributions of nondevelopers if the source-code development is mostly done by the developers of those projects?In particular, we will investigate the contributions of nondevelopers in terms of reporting bugs, commenting on bugs, and exchanging emails.Further, we are interested to investigate the role of nondevelopers in the long-term success and maturity of an OSS project.
(2) RQ-2: What is the ratio of contributions made by a potential developer before and after attaining the role of a developer?
Attaining a higher role comes with more responsibilities and commitments to the project.We will investigate if a potential developer after attaining the role of a developer contributes (except source-code modification or bug fixing) more in contrast to contributing as a nondeveloper.Does the contribution pattern change with the change in role of a potential developer?To be more precise, does his/her contribution to the project in terms of bugs reporting and interaction with the community increase or decrease?We hypothesize that after attaining the role of a developer, he/she will participate actively in technical discussions on the bug tracking systems or on the mailing lists and report bugs effectively.
(3) RQ-3: What is the average rate of contributions made by a potential developer comparing to other members of the project before attaining the role of a developer?
We will investigate if the average contributions made by a potential developer are more than the average contributions made by nondevelopers who were also active during his/her time period.It has been addressed in previous studies [3] that demonstration of technical commitment and social status with other members will positively influence in attaining the role of a developer.We will investigate if a potential developer was more active (i.e., technically skilled and higher social status) than nondevelopers before attaining the role of a developer.
(4) RQ-4: Does a potential developer follow onion model in order to attain the role of a developer?
We will investigate if a potential developer follows the onion model in order to attain the role of a developer, that is, joining the "core members" group of the project.Not every member who is contributing to an OSS project eventually becomes a developer.
It depends on the level of involvement of a member in an OSS project and also on the needs to promote 2 0 0 6 -2 0 1 0 a nondeveloper to the role of a developer.There is no static or standard timeline for a member to join the "core members" group of a project.The time period required to attain the role of a developer varies from project to project and also from member to member.Members often start contributing to a project by participating in the mailing list conversations to get themselves familiar with the project before contributing source-code patches to the project.We will study the appearance of a potential developer on different software repositories by comparing the timestamp value of their first activity on these software repositories in order to validate if he/she actually followed the onion model.

Data Extraction Process
In this section, we describe our data extraction methodology and the Apache projects selected for evaluation.We extracted data from 4 different Apache projects as shown in Table 1.The range of data selected for each project is different because of the difference in the starting date of each project.The reason of choosing these Apache projects is that the repositories of these projects are on the Web and available to be downloaded (i.e., mailing list archives, bugs, subversion logs, etc.).Most Apache projects have at least 3 different mailing lists: user, dev, and commits, but some have more than 3 mailing lists (e.g., announcements, notifications, etc.).For our study, we downloaded only the dev mailing list archives of each Apache project under consideration.The reason is that software developers communicate often with each other on the dev mailing list rather than on any other mailing lists.We developed our own scripts which were used to extract information from mailing list archives in a similar manner to previous research [12,18].For example, each email was processed to extract information like sender name, email address, subject, date, message-id, and reference.The reference field contains message-id(s), if the email is a reply to previous thread(s).We used the reference field information to build a social network correspondence and computed social network measures [19] of all the members of a project.
We retrieved all the bugs (related to the Apache projects we considered for our study) which are publicly available through the Bugzilla and JIRA Web interface (https://issues .apache.org/) and extracted the required information using our custom written scripts.For further details on the information extracted from each email and bug, we refer the readers to [20].We computed the social relation correspondence among members on the bug tracking system based on the bug comments exchanged among themselves.Bird et al. [21] findings indicated the detection and acceptance of sourcecode patches through the mailing list, but we discovered that source-code patches were always attached to the respective bugs on the bug tracking system rather than sending it on the mailing list.Prior research has indicated the importance of offering bug fixing and its acceptance as an influential factor in gaining the developer status [14]; therefore, we have also analyzed how many source-code patches were submitted by the members on bug repositories.
In order to get information from source control repository, we wrote our script (see [22] for details) and extracted necessary information (i.e., log number, date of commit, author id, and files committed).We only considered those subversion logs where a particular source-code file (i.e., " * .java"because Apache projects under consideration are Java-based) was committed.These subversion log files were further analyzed by our script in order to identify if it fixes any bug by looking for patterns, such as, "PR:xxx, " "MNGxxx, " "SOLR-xxx, " "LUCENE-xxx, " and patch acceptance acknowledgements such as "patch provided by xxx", "patch submitted by xxx." On the identification of such patterns, the bugs were queried to retrieve source-code patches associated with those bugs.This would help to identify source-code patches that are accepted by the "core members" group of the project.Further, it allows to identify members who possess strong technical skills required for attaining the role of a developer in the project.Table 2 gives an overview on the raw data sources we extracted from different software repositories of the selected Apache projects based on the methodology described.
The values for Apache Ant show that there were 1,345 source-code patches found for a total of 5,480 bugs reported on the bug tracking system.Further, 6,025 subversion logs were extracted from the source control repository where source-code files (i.e., * .java)were committed and 84,737 emails were extracted from the Apache Ant mailing list archives between 2000 and 2010.

Empirical Analysis
Before we address each of the research questions in detail, we present a high level overview on the development activity of each Apache project under consideration over the period of time in Figure 2.This would give an insight into how much contributions were made each year to a project and the peak development years of a project.For each Apache project under consideration, we computed the number of contributions with respect to the number of people who made those contributions.For example, we computed the number  of distinct bugs reported each year along with the number of distinct reporters who submitted those bug reports (cf. Figure 2).This would make it easier to answer simple questions like: how many bugs were reported and how many members were involved in the bug reporting process during the 2nd year of a project?
Preliminaries: let C be the total number of members (i.e., developers, nondevelopers, etc.) who worked on the project: Let Y be the total number of years of a project under consideration: where y 1 is considered to be the first year of the project, y 2 is considered to be the second year of the project, and so on.Let C be the set of members (i.e., developers, nondevelopers, etc.) who were active in a time period : and let Immig be the immigrants (i.e., potential developers) who started as contributors and later become the developers of the project.We classified only those members as immigrants/potential developers who had an activity in the project (i.e., number of bugs reported, number of bugs commented, number of patches submitted, or number of emails sent) at least 4 months prior to their first commit on the source control repository Let   be the set of developers who have made commits before and during time period y such that   ⊆ C. Let the total number of bugs reported and commented and emails sent by a set of members in a time period y be represented as follows: Contribution bugs (, ) , (8)

RQ-1: What is the ratio of contributions made by the developers and nondevelopers to the project over the period of time?
In order to compute the contributions, we need to distinguish between the developers and nondevelopers of the project.As each subversion log has a time stamp associated to it, we queried all subversion logs from the start date of the project till the last commit date of the year under consideration.Based on this, we get a list of all developers IDs who have contributed to the source control repository till that particular year.For each developer ID, we computed the contributions (i.e., bugs reported, comments on bugs, emails, etc.) made to the project on yearly basis and add up the contributions made by all the developers for each year.Similarly, we computed the contributions made by nondevelopers on yearly basis and add up all their contributions for each year.Later, we plotted the contributions made by the developers and nondevelopers for each year in the form of a chart which is shown in Figure 3. Figure 3 shows the comparison of contributions made by the developers and nondevelopers of each Apache project under consideration.Further, we computed the average rate of contributions made by the developers and nondevelopers as well as the average number of developers and nondevelopers who made those contributions per year, which is shown in Table 3.For example, the number of bugs reported by the nondevelopers in a given period of time y is computed as follows: and the average number of bugs reported by the developers and nondevelopers is computed as follows:  the average participation ratio of developers and nondevelopers is computed as follows: The results in Table 3 show that nondevelopers are highly involved (i.e., contributing more than the developers) in reporting bugs and participating in discussions on the mailing list.One potential reason for this is the existence of a huge community surrounding these Apache projects.Given that discussing/commenting on a bug report requires technical knowledge about the project which is why developers appear to be more active in commenting on the bug reports than nondevelopers, it is quite obvious from Table 3 (also see Figure 3) that nondevelopers play a significant role in the projects under consideration, and hence it is one of the major factors in the long-term survival, success and maturity of these projects over the period of time.
The high ratio of nondevelopers involvement in the project (cf. Figure 3 and Table 3) allows the core members to select or vote for the potential developers to be invited to the "core members" group of the project.
RQ-2: What is the ratio of contributions made by a potential developer before and after attaining the role of a developer?
We are only interested in those developers who did not start contributing directly to the project but instead follows the onion model (cf. Figure 1).In order to select those developers, we retrieved all developers from subversion logs.Later for each developer, we compared his first commit date on the project to his first appearance on any of the project repositories (i.e., first bug reporting date, bug comment date, attachment, or email date) in order to compute the number of days or months before he started to contribute as a developer.Although there is no fixed or standard timeline for attaining the role of a developer in the project, we considered only those developers who had an activity (bug report, bug comment, attachment or email) on the project at least 4 months prior to their first commit on the source control repository of the project.
For each of those selected developers, we queried the contributions made to the project before and after the first commit date of each developer.As the time period of attaining the role of a developer is different for each developer, we computed the average yearly rate of contributions made by a developer before and after attaining the role of a developer.We do not show each individual's contribution to the project due to the privacy issues, and hence we have summarized the aggregated results of each project as shown in Table 4.All the variables (except ) used in our study represent the contribution of potential developers on yearly basis.For each Apache project,  represents the number of potential developers who have attained the role of a developer.The average yearly rate of contributions by a potential developer before and after attaining the role of a developer is calculated as follows: Based on Table 4, we find that the bugs reporting pattern does not change much before and after attaining the role of a developer in Apache Maven and Apache Solr projects.However, in Apache Ant, it decreased tremendously after attaining the role of a developer.As shown in Figure 3, there are only few bugs reported by the developers in contrast to nondevelopers in the Apache Ant project which is also reflected by the value of bugs reported variable for the Apache Ant project.Members after joining the "core members" group participate more often in technical discussions on the bug tracking system which is reflected by the value of bug comment variable.However, an increase in the participation in technical discussions did not increase the social relation of the developers on the bug tracking system (i.e., bug social relation) in the case of Apache Ant and Apache Maven project.One reason could be that, after attaining the role of a developer, they focused only on certain modules of a project and hence involved in discussions on bugs relevant to those modules with other developers of the project.There is also a tremendous increase in the number of emails sent by the members after attaining the role of a developer which eventually increases the value of email social relation variable.
Based on the Apache projects under consideration, we found that members after attaining the role of a developer tend to participate actively in technical discussions either on the mailing list or bug tracking system which also increases their social relation networks except the case of Apache Ant project.The bugs reporting behavior of these members varies in our studied Apache projects, and hence it is difficult to say if they report more bugs after attaining the role of a developer.

RQ-3: What is the average rate of contributions made by a potential developer comparing to other members of the project before attaining the role of a developer?
For each potential developer, we took the first time-stamp value where he first appears on the project and the second time-stamp value when he actually made the first commit The results in Table 5 can be understand as follows: the average rate of reporting bugs by a potential developer of Apache Lucene project is 4.53 times the average rate of reporting bugs by all other members who were active during that time period.Although the average rate of contributions made by potential developers varies in all the projects under consideration, it is quite obvious from each variable value that the contributions made by potential developers are more than the average contributions of all other members.Hence, we can say that they were the most active contributors (i.e., technically skilled and higher social status) before attaining the developer status in the project.

RQ-4: Does a potential developer follow onion model in order to attain the role of a developer?
For each potential developer, we computed the timestamp value between his/her first commit date to his/her first activity on the different software repositories in terms of days.Table 6 presents the appearance of a potential developer in terms of average number of days on different software repositories prior to attaining the role of a developer.The result shows that all the potential developers started from the mailing list (cf.Table 6) because the email activity is the oldest for all Apache projects under consideration followed by the bugs reporting/commenting, and the latest activity before attaining the role of a developer was the source-code patch submissions (i.e., bugs fixing).The results shown in Table 6 closely match to the onion model (see Figure 1) where a member starts as a reader followed by reporting bugs and later fixing bugs before attaining the role of a developer.
Let ActivityDate(Immig,mL) return the number of days between the first commit date of an immigrant (i.e., potential developer) on the source control repository to his first activity date on the mailing list of a project.The average number of days for an immigrant to appear on a mailing list prior to his/her first commit date is calculated as follows: The results (Table 6) also show that it took almost 2 years for a potential developer of Apache Ant, Apache Lucene, and Apache Maven projects to attain the role of a developer.However, we cannot say that it is the standard time as the time varies dramatically from project to project as it can be seen in the results of Apache Solr project comparing to other Apache projects under consideration.

Conclusion
In this paper, we have investigated in detail the patterns of contributions made by those members who have attained 10 ISRN Software Engineering the role of a developer in the project.First, we investigated the significant role played by nondevelopers in the long-term survival of an OSS project and observed that nondevelopers who do not have write-access to the source control repository participate actively in reporting bugs and email discussions, thus contributing to the maturity of an OSS project.Our investigation based on the contribution of potential developers before and after attaining the role of a developer showed that after attaining a higher position in the community, developers tend to contribute more efficiently than nondevelopers of the project by actively participating in technical discussions along with fixing bugs.Moreover, we observed that the members who attained the role of a developer had more contributions in contrast to the average number of contributions made by other members of the project who were active during his/her time period.This makes it obvious that one of the important factors in order to attain the role of a developer is the demonstration of technical skills and commitment to the project in an efficient manner.

Figure 1 :
Figure 1: General structure of an OSS community based on the onion model described in [4].

1 Figure 3 :
Figure 3: Contributions made by developers and nondevelopers over the period of time.

Table 1 :
Apache projects data range.

Table 3 :
Average rate of contributions made by developers and non developers.

Table 4 :
Yearly average contribution ratio of a potential developer before and after attaining the role of a developer.

Table 5 :
Average contribution rate of a potential developer comparing to other members of the project before attaining the role of a developer.

Table 6 :
Appearance of a potential developer on different software repositories prior to attaining the role of a developer.