Open Source Immigration Modeling

Additional Information

Background

This project is a collaboration between software engineering researchers at the Computer Science Department (Chris Bird, Prem Devanbu) and Social Scientists (Swaminathan, Hsu) at the Business School, all at UC Davis. We gratefully acknowledge research funding from two NSF grants from the Human & Social Dynamics Program, and the Science of Design Program.   The overall goal of this research is to conduct a longitudinal study of the interactions between design, social process, and product quality in  Free/Libre open source software systems (FLOSS). The specific goal of the project described on this page is to study the process by which people become accepted as developers in FLOSS projects.

Publications

This is a new project, so nothing (yet) has been published. However, there is a draft paper available for the asking, just send  email to <lastname of prem>@cs.ucdavis.edu. It's under review right now, we'll be glad to put it up here as soon as we hear back. However, the interested reader might find our earlier paper of interest; the data extraction approaches used there are quite similar, although the goals were different.

 Graphs and Analysis. 

Our goal was to quantitatively model the rate at which people became developers in FLOSS projects, using statistical hazard rate analysis. We measure the time delay "at risk" from the time a  newcomer joins the mailing list to the time they first make a commit in the CVS repository.  We studied Apache HTTPD, Postgres, and Python.


Here are some graphs showing the rate at which people become developers: x axis is years since the person first appears on the developer mailing list. It should we noted that very  few people stick around (who haven't yet become developers) past the 4 year mark, so the data at the upper end on the x axis is based on very few samples.  The curves shown are simply smoothed raw data, not fitted to any model. The rate is  slow, because most people don't become developers. Our theoretical explanation for the non-monotonic behaviour of rate is complex, and are discussed in the paper (see publications above). But very briefly, the non-monotonicty relates to 3 conflicting effects: two that increase with time (project-specific skill and social status) and one that decreases (level of technical commitment).



apache hazard rate, raw, smoothedPostgres hazard rate, smoothedSmoothed python hazard rate

Figure 1: Rate at which people become developers, as a function of their tenure on the mailing list. The left most picture is for Apache HTTPD, the middle is Postgres, and the rightmost is Python. note the striking similiarity.


W
e have also made available the trace of the analysis (using the Stata data analysis package) and the descriptive statistics (including several other predictive measures that were not in the scope of the current paper)  for Apache HTTPD, Postgres, and Python.  In all the analysis it should be noted that we are consideirng an entire population at risk,

Our goal was to quantitatively evaluate these 3 hypotheses:
Hypothesis 1 Likelihood of attaining developer status will rise with tenure, peak at some point, and then decline.
Hypothesis 2 Demonstration of skill level, such as patch submissions and/or acceptances, will increase the likelihood of becoming a developer.
Hypothesis 3 Social status will influence the rate at which a non-developer becomes a developer.



Additional data will be made available subsequent to the review process.