Summary of proposal

To study the structure and roles of the community of the Apache Software Foundation.

Ramón: These bullet points don't reflect the questions to be answered proposed in the next section.

This should be done over time. Then we can obtain how the roles of each individual change over time.

Ross: What is the "community"? The ASF is an umberella organisation for a number of projects. Are we studying individual projects, or the ASF as a whole? Personally, I (RossGardler) would feel it far more interesting to study the ASF as a whole, how has the foundations community grown over time in relation to the growth of the projects it houses?

Herraiz: The community is formed by the group of people interacting in the ASF, through the mailing lists, the subversion repository, etc. The study would take in account all the mailing lists of all the hosted projects in the ASF, as well as the lists of ASF itself (at least, of all the public lists). The study would be done at the project level, and at the community (or ASF as a whole level).

Ramón: I think that going with the ASF as a whole is going to be too complicated. From a practical point of view, I'd say let's study one project. Can we get useful community metrics? If the answer is yes, then we apply the same methodology to all projects. Once we have that, we can think how to tackle the whole Foundation. One problem will be, e.g. how to deal with people who appear in different projects.

Ross: There is a mailalias.txt file in the ASF that (theoretically) defines all mail addresses used by committers in the ASF. Doesn't help with non-committers using multiple addresses though. I'm not sure if this is a public file or not, I'll check if necessary.

Using the developers and users mailing lists, it should be possible to identify the profiles described in CommunityMetrics.

Questions to be answered

Ramón: In the proposal below, too many questions, and too generic, the research problem is effectively unspecified. As it is posed now, this is enough for an open research project for 1 year, not a 1 month study. I would hesitate having more than 2 questions.

Other ideas to work on

Ramón: While the ideas below may be interesting, I think they stray from the study. In fact, they could constitute a new study. I'd rather have 1 study finished that 2 incomplete, so I'd forget about this completely for the moment. I'd like to delete this section if nobody disagrees.

I am interested in finding out how ideas spread in a community. The global trend of the community is determined by the common wisdom of the community. This is, people accept some ideas about the community. How that common wisdom generates and spreads?

The methodology proposed is to take the text of the messages in the mailing list, and to extract "keywords" or "ideas" from that text. There some approaches to do this. The easiest one is just to filter out the most common words and to obtain a list of the 10 least commond words. Those words would be the "keywords". Another approach could be to reuse the text matching methods used by the FOSSology project.

The evolution of keywords over time could give an idea of the evolution of the community. We could try to make some Social Network Analysis, generating a network of people connected by keywords. This analysis could be done on a monthly basis, and maybe we could try add information about the profile of each individual (identified using the method described on the top of this page).

This approach has some other applications as well. For instance, we could identify who first introduced a keyword in the project, and how keywords propagate depending on who introduced them.

Another application would as a summary of messages. This could be useful for newcomers. When someone wants to ask something in a mailing list, she has to carefully review previous messages in order not to ask about something that has been asked before, because otherwise people get annoyed. This "archeology" of mailing lists archives supposes a entry barrier for new comers in the mailing lists. If messages could be summarised using keywords, the digging process would be easier.

From another point of view, developers have to deal with large amounts of mail. Having keywords of the messages would help to identify which messages are important and which are not.

Data sources

Ramón: Pointers needed.

Work plan

List of tasks to be performed during the visit.

Profiles and Migration processes

Ramón: Unless Israel has already done this before, knows the tools by heart, and knows exactly what results he needs to get at each step, and how to use them in the following steps, I don't think there's any chance of getting this done within the proposed schedule. I think optimistic estimates should start at twice the time.

Spreading of ideas

Ramón: As above, this is another project and doesn't belong in this page.


Kevin Crowston and James Howison. The social structure of free and open source software development. First Monday, 10(2), February 2005.

Chris Jensen and Walter Scacchi. Modeling recruitment and role migration processes in OSSD projects. In Proceedings of 6th International Workshop on Software Process Simulation and Modeling, St. Louis, May 2005.

Comparing the Similarity of Statistical Shape Models Using the Bhattacharya Metric

OSSWatchWiki: ASFCommunity (last edited 2013-04-15 13:56:18 by localhost)

Creative Commons License
The content of this wiki is licensed under the Creative Commons Attribution-ShareAlike 2.0 England & Wales Licence.

OSS Watch is funded by the Joint Information Systems Committee (JISC) and is situated within the Research Technologies Service (RTS) of the University of Oxford.