The heartbeat of open source projects can be heard with GitHub data

27.06.2016
GitHub released charts last week that tell a story about the heartbeat of a few open source, giving insights into activity, productivity and collaboration of software development.

Why are these important Enterprises increasingly define software development as a top priority to gain competitive advantage or defend against disruption. They often turn to open source software because it is fast and agile. Enterprise IT decision makers should understand GitHub because it is the backbone of most open source projects.

+ Also on Network World: Programmer picks: 7 great GitHub integrations +

Open source lets enterprises collaborate with all the other interested outside developers to create software that exceeds their internal capabilities. But it also means much of the collaboration takes place outside of the enterprise. Unsafe and disturbing it’s not because of controls, transparency and analytics.

Salted throughout the GitHub website are analytics, and there is an application programming interface (API) that data-driven enterprises can use to create their own analytics to measure the progress and health of any public open source projects important to them. A dashboard displaying the project’s heartbeat could be built with the API.

The majority of the project activity is commenting. It’s not too far removed from Facebook posts, except the comments are tersely written and often have code changes attached.

Think of GitHub as a crowdsourced public book editing system where an author submits a draft book to his or her editors and the public interested in the book. Editors and anyone interested can download a copy of the book. They can comment, rewrite or submit additional paragraphs pages and chapters. All of the changes can be sent back to the public book editing system or, using GitHub terminology, with a Pull Request.

A Pull Request means the proposed changes have been sent, the author and editors notified, but the changes have not been merged. The editors review all of the submitted changes, check them for proper spelling, grammar, and accuracy, and curate the best. The curated changes are merged with the original work into a final bestseller.

A translation of a recent event on one of the projects that GitHub reported about, Microsoft’s Roslyn project, is a useful introduction before turning to the data. GitHub usernames are used in this example.

jasonmalinowski committed on GitHub Merge pull request #12057 from jasonmalinowski/generate-compiler-bind

This change to the project’s main body of code started last December when tmat suggesting a new feature in the Issues list. He might have written the software to implement the new feature to accompany his Issue—as GitHub contributors often do—but in this case, he didn’t.

After some clarifying revisions, user davkeen added tmat’s feature request to the priority backlog in February. Davkeen previously had been given the permissions by the project’s owner to accept issues into the backlog list. A developer coded the feature in his local up-to-date copy and made the changes. About a week later, jasonmolinowski made a pull Merge Pull Request from the developer’s version of the Roslyn project to move the changes to the main body of the project. He included comments to explain what and why he proposed the changes.

Emails were sent automatically to project members responsible for reviewing the change and testing—like the book changes submitted to the author and editors in the earlier example. After about a week of review and testing, he Committed the change, merging it into the main body of source code—like a book ready to be published.

To maintain software quality, the owner gave the authority to jasonmolinowski to make merge decisions. If someone else made the merge Pull Request, jasonmolinowski would have approved the merge.

Now, let’s look at some of the color-coded data.

Pull Requests: These are the code submissions that propose a feature or bug fix by developers that don’t have permission like jasonmolinowski did to merge them into the main code body. The proposed changes will be reviewed, and if they are accepted, they will be tested and merged. Using the book example, the changes have been submitted to the editors.

Pushes: Pushes are code changes that are merged into an earlier Pull Request. In the book example, someone submitted a change to the editors. After the editors and public have commented, the writer makes some changes to his or her original submission and Pushes them, merging them into the original submission.

Pull Request Review Comments: These represent all of the comments made to lines or sections of code, usually made by developers reviewing the proposed change. In the book editing example, these would be notations in the margin.

Pull Request Comments: These are comments made to Pull Requests proposing code changes to the main body of code. In the book example, these would be an explanatory cover letter.  

Issues Comments: These measure the comments about feature requests and bugs.

Issues: This represents the outstanding feature requests and bugs reports.

The Rosyln chart above makes a couple of interesting points. Foremost is that when Microsoft made the project public, a lot more developers contributed code and each change was accompanied by a broader, more insightful discussion. It also shows a drop in productivity during the summer of 2015. The volume of comments represents the degree of collaboration. The chart below shows that as projects size increases, comments grow to coordinate decisions as issues move from draft to a final merge and a release. The transparency of the development process improves the quality of review, as well as the decisions of those managing features and fixes, which would not be possible if the project were private.  

(www.networkworld.com)

Steven Max Patterson

Zur Startseite