GitHub for the rest of us

24.02.2015
There's a reason why software developers live at the leading edges of an unevenly distributed future: Their work products have always been digital artifacts, and since the dawn of networks, their work processes have been connected.

The tools that enable software developers to work and the cultures that surround the use of those tools tend to find their way into the mainstream. It seems obvious, in retrospect, that email and instant messaging -- both used by developers before anybody else -- would have reached the masses. Those modes of communication were relevant to everyone.

It's less obvious that Git, the tool invented to coordinate the development of the Linux kernel, and GitHub, the tool-based culture that surrounds it, will be as widely relevant. Most people don't sling code for a living. But as the work products and processes of every profession are increasingly digitized, many of us will gravitate to tools designed to coordinate our work on shared digital artifacts. That's why Git and GitHub are finding their way into workflows that produce artifacts other than, or in addition to, code.

As reported in Wired, ReadWrite, and elsewhere, GitHub is used to manage the collaborative development of recipes, musical scores, books, fonts, legal documents, lessons and tutorials, and data sets. Given the infamous complexity of Git, how is this possible

One reason is that GitHub has gradually exposed more of the underlying Git capabilities in its Web interface. Another is the emergence of Web applications that use GitHub as a platform. Then there's the cultural factor: GitHub embodies a particular way of working together. Dave Winer describes it with the phrase "narrate your work." I've used "observable work." The Responsive Organization movement celebrates "transparency over privacy." For GitHub's government evangelist, Ben Balter, it's "open collaboration." 

The blog post in which Ben Balter proposes that term was unpublished when I read it. But since the blog is hosted on a public GitHub repository I could not only read the post in draft form but also follow the discussion with invited reviewers and observe how that discussion influenced the draft. A repository, of course, need not be open to the public -- but every organization should want its internal processes to leverage this style of open collaboration. According to Brian Doll, vice president of strategy for GitHub, a growing number of companies are doing exactly that.

It's often said nowadays that every company is a software company. That's true in an abstract way, if you define intellectual property as software. But it's also literally true for many companies whose value is embodied in software they develop internally.

It was always desirable to expand participation in that development beyond the traditional disciplines of code, test, QA, and documentation. But if the contribution you can make was based on your understanding of the business or of the customer, you couldn't engage directly.

"That's insane," says Brian Doll. "If you're a bank, the wealth management tools your employees and your customers use are the product, how can those people not have a direct hand in improving it" With GitHub, every stakeholder can become a first-class participant. Rather than writing emails that orbit the system of record, they can send pull requests and discuss related issues directly in that system. 

Taming the Git beast

Git, the decentralized version control engine under GitHub's hood, works in ways that surprise not only nonprogrammers but also programmers who come to it from centralized systems.

In those systems it's a big deal to create a branch within a repository, in order to explore an alternative version of a set of artifacts. In Git a branch is a lightweight construct, an illusion created by moving pointers instead of data. In a conventional system it would be unthinkably costly to create a branch to change a single word in a document. Git makes that maneuver trivially cheap. GitHub can embed it in a workflow -- the pull request -- that encapsulates discussion of the change and ties it to the document's change history.

Git's protean capabilities have made it a laboratory for workflow innovation, and the many approaches that have emerged present another layer of complexity. The mechanics of branching and merging are tricky enough, but there are also various schools of thought about when and how to branch and merge. All this is challenging for programmers and way beyond most others. How can you tame this beast so that nontechnical stakeholders can participate

GitHub's answer: Enhance the website for core activities. A lawyer who wants to change one word in a legal document needn't use the scary Git client; she can edit the file in the browser. That action will kick off a pull-request workflow that automates the creation of a branch dedicated to the proposed change. GitHubbers like to say that "there's only one way to change something." Nobody is required to adhere to that golden rule, but doing so follows a path of least resistance.

As a result, everyone in a GitHub-enabled company can easily adopt this best practice. "Instead of grousing at the water cooler because the software is terrible," says Brian Doll, "you have a way to change it." Everyone can use the same mechanism, whether a contribution to that change is code or documentation or legal advice or business perspective or customer feedback. 

The value of that shared convention, arguably GitHub's most important innovation, is enhanced by other conventions imported from social media. On Twitter, for example, you can draw the attention of another Twitter user by mentioning their username. This @mention technique works in GitHub for individuals and for teams.

There's also GitHub Pages, a service that hosts websites on top of GitHub repositories. It's favored by technical bloggers who are familiar with Git and willing to install (and use locally) a Ruby-based site generator called Jekyll. But as others have discovered, you don't have to install Jekyll. It's possible to manage a GitHub Pages site entirely in the browser and enjoy the benefits of version history and issue discussion.

Visualizing change

Version control and change visualization are deeply wired into the work of software development. Nowadays no competent programmer would even think of discussing a proposed new version of some code without a "diff" that shows exactly what will change.

That expectation is another part of the unevenly distributed future inaccessible to most other knowledge workers. It's a fundamental kind of digital literacy, relevant to everyone in an organization, but not yet pervasive. Obstacles to its spread are both cultural ("we've never done it that way before") and technical ("my work product is not a text file").

The digital artifacts of software development are still files containing lines of text that hark back to punch cards. And we still visualize changes to those files on a line-by-line basis. Compilers and IDEs understand code in terms of modules and methods, but version control systems don't share that understanding. Attributing a change to module X or method Y, and observing such change over time, is cognitive grunt work that could in theory be machine-supported but in practice isn't.

This impedance mismatch exists for deep historical reasons and won't be resolved anytime soon. Meanwhile there are two ways to address it, and GitHub is pursuing both.

One approach is to convert rich documents into text files. That's a common practice in government agencies that have adopted GitHub for collaboration, according to Ben Balter. He's created a tool that can convert the Word documents widely used in such agencies into Markdown, a plain-text format used on GitHub and in many other environments. That workaround is less than ideal for two reasons. Roundtripping documents through format converters is perilous -- and Markdown isn't a standard format. There are many variants; in fact, the one used on GitHub is known as GitHub Flavored Markdown.

Ideally, GitHub would understand rich formats, and there's been progress on that front. It's long been possible to compare changed images in a visual way. A year ago, "prose diffs" enabled inline color-coded highlighting of differences between HTML renderings of Markdown files. This approach also helped make differences in tabular formats like CSV data and HTML tables more legible, but didn't leverage any deep awareness of document structure.

Such awareness is now available for one format: GeoJSON. It encodes geospatial information in a JSON format that GitHub uses not only to display a map as one rendering of the data, but also to show the map's revision history visually, using a slider to scroll through versions. Extending that approach to Word documents, PDF files, and spreadsheets would make GitHub-style collaboration vastly more appealing to people whose work products are expressed in those formats.

GitHub as a platform

GitHub can't be all things to all of its millions of hosted projects, but it can enable others to build on top of it and integrate with it. Tools that use GitHub APIs to wrap project management features around existing repositories include Waffle.io, HuBoard, and ZenHub.

Continuous integration systems like Travis and Jenkins can use the status API to report the outcome of tests associated with a commit. Moreover, the CRUD API enables programmatic commits that create, update, or delete files in a repository. Ben Balter has used it, for example, in an application that takes input from HTML forms and appends it to a CSV file in a GitHub repository. 

Of course GitHub isn't the platform for everybody. O'Reilly Media's Atlas, a hosted system for publishing books in multiple formats, is built directly on top of Git. But for many nontraditional uses of Git, GitHub's interface -- and evolving extensions to it -- will be a powerful combination.

A culture of collaboration

Like Git, GitHub enables many styles of collaboration. It encourages key best practices, like issuing pull requests from disposable branches, but doesn't try to legislate others. Consider labels, which are keywords assigned to issues and pull requests. (Other social systems might call these tags, but in Git tags identify specific points in a repository's history.) Nothing requires you to use labels, but if you do -- and if your team adopts a thoughtful and consistent vocabulary -- you enable filtered views that help everybody make sense of a project.

Crafting useful commit messages is another way to be a helpful collaborator. Programmers joke about commit messages like "I changed some stuff" and have, over the years, learned how and why to narrate their work more effectively. Most other knowledge workers, though, aren't used to formalizing small units of work, contributing them to widely shared spaces, and describing them in ways meaningful to others.

These practices, layered on a common understanding that all shared artifacts ought to be versioned and all changes carefully controlled and documented, should never have been restricted to software development. We're all doing distributed work that must be coordinated with care, attention to detail, and team awareness. Git made that possible for programmers. GitHub is making it possible for the rest of us.

(www.infoworld.com)

Jon Udell

Zur Startseite