TL;DR: Almost everyone seem to be using Git like CVS, while it was created to be used like in Linux kernel development.
Edit: After initial round of feedback, I’d like to clarify at the top:
You can take the developer out of the CVS, but you can’t take the CVS out of the developer.
– me, in this post
Maybe I’m arrogant, but I think almost everyone are using Git incorrectly. Or, to say it in a less click-bait way: sub-optimally.
How do you know if you, your team, or your organization use Git wrong?
Answer the following questions:
If the answer to all of them is yes, I’m afraid this is you:
Do you remember CVS? It stands for Concurrent Versions System. Concurrent is the keyword here. If everything you do around your source control is centralized, then concurrency is as far as you can go.
Git on the other hand implements Distributed version control, also known as distributed revision control system (DRCS), or a distributed version control system (DVCS).
Somewhere during popularization of Git people forgot what was the original reason to create it, and now they use it as weird-CVS that wastes disk space, by coping the whole code history everywhere.
My main point is: the difference between Git (and alikes) and previous generation of source control tools is the difference between concurrent model: creating changes to the shared codebase at the same time, and distributed model: developing code independently and sharing changes.
It’s a fundamental difference. In a distributed model each developer should be as autonomous and self-reliant as possible, and consensus should come from communication, and not be centrally controlled fact.
This difference might initially seem rather insignificant, but it affects the whole software engineering culture. It changes what gets revisioned in the source control system in the first place, how you create your build system, how you communicate, how you review changes, how you setup testing and infrastructure.
DVCS is a superset of CVS. You don’t have to apply every aspect of DVSC model, but it’s good to know it exists.
Have you ever done a code review, and gave feedback that was longer than doing the fix yourself, while you’re at it?
Did you ever had a fix that your colleges are waiting for, but they couldn’t use it until you passed all the code-review checks to actually get it “merged into master”, while passing it out-of-band was an extra-hassle with your tools and procedures actively getting in a way?
Did you encounter bugs in master branch, that affect only some developers, or are failing intermittently, and slipped through automated tests?
Have you ever been unable to do something, because “master is broken”?
Your software development model should embrace Linux kernel development model. It was the whole reason Git was created, and got so popular. It enables efficient, asynchronous code development collaboration between many people. It works well for > 15 million lines of code, thousands of developers: paid and unpaid, professional and amateurs alike. So it can benefit your projects as well.
There should be no “one master branch” that everyone uses, since it’s a point of contention and a single point of failure. And when stuff breaks in “one master branch” your whole organization is affected.
Changes to the code should flow freely between people. Developers and teams should exchange patches or whole branches as they find optimal for a situation at hand. Generally, they should be organically propagating from the bottom-up and across. This gives them time to mature, be battle tested on many setups and situations before they land in anything that you would consider “the master branch”.
It might seem like chaos, but it is not. It is just fluid synchronization, or to put differently: eventual consistency system.
Also, it is worth noting that every time changes are propagated, some code review and testing is involved. More on it below.
I am really at odds with how typically code reviews are organized.
I think the cost of any barrier to “getting the job done” in software development is greatly underestimated.
Obviously, you don’t want a buggy code in production, and developers are only humans: they make mistakes all the time. You need peer reviews for this and many other reasons. But if your developers (even subconsciously) think that doing something will involve unpleasant emotions like frustration, critique, etc. their productivity will inevitably drop.
And code-reviews are where such barriers are naturally placed. They usually involve:
First, developers already have
git and their development environment. That’s all
they need for changing the code, which involves reading the existing code.
So why would they use anything else, for reading the code, when doing the code review?
Making them learn and then deal with yet another software is a distraction, at best.
And why would you force everyone to review code the same way? Do you really think that everyone enjoys the same color scheme and web interface of one tool that was picked for everyone? Do you force everyone to use the same text editor, with the same color scheme too? In an organization that uses truly distributed model, every developer should be able to use their preferred tools for everything, code review included.
The way code review is done: by looking
on an output of
diff command, pretty-printed in a web-browser, and pointing
out obvious mistakes is very inefficient and fundamentally insufficient. For
a developer to truly review a change, the developer should download
it locally, do
git checkout on
it, open in IDE/text editor of their choice with or along
diff viewer, jump
around the new code, run tests and even write some new ones, toy with it,
execute etc. Only then the change has been “reviewed” as opposed to “glanced
Review should be more about communication and collaboration, rather than just one way critique. If problems were found the reviewer can fix them immediately and send back the fixes. It’s usually much faster anyway, compared to writing long comments, explaining what changes are required through some web form and doing the review again, after the original author implemented requested changes (often resulting in multiple round-trips like that). Also, “this is a change with how I’d improve your code” feedback, is much nicer than “your stuff is wrong, go fix it”, no matter how politely you go about it.
You should treat your developers as a distributed testing network and not rely on a central testing system to do everything and catch every issue. Sure, central, mandatory and automatic testing is a great last line of defense, but before any change reaches that point, it should have been already tested many, many times, locally by the developers propagating it.
This increases chances of catching stuff failing intermittently or only in particular setups, before it hits everyone.
For this approach to be effective, it is vitally important that you make local testing as efficient, natural and useful as possible. Developers should be intimate with testing and run routinely and locally as much tests as possible. Your testing suite should be created for humans, and not for a big, inhuman, black-box, automation system. It should be natural and intuitive to run specific subsets of your tests. Being able to tell what and where failed is important.
Sure, not all that is applicable in every situation. And you might just not be enthusiastic about that “distributed” idea at all. Maybe more top-heavy, centralized system is actually what you need.
But then, maybe CVS-es (like Subversion) could be a better choice for you.
Anyway, I think it’s worth to know how things could be, even if you don’t plan to use them right now, or even disagree with them altogether.