Semver is good, and the crisis in open source tools

The opinions stated here are my own, not those of my company.

Semantic Versioning 2.0.0

Last week I read an article by Hynek Schlawack called and disagreed with a great deal of it. I posted a reply on Twitter, but it’s a quiet evening and I feel like elaborating a bit more on this because I think this points to a looming trouble in the software engineering space.

is good and should be adopted as the industry standard. By this I mean that someone who doesn’t adopt this is generally doing the wrong approach. I know there are some exceptions, and we should be okay with exceptions, but each exception should be scrutinized to ensure that it makes sense.

Semver comes out of the need to define API contracts between versions of a dependency, like a library. If I’m using v1.0.1 and there’s a v1.0.2, should I upgrade? Will that create any problems for my software?

No, semver mandates that a patch release (x.y.N -> x.y.[N+1]) should only include bug fixes, not changes to any API. The contract remains the same, giving you assurance that this is backwards compatible.

A minor release (x.Y.z -> x.[Y+1].z) should only include backwards-compatible changes to the API, such as adding optional features. While the contract has changed, I still have assurance that I can upgrade without trouble.

Only a major release (X.y.z -> [X+1].y.z) is allowed to change this contract. So, when there’s a major release, I should take precautions that my code continues to work with this dependency. Alternately, I may refrain from upgrading for a time so that I can integrate this change when I am ready.

Bug fixes are not breaking changes

“With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.”

states that people will use the pragmatics of a API even if that contradicts what is stated in the API contract. Yet someone who is using the API is an unintended way is writing bad code. This results in code that is highly fragile. If a library patch breaks your code, it’s possible that you’re at fault.

XKCD comic showing a random number function always returning the same number
XKCD comic showing a random number function always returning the same number
Random Number XKCD:

This XKCD is shared a lot, and it shows an example of a random number function you may see in a library. Let’s say this library function is used by another developer.

One day the function is rewritten so that it works in a way you might intend (with the number being randomly generated each time it is called). If you were relying on this function to deterministically return 4 each time, and now it doesn’t, you made a mistake.

This may also happen in reverse, where a library function may be returned so it always returns 4. If so, you may need to refrain from upgrading until that bug is fixed. Either way, your build should never fail if the contracts remain the same, and your code should be resilient.

Yes this is idealistic. In practice there’ll always be a ton of gotchas. But we should use semver as a good signal on what to expect with each dependency update. Many projects may have too many dependencies to follow along with each changelog, and trying to do so is not the best use of engineering time.

Semantic versioning means something

The originally cited article, the cryptography Python library in a “minor update”, breaking builds and violating semantic versioning.

The maintainers have . Should this be one of the exceptions? I would say no, that they’re actively creating trouble. Python library developers, as well as others, use semver as an industry standard. Maybe not everyone does, but enough do that you can assume everyone does. In fact, the package manager supports this:

pip install "package>=1.1,<2"

This will install the latest version of v1.x.y, using the assumption of semver. By not participating in this industry standard, a library developer is fermenting confusion. Being different is not the right attitude in this case.

The problem is that “close” isn’t good enough. Without compliance to some sort of formal specification, version numbers are essentially useless for dependency management. By giving a name and clear definition to the above ideas, it becomes easy to communicate your intentions to the users of your software. Once these intentions are clear, flexible (but not too flexible) dependency specifications can finally be made.

It is not different from using an obscure code style that clashes dramatically with the standard approach for that language. Sure, you can, but by not adopting industry best practices you’re creating a higher engineering burden for everyone using it.

ZeroVer libraries should not be used

Hynek discusses the advent of 0-based versioning. Part 4 of the Semver standard states:

Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

This, of course, gives developers an escape hatch. It’s a perfect solution. Just never update your library to v1 or above and you’ll never have to worry about backwards compatibility of adhering to semver.

A v0.y.z library provides a very good signal under semver: don’t use it in production! They are explicitly signaling that this library is not ready to use, and it may break your application in later updates.

If your software is being used in production, it should probably already be 1.0.0. If you have a stable API on which users have come to depend, you should be 1.0.0.

The problem is that these libraries are in fact being used too often even if they shouldn’t be. In fact I will use a v0 library in applications if that’s what’s available to me. I recognize that this is a problem, and yet I have to do it to get my job done.

Yet pinning your dependency to a specific version is also a problem. As Hynek mentions, you’ll miss bug fixes and potentially critical patches in security-sensitive libraries. I’m not going to follow along with the dozen or so libraries I’m quickly importing to get a job done. At some point I have to assume they’re doing the right thing, and they need to do the right thing.

Dependency hell

The massive size of node_module directories is an , due to the NPM ecosystem leveraging libraries extensively. Using libraries are good, actually, and rewriting the same code is not a good use of time.

Of course, this makes the dependency management problem even harder. If each of the dozen dependencies I need require their own dozen dependencies, and so on, it is highly impractical to manage individual updates on my own. I really don’t want to spend time trying to verify each child dependency and each grandchild dependency. I also don’t want to go through a dozen library updates where each one just changes its dependencies.

In modern projects semver is necessary in allowing dependencies to state API compatibility trees so that you can ensure each dependency is using the most recent versions all the way down. Otherwise you’ll be stuck with security problems in some obscure dependency that can’t be fixed.

Now I recognize that Hynek is writing from the context of Python and Pip while I’m looking at the lens of NPM. I admittedly have less experience with Python libraries so maybe that ecosystem is less equipped to deal with these kinds of problems. That should be an indication that the package system needs an overhaul.

Open source burnout

In a perfect system once your library bumps a major version, v1 -> v2, you will continue to maintain v1 for some time with patches and bug fixes.

In reality this doesn’t happen. In fact, open source projects are often the hobby of a few engineers in their spare time. They do it because they love open source so much they’ll volunteer their time to create it.

I love open source, and I do contribute to open source projects. However, from a professional standpoint this is a serious problem that has been plaguing open source and is leading to scenarios where nobody is happy.

Open source volunteers get angry reports about their library breaking things, leading to burnout. Why should their hobby be getting yelled at? Then they end up leaving the project, abandoning effort, dropping support for things, or making other non-ideal choices.

From someone who is using this library, now I’m put in a bad position. If it is not maintained anymore, do I need to fork the project? Do I need to copy the code manually into my project and remove the dependency?

Nobody wins here, yet we still go through these cycles of pain for no real reason other than inertia.

Fundamentally open source libraries are in the midst of a crisis where they’re popular even though the maintainers have no monetary incentive to continue. This means that the projects tend to focus on new things rather than maintenance, and projects get abandoned when the developer doesn’t want to do it anymore. And they’re entirely right to do so, as that’s what we’re incentivizing them to do.

Sustainable open source

The right answer is to add a monetization element. Professionalizing these projects means that maintainers can devote their full time to new features but also Long-Term Support (LTS) versions for backwards-compatibility. They can spend their free time doing truly enjoyable things while spending their work time focused on what we want.

The problem is that these maintainers don’t think about monetization. If anything, they’ll just stick a donation button in the README and hope that they can buy a cup of coffee. But this is not monetization. Asking people for voluntary donations is as irresponsible as large products using libraries built by volunteers.

License structures that require some sort of regular fee to use the library would be much more valuable at generating a sustainable ecosystem in open source. You’d say for any commercial product you’re owed some monthly fee, while non-commercial keep it for free. This wouldn’t be an issue for some cheaters, but larger companies would have a reason to actually pay you.

Keep in mind that software engineers are paid a lot. Replacing a full-time engineer with a $50/month charge is simple math for them. Having to deal with taxes and accounting is a larger burden for you. But it’s definitely worthwhile to explore, and necessary to prevent burnout and instability.

People in open source complain that their work is being used by large companies for free to generate money for them, but that’s exactly what you wanted! Your licenses state they can use your code to make them money, so you are not owed anything. No, a large company is not going to send $10 to your PayPal. Felker Software Corp does not have a Venmo account. Send them an invoice or a purchase order.

Open source has a lot of smart people, but they should be spending their time trying to come up with a better economy rather than burning out with bug fixes and user issues.

Stick to Semver

Software engineering is more complex than ever. At the same time, we’ve made improvements in the fundamentals. Random C-lang build tools have been replaced with LLVM. Standards bodies exist to formalize HTML and other programming languages so that we can guarantee they work across platforms.

Semver is one of these fundamental improvements. Importing libraries allows us to take advantage of previous engineering work, and of course everything we do is the top of a giant tower of software and hardware layers.

A single engineer cannot, and should not be expected to, know the entirety of their dependencies. Especially consider that what I’m importing will not be my core competency, so I cannot audit those changes for reliability or accuracy.

Malicious actors know this, and so and . Vigilance is necessary, but it requires the ecosystem working as a community to maintain a healthy system.

The supply chain attack mentioned occurred when a library transferred ownership from its original maintainer to a bad actor. That the original maintainer had no incentive to continue maintenance is a strong signal that there’s a problem here.

However, the answer to all of this is greater professionalism. Adhering to Semver is the right thing to do for your dependency, and signals that you’re taking your development seriously. Projects stuck in zero-ver should finalize their APIs, make a major version, and maintain that major version.

If you’re not willing to do these things, why are you releasing a package? This should be a serious question that you think about. If you don’t plan on maintaining your work, you should not release it at all. If you don’t plan on making your APIs stable, don’t allow people to use them. At the very least, signal explicitly your intentions so that others know not to use your work.

Yes, this is an idealistic approach, and in reality we all have to make choices based in pragmatics. But we, as an industry, need to strive to do better. Given the wide reach of many software packages, the harm caused can be wider than we may anticipate. It is still our responsibility for what we put out in the world.

Social Media Expert -- Rowan University 2017 -- IoT & Assistant @ Google