You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

82 lines
17 KiB
Markdown

+++
title = "colors and faker: a case study on the npm ecosystem"
tags = ["open-source", "opinion"]
date = "2022-01-10"
author = "Braydon Kains"
+++
# Foreword
For years I've listened to software engineers more experienced than myself poke fun at the [left-pad incident](https://www.theregister.com/2016/03/23/npm_left_pad_chaos/). Usually used as a joking throwaway comment about keeping package-lock files in sync, or in accordance with the [related xkcd comic](https://xkcd.com/2347/) (which seems to get more relevant the older it gets). It was technically just before my time as a professional developer (my less-than-stellar jQuery experimentation was safe from this at the time), so I would take it as a cautionary tale that taught us an important lesson about the software supply chain.
It also informed a lot of the learning I have done over the years about what it means for software to be open source, the nuances of open source software licensing, and the [difference between freedom and beer](https://www.wired.com/2006/09/free-as-in-beer/). I've always been passionate about software that is at the very least source-available; the collaboration between so many talented and passionate people has always felt like something of a panacea to me (depending how rosy my glasses are that day).
This is all to say that reading about what happened with the npm packages `colors` and `faker` left me with a lot to say. I would have gone the lazy route and tweeted my thoughts to the void as usual, however I haven't posted to this blog in *checks notes* 10 months! Some content creator I am. The shareholders will have my head!
So without further ado, I'd like to take as nuanced a look as I can at all the moving pieces of this fascinating case study.
# What happened with `colors` and `faker`?
The headline is not clickbait enough to attract anyone who does not already know about this situation (other than my proofreading partner, hi dear!). However, for the purposes of this post, I'm going to pretend you have no idea what's going on and summarize quickly so we can build some context.
[`colors`](https://www.npmjs.com/package/colors) is an npm package that enables the user to colour their console text in their command line applications. Command line applications may not be the first thing that come to mind when you think of Node.js, but a vast majority of JavaScript dev tools have a command line interface and leverage this package to improve the appearance of their output.
`faker` (no link for this one; will explain shortly) is an npm package that will randomly generate data, however this data is believable; it falls into common data patterns like names, street addresses, movie quotes, etc. I am not exactly sure which was first, but this library was heavily inspired by counterparts in other languages such as Perl, Ruby, PHP, and Python.
These packages are authored and maintained by the same developer: Marak Squires ([see his GitHub](https://github.com/Marak)). These packages were used by thousands of Node.js applications, all published to and subsequently downloaded from the node package manager's central repository. This large repository of packages is owned by npm, Inc. and GitHub, and is the source from which virtually every node application pulls at least some open source dependencies. `colors` and `faker` were both open source and published with the [`MIT` License](https://opensource.org/licenses/MIT).
Last year, the author of these two packages decided that he was no longer interested in developing and maintaining the packages. They opened [this issue](http://web.archive.org/web/20210704022108/https://github.com/Marak/faker.js/issues/1046), declaring that they would no longer be working on the package. Last week, they took this a step further: they intentionally [introduced an infinite loop with spooky text in `colors`](https://github.com/Marak/colors.js/commit/074a0f8ed0c31c35d13d28632bd8a049ff136fb6) and, as we zoomers might say, [yeeted `faker` from existence](https://www.npmjs.com/package/faker) (not really, but I will expand on that later on). This affected thousands of Node.js applications, which means it affected a ton of developers and companies of all sizes. And I mean "all sizes"; one of the affected packages I am personally familiar with is Amazon's [`aws-cdk`](https://github.com/aws/aws-cdk/commit/b851bc340ce0aeb0f6b99c6f54bceda892bfad0e), and this is just one of many widely used packages that were essentially bricked until the issue was resolved.
Now that we have a general idea of what happened, I'd like to add my interpretation of what it means to work with npm.
# What does it mean to download an npm package?
One of the earliest lessons I learned when I first started using [Linux](https://www.reddit.com/r/copypasta/comments/czef0u/id_just_like_to_interject_for_a_moment/) is to not download and execute random scripts without reading them first and understanding the risk. They require that you make a conscious decision to trust the source of the script. This makes sense when you think about it; a bash script (especially with sudo permissions) has the power to do an incredible amount of damage things to your system (or maybe just [fork bomb](https://en.wikipedia.org/wiki/Fork_bomb) you as an epic prank, anything goes). Usually, where possible, you were encouraged to install your software through your distribution's central package repository. This large repository of packages, all built specifically for the distribution, is owned and maintained by a dedicated group of volunteers or employees who vet and approve each one. There are ways for independent users or organizations to host their own repositories of packages, and integrate with the distribution's respective package managers. These require that you trust their source, similarly to downloading and executing bash scripts.
The reason for this tangent is to relate it back to `npm install`ing a package. Installing a package through npm is a combination of these two flavours of installing software; it is a package manager similar to the ones commonly included in Linux distributions, however each package in its central repository is not vetted and managed by a group of volunteers or employees. When you download and execute a package from npm's central repository, you are trusting the author of that package.
Now it's obviously a bit extreme to directly equate installing an npm package to `sudo` executing a bash script. It's a lot more nuanced than this since the most popular packages in the repository also have the most security experts' eyes on them at all times. They may not be constantly approved by a central group of people, but in a perfect world issues are swiftly reported and dealt with by package maintainers and consumers of the package. npm also has a number of mechanisms to keep dependencies at a certain version until you trust that an upgrade is up to your standards.
# What does it mean to publish an npm package?
Every JavaScript package on npm is open source by nature. JavaScript is an interpreted language, and no amount of obfuscation will truly hide the JavaScript being shipping when a package is published to the central repository. This code can be licensed under any open source license that suits the project's needs. This license legally defines the way that the copyright holder approves the code to be used. Once the license (or lack of one) is defined, and the [minimum setup requirements](https://docs.npmjs.com/cli/v8/commands/npm-publish) are present, you are free to publish whatever you would like. You could publish the next big JavaScript framework, a useful new CLI tool, [nothing](https://docs.npmjs.com/cli/v8/commands/npm-publish), whatever you'd like provided it is legal.
# How we've forgotten this
Node.js and npm are tools that have come about as close to ubiquity as a very short list of technologies ever have. The number of new developers who's first step of their journey was/will be to run `npm install` is staggering. The largest companies in the world continue to rely on npm in varying capacities, and have contributed a large number of popular packages to its ecosystem. When such an apparent consensus of people are doing something, it's easy to interpret some guarantee of safety. You wouldn't jump off a bridge just because someone else did, but if 10 000 people jump off a particular bridge every day it's gotta be safe, right?
# One-way Trust
Let's have a quick look at that [`MIT` license](https://opensource.org/licenses/MIT) again. The first line of this license states: "Permission is hereby granted, free of charge, to any person obtaining a copy of this software[...] to deal in the Software without restriction[.]". Further down, the license states: "[sic, all caps] THE LICENSE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED[.]".
I am not a lawyer, but my interpretation of this in plainer terms is that anyone is allowed to use this software as they see fit, but that the copyright holder provides no guarantee of anything that may come with that freedom. If it doesn't work for some reason, doesn't do what you want, or has some kind of major flaw, the copyright holder does not bear the responsibility. When some of the biggest packages on npm are maintained by such large, efficient, and dedicated teams, it is easy to forget that warranty of any kind is almost never included by default; that's the one of the exchanges that is made when the software does not cost anything.
# How this relates to `colors` and `faker`
All of this is to contextualize my general takeaway from this situation; the author of `colors` and `faker` were within their right to self-sabotage their packages. A quick disclaimer that I personally dislike what they decided to do, and I'll shortly expand on that, but I want to explain why I think what they did is within their right.
I think this goes right to the root of what open source is, distilled to its core: some person writes code, they put it somewhere for the public to see, the public can do with it whatever the license permits. To be reductionist for the sake of brevity (let's laugh and pretend I have that), I think every single piece of open source code reduces down to this. Even if code is published by a Fortune 500 company who have great motivation to maintain that software until the end of time, even if software is so popular that volunteers will diligently shepherd it along the moral high ground to the heat death of the universe, open source software is still this at its core. When you use this published code, you have made the implicit decision to trust the person who published it. The only binding promise that the copyright holder have made in return is that the code exists and you can use it.
Marak wrote these packages and published them under the MIT license. They can update this code however they want. If they want to intentionally corrupt the package making it print `LIBERTY` a bunch of times to scare the bejeezus out of me, it appears to me that they are within their right to do so. I saw a number of people on Twitter decrying a "breach of trust", and how it ruins the image of the npm ecosystem for a package author to do this. In my opinion, something is only a "breach" of trust when that trust is established two way and both parties are responsible for it. **Package authors are not responsible for the one-way trust you placed in them, nor are they responsible for the effect their actions have on the reputation of the npm ecosystem.**
The way Marak went about this was pretty "chaotic evil" for my tastes, and I don't personally appreciate that they broke the trust so many people placed in them. However, I think a large number of us have forgotten that the people we download all these gigantic dependencies from are not in any way obligated to maintain a relationship of trust, and on paper could go nuclear at any time. The modern software supply chain has led to us unknowingly permitting this at an unprecedented scale for decades.
# You are being ridiculous
Yeah, sort of. I am being pretty "doom and gloom" on purpose to frame this crucial portion of the software supply chain in a particular way. Let's come back down to reality for a bit to talk about what all of this means for well meaning software developers just trying to get their job done.
# How to improve npm safety (while still using npm)
I should preface this section by clarifying that I'm relatively new to this problem at scale, and there are experts far smarter than myself working to solve and educate on these problems. I'd still like to close this article off with tips that I have for developers who are worried about how to protect themselves further in the future.
I imagine there are a number of you reading this who are upset that I am appearing to suggest every line of open source code you pull down be audited. We all know that's really not feasible at any scale larger than "demo". This is why there are so many tools, such as [sonarqube](https://www.sonarqube.org/), [snyk](https://snyk.io/), and every project's most diligent contributor [dependabot](https://github.com/features/security) (I am not affiliated with any, just a few I'm familiar with) built with features to track and audit dependencies you've brought in that may contain vulnerabilities. However, these tools don't necessary help if you've accidentally pulled in a bad dependency during development.
When a package is published on npm, save for select circumstances, the version published is there in perpetuity unless npm decides to take it down. Even though `faker@6.6.6` which essentially deletes all of its code is published on npm, it does not remove the history of `faker` releases. Code can only be [unpublished from the registry](https://docs.npmjs.com/unpublishing-packages-from-the-registry) if the package has no dependents, which `faker` had a number of. In this case, and the case of `colors@1.4.44-liberty-2`, npm provides the tools to protect against these releases if you are a direct dependent.
If you are a newer developer, I recommend understanding [semantic versioning](https://semver.org/) as fully as you can; it is one of the greatest defenses to much of what I've mentioned in this article. The most common practice when using semantic versioning is to use the `^` caret prefix on most of your dependencies because this is what npm does by default when installing a new dependency. It means that any updates to the major version will not be installed, but the latest release of that major version will be used. Similarly, there is the `~` tilde prefix is similar, which will not allow any updates to the minor version. Providing no prefix will pin a dependency at a particular version. If you aren't already, it is highly recommended to use more discretion when deciding which prefix to use on new and existing dependencies you choose to bring in.
An important caveat here is that even people who were more reserved by only allowing patch releases of `colors`, which should suggest only bringing in bug/vulnerability fixes, still got screwed here by unexpectedly allowing a *very* breaking change. However, this defense is still good against typical, benign cases.
The issue with increased discretion is it usually means you have to do more manual work when it's time to update. Working in the Node.js ecosystem is implicitly accepting that everything moves incredibly fast, and it's a danger to your application's continued health to let things fall too far out of date. While far from a perfect solution, one of my favourite ways to combat this is [`npm-check-updates`](https://www.npmjs.com/package/npm-check-updates). It provides an optional interactive environment to select updates to packages that you feel confident are safe. It is a nice convenience in a process that haunts Node.js developers everywhere.
The sad truth is that there probably is no true way to stop this from affecting you. Semantic versioning is the biggest help when you are a direct dependent of the code you are trying to control. Unfortunately, npm package dependency graphs can go many layers deeper than we bargain for. If you pulled in even one odd dependency that doesn't pin some sub-dependency nicely, and the sub-dependency becomes problematic, you could have an issue that you often can't directly do anything about. This can lead to a frustrating amount of work, and what feels like a lack of control over your codebase if it happens often. For this one, I don't have a great solution. I wish I did, because it's a problem I have had for most of my time in the industry. I wouldn't want to say that a great solution doesn't exist somewhere, but it's probably going to be a burden we have to bear in the Node.js ecosystem to remain safe and secure. The biggest piece of advice is to make sure you are controlling your direct dependencies as tightly as possible the more strict your security requirements are; even when a dependency pulls in a bad sub-dependency, you can protect against the ripple effect if you keep tight control over when you bring the direct dependency in.
# Conclusion
I think what happened with `colors` and `faker` is a fascinating case study into how many of us have become complacent with npm's hidden safety concerns. I love open source software, and I believe we can all do out part to ensure we use it safely. I hope this article provided a new perspective to the situation, and whether you agree or disagree feel free to reach out and discuss! I am interested to hear about your experiences.