Useful open source tools: the little things add up

A review of handy open source developer tools for everyone, created at ApostropheCMS.

We've been building open source for a long time. Mostly ApostropheCMS itself and its many supporting modules. But also other useful tools. Developer tools that don't require ApostropheCMS in order to be useful. Little things we built to get through the day and released as open source, without much drama. Recently I reviewed our GitHub organization and realized it's time we talked about the supporting cast. Perhaps you'll give these tools a try:

sanitize-html: when your markup needs a bath

Outside of ApostropheCMS, sanitize-html is by far our best known module. sanitize-html provides a simple API to... well, sanitize HTML. Markup that comes from rich text editors, markup that comes from external pages you're importing, anything messy you'd like to clean up so you can offer a guarantee that it won't break the page. For more complicated cases, it's often a good idea to pair sanitize-html with cheerio.

absolution: give your URLs a home

While we're on the subject of HTML: sometimes your HTML is just fine, but you need to replace relative URLs with absolute URLs. absolution is a simple tool for that one job. It just needs two things from you: the URL of your website, and the markup to be... absolved.

split-html: slice that HTML down the middle

One more HTML module: split-html. Ever since we introduced ApostropheCMS, we've known that users sometimes want to split a rich text widget in two and insert a different widget in the middle. The task seems simple, but gets complicated when you consider how HTML tags can be nested. Powered by cheerio, split-html divides your HTML fragment into two pieces at the dividing line of your choice, closing and re-opening tags as necessary to make sure the result makes sense.

oembetter: the turducken of web development

Everybody embeds stuff. Want video on your website? You could self-host, but you're often better off embedding YouTube or Vimeo. SoundCloud and Spotify are often candidates for embedding too. What they all have in common is a very old, very useful standard called oembed in which the provider tells you exactly how to embed their stuff. But to use oembed safely and effectively in 2024, you need to decide exactly which sites to trust, and you often need to substitute your own tweaked markup for certain sites. oembetter brings you those features as standard equipment. We rely on it for video embedding in ApostropheCMS.

scale: let the browser resize big images

Lots of apps accept uploaded images. In the bad old days, we took for granted that servers would have to do the grunt work of resizing them to make sense for a particular application. And in 2024... we still need to validate those files on the server side. You can't trust a browser! But you can ask the browser to do the resizing up front, which saves time and frustration for users who no longer hit upload size limits. scale solves this problem neatly, accepting a File object and giving you back... a new File object. Which you can upload exactly the same way. We really, really like simple APIs. Speaking of which...

emulate-mongo-*-driver: the adapter pattern saves your bacon

Got legacy code? In no hurry to review every MongoDB driver call for compatibility with the latest version? We've got you covered. emulate-mongo-2-driver and emulate-mongo-3-driver let you keep that code, to a large extent, and still have the safety and compatibility of the latest mongodb driver underneath.

random-words: the name says it all

random-words is remarkably popular, considering that I whipped up the original version just to try my hand at test-driven development (TDD) while attending a conference talk on the subject. But the open source community has steadily contributed features and fixes over the years (thank you!), and now this module is a common choice for the job.

max-mem: how much is too much?

You know your command runs out of memory on that RAM-starved production server. I'm looking at you, npm run build *cough*

But: how much memory does it actually require?

For a quick and dirty estimate, you can run max-mem my-command-here on your beefy development laptop to find out. The max-mem utility checks the memory usage of your program frequently while it runs and reports the maximum figure when it finishes. That's it... that's plenty.

github-change-ownership-in-bulk: oops, we're a company now

We spun ApostropheCMS out as its own company... and then realized we had a lot of repos that needed to change ownership. Like... a lot.

github-change-ownership-in-bulk automates that transition. Is it our newest, most modern utility? No. Does it do the job? Yup.

github-stars-by-month: bragging rights with round numbers

We're all familiar with the github star history site, but for those who want this with nice round months on the X axis, github-stars-by-month is a handy way to get the data.

count-outside-pull-requests: is the call coming from inside the house?

Your open source project has outside contributors... but how many? If you need to prove community engagement, count-outside-pull-requests can give you those figures for all of your repos. Or all of your organizations' repos, for that matter.

alpha-beta-scanner: find those loose ends

You ship stuff as alpha, or beta, because you're cautious like that.

[Years pass, no one complains, you use it in production, etc.]

Oops! Still officially "alpha." No wonder no one outside your organization has tried it. Bonus points for a GitHub repo description that still says "HERE THERE BE DRAGONS." 🐉 🙈

alpha-beta-scanner is a handy way to identify all the modules you've published that are "still alpha" or "still beta." Simple and useful.

changelog-scanner

changelog-scanner is simple tool to list all commit messages since a given date in the default branch of all repos in a GitHub organization or organizations. Great for catching omissions in release announcements and changelogs. Not that this ever happens to us... oh gosh no.

express-cache-on-demand: YOU get a car, and YOU get (the same) car...

It's a very simple idea: if your Node.js application is busy generating a response, and the exact same request arrives from another client, why not serve them that same response when you finish the generating the first one?

There are more sophisticated ways to cache, with an expiration date and storage in Redis, et cetera. These things are good. But "cache on demand" is often attractive because the result is never stale. In fact, everyone involved gets a fresher response than they would have without it because the system is not struggling to generate six responses at once.

express-cache-on-demand implements this idea, while allowing you to adjust how the "cache key" is generated and reject the possibility of caching entirely for certain requests. Our default cache key generator is already pretty smart: it won't try to cache if req.session is interesting, or req.user exists, or the method is not GET or HEAD. So this middleware is often a drop-in solution. ApostropheCMS uses it to greatly optimize page responses under heavy load.

time-limited-regular-expressions: safety and power

Allowing users to enter regular expressions is tempting, but regular expressions can require a great deal of time to run. This can happen innocently. It can also happen maliciously. And in Node.js, a long-running regular expression is blocking an entire process that may be serving many other requests.

time-limited-regular-expressions is a simple wrapper that allows you to execute a regular expression more safely, in a separate process, with a fixed time limit. And the only change for you, the developer, is using await.

csv-to-zone-file: DNS entries in a hurry

You need to create lots of DNS records quickly. You have them in a spreadsheet, but you need a DNS zone file, suitable for import to Amazon Route 53 among other places. csv-to-zone-file can knock that out for you.

launder: never trust a browser!

That input you're accepting from the browser could be anything... anything at all. Especially when using a body parser to populate req.body, or populating req.query in fancy ways.

We wrote the launder module to handle the most basic sanitization tasks, like making sure a parameter expected to be a string comes through as a string. Under the hood, ApostropheCMS uses it to help implement our rich field types.

sluggo: pretty URLs for everyone

Look up at the address bar. /blog/useful-open-source-tools is the "slug" of this page: the part of the URL that identifies it within the site as a whole.

Slugs shouldn't contain most punctuation, and they should only contain slashes under certain circumstances. But what is punctuation, exactly, in a modern internationalized world?

Unfortunately, many tools still act like URLs can't be localized. Fun fact: they can! We just need to eliminate inappropriate punctuation in a Unicode-aware way.

sluggo converts strings, like user-entered titles, to safe slugs while honoring the fact that there are many character sets on this planet.  ApostropheCMS relies on sluggo to do a first-class job with all URLs, not just those in a Latin alphabet.

prettiest: REALLY simple storage and locking for the command line

When we write command-line utilities, we often need to store some data. And we often need to make sure that data isn't overwritten in some crazy slap-fight with another instance of the same utility. Which hardly ever happens. Except when it does, and you lose all of your data. [twitch] [twitch]

At the same time, it's helpful to make sure only one instance of the utility is running at a time. Especially if it runs other commands that were not carefully designed to run together. Not that this has ever happened to us. [twitch] [twitch] [twitch] [faceplant]

prettiest is a simple, radical solution:

  • prettiest returns a data object to your code.
  • You store whatever you want in properties of data.
  • When your program exits, data is stored to disk automatically.
  • The next time your code runs, data is automatically loaded from disk.
  • If another instance of your code starts before the first one exists, then it will just have to wait its turn... because right now, that first instance is the prettiest one, and it deserves the spotlight.

Persistence and locking were never this simple.

minuscule: build microservices quickly and safely

Still in the oven, but hey, the tests are passing:

minuscule is our newest offering, and it is a work in progress, which is why you'll need to check out the wip branch to learn more.

We build a lot of little API microservices, and as much as we love ApostropheCMS, not every tiny microservice needs a CMS. We could just reach directly for Express, and we often have. But Express development can be bug-prone because developers are 100% solely responsible for catching errors and sending responses manually. We created minuscule as a tiny wrapper for Express that solves these problems by allowing developers to "just return" values directly from async route functions, validate API input safely, catch and log errors automatically, and simplify the most common kinds of middleware. Definitely a case of keeping it simple. minuscule doesn't do everything, but that's the point.

Conclusion

You were expecting some neat, tidy summary? Hey, it's not a neat, tidy collection. But pulling back to 30,000 feet, we can see some themes: command line developer experience. HTML manipulation. GitHub API automation. Safety. Performance. And just plain usability.

While open-sourcing little utilities like this doesn't make us money, it does have benefits: it holds us to a higher standard of quality, it "gives back" in return for all the open source projects we depend on, and of course it opens the door to community contributions.

Speaking of which: pull requests welcome!