The last time Hackerfall tried to access this page, it returned a not found error. A cached version of the page is below, or click here to continue anyway

Why we open-sourced our uptime monitoring system

About a month ago, Sourcegraph released Checkup, an open-source, self-hosted uptime monitoring system written by Matt Holt.

Following its release, a lot of people asked us how we were using Checkup at Sourcegraph. Today, were sharing our public status page, powered by Checkup, and laying out some of the big advantages weve found using an open-source health check tool.

We sponsored the creation of Checkup, because we found none of the existing paid or free uptime monitors met our needs. You can read the Checkup launch post for details, but heres a quick overview of some of the things it enables:

How Sourcegraph usesCheckup

Using an open-source health check system makes it possible to incorporate health checks into a few different stages in the software development cycle.

1. Uptime monitoring

The first and obvious use case is production health checking. This is what you see on checkup.sourcegraph.com. Checkup is easy to deploy and all the monitoring data is stored in S3, so its easy to audit. And you can geographically distribute checks by deploying to servers in different parts of the worldanywhere where you can spin up a VPS.

2. Continuous integration

Though typically viewed as a last line of defense in production, health checks are really no different from any other test youd like to run against your app. And as experienced software engineers know, its far better to catch a bug in test than in prod.

Because Checkup can be used as a simple CLI, you can roll it into your continuous integration scripts. Below is a snippet taken from Sourcegraphs CI build that demonstrates this ability. We version a checkup.json file directly in our codebase that describes critical URLs that we must not break.

#!/bin/bash
# Quick end-to-end uptime tests
checkup_success=false
src serve &                    # run an instance of our server
for i in {1..5}; do
    sleep 2s;
    echo "Checkup health checks (attempt $i / 5)";
    if (checkup -c  ./dev/ci/checkup.json); then
        checkup_success=true;
        break;
    fi;
done;
kill %1                        # kill the instance of our server
if ! "$checkup_success"; then
    echo "Checkup health checks failed after 5 attempts" && false;
fi

3. In development

Whats better than a test you can run in CI? Why, a test you can run in your dev environment, of course. Before we push a new version of Sourcegraph and kick off a CI build, we can run Checkup against a dev server to verify that all critical endpoints are live. All you need to do is run checkup in the terminal (it picks up the endpoints from the configuration file versioned with the code).

Why source accessibility is important

We think Checkup has a lot going for it as a health check tool. Its not for everyone, but its simplicity and developer-driven design suit our purposes well. Besides the feature set and simple interface, we think Checkup has another strong advantage: its source code is publicly available.

Having the source available means you can dive into the inner workings of Checkup if unexpected behavior crops up. And you can extend the tool to fit your needs (and push those changes upstream to share those capabilities with other Checkup users). Already, community contributors have added support for new underlying data stores and new types of checks (TCP and DNS).

But availability alone is not enough. There are plenty of open-source projects where the code is available but inscrutable. Documentation and API design are key, but so is making the code itself as easy to navigate as possible. And in that spirit, here are 5 different places where you can dive into the Checkup source and understand how it all works:

We hope youll find Checkup useful as a tool and informative as a codebase. Please send us feedback and let us know how youre using it!

Continue reading on text.sourcegraph.com