A Basic Pattern for Continuous Delivery in a Cloud Database

marcussorealheis
5 min readAug 25, 2022

You need to git up, git out and git something
How will you make it if you never even try
You need to git up, git out and git something
’Cause you and I got to do for you and I — Git Up, Git Out, Outkast

please note: many of the hyperlinks in here are targeting beginners because I want them to know what’s out there so they can go solve new problems with their knowledge. Many readers will be familiar with the concepts and resources that I link to because most of them are not new.

When I started working on Atlas Search in 2020, I couldn’t reconcile the idea of a database technology that was only deployable in the cloud. How can I do continuous delivery with a CI pattern so obtuse? I also could not see myself beholden to a single cloud provider. A few months later, MongoDB released multi-cloud database clusters. Even though I thought it was mostly hype, I had to drop that gripe. Yet my continuous delivery suspicion remained, particularly around schemas, as did our customers’ pains on the same topic.

More or less a real photo

This year, during our internal hackathon, our (winning) team pictured above stumbled on a simple solution to the problem of continuous schema delivery under pressure. I proposed continuous index management in Atlas Search with GitHub Actions on a whim. This blog is about index management in Atlas Search because it’s a good example. I’m not suggesting everybody run over to Atlas Search and try to tame some text for the next six days. I don’t care if you are on Planet Scale, KeyValueDB, or an over-priced knock off of MongoDB. GitHub Actions could be a great first step toward a “remocal” (remote-local) development workflow.

Led by Benjamin Perlmutter, second of his name, my skunkworks team built a very simple NPM library for adding a search bar to a React website. To use Atlas-Static-Site-Search-Box — clearly we didn’t win for the name — you only need a sitemap for your site and Atlas credentials.

A demo gif of an engineer adding the npm module to his react application.
How to add the static site search box to your application

With one command here, copy pasta there, voíla! You know the vibes. Once we had the first iteration of everything packaged up, we realized we were too lazy to manually deploy the search index definition as we iterated.

“Index definition” in Atlas Search refers to a domain-specific language (DSL) that defines the write-side analysis chain in an Apache Lucene search index. If that doesn’t mean anything to you, think “field mappings” or “schema” in a data store. When a user sets up a search index, it impacts how the context of their corpus is stored. Getting it right or wrong can be the difference between Best Buy and Circuit City 💀, Target and Sears Roebuck 💀. It’s almost always a bit of trial and error, and a hackathon is about speed of execution. Manually updating index definitions every time we changed something was messing up the rotation.

As the worst coder on the team, third coolest, I pulled GitHub Actions out of gas when asked by Ben what should we do. “It’s simple,” I started to type in Slack but lost to the “Drafts” dungeon. I remember thinking: (1) The index definitions are in the repo. (2) The repo is on GitHub. (3) GitHub is out here leasing Linux servers to the software development world to fuel Copilot’s flight training. I whipped up some Bash with Ben batting cleanup. Then, once merged, every index definition commit to the project repo would deploy an updated search index.

The github-action-search-index.yamlaction is stored in the .github/workflows/ derr and looked something like:

name: Update the Search Index Definition
on:
push:
paths:
- "**/site_index.json"

env:
PUBLIC_KEY: ohtani
GROUP_ID: 627586aa915a0f72783d661c
CLUSTER_NAME: WeShallWin
PRIVATE_KEY: ${{ secrets.PRIVATE_KEY }}
jobs:
Publish-Atlas-Search-Index:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- run: echo "🎉 The job was automatically triggered by a ${{ github.event_name }} event."
- run: echo "🐧 This job is now running on a ${{ runner.os }} server hosted by GitHub!"
- run: bash ${GITHUB_WORKSPACE}/scripts/create-atlas-search-index.sh ${GITHUB_WORKSPACE}/atlas_search/site_index.json

Tweak, smash approve, commit, deploy. Tweak, smash approve, commit, deploy. Several devs cooking up on the latest at the same time! You might be thinking, Jenkins or Travis work already. Circle CI + Bazel + BuildBuddy let you cruise. Those are obviously designed for more robust systems. The hackathon project is one step up from a Vercel landing page in complexity, and several steps lower than those options. The more I write this blog, it becomes clear that this solution is probably for an MVP. I’m asking myself, where the hell are my credentials? How is GitHub authz’d to my cluster?

Security is for ̶s̶o̶c̶i̶a̶l̶ ̶m̶e̶d̶i̶a̶ ̶c̶o̶m̶p̶a̶n̶i̶e̶s̶ ̶h̶o̶l̶d̶i̶n̶g̶ ̶s̶i̶g̶n̶i̶f̶i̶c̶a̶n̶t̶ ̶p̶o̶w̶e̶r̶ ̶o̶v̶e̶r̶ ̶t̶h̶e̶ ̶h̶e̶a̶l̶t̶h̶ ̶a̶n̶d̶ ̶s̶t̶a̶b̶i̶l̶i̶t̶y̶ ̶o̶f̶ ̶h̶u̶m̶a̶n̶i̶t̶y̶ my guy Mudge! The threat modeling for this project was limited to about ten minutes because we relied largely on open source software and a tiny scope. If you are starting a project with a managed db, maybe this pattern can help you with continuous delivery of your config files. I will never waste time manually changing configurations and trying to migrate schemas again. Nor will I leave that work out of version control like I did with MVPs before Github Actions.

At one point, I was an unrelenting hacker due to lack of skill. In other words, I never read or comprehended any docs fully. StackOverflow checkmarks were my cliff notes. For technical content, only Douglas Crockford, the homie Erik Hatcher, and “the Gang of Four” made sense. We didn’t win because we had the most complex project with greatest technical achievement, believe me. We won because we had the simplest. And we finished.

Purists or experts will write this post off. They will be bike shedding about my use of terms. That’s fine. I’ll be shipping. In the criteria for why we won, this strategy wasn’t even mentioned. Productivity is not sexy, nor are the subtleties of how you can get there. I try not to overlook the value of developer experience, and the compounding value of cognitive continuity.

PS.

I’m training an AI to write in my voice. Let me know how that’s going in the comments. Some helped me with this post. In 6–9 months (PM timelines), I expect to write complete posts just with a title, hand-picked rap bars, a code snippet, and a conclusion.

--

--

marcussorealheis

Apache Solr Committer, MongoDB and Weaviate Advisor, Co-Founder at a Futuristic Tools Company