Basic rules for software deployment · Já não vou salvar o mundo

Now and then someone asks my opinion on what is the best way to deploy code to a server. There’s a lot to be said about this subject, so I usually end up disappointing the inquirer because I rarely have a simple answer for them. Like so many other things in life, the best way to do it depends on the exact situation. The scale you’re operating at, the type of servers you use, the type of application you’re deploying, the level of security required, the infrastructure you’re using, etc.

What I usually do is convey a few basic rules that I always follow in every scenario.

Automated testing

The first rule is also the one that most people overlook or outright ignore but it’s also probably the most important: there has to be some sort of automated testing that runs before anything else and stops the deployment process if anything goes wrong.

Unit testing, integration testing, functional testing - do whatever you want but never, ever deploy code that isn’t tested in an automated way.

I can’t stress enough how important this is. It gives you peace of mind because you know that if a bug is introduced somewhere, there’s a very good chance it will get caught before reaching production (assuming you’re doing your tests correctly).

Most people think this is the biggest benefit of testing and it is a huge benefit - but there’s something else I consider an even greater benefit, even though it’s a much more subtle one.

Not being burdened with the worry of breaking something on the live servers gives developers people peace of mind. By removing this weight from the developers’ shoulders, they can be more productive because they can manipulate their code freely without (as much) fear of breaking everything.

We don’t write tests (only) to catch bugs before they hit production - we write tests so we can refactor and iterate faster.

There’s a lot more to be said about automated testing but the bottom line is: do it, no excuses.

Automate all the things

This is another very common mistake: updating production code by logging into a server and running git pull. That’s a bit like playing Russian roulette with more than one bullet in the pistol chamber.

Automating everything means that at most you push a button to trigger a deployment. From that point on, everything has to happen automatically. A very common setup is to have a hook of some sort on the code repository and when code is pushed, a deployment is triggered and a script is executed.

Why is this important? For a very simple reason: we are humans. Humans make mistakes whereas a deployment script is deterministic: it will run the same way every single time. If your deployment consists only of getting a few changed files from your code repository, it’s not a big deal. But as you add more steps to this, the probability of making a mistake, even on something simple, gets higher.

For example, a long time ago I worked on a project where the project lead insisted that we manually deployed our code. He said that way we would feel more responsible and would be more careful. I didn’t want to take any chances, so I wrote a script to do it for me but most of my colleagues were doing it manually. They did it dozens of times without a problem but one time one of them forgot to run the database migrations and several hours passed before anyone noticed there was a problem. By then there was a ton of corrupted data and we all had a super fun time recovering from the event.

So do yourself a favour and resist the temptation of doing things manually. It’s far too easy to mess it up even in simple scenarios. Write a script to automate your deployment from the start of your project, even if it’s just a simple git pull. It will eventually grow and if you do it, you will never realise how thankful you should be that you did it.

Deploying the code

This is where things get murky because it depends on what you are developing. For example, if you’re deploying a piece of Python or Ruby, you can have a simple script that runs git pull or something similar and restarts your web server or long-lived processes. But if you’re building a Java application, it’s a bit more involved, because a new .jar file has to be built and whatnot. If you’re using a database you may need to run some database migrations. You probably need to compile and publish some static files. There are a lot of variables.

I used to have a Fabric or Capistrano script that contains all the steps I would do if I were manually deploying the code. That script is triggered by TravisCI, CircleCI , Drone.io, Jenkins or whatever Continuous Integration tool I’m using when I push code to my repository (usually to a specific branch, like master). It connects to the servers and executes a series of pre-programmed steps to fetch the new code from the repository, publish static assets, run database migrations, restart servers, rotate DNS for multiple server redundancy, etc.

A lot of people do a simple git pull but the way I used to do it was running git fetch origin first followed by git reset --hard HEAD. This way I guarantee that if anyone committed the capital sin of manually going into the server and changing live code, those changes are gone.

Nowadays I prefer to use Docker instead of scripts that change things on servers. The CI tool does the job of building a container image and pushing it to a repository. Then it takes care of updating the running services on the servers or a hosted service like Google Cloud Run, Google Kubernetes Engine, AWS ECS, etc.

Single source of truth

There should be a single “source of truth” for your code. This means that anything coming from outside the repository should be ignored, discarded and not permitted on the production servers.

If someone decides to patch a bug directly in the live code, they should get 5 lashes for each line of changed code. If they do that and then don’t apply the same changes on the repository, then that’s another 20 lashes :-)

That’s because the next time the code is deployed, the patch will be overwritten and the bug will be back. Or if you use the plain git pull method, the deployment may crash because there are untracked changes in the server’s copy of the repository.

Final thoughts

I hope this gives you a few ideas. Again, there’s a lot to be said about this topic and it entirely depends on your specific situation, so it’s impossible to come up with “the best way” to deploy software.

If this is a topic that interests you, I strongly recommend you read Zach Holman’s post titled “How to deploy software”. It’s one of the best articles I ever read about it. It’s long but worth every minute.

What about you, how do you deploy software? What are your basic rules?

Menu

Automated testing

Automate all the things

Deploying the code

Single source of truth

Final thoughts