Basic rules for software deployment

Every now and then someone asks me what is the best way to deploy code to a server. There’s a lot to be said about this subject, so I usually end up disappointing the inquirer because I rarely have a simple answer for them. Like so many other things in life, the best way to do it really depends on the exact situation - the scale you’re operating at, the type of servers you use, the type of application you’re deploying, the level of security required, the infrastructure being used, etc.

What I usually do is give them my opinion on what I consider a few basic rules that should always be followed in every scenario. This post is a simplification of that.

Automated testing

The first rule is also the one that most people overlook or outright ignore but it’s also probably the most important one: there has to be some sort of automated testing that runs before anything else and stops the deployment process if anything goes wrong.

Unit testing, integration testing, functional testing - do whatever you want but never, ever deploy code that isn’t tested in an automated way.

I can’t stress how important this is. It gives you peace of mind because you know that if a bug is introduced somewhere, there’s a very good chance it will get caught before reaching production (assuming you’re doing your tests correctly).

Most people think this is the biggest benefit of testing and this is in fact a huge benefit - but there’s something else which I consider an even greater benefit, even though it’s a much more subtle one.

Not being burdened with the worry of breaking something on the live servers gives developers people peace of mind. By releasing the developers from this weight on their shoulders, they can be more productive because they can manipulate their code freely without (as much) fear of breaking everything.

We don’t write tests (only) to catch bugs before they hit production - we write tests so we can refactor and iterate faster.

There’s a lot more to be said about automated testing but the bottom line is: do it, no excuses.

Automate all the things

This is another very common mistake: someone logs into a production server and runs git pull to update the live code. That’s a bit like playing Russian roulette with more than one bullet in the pistol chamber.

Automating everything means that at most you push a button to trigger a deployment but from there on every single thing is done automatically for you. A very common setup is to have a hook of some sort on the code repository and when code is pushed, a deployment is triggered and a script is executed.

Why is this important? For a very simple reason: we’re humans and humans make mistakes, whereas a deployment script will run exactly the same way every single time. If your deployment consists only of getting a couple of changed files from your code repository, it’s not a big deal but as you add more steps to this, the probability of making a mistake, even on something simple, gets higher.

For example, a long time ago I worked on a project where the project lead insisted that we manually deployed our code. He said that way we would feel more responsible and would be more careful. I didn’t want to take any chances, so I wrote a script to do it for me but most of my colleagues were doing it manually. They did it dozens of times without a problem but one time one of them forgot to run the database migrations and several hours passed before anyone noticed there was a problem. By then there was a ton of corrupted data and we all had a super fun time recovering from the event.

So do yourself a favour and resist the temptation of doing things manually. It’s far too easy to mess it up even in simple scenarios. Write a script to automate your deployment from the start of your project, even if it’s just a simple git pull. It will eventually grow and if you do it, you will never realise how thankful you should be that you did it.

Deploying the code

This is where things get murky because it really depends on what you are developing. For example, if you’re deploying a piece of Python or Ruby, you can have a simple script that runs git pull or something similar and probably restarts your web server or long-lived process(es). But if you’re building a Java application, it’s a bit more involved, because a new .jar file has to be built and whatnot. If you’re using a database you may need to run some database migrations. You probably need to compile and publish some static files. There are a lot of variables.

What I like to do is use a Fabric or Capistrano script that contains all the steps I would do if I were manually deploying the code. That script is triggered by Codeship, or whatever Continuous Integration tool I’m using when I push code to my repository (usually to a specific branch, like master), it connects to the servers and executes a series of pre-programmed steps to fetch the new code from the repository, publish static assets, run database migrations, restart servers, rotate DNS for multiple server redundancy, etc.

A lot of people do a simple git pull but the way I do it is run git fetch origin first followed by git reset --hard HEAD. This way I guarantee that if anyone committed the capital sin of manually going into the server and changing live code, those changes are gone.

Single source of truth

That last bit is another important point: there should be a single “source of truth” for your code, which means that anything that comes from outside the repository should be ignored, discarded and not permitted on the production servers.

If someone decides to patch a bug directly in the live code, they should get 5 lashes for each line of changed code and If they do that and then don’t apply the same changes on the repository, then that’s another 20 lashes. :-) That’s because the next time the code is deployed, the patch will be overwritten or, even worse, if you use the plain git pull method, the deploy may crash because there are untracked changes in the server’s copy of the repository.

Final thoughts

I hope this gives you a few ideas. Again, there’s a lot to be said about this topic and it entirely depends on your specific situation, so it’s impossible to come up with “the best way” to deploy software.

In order to help you with software deployments and the basic rules I mentioned here there are some tools that are very helpful and I recommend you check them out:

If this is a topic that interests you, I strongly recommend you read Zach Holman’s post titled “How to deploy software”. It’s one of the best articles I ever read about this topic. It’s long but worth every minute.

What about you, how do you deploy software? What are your basic rules?