Today someone asked for my help to undo a git push --force
that messed up his production server by overwriting the master branch with something that wasn’t meant to be there.
My first advice would be to not deploy to production servers by manually running git pull
or git push
, but that’s another story and I briefly mention it in my post on basic rules for software deployment.
So how would you solve the situation? It’s actually quite simple but if you don’t know some details about how git works, it can seem a bit daunting.
If you’re in a hurry and know what you’re doing:
- Look at the reflog from either your local (
git reflog
) or remote repository (git reflog show remotes/origin/master
) and find the commit that you want to revert to; - Checkout the commit you want to return your repository to by checking out its hash;
- Create a temporary branch off from the commit;
- Push it back to the remote repository.
Done.
If you want some more details to understand how this works, stick around.
To delete or not to delete
The first lesson is that git rarely deletes data. No, really, even when you think you’re deleting a branch, you’re not. You’re just deleting a reference to that branch, but all the commits and its files are still in your repository, untouched (until you do something like git gc
but that’s a story for another day).
How is that possible? Glad you asked!
Git’s commit graph
Explained in a simple way, a git repository is nothing more than a directed acyclic graph, with each vertex, or node, being assigned a unique commit hash.
Let’s say you start off with a simple repository that has nothing more than a master
branch (green in the images below) with a couple of commits and you currently have the last commit (HEAD) checked out.
The first concept to grasp is that this master
branch we talk about is almost virtual, it’s not a container for our data, it’s more like a reference, or a tag that git uses to always point to the latest commit coming from a specific ancestor commit. What really matters is each commit’s hash, since those actually belong to each commit and are unique to them. In other words, they are the unique identifiers of each commit.
Imagine a group of people marching in line, the first one carrying a flag identifying its group name and symbolizing that every person behind the flag in that line belongs to that group. Now imagine several of these lines, each with its own flag. Even if the person who carries the flag drops it, the line still exists with all its members, each carrying their own unique identity card. The same applies to git branches and commits - they are just “flags” to let you know that that particular chain of nodes, or commits, are referred to by that name. The branch name (the flag) is merely a reference to a specific commit that points to the newest commit (the flag carrier) in a chain of commits (the whole line of people) and this chain of commits, the whole set of nodes from that chain, is what we actually call “branch”. You can make this reference point to a commit other than the newest in the chain but for the sake of simplicity I’ll skip that.
Now let’s say you make some changes to a file in your master
branch and commit those changes. What git does is create a new node in the repository graph, and sets this new node’s parent as the previous node/commit. It also moves the master
“flag”, or reference, to point to the new commit.
Continuing the line of people analogy from above, this would be the same as having a new person join the line in the front and picking up the flag from the person behind her. This is pretty much like attaching a tag to that commit and in fact, the difference between a git tag and a git branch is that a tag is attached to a specific commit and doesn’t change when you add new commits, whereas a branch will always be updated by git to point to the newest commit. This way you can mark specific commits and get back to them without having to know their commit hash - because what you’re doing when you do git checkout <tag_name>
is the same as git checkout <commit_hash>
, where <commit_hash>
is the hash to which the tag points. Same thing for checking out a branch, except that when you commit some new changes in a branch, git automatically updates the commit hash the branch reference points to.
Branches
A similar thing happens when you create a branch from master
, make some changes and commit them. First, when you create the new branch, a new reference is created, pointing to the commit, or node you were at when you created the new branch. At this point there are two references pointing to that same commit, or graph node - the master
branch and your new branch. Then, when you commit the changes, this new commit will have as its parent the commit/node you were in when you created the branch, and any subsequent commits will build up on that chain, not touching the nodes in the master
branch.
If later you decide to delete this branch, you will only delete the reference to the last commit in the branch but the actual commits, the actual files and changes you made to them, will still be there.
It’s just a reference!
OK, so it should be clear by now that branch names and tags are simply references to commits and the commit hashes are what really matters.
This explains why when you delete a branch or a tag, git doesn’t actually delete the commits - it just deletes the reference that points to the commits! And thus our data is still there.
A useful comparison (if you know HTML and CSS) is having an HTML element with an id, which you can use in CSS to select the element and apply some styling. If you remove the id attribute from the HTML element, your element is still there, you didn’t delete it; you merely deleted a reference to it, which makes it harder - but not impossible - for you to find it.
Now that you know that even if you “delete” stuff, it’s actually still there in your repository, and given how powerful git is, it’s no surprise that it gives you all the tools you need to recover from accidents.
Undoing the mess
This finally brings us to how we would solve the situation described at the beginning of this post: you forced a push to your remote repository (hopefully not your production server!), overwriting the master branch there, and then you realised you screwed up and want to undo this.
When you push changes to a remote repository and git tells you it can’t do it, it’s usually because the chain of commits, or nodes in the graph that lead up to the commit the branch name currently points to, is different in the remote repository and your local repository.
In reality, git could perfectly well push the commits to the remote repository, as the problem is not with the actual commits; what it is telling you that it can’t/shouldn’t do is change the branch name pointer to point to your new latest commit because its chain diverged from the one in the remote repository. In other words, the two “paths” to the latest commits in the local and remote repositories are different. Or to put it another way, your local and the remote branches’ source node/commit is not the same.
This can happen if you do some commits locally, push them to the remote, then realise you want to do things differently, so you do something like git reset HEAD~1
to move the branch name pointer back one commit, then you make whatever changes you need to make, commit those changes and try to push to the remote again.
You can tell git to ignore this and still change the branch name reference in the remote repository to make it point to your new commit. That’s what the --force
option in git push
does.
And again, this will give you the impression that the commits of the old branch have disappeared but as we’ve seen, they are still there, so we can get them back.
In order to do that you just need to find the right commit hash, because you no longer have a descriptive name to reference it. To do that you can use <code>git reflog</code>. More specifically, you’ll want to look at the remote repository’s reflog, so see what you erased there: git reflog show remotes/origin/master
This will give you something like this list:
$ git reflog show remotes/origin/master
89b85b3 remotes/origin/master@{0}: fetch --append --prune origin: forced-update
32c0dbf remotes/origin/master@{1}: update by push
15c8e79 remotes/origin/master@{2}: update by push
b4ff13f remotes/origin/master@{3}: update by push
git reflog
gives you a log of commits that Git still has in its storage. It’s like the history of the repository. It shows you some information about each commit, including the first few characters of its hash and the commit message. If you try it, you’ll see that it shows more than just commits but let’s not worry about that for this exercise.
What you want is to search the commit messages to find the last commit of your old branch, the one you want to recover. In our example, since we haven’t done anything else after the forced push, it’s easy enough to find the commit we want: the first line is the forced push, so the one right after it is the hash referring to the state the remote repository was in right before you forced the push.
When you know which hash you want, copy it and do a git checkout <hash>
. In our example you want the commit 32c0dbf
, so you would do git checkout 32c0dbf
.
You can verify that your files are how you want them and if everything is correct, you should now create a new reference to make it easy to find this commit. In other words, you want to create a new branch off from this commit, so just do a git checkout -b temp_branch
.
As of now you have recovered your original branch, even though you gave it another name, and you can push it to your remote repository to replace the mess you made: git push --force origin temp_branch:master
. There’s one detail: since the master
references on the origin
remote and your local repository have diverged, you have to force the push once again.
This will have fixed your remote master
but note that your local master
is still the one you messed up, so you can either delete it with git branch -D master
if you are sure you don’t need the changes in it, or you can rename it if you want to keep it around to copy stuff from it, or maybe cherry-pick
a few of its commits: git branch -m master messed_up_master
.
Finally, rename your temporary branch so that it replaces your master: git branch -m temp_branch master
and your repository will be back to its former glory.
You’re welcome.