Git Reminders
After using Git for two years, I’ve finally finished reading these two books, twice. Not only skimming, but also making excerpts and perform experiments in the meantime. These two fabulous books really benefit me a lot, and this is the final notes which construct my Git knowledge base, and comprise the excerpts from both books and experiments on some specific topics.
Book | Git Community Book |
Author | people in the Git community |
Link | alx.github.io/gitbook |
Book | Pro Git |
Author | Scott Chacon and Ben Straub |
Link | git-scm.com/book |
- Basics
- Configuration
- Porcelain
- Git Internals
Basics
Meta
Snapshots, Not Differences
Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot.
Git Generally Only Adds Data
It is very difficult to get the system to do anything that is not undoable or to make it erase data in any way.
Git Object Model
The SHA-1
checksum, object ID
- Represents object name.
- 40-digit long.
- Use SHA1 hash to generate based on the object content.
- Keeps the identity.
The Objects
Every object consists of three things: type, size, content.
There are four different types of objects: blob, tree, commit, tag.
blob is a chunk of binary data, used to store file data.
The blob is entirely defined by its data, totally independent of its location.
tree is basically like a directory - it references a bunch of other trees and/or blobs.
Since trees and blobs, like all other objects, are named by the SHA1 hash of their contents, two trees have the same SHA1 name if and only if their contents (including, recursively, the contents of all subdirectories) are identical.
commit points to a single tree, marking it as what the project looked like at a certain point in time. It contains meta-information about that point in time, such as a timestamp, the author of the changes since the last commit, a pointer to the previous commit(s), etc.
1 2 3 4 5 6 7 8 |
|
tag is a way to mark a specific commit as special in some way. It is normally used to tag certain commits as specific releases or something along those lines.
A tag object contains an object name (called simply ‘object’), object type, tag name, the name of the person (“tagger”) who created the tag, and a message, which may contain a signature
Staged
Staged means that you have marked a modified file in its current version to go into your next commit snapshot.
The staging area is a simple file, generally contained in your Git directory, that stores information about what will go into your next commit.
Branching
A branch in Git is simply a lightweight movable pointer to one of these commits.
How does Git know what branch you’re currently on? It keeps a special pointer called HEAD.
Remote Branches
Remote branches act as bookmarks to remind you where the branches on your remote repositories were the last time you connected to them.
Tags
The tag object is very much like a commit object, but a tag object points to a commit rather than a tree. It’s like a branch reference, but it never moves — it always points to the same commit but gives it a friendlier name.
You can tag any Git object. For example, the maintainer adds the GPG public key as a blob object and then tagged it.
Lightweight
A lightweight tag is very much like a branch that doesn’t change — it’s just a pointer to a specific commit.
Annotated
Annotated tags, however, are stored as full objects in the Git database. They’re checksummed; contain the tagger name, e-mail, and date; have a tagging message; and can be signed and verified with GNU Privacy Guard (GPG).
Configuration
Ignoring Files
Glob patterns are like simplified regular expressions that shells use.
You can negate a pattern by starting it with an exclamation point (!).
1 2 3 4 5 |
|
Commit Template
1
|
|
I’ve defined some experimental rules based on rangzen’s recommandation on Stack Exchange. Here is my .gitmessage.
Here is another post for specific usage, Readable Git Log by Using Custom Commit Template
Git Attributes
The path-specific settings are called Git attributes and are set either in a .gitattribute
file in one of your directories (normally the root of your project) or in the .git/info/attributes
file if you don’t want the attributes file committed with your project.
- Identifying Binary Files
- Diffing Binary Files (word, image EXIF)
- Filters (clean and smudge)
- Exorting
- export-ignore
- export-subst
- Merge Strategies
Porcelain
Branch
Inspecting a Remote
git remote show [remote-name]
1 2 3 4 5 6 7 8 |
|
Checkout and Track a Remote Branch
Two ways.
1 2 3 |
|
1 2 3 |
|
Tag
Share tags
By default, the git push command doesn’t transfer tags to remote servers. You will have to explicitly push tags to a shared server after you have created them.
git push origin [tagname]
If you have a lot of tags that you want to push up at once, you can also use the –tags option to the git push command.
git push origin --tags
Sign tags
- PGP, Pretty Good Privacy, the standard.
- GPG, Gnu Privacy Guard, the implementation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Distributing the Public PGP Key
Import the key into the Git database by exporting it and piping that through git hash-object
, which writes a new blob with those contents into Git and gives you back the SHA–1 of the blob.
1 2 |
|
Now that you have the contents of your key in Git, you can create a tag that points directly to it by specifying the new SHA–1 value that the hash-object command gave you:
1
|
|
If you run git push --tags
, the maintainer-pgp-pub tag will be shared with everyone. If anyone wants to verify a tag, they can directly import your PGP key by pulling the blob directly out of the database and importing it into GPG:
1
|
|
They can use that key to verify all your signed tags. Also, if you include instructions in the tag message, running git show <tag>
will let you give the end user more specific instructions about tag verification.
Generate a Build Number
Git gives you the name of the nearest tag with the number of commits on top of that tag and a partial SHA–1 value of the commit you’re describing:
1 2 |
|
The git describe command favors annotated tags.
Prepare a Release
Create tar.gz
.
1 2 3 |
|
Create zip
1
|
|
Rebasing
an advanced example
Now git show-branch master server client
shows like this:
1 2 3 4 5 |
|
What does git rebase --onto master server client
do?
- Checkout client branch
- Figure out the patches from the common ancestor of server and client (commits of
git log server..client
) - Replay the patches onto master
Run git show-branch master server client
again:
1 2 3 4 5 6 7 |
|
Merge Base
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Merge Stage
When merging, one parent will be HEAD, and the other will be the tip of the other branch, which is stored temporarily in MERGE_HEAD.
During the merge, the index holds three versions of each file. Each of these three “file stages” represents a different version of the file:
1 2 3 |
|
Some special diff options allow diffing the working directory against any of these stages:
1 2 3 4 5 6 |
|
Ancestry References
- ^<n> select the nth parent of the commit (relevant in merges).
- ~<n> select the nth ancestor commit, always following the first parent.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
recorded in git-rev-parse(1)
Commit Ranges
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Log
Summarize or Get a Quick Changelog
Use git shortlog
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Stash
Reapply the Staged Changes
Use git stash apply --index stash@{n}
You have stashed changes below,
1 2 3 4 5 6 7 8 9 10 11 |
|
After checking out to other branch and back, you wanna apply the changes stashed.
1 2 3 4 5 6 7 8 9 |
|
So, How to reapply the staged changes?
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Create a Branch from Stash
Use git stash branch {branch_name}
, which creates a new branch, checks out the commit you were on when you stashed your work, reapplies your work there, and then drops the stash if it applies successfully.
Create New Empty Branches
Use symobolic-ref. A symbolic ref is a regular file that stores a string that begins with ref: refs/. For example, your .git/HEAD is a regular file whose contents is ref: refs/heads/master.
In the past, .git/HEAD was a symbolic link pointing at refs/heads/master. When we wanted to switch to another branch, we did ln -sf refs/heads/newbranch .git/HEAD, and when we wanted to find out which branch we are on, we did readlink .git/HEAD. But symbolic links are not entirely portable, so they are now deprecated and symbolic refs (as described above) are used by default.
1 2 3 4 5 6 7 8 9 |
|
Filter Branch
Removing a File from Every Commit
The --tree-filter
option runs the specified command after each checkout of the project and then recommits the results.
1 2 3 |
|
Making a Subdirectory the New Root
Use --subdirectory-filter
option.
1 2 3 |
|
Changing E-Mail Addresses Globally
Use --commit-filter
option.
1 2 3 4 5 6 7 8 |
|
Realworld Example
Check this post, Git Filter Branch in Practice
Blame
If you pass -C
to git blame, Git analyzes the file you’re annotating and tries to figure out where snippets of code within it originally came from if they were copied from elsewhere.
1 2 3 4 5 6 |
|
Bisect
First you run git bisect start
to get things going, and then you use git bisect bad
to tell the system that the current commit you’re on is broken. Then, you must tell bisect when the last known good state was, using git bisect good [good commit]
.
Auto Check By Script
1
|
|
Doing so automatically runs test-error.sh
on each checked-out commit until Git finds the first broken commit. You can also run something like make or make tests or whatever you have that runs automated tests for you.
Submodules
Maintain a repo which contains a submodule
Add a submodule into your existing git repo.
Add a submodule.
1 2 3 4 5 6 7 8 9 10 11 |
|
As .git/config
and .gitmodules
have been registered, when you want to update the submodule, just enter into the submodule dir and do git opertations.
1 2 3 4 5 6 |
|
When you make changes and commit in that subdirectory, the superproject notices that the HEAD there has changed and records the exact commit you’re currently working off of. So, update your superproject from time to time with a pointer to the latest commit in that subproject.
1 2 3 4 5 6 7 8 9 10 |
|
Maintain a cloned repo which contains a submodule
1 2 3 4 |
|
Setup with two commands:
git submodule init
. Initialize your local configurtaion (.gitmodules
to.git/config
)git submodule update
. Fetch all the data from that project and check out the appropriate commit listed in your superproject.
A Demo Workflow
Create the submodules:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Create the superproject and add all the submodules:
1 2 3 4 5 6 7 |
|
See what files git-submodule created:
1 2 |
|
The git-submodule add
command does a couple of things:
- It clones the submodule under the current directory and by default checks out the master branch.
- It adds the submodule’s clone path to the gitmodules file and adds this file to the index, ready to be committed.
- It adds the submodule’s current commit ID to the index, ready to be committed.
Commit the superproject:
1
|
|
Clone the superproject:
1 2 3 |
|
Check submodule status:
1 2 3 4 5 |
|
Register the submodule into .git/config
:
1
|
|
Clone the submodules and check out the commits specified in the superproject:
1 2 3 4 |
|
One major difference between git-submodule update
and git-submodule add
is that git-submodule update checks out a specific commit, rather than the tip of a branch. It’s like checking out a tag: the head is detached, so you’re not working on a branch.
A detached head, means the HEAD file points directly to a commit, not to a symbolic reference.
1 2 3 |
|
Check out or create a new branch:
1
|
|
1
|
|
Do work and commit:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Cautions
Always publish the submodule change before publishing the change to the superproject that references it. If you forget to publish the submodule change, others won’t be able to clone the repository:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
It’s not safe to run git submodule update if you’ve made and committed changes within a submodule without checking out a branch first. They will be silently overwritten:
1 2 3 4 5 6 7 8 9 10 |
|
Subtree Merging (A Submodule Substitution)
Use git-merge Subtree Strategy
Add a subtree
First, add the Rack project as a remote reference in your own project and then check it out into its own branch.
1 2 3 |
|
Now you have the root of the Rack project in your rack_branch
branch and your own project in the master
branch.
1 2 3 4 5 6 7 |
|
Use git read-tree
to read the root tree of one branch into your current staging area and working directory. You just switched back to your master
branch, and you pull the rack_branch
into the rack subdirectory of your master
branch of your main project.
1
|
|
When you commit, it looks like you have all the Rack files under that subdirectory — as though you copied them in from a tarball.
Update and merge subtree
If the Rack project updates, you can pull in upstream changes by switching to that branch and pulling:
1 2 |
|
Then, you can merge those changes back into your master branch. You can use git merge -s subtree
and it will work fine; but Git will also merge the histories together, which you probably don’t want. To pull in the changes and prepopulate the commit message, use the --squash
and --no-commit
options as well as the -s subtree
strategy option:
1 2 3 4 |
|
All the changes from your Rack project are merged in and ready to be committed locally. You can also do the opposite — make changes in the rack
subdirectory of your master branch and then merge them into your rack_branch
branch later to submit them to the maintainers or push them upstream.
Diff a subtree
To get a diff between what you have in your rack subdirectory and the code in your rack branch branch — to see if you need to merge them — you can’t use the normal diff command. Instead, you must run git diff-tree with the branch you want to compare to:
1 2 |
|
Use git-subtree
- alternatives-to-git-submodule-git-subtree by Atlassian Blog
- Understanding Git Subtree by HPC @ Uni.lu
Git Internals
Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it.
- Plumbing, verbs that do low-level work and were designed to be chained together UNIX style or called from scripts.
- Porcelain, the more user-friendly commands.
Plumbing Objects
Blob Object
Git is a content-addressable filesystem, at the core of Git is a simple key-value data store. You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time.
Create
1 2 |
|
The -w
tells hash-object to store the object; otherwise, the command simply tells you what the key would be.
Check
1 2 |
|
View
1 2 3 4 5 |
|
Tree Objects
Basically, tree objects are used to specify snapshots.
Tree object solves the problem of storing the filename and also allows you to store a group of files together.
View
1 2 3 4 |
|
Create.
Git normally creates a tree by taking the state of your staging area or index and writing a tree object from it.
-
Use
update-index
to create an index.--add
, because the file doesn’t yet exist in your staging area.--cacheinfo
, because the file you’re adding isn’t in your directory but is in your database.100644
, which means it’s a normal file. Other options are100755
, which means it’s an executable file; and120000
, which specifies a symbolic link.
1 2 |
|
- Use
read-tree
to read an existing tree into your staging area as a subtree by using the--prefix
option.
1 2 3 4 |
|
Use the write-tree
command to write the staging area out to a tree object. No -w
option is needed — calling write-tree
automatically creates a tree object from the state of the index if that tree doesn’t yet exist:
1 2 |
|
Commit Objects
You have three trees that specify the different snapshots of your project that you want to track, but the earlier problem remains: you must remember all three SHA–1 values in order to recall the snapshots. You also don’t have any information about who saved the snapshots, when they were saved, or why they were saved. This is the basic information that the commit object stores for you.
Create
Use commit-tree
.
1 2 3 4 5 |
|
View
1 2 3 4 5 6 |
|
Object Storage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
What Git does when you run the git add and git commit commands?
- stores blobs for the files that have changed
- updates the index
- writes out trees
- writes commit objects that reference the top-level trees and the commits that came immediately before them.
1 2 3 4 5 6 7 8 9 10 |
|
Index
The index is a binary file (generally kept in .git/index) containing a sorted list of path names.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
-
The index contains all the information necessary to generate a single (uniquely determined) tree object.
-
The index enables fast comparisons between the tree object it defines and the working tree.
-
It can efficiently represent information about merge conflicts between different tree objects.
Packfile
- Loose objects are the simpler format. It is simply the compressed data (snapshots) stored in a single file on disk.
- Packed Objects. In order to save that space, Git utilizes the packfile. This is a format where Git will only save the part that has changed in the second file, with a pointer to the file it is similar to. Triggered by
- run the
git gc
command manually - push to a remote server
- run the
When Git packs objects, it looks for files that are named and sized similarly, and stores just the deltas from one version of the file to the next.
What is also interesting is that the second version of the file is the one that is stored intact, whereas the original version is stored as a delta — this is because you’re most likely to need faster access to the most recent version of the file.
The Refspec
Recorded in .git/config
.
1 2 3 4 |
|
The format of the refspec is an optional +
, followed by <src>:<dst>
.
+
tells Git to update the reference even if it isn’t a fast-forward.<src>
is the pattern for references on the remote side.<dst>
is where those references will be written locally.
fetching
If you want Git instead to pull down only the master branch each time, and not every other branch on the remote server, you can change the fetch line to
1
|
|
You can also specify multiple refspecs for fetching in your configuration file.
1 2 |
|
You can use namespacing to accomplish something like that. If you have a QA team that pushes a series of branches, and you want to get the master branch and any of the QA team’s branches but nothing else, you can use a config section like this:
1 2 3 4 5 6 7 |
|
If you want to do something one time, you can specify the refspec on the command line, too. Multiple refspecs are accepted.
1 2 3 4 5 |
|
Data Recovery
reflog
As you’re working, Git silently records what your HEAD is every time you change it. Each time you commit or change branches, the reflog is updated. The reflog is also updated by the git update-ref
command, which is another reason to use it instead of just writing the SHA value to your ref files.
1 2 3 |
|
To see the same information in a much more useful way, we can run git log -g
or git log --walk-reflogs
, which will give you a normal log output for your reflog.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
fsck
File System Consistency Check
Suppose your loss was for some reason not in the reflog, you can use the git fsck
utility, which checks your database for integrity. If you run it with the --full
option, it shows you all objects that aren’t pointed to by another object:
1 2 3 4 5 |
|