Git Filter Branch in Practice
For some reasons, our company team is migrating our codebase from Github Enterprise to Gitlab. One of the annoying things we should do is to update the invalid author names and emails in our git commits. Specifically, we should
Filter out the author emails which are not ending umeng.com
, modify meta info of these commits by a self-defined rule, and update the inconsistent author and committer info.
I’ve used git-filter-branch
once to do a similar but simpler job, which updated my own name and email, by using env-filter
option in a few lines to complete.
Things are getting a little complicated this time. Our repo has several branches, numbers of collaborators and almost 18,000 commits. I must be careful and patient, to find a safe way before reaching the ultimate horrible “force update”.
Major Idea
Use git filter-branch --commit-filter
to update each commit’s author info.
Psuedo-code of updating logic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Step by Step
1. Checkout a test branch
1
|
|
2. Filter author emails
Use generate_stats.rb to
- Gather commits info of author_name, author_email, and committer_email.
- Run again after finishing the whole job to verify.
3. Prepare a mapping file
For authors whose email domain is not umeng
, write the mapping file under this rule:
- Seperated by
\s
- First is the valid Umeng name
- Second to the end, are the names of the invalid email
Sample:
change wendy@xx.com
and ifyouseewendy@xx.com
to wendi@umeng.com
.
1
|
|
4. Leverage mapping file
Write a Ruby script to map names, used in the final script.
update_name.rb, read a name to change, output the corresponding Umeng author name.
5. Git filter-branch bash script
Here is the final working script, git_filter_branch.sh. The bash email pattern matching part was tweaked based on glenn jackman’s answer on Stack Overflow.
Things to Take Caution
When running git filter-branch --commit-filter <commad>
, logic in <command>
was the core part to finish my job. Remenber, DO NOT write echo
in command part for debug use or whatever, as echo
will interrupt the filter branch workflow.
Better use a seperate script when debugging. I use update_email.rb to develop on email pattern matching, and copy paste into the final git_filter_branch.sh.