For some reasons, our company team is migrating our codebase from Github Enterprise to Gitlab. One of the annoying things we should do is to update the invalid author names and emails in our git commits. Specifically, we should
Filter out the author emails which are not ending
umeng.com, modify meta info of these commits by a self-defined rule, and update the inconsistent author and committer info.
git-filter-branch once to do a similar but simpler job, which updated my own name and email, by using
env-filter option in a few lines to complete.
Things are getting a little complicated this time. Our repo has several branches, numbers of collaborators and almost 18,000 commits. I must be careful and patient, to find a safe way before reaching the ultimate horrible “force update”.
git filter-branch --commit-filter to update each commit’s author info.
Psuedo-code of updating logic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Step by Step
1. Checkout a test branch
2. Filter author emails
Use generate_stats.rb to
- Gather commits info of author_name, author_email, and committer_email.
- Run again after finishing the whole job to verify.
3. Prepare a mapping file
For authors whose email domain is not
umeng, write the mapping file under this rule:
- Seperated by
- First is the valid Umeng name
- Second to the end, are the names of the invalid email
4. Leverage mapping file
Write a Ruby script to map names, used in the final script.
update_name.rb, read a name to change, output the corresponding Umeng author name.
5. Git filter-branch bash script
Things to Take Caution
git filter-branch --commit-filter <commad>, logic in
<command> was the core part to finish my job. Remenber, DO NOT write
echo in command part for debug use or whatever, as
echo will interrupt the filter branch workflow.