JS monorepos in prod 5: merging Git repositories and preserve commit history
May 21, 2021
- Categories
- DevOps & SRE
- Node.js
- Tags
- Bash
- DevOps
- Packaging
- Git
- GitHub
- GitOps
- JavaScript
- Monorepo [more][less]
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
At Adaltas, we maintain several open-source Node.js projects organized as Git monorepos and published on NPM. We shared our experience to work with Lerna monorepos in a set of articles:
- Part 1: project initialization
- Part 2: versioning and publishing strategies
- Part 3: commit enforcement and changelog generation
- Part 4: unit testing with Mocha and Should.js
- Part 5: merging Git repositories and preserve commit history
- Part 6: CI/CD, continuous integration and deployment with Travis CI
- Part 7: CI/CD, continuous integration and deployment with GitHub Actions
Now is the turn of our popular open-source Node CSV project to be migrated to a monorepo. This article will walk you through the available approaches, technics, and tools used to migrate multiple Node.js projects hosted on GitHub into the Lerna monorepo. At the end, we provide a bash script we used for migrating the Node CSV project. This script can be applied to a different project with just a little modification.
Requirements for migration
The Node CSV project combines 4 NPM packages to work with CSV files in Node.js wrapped by the umbrella csv
package. Each NPM package has its rich commit history, and we wanted to save the maximum information from the old repositories. There are our requirements for migration:
- preserve commit history with maximum information (such as tags, its messages, and merging commits)
- ameliorate commit messages to follow the Conventional Commits specification
- preserve GitHub issues
Monorepo structure
Well, we have 5 NPM packages to migrate to the Lerna monorepo:
We want to achieve a directory structure that looks like this:
packages/
csv/
csv-generate/
csv-parse/
csv-stringify/
stream-transform/
lerna.json
package.json
Choosing Git log strategy
When migrating repositories into a monorepo, you merge their commit logs. There are 3 suggested strategies in the image below.
- Single branch
It provides a straightforward log containing only commits on the default (master) branches of all packages. Different logs are joined sequentially by adding the latest commit of the previous package as a parent commit to the first commit of the next package. This strategy breaks the sorting of the log by the date of commits. - Multiple branches with a common parent
This improves the visual perception of the log by splitting branches of different repositories. A new parent commit is added to all the first commits of the branches. In the end, all the branches are merged into the default branch. - Multiple branches with different parents
This strategy doesn’t rewrite the first commits of old repositories. It requires minimal intervention into commit history and seems logically more correct because initially, the repositories didn’t have a common parent.
Merging commit logs
Lerna has a built-in mechanism for gathering existing standalone NPM packages into a monorepo preserving commit history. The lerna import
command imports a package from an external repository into packages/
. The sequence of commands is pretty simple: you need to initialize Git and Lerna repositories, make the first commit, and then start importing packages from locally cloned Git repositories. You can find basic usage instructions in the documentation here.
Using lerna import
, you can only follow the 1st or the 2nd Git log strategy described above. For the 2nd one, you need to create a separate branch per importing repository like this:
# Import 1st package
git checkout -b package-1
lerna import /path/to/package-1
# Switch back to the default branch
git checkout master
# Import 2nd package
git checkout -b package-2
lerna import /path/to/package-2
# Then merge branches into the default branch...
lerna import
provides an easy-to-use tool to migrate repositories to the Lerna monorepo. However, it flattens the commit history reducing merge commits, and it doesn’t migrate tags and their messages. Unfortunately, these limitations didn’t meet our requirement to save maximum information from existing repositories and we had to use a different tool.
The native git merge
command provides merging unrelated histories using the --allow-unrelated-histories
option. It preserves the full commit history of a targeted branch with its tags. In this case, you will achieve the 3rd Git log strategy.
Merging a commit history of an external repository into a current one using --allow-unrelated-histories
as simple as running 2 commands:
# Add an external repository as a remote
git remote add -f <external-repo-name> <external-repo-path>
# Merge commit history of a required branch
git merge --allow-unrelated-histories <external-repo-name>/<branch-name>
Rewriting commit messages
To put more order and transparency into the combined commit log, we prefix all commit messages with their package names. Additionally, we make them compatible with the Conventional Commits specification which we follow in our latest projects. This specification standardizes the commit messages making them more readable and easy to automate.
To implement this, we need to rewrite all commit messages by prefixing them with the string like chore(
.
We chose the
chore
type just to make it compatible with the specification, and we didn’t want to make complex regular expressions to fully support it.
There are 2 tools to rewrite commit messages:
git filter-branch
A native Git CLI command. It is not recommended officially because “it has a glut of gotchas generating mangled history rewrites”.git filter-repo
A third-party versatile tool for rewriting Git history officially recommended by Git.
Following the Git recommendation, we choose the git filter-repo
. After installing the tool using these instructions, the command to rewrite the commit messages of a current repository is:
git filter-repo --message-callback 'return b"chore(<package-name>): " + message'
To see more usage examples of rewriting repository history with git filter-repo
, you can follow this documentation.
Transferring GitHub issues
After migrating repositories and publishing a new monorepo to GitHub, we want to transfer existing GitHub issues from the old repositories. Issues can be transferred from one repository to another using the GitHub interface. You can follow this guide to learn the instructions.
Unfortunately, at the time of this writing, there is no possibility to make a bulk issues transfer. Issues must be transferred one by one. But this can give you an excuse to “forget” to transfer annoying pending issues created by the project community;)
What about GitHub pull requests? There will be a loss and we have to live with it. A good thing is that links between issues written in commentaries and linked pull requests will be saved thanks to redirecting.
Migration script
The migration bash script leverages the chosen approaches and tools described above. It generates the ./node-csv
directory containing the Node CSV project files reorganized as a Lerna monorepo.
#!/bin/sh
set -e
# 1. Configure
REPOS=(
https://github.com/adaltas/node-csv
https://github.com/adaltas/node-csv-generate
https://github.com/adaltas/node-csv-parse
https://github.com/adaltas/node-csv-stringify
https://github.com/adaltas/node-stream-transform
)
OUTPUT_DIR=node-csv
PACKAGES_DIR=packages
# 2. Initialize a new repository
rm -rf $OUTPUT_DIR && mkdir $OUTPUT_DIR && cd $OUTPUT_DIR
git init .
git remote add origin ${REPOS[0]}
# 3. Migrate repositories
for repo in ${REPOS[@]}; do
# 3.1. Get the package name
splited=(${repo//// })
package=${splited[${#splited[@]}-1]/node-/}
# 3.2. Rewrite commit messages via a temporary repository
rm -rf $TMPDIR/$package && mkdir $TMPDIR/$package && git clone $repo $TMPDIR/$package
git filter-repo \
--source $TMPDIR/$package \
--target $TMPDIR/$package \
--message-callback "return b'chore(${package}): ' + message"
# 3.3. Merge the repository into monorepo
git remote add -f $package $TMPDIR/$package
git merge --allow-unrelated-histories $package/master -m "chore(${package}): merge branch 'master' of ${repo}"
# 3.4. Move repository files to the packages folder
mkdir -p $PACKAGES_DIR/$package
files=$(find . -maxdepth 1 | egrep -v ^./.git$ | egrep -v ^.$ | egrep -v ^./${PACKAGES_DIR}$)
for file in ${files// /[@]}; do
mv $file $PACKAGES_DIR/$package
done
git add .
git commit -m "chore(${package}): move all package files to ${PACKAGES_DIR}/${package}"
# 3.5. Create a new branch, eg "init/my_package"
git branch init/$package $package/master
done
# 4. Cleanup and remove outdated files
rm $PACKAGES_DIR/**/CONTRIBUTING.md
rm $PACKAGES_DIR/**/CODE_OF_CONDUCT.md
rm -rf $PACKAGES_DIR/**/.github
git add .
git commit -m "chore: remove outdated packages files"
To run this script, simply create an executable file, for example with the name migrate.sh
, paste the script’s content inside it, and run it with the command:
chmod u+x ./migrate.sh
./migrate.sh
Note! Don’t forget to install
git-filter-repo
before running the script.
Notes for each step of the script:
1.
Configure
Configuration variables define the list of repositories to be migrated, the destination directory of the new Lerna monorepo, and the folder for packages inside it. You can modify these variables to reuse this script for your project.2.
Initialize a new repository
We initialize a new repository. The first repository is also registered as the remoteorigin
repository.3.
Migrate repositories3.1.
Get package name
It extracts package names from their repositories links. In our case, the repositories are prefixed withnode-
which we don’t want to keep.3.2.
Rewrite commit messages via a temporary repository
To add a prefix to the commits of each package using the patternchore(
, we need to make it separately for every repository. This is possible via a repository locally cloned to a temporary folder.): 3.3.
Merge the repository into monorepo
At first, we add a locally cloned repository as a remote to the monorepo. Then, we merge its commit history specifying a merge commit message.3.4.
Move repository files to the packages folder
After merging, the files of the merged repository appear under the monorepo root directory. Following the structure we want to achieve, we move those files to thepackages
directory and commit it.3.5.
Create a new branch
The commit history is now associated with our monorepos through a remote repository. The history will be lost if the original repository is erased. To store the history in the monorepo, we create a branch which track the remote repository and prefixed it withinit/
.
4.
Cleanup and remove outdated files For the sake of illustration, we clean up some package files that are outdated thanks to the migration. Some of those file shall be moved to the repository root directory.
Further steps
The GIT repository is now ready and, as such, qualifies as a monorepo. To make it usuable, additionnal files must be created such as a root package.json
file, the lerna.json
configuration file if using Lerna and a README
file. Refer to the first article of our serie to apply the necessary changes and initiliaze your monorepo with Lerna.
Conclusion
Migration of existing open-source projects requires you to be tidy and meticulous because a little mistake can ruin the job of your users. All the steps must be carefully analyzed and well tested. In this article, we have covered the scope of work to migrate multiple Node.js projects to the Lerna monorepo. We have considered different approaches, technics and available tools to automate the migration on the example of our Node CSV open-source project.