A Maven-Git-Monorepo

21 September 2017 • development & monorepo • 10 min read

I’m a software engineer and want to tell you how my team at LOGIBALL moved from one Svn repository and multiple Git repositories into one Git based monorepo which follows the expanding/contracting style that Google haves. The used build tool previous and now is Maven.

At first a big thanks to Paul Hammant for his article Maven In A Google Style Monorepo and help at different points in this migration.

Before the monorepo

We used Subversion (Svn henceforth) for ten years for multiple products and projects. Another team was using Git for a year up to this moment. Because of that team’s positive experience, Git was selected as our next company-wide VCS.

My team develops a bunch of ‘geo’ related services (mapping, geocoding, routing) and for a few months some Vaadin-based clients. Vaadin is a framework to create rich Internet applications. The clients we build are highly depend on customer requests which have a big variety in how and what to show in maps or on routing results . Each of our Vaadin clients was based on a number of shared libraries that we had made. Each client and each library has its own Git repository, in the classic style for Git organizations.

The geo services developed in the Svn were in the classic trunk, branches, tags (TBT) layout. All services and libraries are were within the same trunk as a separate directory. In both cases, clients and services, the sharing of libraries was done at a binary level and for released libraries only which were distributed via an Archiva installation (on-prem).

Research and alternates

Before recommending a monorepo layout I thought about the alternate repo design popular in the Microservices¹ scene. One deployable unit (library or service) in each repository. But my team didn’t like the idea of handling a bunch of small repositories. We tried it anyway in the client development. The result: we also didn’t like it.

In order to do a smooth migration, I made a detailed proposal for my team, with a aim of getting everyone agreement before pushing ahead with it. This proposal described the pro and cons to move to a monorepo or to a multiple repositories implementation. During the discussion about this, we also thought about two repositories instead of a monorepo. One for the service and one for the client development. But in that case we must also share binaries between them, because there are some libraries which are used by the services and the client. Then we have the cons of two worlds.

When we decided to go with the monorepo layout, I created a second proposal. This one was about the directory layout within the monorepo. This was an important topic because of the warning of the risk of chaotic directory layout (see inline callout with that title). I tried to create a generic layout that makes it possible to add new components, also in different languages, easy. In my opinion also the resulted paths must be readable and help to navigate thru the repository. That proposal was also accepted.

My experience is that the collective decision about a change like this is very important. Without that participation towards the decision, the risk of team rejection was very large.

Goals

The following points are our main goals:

Get rid of repetitive Maven configuration
Delete old Git repositories for the client development
Get rid of Svn and switch to Git
Share code at source level and get rid of Maven Snapshots
Implement Trunk-Based Development
Atomic commits

Constraints

Every service and client must independently releasable
It must possible to safely work on a subset of the monorepo (git sparse checkout) at a moment in time
Every service/client should be able to be built independently (from scratch) with all of its dependencies without depending on a binary repository

For the moment of the migration itself there were no hard constraints. We may not be “release ready” during the migration and it was OK if no repository is accessible for one or two days. The only real constraint was that I had to carry out the migration myself, which meant I did a modest amount of overtime.

Branching Model

We using Short-Lived Feature Branches variant of Trunk-Based Development in Bitbucket Server. Our workflow is this:

Create branch for user story/feature, with the intention of the branch living for a day
One developer (or a pair) working on the feature
Merge from master just before create pull request
Create pull request for review
After reviewers accept and the CI was successful the branch will auto-merged with the master
If the merge successful, the branch will be deleted automatically.

Unexpected Challenges

The biggest challenge was to determine how to release every component separately from a Maven based monorepo. Every component have it’s own version number and release time. Because of this problem I contacted Paul and asked for help. And he helped me to work it out. The solution is to create a release branch which contains only one service or webapp and it’s needed libraries. That is created on a just in time basis from the master merged, and then lock the majority of developers out of that branch, only allowing bug fixes in via a cherry-pick mechanism.

There also was some smaller challenges. They all popped up after the the migration:

The team works mainly with Windows. Getting the mr/checkout.sh technology working with Git for Windows.
After moving the webapps of webapps in the monorepo the build times exploded (from 2 to 25 minutes). There were two reasons:
1. I’m not so into the client development. And because of that I didn’t know that my teammates creates a webapp-playground for every client-library, which was also build in the CI. An non-default maven profile helps to ignore them during CI.
2. In the past we have a CI job for every component. So the build time are always small (< 2 minutes). With the maven parameter –threads I got them back down. After we resolved these two issues we are at ~5 minutes per build.
After skipping installation of artifacts in the local maven repository the goal jetty:run would not work anymore. Because of that we currently don’t skip the installation. But it’s an open issue for us on which we will work to resolve.

The Migration Itself

The first step of the migration was to implement the defined directory layout. We achieved this in Svn a few weeks before the migration to Git. Moving from Svn to Git was not that hard because of the git svn command. The hardest part was to create the authors.txt file containing a mapping between the Svn and Git users for all developers for that call:

git svn clone --stdlayout --authors-file=authors.txt http://svn.example.com/repository monorepo

It needs about three hours in our case. The duration depends highly on the size, number of commits, branches and tags. After git svn clone finished some cleanup were needed:

$ cp -Rf .git/refs/remotes/origin/tags/* .git/refs/tags/
$ rm -Rf .git/refs/remotes/origin/tags
$ cp -Rf .git/refs/remotes/origin/* .git/refs/heads/
$ rm -Rf .git/refs/remotes/origin
$ git branch -d trunk
# Delete all tags which not ends with an valid version like '-1.12.3'
$ git tag -d `git tag | grep -v '\-[0-9]*\.[0-9]*\.[0-9]*$'`
# Set remote origin
$ git remote add origin git@my-git-server:myrepository.git
$ git push origin --all
$ git push origin --tags

As I say we did this in one three-hour go, but we always had the choice to do in a number of phases. Say three one-hour goes (by using the ranges of change-list numbers).

Next step was to share code on source level. With maven this is not really the case. Code is shared by jars which are created during one build run. So the installation in the local maven repository and deploying into remote maven repository is deactivated. Jars are used from their target/ folders. All modules are now defined in one tree and have the version HEAD-SNAPSHOT. To get rid of the repetitive Maven configuration our root and master pom.xml must be merged. The master pom.xml contains globally configuration like the company name and Maven repositories to deploy snapshots and releases. And the root pom.xml was to collect all of our services and libraries projects to build them in one call. My team wants this in the past. I think this file shows that my team wants to work in an monorepo.

The fifth step was the integration of the existing Git repositories including the history into the monorepo. For that I checked out every Git repository and created a branch named monorepo-integration locally. In that branch the sources were moved to the new directory layout. Then I added it as an remote repository to the alos local checked out monorepo.

# In old Git repository (webapp-one in this example)
$ git checkout -b monorepo-integration
$ mkdir -p component/webapp/webapp-one
$ mv . component/webapp/webapp-one

# In monorepo directory
$ git remote add webapp-one ../old-git-repositories/webapp-one/
$ git fetch webapp-one
$ git merge --allow-unrelated-histories webapp-one/monorepo-integration

With the sixth step the initial monrepo tooling was introduced. That are the two scripts mr/checkout.sh and mr/release. The mr/checkout.sh script was developed by Paul and is used to realize sparse-checkouts. It is described in the blog entry Maven In A Google Style Monorepo. The mr/release script allows it to create and update release branches.

The last step was to activate CI. We using Jenkins and configured a Multibranch-Pipeline Job to observe the status of the master and feature branches. Additionally for every release branch a Multibranch-Pipeline Jobe is configured to create releases. For more details see the chapter Status.

Our migration steps, again:

Define directory structure for the monorepo.
Migrate service repository from Svn to Git
Share code on source level
Merge root and master pom.xml
The root pom.xml only was used to collect all modules together so that the team can checkout the service trunk and build all services directly
The master pom.xml contains global configuration like the default JDK version or the version of widely used maven plugins (like the maven compiler plugin)
Move other Git repositories into this monorepo one
There were twelve Git repositories for the webapps
Note we checked in HEAD revision here, and history for those remains in the old (now read-only) repos.
Introduce initial monorepo tooling.
This is mainly the mr/checkout.sh script Paul developed, but modified some more to manage the expanding/contracting checkouts (sparse-checkouts in Git)
The mr/release script is for creating release branches and perform releases.
Activate CI (Jenkins with agents, Multibranch-Pipeline Job for master and feature branches, Pipeline Jobs for release branches)

Future Plans

Use Feature Flags/Toggles and Branch by Abstraction for longer to implement changes
- Training required for the development teams
Change to directed graph build system (Buck or Bazel), from Maven

Timeline

February: Maybe we should learn more about microservices and bought the book Building Microservices by Sam Newman.
17th March: First thought about monorepo’s after reading Sam’s Twitter conversation with Paul
March - May: Discuss this topic with teammates and colleagues of another team. Also read articles.
22th May: Team decides to go with monorepo
24th May: Team decides the directory layout and implemented it in Svn
June - July: Work out a migration path to monorepo and testing it (all in addition to our normal business deliverables commitments)
17th to 31th July: Methodical migration from Svn to Git
1th to 3th August: Migration of the Vaadin clients Git repositories into the monorepo too

Status

Since changing to a monorepo never the complete 4 person team is working completely because of vacation and working on other projects. So the following numbers will change.

~12 commits/day
~1 pull requests/day (feature branch to master)
~10 builds/day

Jenkins is our CI daemon. We using one Jenkins instance with multiple virtual and physical agents company wide. My team is currently using only one agent to observe one master and between one to four feature branches. There is no need to expand it now.

Doing this for your company

I’ve uploaded a skeleton version of our monorepo to GitHub for people to use as they see fit. None of our production sources, of course, but the tech to do the expansion contraction is there, and ready to use. If you have any questions about the example monorepo please open an issue on GitHub.

Via Sam Newman’s Twitter feed I read this and then researched more on monorepo’s which brought me back to Paul’s proof of concept work. ↩︎