Monorepo vs Polyrepo codebases

Discover reasons for and/or against monorepos and polyrepos. Understand that each with each approach there are tradeoffs that you need to consider.

A codebase is where the code of your app resides. Usually when you inspect larger websites, applications and systems you would see swaths of files. Within those files lies the product/s that supply the business/organisation. There exists different ideologies on how to structure the code, here where lies the monorepo vs polyrepo debate. This article will navigate the ideological difference and provide my opinion. It is expected that you have some knowledge of programming to relate to and ideally have worked in a team environment. I will provide reasons for and against each side of the debate, you will need to make up your mind. First of all. A monorepo is a single house holding all code used by an organisation. On the other hand, a polyrepo setup is where multiple houses all host the code of an organisation. The house is most likely a source control repository like a Git repo. Big companies like Google host the vast majority of their business code in a huge monorepo. So what’s the problem exactly? Why don’t we just use whatever methodology we want? Yes of course, if you are the sole person responsible for the codebase then please decide freely. Unfortunately there will be many people in an organisation who will different viewpoints to consider so will need to differentiate why go with each approach. Monorepos literally mean all the code is in one place. That means, if you had 1, 10 or 100 apps then they all can potentially share the same code. This is huge when you have a lot of utility code. You can just import the file you want to use and you’re good. If on the other hand you had a utility library repo in a polyrepo and multiple repos referenced it then when you need to introduce new code you will have to push the change, release the binary/package and update the repo you’re working on to consume the change. That was a lot of steps and time. You might object and say. So what? It works wonderfully for us?? Well what if the change you introduced worked fine for the repo you updated but you/someone upgraded another repo’s dependency of the common utility and it broke something. The cause may not be apparent right away because you might not update the dependencies in the same time for all repos. To fix the cause may cause another bug because it requires code change either on the utility repo or the consumer repo. That’s the kind of potential risk you deal with multi repo setup. On the contrary, if all the code was in one place and an analogous change occurred, that might risk breaking all dependants. That might be a big menace. This is where automated testing can catch such changes right away and potentially block such change from being pushed in the first place. Expanding on the dependency change aspect. Versioning is another issue engineers deal with. Different repos might have different versions of different packages installed. If a critical security threat was publicised in any of the versions, the security team starts making requests to upgrade the depencdency to a newer version without the threat or substitute it then you’re going to have a big task ahead of you to update all repos. It might be easy if you had just a few repos but in a typical organisation with over 20 engineers you might have 50 repos. That’s hours to potentially weeks worth of work because you will deal with all the different repos’ changes then will need them tested and deployed. On the other hand, with a monorepo you can just update once and test once. You still need to test all affected apps and libraries but you reduced sometime. Since I mentioned a monorepo has the same dependencies all across the board, there is actually a branch here. What do I mean? Some monorepos operate like a polyrepo to get the best of both worlds. That means some/all projects will have separate dependencies and each having separate versions. This is another exciting difference. You see, The comparison is not really black and white here. Amazing. What else? One risk with monorepos is isolation of changes. Even if all automated checks pass and the code changes look fine there is still a chance of having side effects to other projects. Usually this happens when a tweak to a config file, dependency or utility happens. That can have immediate effect when the code is merged and deployed for other apps within the same repo. An example is something breaking all of a sudden after the recent deploy. That means we need to invest in how we bundle and build code. I am an advocate of only bundling the code that’s actually used both for server and client sides. But what if the build code broke and you can’t build anything? The upside of monorepos is consistency of code styling. It is kind of forced by design. Everyone operating in the same place means following same rules. That’s a big payoff with monorepos, you can easily regulate all styling, formatting and naming. This means everyone in the organisation can confidently contribute to other teams’ work without too much friction of conventions because they are the same. Bigger organisations will find it hugely helpful as they do often have a lot of teams dedicated to different parts of the codebase. I want to note that doesn’t mean there is no local conventions, these are typically specialised to suit the scope of the concerned code. Building on the upside. Transparency of code. Most likely your organisation’s code doesn’t contain confidential code so engineers can easily understand the code they interface with. That makes inter-app communication easy, imagine a frontend and a backend sharing a library define the contract between those. Any changes to the contract will let you know of potential breakages assuming you have the detection infrastructure in place. That’s all good but what benefits would I gain if we had a polyglot setup. i.e. multi programming setup? I think monorepos shine if you had only one programming language because of the consistency and reusability of code perks. Transparency of code probably becomes the biggest benefit. If you had to package different apps together and ship them all at once then that might be good but you can also do that in a polyrepo setup. Polyrepo would actually be a good isolator of languages since each is separate. I want to note that when we refer to polyrepos, we don’t have to assume one repo per app or per library. It can just be a handful of repos for large organisations. These all follow the organisational structure. If the org had independently operating teams and each operated their own products written in different programming languages then polyrepo maybe more appropriate though if there was dominant languages you might want to consider a monorepo to reap its benefits. A large scare among monorepo fans is security. How can we isolate a potential hacker acquiring access to only view a certain part of a monorepo? It is a difficult question to answer. It comes as a compromise. We can lockdown repos to certain individuals in a polyrepo setup. Despite that, there is always ways to conditionally allow people access to different files/directories however not as isolated. Code ownership is often blurred in a monorepo because technically anyone with little friction can update a piece of code a different team owns and push it to production risking breakage. This is whee organisational agreements need to be made on who can do what and how. It might not be as easy to find out where different code resides in a polyrepo because you will have to search for it. Searching had become really fast but what happens when you have multiple results, you will need to narrow down stuff to debug, understand and/or implement changes. Multiple repos usually lack organisation, different repos will have their own conventions in everything. You might not even find the code because you don’t have permissions to the code so you ask around which might not yield much results right away. That might be a hidden business cost. On a rarer note, the bigger the monorepo the more resources you will consume to load it in your development environment. This is where optimisations into the codebase will need to become priority because if you’re mostly working on 3 directories out of 124 then why load the rest in memory and consume plenty of resources? There is definitely workarounds. That really becomes a problem at big tech scale. Quite a lot was mentioned already above. I think that’s a solid starting point for you to decide what to go with. Keep in mind the resulting decision to your organisation if you decide to act can have severe consequences because it is not the change/introduction you can revert overnight if it doesn’t work out. I do strongly believe what you decide on must reflect the values of the organisation. Companies have been successfully operating in different configurations of codebases with various kinds of monorepos and polyrepos, it doesn’t have to be configured fully monorepo nor the other way around.