Are monorepos really simpler?

Jeena@piefed.jeena.net · 2 days ago

Are monorepos really simpler?

witness_me@lemmy.ml · 2 hours ago

My team has a product repo and it’s fantastic. Instead of a monolithic repo across all teams, it’s limited to just the work my team does. About 5 applications and their related libraries and tools are all in the same repository, making it super simple to share related code amongst the 5 applications we maintain in it.

We have a couple other separate repos that have nothing to do with our applications (ELK stack and docusaurus)

Angel Mountain@feddit.nl · 6 hours ago

Having worked in both a very seperated multirepo codebase and a monorepo, I’ve got to say, at the moment, the monorepo (using pnpm) wins hands down for me.

Pup Biru@aussie.zone · 5 hours ago

i’d say they’re pretty equivalent

a monorepo is far easier to develop a single-language, fairly monolithic (ie you need the whole application to develop any part) codebase in

(though as soon as you start adding multiple languages or it gets big enough that you need to work on parts without starting other parts of the application it starts to break down rather significantly)

but as soon as your app becomes less of a cohesive thing and more separated it becomes problematic… especially when it comes to deployments: a push to a repo doesn’t mean “deploy changes to everything” or “build everything” any more

i think the best solution (as with most things) is somewhere in the middle: perhaps several different repos, and a “monorepo” that’s mostly a bunch of subtrees or submodules… you can coordinate changes by committing to the monorepo (and changes are automatically duplicated), or just work on individual parts (tricky with pnpm since the workspace file would be in the monorepo)… but i’ve never really tried this: just had the thought for a while

FizzyOrange@programming.dev · 3 hours ago

a push to a repo doesn’t mean “deploy changes to everything” or “build everything” any more

What do you mean? I have yet to work for a company that’s organised and sophisticated enough to actually use a monorepo but my understanding is you’d set up something like Bazel so it only builds & tests (and I guess deploys) things that depend on your change.

melfie@lemy.lol · edit-2 1 day ago

It’s really all about using Conway’s Law to your own benefit.

If adding features or fixing bugs consistently requires one person from a fairly small team to make PRs across multiple repos and changes can only really be tested in a staging environment where everything can be tested together, then it’s an anti-pattern.

However, if 100 developers or more are working in a single repo, it’s past time to split it up into appropriate bounded contexts and allow smaller teams to take ownership.

I worked at a place where hundreds of developers worked on a single Rails monolith / monorepo, and enterprise architects insisted that 100,000+ RSpec tests that required PostgreSQL had to run in CI for every PR merge. Every build took 45 minutes and used ungodly amounts of cloud compute. The company ended up building their own custom CI system to reduce their 7 figure CI spend so they could ignore the problem.

marcos@lemmy.world · 1 day ago

You should really not need to do a PR across multiple repos. If you need, you are breaking your code wrong. Some functionality may require multiple PRs, but you should always be able to do those at different moments and test them separately.
The monorepo tools are exactly software that emulate the features of a multi-repo so that you can have thousands of people on the same repository. We also have multi-repo tools that emulate the features of a monorepo, but people don’t hype those online because they are simple and free.

Pup Biru@aussie.zone · 5 hours ago

You should really not need to do a PR across multiple repos.

different ways of treating PRs… it’s a perfectly valid strategy to say “a PR implements a specific feature”, in which case you might work in a backend, a front end, and library… of course, those PRs aren’t intrinsically linked (though they do have dependencies between them… heck i wouldn’t even say it’d be uncommon or wrong for the library to have schemas that do require changes in both the fronted and backend)

if you implement something in eg the backend, and then get retasked with something else, or the feature gets dropped then sure it’s “working” still, but to leave unused code like that would be pretty bad… backend and front end PRs tend to be fairly closely tied to each other

a monorepo does far more than i think you think it does… it’s a relatively low-infrastructure way of adding internal libraries shared across different parts of your codebase, external libraries without duplication (and ensuring versions are consistent, where required), and coordinating changes, and plenty more

can these things be achieved with build systems and deployment tooling? absolutely… but if you’re just a small team, a monorepo could be the right call

of course, once the team grows in size it’s no longer the correct option… real tooling is probably going to be faster and better in every way… but a monorepo allows you to choose when to replace different parts of the process… it emulates an environment with everything very separated

termaxima@slrpnk.net · 24 hours ago

The only reason monorepos are used is because tooling for multi-repos is inadequate, or people don’t know how to use it.

Version control tooling is still at its “blackberry” stage anyway.

Pup Biru@aussie.zone · 5 hours ago

i’d say it’s less that it’s inadequate, and more that it’s complex

for a small team, build a monolith and don’t worry

for a medium team, you’ll want to split your code into discreet parts (libraries shared across different parts of your codebase, services with discreet test boundaries, etc)… but you still need coordination of changes across all those things, and team members will probably be touching every part of the codebase at some point

for large teams, you want to take those discreet parts and make them fairly independent, and able to be managed separately: different languages, different deployment patterns, different test frameworks, heck even different infrastructure

a monorepo is a shit version of real, robust tooling in many categories… it’s quick to setup, and allows you a path to easily change to better tooling when it’s needed

Serdalis@lemmy.world · edit-2 1 day ago

They are simpler, but they do not scale. Eventually its better to create an internal package repo to share common code, this allows rolling updates a lot easier than a monorepo does.

Smaller repos are also less stressful for monitoring and deployment tooling and makes granular reporting easier which you will eventually have to do in large projects.

Simple for small code bases, a pain and a big code smell for large ones.

GissaMittJobb@lemmy.ml · 1 day ago

I mean, with large swaths of big tech companies running monorepos, does this statement really stand up to scrutiny?

For one data point, Google has >2 billion slocs in their monorepo.

Pup Biru@aussie.zone · 5 hours ago

google does a lot of things that just aren’t realistic for the large majority of cases

before kubernetes, you couldn’t just reference borg and say “well google does it” and call it a day

majster@lemmy.zip · 1 day ago

Agree with this explanation. Also in a monorepo it’s much easier to reference code between modules and I think this leads to too much coupled code. It takes more discipline to limit the scope of modules.

Pup Biru@aussie.zone · edit-2 5 hours ago

that’s a good and bad thing though…

it’s easy to reference code, so it leads to tight coupling

it’s easy to reference code, so let’s pull this out into a separately testable, well-documented, reusable library

my main reason for ever using a monorepo is to separate out a bunch of shared libraries into real libraries, and still be able to have eg HMR

Ephera@lemmy.ml · 1 day ago

The thing to me is always that, yeah, you need a huge commit for a breaking change in an internal library inside a monorepo, but you will still need to do the same work in a polyrepo eventually, too.

Especially since “eventually” really means “ASAP” here. Without going through the breaking change, you can’t benefit from non-breaking changes either and the complexity of your codebase increases the longer you defer the upgrade, because different parts of your application have different behavior then. So, even in a polyrepo, you ideally upgrade all library consumers right away, like you’re forced to in a monorepo.

toebert@piefed.social · 1 day ago

This is true but there is a matter of being able to split up work into multiple pieces easily and prioritise between services. E.g. the piece of legacy service that nobody likes to touch, has no tests and is used for 2% of traffic can take its’ time getting sorted out without blocking all the other services moving on.

You still have to do it and it should be ASAP, but there are more options on how to manage it.

FishFace@piefed.social · 1 day ago

We have a gigantic monorepo at work.

To manage the complexity we have entire teams dedicated to aspects of it, and an insanely complex build system that makes use of remote builders and caches. A change to a single python file requires about fifteen seconds of the build system determining it needs to do no work, with all of this caching, and the cache can be invalidated unexpectedly and take twenty minutes instead. Ordinary language features in ides are routinely broken, I assume because of the difficulty of maintaining an index of that stuff on such a huge codebase. Ordinary tools like grep -R or find can only be used with care.

paequ2@lemmy.today · 20 hours ago

On the other hand, using ordinary tools like find and grep are exactly what I like about monorepos! Yes, they may take a while, but at least I know I’ll find a file or code that I’m looking for!

With multi-repos I’m constantly searching, but not finding where a particular piece of code comes from. Yes, it’s from library X, but where there heck does that live? Now I really can’t use ordinary tools. I have to rely on coworkers, docs, or GitLab to search for where a piece of code is actually defined.

FishFace@piefed.social · 20 hours ago

there is a size of monorepo where that becomes infeasible ;)

paequ2@lemmy.today · 16 hours ago

Yeah, I’m sure. It’s not something I would do frequently. My work had us on beefy desktops. But, I was totally fine with letting find+parallel+grep run for 30 minutes in the background while I searched docs or messaged people on slack. Depending on your team, getting a response from slack could easily take 24 hours so. Eh.

The other thing I liked to do is directly edit the libraries in the monorepo! No need to figure out how hack some random decency manager. You have the code! Just edit and build!

Cyberflunk@lemmy.world · 1 day ago

grep

consider looking at ripgrep and semantic search tools.

i maintain a gigantic monorepo using policies, cicd gating, pre-commit, and lots of scripting. its not unworkable, just takes process. i don’t really agree with the creators pov. but i guess i may not be a new engineer entering a team with a monolith and a bone.

FishFace@piefed.social · 1 day ago

I don’t see how ripgrep would help with the monorepo situation. We have tooling for an equivalent of grep, but it’s based on an index, not what’s on your filesystem.

Miaou@jlai.lu · 16 hours ago

IIRC, ripgrep used a faster algorithm than grep, but more recent versions of grep are now shipping with a faster (the same?) algorithm. So the above suggestion shouldn’t help much unless you use a very old grep.

wewbull@feddit.uk · 1 day ago

The problem is PRs / CI tooling. They treat a repo as an independent project, but…:

A change needs to be able to span multiple repos
…or you need a way of expressing dependencies between changes in multiple repos
…and you need to be able to block changes to dependencies that should be non-breaking but aren’t.

Zuul CI solved a lot of these kind of problems for the Openstack project but I’ve personally found it a bitch to setup. Lots of good ideas in it though.