I’m trying to make my first API with an accompanying website, and I’m using tutorials. I decided the MERN stack was the way to go, since it seemed popular and easy.
I have some experience with SQL itself, but have never connected a database to an actual application.
So I got to the point of making my database in mongoDB when I saw some stuff about how mySQL is more secure than noSQL, and how noSQL has some disadvantages.
The api/website I’m making is just a pet project, but if it ever does become popular, would I have to move my databases? Or is this a “cross that bridge if/when I get there” situation?
Alternatively can I have the same database in multiple places at once? As in, bot mySQL and noSQL?
It is a “cross that bridge if/when I get there” situation.
Second the “Jump off that bridge when we get to it” sentiment.
Also IMHO: Don’t worry about it a whole lot. If it’s required you can “fix” a whole lot with just brute forcing better hardware and smaller performance tweaks. And if it really requires a major change, the short term fixes can buy the time to develop larger changes.
But as a pet project, just to get more experience and learn stuff. Sure go ahead, connect it to two databases with wildly different architectures. Why not do three? For learning stuff, doing something really weird often teaches a lot, even if the end result might not be very useful.
Well, then I guess I’ll worry about it if it becomes an issue.
Sure, you can. There are even some very legitimate use cases.
But for your goals as you’ve defined them, you’re kind of just creating extra layers of complication for yourself. Sure, you could store all your data in MySQL AND Mongo and write to both with every write operation. If you do this, only read from ONE. You could write methods to toggle back and forth for which one you’re reading from but WHY?
A common usecase for this kind of thing is using Mongo as an object cache. You pull some common class of thing out of SQL that involves like
SELECT
g.group_id,
g.name,
g.blahblahblah,
COALESCE(json_agg(u.*) FILTER (WHERE u.user_id IS NOT NULL), '[]') AS moderators
FROM groups g
LEFT JOIN group_moderators gm ON g.group_id = gm.group_id
LEFT JOIN users u ON gm.user_id = u.user_id
GROUP BY g.group_id, g.name, g.blahblahblah;
Now you have this complex query returning your group object, with this JSON blob in it that stores all the moderator IDs. Now… what you can do is cache that in Mongo. When the API asked for a group by group_id, it can check to see if it exists in Mongo and if it does it can just grab that and return it to the user. If it doesn’t, it can ask SQL for it, then cache that in Mongo. If a new moderator gets added, that operation can refresh the Mongo cache.
Now WHY would you do this?
- Maybe Mongo has better performance.
- Maybe you’re trying to make less requests to your SQL server.
- Maybe Mongo runs locally on the same host as your endpoint, but the SQL server is remote and you want to use less bandwidth (this can cause some silly problems when you have multiple end points just sayin - but those problems have known simple solutions).
- You think its cool and you’re down to do the extra work.
- Any combination of these.
Now, IMO Redis is a superior tool for this, but I HAVE heard of people using Mongo.
But this isn’t really your usecase is it?
It could be a good learning exercise to set up your database reads and writes to use both MySQL and MongoDB. I’d actually recommend using either PostGres or MariaDB over MySQL… You’re going to encounter PostGres a LOT more often in the wild, so getting comfortable with it is a good use of your time. If you REALLY want MySQL without the taint of Oracle, Maria has got you covered.
Either way writing methods to read / write from both databases will certainly help you learn about both kinds of databases. Just make sure that when your site is actually live, you’re treating one database as the source of truth and the other as a mirror.
Given the scope of your project, I bet you’ll find both databases are perfectly adequate to meet your needs. But there are reasons many people have strong preferences for truely relational databases and it sounds like you’re kind of discovering that on your own.
But, you should probably just not complicate things for yourself at this point
You’re at a point where you can say
“I’m gonna just use Mongo, but I want to build things in such a way that it’s possible to change my mind later” just write your data handler class in such a way that you can save yourself pain down the road.
For example, you might have a data::getObject method() or a data::createUser() method. If you write this intelligently now, you can even preemptively add something like
if(database==="mongo"){
<-do stuff to make a new user in mongo->
}else{
<-Throw an unknown database exception->
}
Now, when you come back to switch databases in the future, all you have to do is
if(database==="mongo"){
<-do stuff to make a new user in mongo->
elseif(database==="postgres"){
<-do stuff to make a new user in postgres->
else{
<-Throw an unknown database exception->
}
But an even better and more secure way to do this is to make your data class ignorant of what kind of database you’re using (at the method level) and just define your schema. If you’re using FastAPI (I personally loath working in Python, eff FastAPI, I roll my own APIs in Rust, Lua and PHP) it’s kind of set up to get you started on this quickly. You probably should use it, given where you’re at. There’s reasons it’s the most popular tool out there for doing this kind of thing and will teach you a lot about how to structure an API.
https://www.fastapitutorial.com/blog/pydantic-schemas-in-fastapi/
I don’t fully understand the technical aspects, but I get the gist of what you’re saying. Overcomplicating code is something I’m sometimes guilty of. I will try to make the code as “flexible” as possible to allow for changes later down the line if needed. Also, I’ve never done anything in Lua besides making stuff in Roblox Studio 😅. You really know your stuff hus?
I’ve been doing this since 2001 and I’m a big fan of openresty.
If you interface and structure your code decently then migration between stores is more possible.
Using both Postgres and ElasticSearch is a pattern I use depending on the scale of things and situation. But I don’t usually start off a project using both.
Storing the data both places would severely limit the features you can use in each data store, and overcomplicate things. Just use one thing for now. You’ll be able to migrate later if you need to. Don’t over optimize at this point and just use what you most want to learn more about.