So what’s this all about? This is my first article (hopefully out of many), where I’ll describe my journey of learning how to build a Graph Database.
: I’m not sure how far I’ll go yet, I’m doing this in my spare time, so I make no promises…
My background is: software developer for about 10 years now. I’ve written many kinds of software and used several databases: MySql, Microsoft SQL Server, SQLite, MongoDb, RethinkDB, LevelDB; but I’m no expert, just your regular user.
I’ve also used IPFS and Ethereum, not the regular kind of databases, but fun to learn nonetheless.
And this is how I want to do it: I’ll imagine that what I’m trying to do is already finished, then I’ll explain to you how I did it
This mental hack keeps me focused and confident, even though it’s a HUGE task…
: So why am I doing this?
: Because it’s fun. Because it’s a huge challenge, way more than I can handle. Because I need this kind of library. Because I’ll use this knowledge, even if it doesn’t become a stable product.
: What are the challenges?
: I don’t know anything about how databases work internally (well, I have some ideas, but I didn’t study much). I don’t really know how indexing works. I don’t know much about graphs.
: So why the heck am I doing this again?!
: In the end, I want to create a better alternative to table-based and document-based databases, so that I can connect data infinitely and navigate the resulting spider web. And of course, I want to learn in the process.
: What’s my use-case?
: I want to save all the countries in the world and all their relations; and all the animals and plants in the world and all the data I can find about them; and all my favorite movies, with all the actors and the most relevant relations between them; quantity conversion (length, weight, temperature). I want to use this data in a normal application, where I would use a NoSql database. Maybe in the future, I’ll implement a nice GUI to visualize and edit the nodes and edges.
: What’s the maximum data size?
: I’m thinking 10 million edges. It’s a pretty big number for a toy database (see Visualizing 1 million and multiply by 10). And if this becomes serious, I’ll target more.
: What programming languages will I use?
: Any, or all of: Elixir, Python, Node. In this exact order. I prefer Elixir the most, because it’s really fun to play with and I want to learn it better. But I might end up using Python, because I’m the most productive with it. Node is also a great option, because there are hundreds of thousands of libs to help me.
: Can I really do this?
: Time will tell. If I can’t do it, I’ll resume watching Game of Thrones, I haven’t finished the last season.
: Will this be open-source?
: Yes. Most of it, anyway.
The tasks:
The first things that I found:
Until the next time!
This blog is open source. You can check the history of this post.
If you have any thoughts, suggestions, criticism, or whatever, please drop me a line in the comments section.
If I have some audience, I’ll be more focused on details and I’ll write more often, obviously.