Making A Graph Db Ep6

Oh man, long time and no updates. I’m sorry about that. I’m writing a few words every few days, but it doesn’t seem to finish.

In the meantime, I created a repo on Github.

In my previous article about my Graph Database, I studied some methods for representing graphs in memory and persistent.

In this episode, I want to look at the actual libraries that I might use for persistence.

Embedded databases for Node.js or Python

This is the internal persistence layer, required to be as fast as possible:

Compatibility formats

These formats are not particulary speed efficient, nor storage efficient, so they can’t be used for instant snapshots. But to be compatible with other Graph implementations, my Graph should be exported to at least one of these:

Graph query languages

Also, An overview of Graph database query languages from IBM developers.

Desired technical requirements:

I want to hash the nodes using a standard hashing algorithm. The nodes must be hashed and stored in a special table, to avoid duplication of data. Hashing is less space efficient than, for example, giving an arbitrary number for each node in the de-duplication table, but has huge advantage that when you merge your nodes with the nodes from another database, the hashes will match perfectly.
In the same way, the edges should be hashed so that when you create a connection between 2 nodes, it’s exactly the same connection as another database who created a connection between the same nodes.

For example, my database:
node: horse = 091b5035885c00170fec9ecf24224933e3de3fcc
node: grass = bbc29b76c59b382aca12201c44929002ef1778ae
edge: horse -> grass = da19c89bcc24bdc54041df2cbb2e7e5bc639809a

Another database:
node: horse = 091b5035885c00170fec9ecf24224933e3de3fcc
node: animal = 3199ea056253916c41d65c6fd39b52e5f239873c
edge: animal -> horse = e2fe5ce1a2d0ff363cda9862ed72e1b30a777b62

When joining the two databases, you’ll gain more knowledge about the horse for free, without any unnecessary duplication: horse is an animal and eats grass.
This is just an example using SHA1 hashing. I’ll probably use something different in the real implentation.


This blog is open source. You can check the history of this post.

If you have any thoughts, suggestions, criticism, or whatever, please drop me a line in the comments section.
If I have some audience, I’ll be sharing details and I’ll write more often, obviously.

Tags: software programming graph db