To Cache or not to Cache

..that is the question—
Whether ’tis Nobler in the mind to suffer
The Slings and Arrows of outrageous Fortune, Or to take Arms against a Sea of troubles,
Selecting caching & no sql solutions is no joke & I may be basing my pick based on this graphic from Perfect Market and the comments of Jun Xu. (I am comparing Redis vs MongoDB)Redis is an excellent caching solution and we almost adopted it in our system. Redis stores the whole hash table in memory and has a background thread that saves a snapshot of the hash table onto the disk based on a preset time interval. If the system is rebooted, it can load the snapshot from disk into memory and have the cache warmed at startup. It takes a couple of minutes to restore 20GB of data depending on your disk speed. This is a great idea and Redis was a decent implementation.
But for our use-cases it did not fit well. The background saving process still bothered me, especially when the hash table got bigger. I had a fear that it may negatively impact read speed. Using logging style persistence instead of saving the whole snapshot could mitigate the impact of these dig dumps, but the data size will be bloated if frequently, which eventually may negatively affect restore time. The single-threaded model does not sound that scalable either, although, in my testing, it scaled pretty well horizontally with a few hundred concurrent reads.
Another thing that bothered me with Redis was that the whole data set must fit into physical memory. It would not be easy to manage this in our diversified environment in different phases of the product lifecycle. Redis’ recent release on VM might mitigate this problem though.

MongoDB is by far the solution I love the most, among all the solutions I have evaluated, and was the winner out of the evaluation process and is currently used in our platform.
MongoDB provides distinct and superior insertion speed probably due to deferred writes and fast file extension with multiple files per collection structure. As long as you give enough memory to your box, hundred of millions of rows can be inserted in hours, not days. I would post exact numbers here but it would be too specific to be useful. But trust me — MongoDB offers very fast bulk inserts.
MongoDB uses memory mapped files and usually it takes only nanoseconds to resolve minor page faults to get file system cached pages mapped into MongoDB’s memory space. Compared to other solutions, MongoDB will not compete with page cache since they are same memory for read-only blocks. With other solutions, if you allocate too much memory for the tool itself, then the box may fall short on page cache, and usually it’s not easy or there may not be an efficient way to have the tool’s cache fully pre-warmed (you definitely don’t want to read every row beforehand!).
For MongoDB, it’s very easy to do some simple tricks (copy, cat or whatever) to have all data loaded in page cache. Once in that state, MongoDB is just like Redis, which performs super well on random reads.
In one of the tests I did, MongoDB showed overall 400,000 QPS with 200 concurrent clients doing constant random reads on a large data set (hundred millions of rows). In the test, data was pre-warmed in page cache. In later tests, MongoDB also showed great random read speed under moderate write load. For a relatively big payload, we compress it and then save it in MongoDB to further reduce data size so more stuff can fit into memory.
MongoDB provides a handy client (similar to MySQL’s) which is very easy to use. It also provides advanced query features, and features for handling big documents, but we don’t use any of them. MongoDB is very stable and almost zero maintenance, except you may need to monitor memory usage when data grows. MongoDB has rich client support in different languages, which makes it very easy to use. I will not go through the laundry list here but I think you get the point.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.