The Future of Relational Database Management Systems
There’s been a lot of buzz around cloud computing with Google’s new App Engine environment. I’m not going to discuss the merits or issues with their new product and instead focus on the fact that they’re giving developers access to the BigTable via its DataStore.
This is the second major player to release access to an internal storage system that’s document based, Amazon’s SimpleDB being the other. In the open-source space we have CouchDB which is a distributed database that’s completely document based. I see a definite trend from creating databases with relationships to store everything in one big blob.
And this makes sense. There are three motivators for normalized databases: reducing storage space, reducing processing required to find multiple instances of data, reducing number of places to update when something changes. The first two are hardware related. Computers and disks are ridiculously fast and cheap right now. For less than $10k you can have a machine that can outpace a half million dollar machine from less than a decade ago.
That leaves having to update multiple places when you change data. This is a technology problem and one that a lot of developers will recognize. It looks eerily similar to what we’ve been doing with caches for a long time. We grab the data, cache it the way we want, then retrieve that and ignore the shiny RDBMS that we have.
The problem with this is the same one you face with updating data in a de-normalized database. The default way is to just timeout and grab data after its old enough that we consider it stale. It’s rudimentary, but works. The next step is having your data layer smart enough to expire the cache itself when it changes. Taking that further you end up with code that expires and primes the cache whenever there’s an update.
Any route you take with caching, you end up with two data layers, the raw, relational database and the cached views of it. So it seems that document databases are the next logical step. Instead of creating this layer on top of you data to cache it, just store it as blob of data.
The one thing I haven’t seen yet with these document-based systems is some sort of trigger mechanism in place, though, to make changes in one place ripple through the system. That can exist at the database level or the ORM level. It just has to be transparent to the developer.
Another interesting approach is what MySQL is planning for MySQL 6. They’ll have communication with memcached from within MySQL via a user-defined function (UDF). There’s already a project on forge.mysql.com that brings this functionality, but its still in alpha. Something like this coupled with the use of triggers could address these issues completely by making the MySQL server simply a data storage and backup mechanism that you only touch during development.