Michael Stonebraker came to talk to our three-day lab course on SQL programming. He’s a great example of why the research university should not be shut down precipitously. Here’s what I learned from him about the current state of database management systems.
For transaction processing (“Small Data”), the standard RDBMS variants work pretty well as long as the database fits into RAM (1 TB is a reasonable size) and as long as you’re not trying to do more than about 1,000 updates per second.
For high-volume transaction processing you need a new architecture where everything is in RAM, which means that transactions happen fast enough that the system doesn’t need to try to be doing 100 of them simultaneously, in various stages of completion. How is the Durability part of the ACID test met if everything is in RAM? Stonebraker points out that you need to have real-time failover anyway so why not let the failover system give you the D in ACID? If that’s not good enough, put an uninterruptible power supply behind both servers. It turns out that customers don’t actually trust this so everyone takes a 5 percent performance hit and logs transaction requests to an SSD. (The log is what the DBMS was told to do, not a data log of blocks in their old and new states.) Stonebraker has drawers full of companies that he has started for every possible database management challenge and his personal solution in this area is VoltDB.
For traditional “business intelligence” or “data warehousing” queries, the column-oriented shared-nothing DBMSes such as Vertica (yet another Stonebraker-founded company, sold to HP) end up being 50X faster than row-oriented DBMSes (e.g., Oracle, MySQL). Why? The database is too big to fit into RAM and you’re usually interested only about 1/50th of the columns in any one query. Thus the system needs to fetch and scan only about 1/50th as much stuff from disk as would a row-oriented DBMS.
What customers want and doesn’t exist right now is a DBMS to handle Big Data and Big Analytics. This will be an “Array DBMS” and it will be good for machine learning, clustering, trend detection, etc. The Array DBMS will be good at handling the common “inner loops” of “Big Analytics” such as matrix multiply, QR decomposition, SVD decomposition, and linear regression. Somehow I have a feeling that this might be Stonebraker’s next company!