Nerds: What do we tell MITers about cloud hosting of RDBMS?

If you don’t wear your phone on your belt you can probably stop reading right here…

We’re trying to give our students at MIT (and anyone else who wants to join, Jan 26-28) a little real-world talk about hosting an RDBMS in our modern virtualized, cloud-based, containerized world. Here’s an outline so far and comments/real-world experience would be appreciated. Thanks in advance.

High-performance Hosting of RDBMS in a virtualized and containerized world

Traditional hosting: One computer, enough RAM to hold all data regularly accessed, lots of separate physical disk drives, e.g., one for the table, a different one for each index on the table, one for the logs, etc. Thus an update that involves multiple disk writes can be done in the time consumed by single disk seek and write. Downside: Moving to bigger computer means exporting data, importing (could take a weekend or longer) or using more DBMS admin skills.

Virtual hosting: This is what you’re doing right now. The virtual machine (VM) has one “disk”, which on my desktop computer right now is C:\cygwin\home\Philip\VirtualBox VMs\three-day-rdbms_default_1421369255990_87357.vmdk (4.4 GB file). Downside: slow; Upside: easy to move to bigger computer. (How slow? VMware did a study with, presumably, the best-tuned virtual machine ever to run Oracle, and found that Oracle RDBMS throughput dropped about 20 percent compared to native hardware. More typical is a friend’s ecommerce site where the VM was configured to use a maximum of 16 GB of RAM and installed by itself on a computer with 32 GB of RAM.)

Virtual hosting, real-world server: in a high-load installation the VM could have multiple files assigned, each of which was on a separate physical hard drive, would then look like a separate disk to the VM. Could even give the VM actual physical disks, but then moving to a new computer isn’t so straightforward (what if the new machine has a different number of disks?)

Idiot-proof yet modern and hip: Amazon RDS! Pick a size, a type of storage (disk or flash memory (SSD)), and an RDBMS, then let Amazon keep it all running, upgraded, backed up, etc. Amazon can also set up the system for automatic failover to a hot standby system that has been reading the transaction logs of the production server. Amazon currently supports MySQL, Oracle, Microsoft SQL Server, Postgres, and … Aurora (November 2014 innovation from Amazon, a “drop-in replacement for MySQL”). Note that Google has a competitive service, though limited to MySQL.

Modern, hip, and like an Asian martial arts weapon (i.e., you are more likely to hurt yourself than an opponent): Containerization (see docker.com). Instead of running MySQL as an operating system process (“Traditional hosting”) or in a virtual machine that is larded up with a full copy of Unix you run MySQL in a “container” that is easy to move from one computer to another but smaller and faster to start up than a VM. What about storage? If you’re a true containerization believer then MySQL can use a separate “storage container” when it wants to write to a hard drive. How does this work out when people try it? We did a quick Google search and found “When I run the benchmark on the native machine, I get 779.5 transactions per second. When I run the benchmark in a mysql container, I get 336 transactions/second.”

Philip Greenspun’s Weblog

Leave a Reply

Your email address will not be published. Required fields are marked *