On ext3, MySQL and the impact it has on Second Life
All the big databases of Second Life are using MySQL. Lindenlab runs them on the premise: databases are ordinary, better run 50 of them than just to have a big one. Choosing and running a database engine is one thing, the other how you install it.
A big matter of choice and on the impact on the whole data system is of course the operating system – Lindenlab runs Linux – and of the underlying file system. According to the SL history wiki all the database servers of Lindenlabs use ext3 as default filesystem, after they uses ReiserFS 3 for a while and evaluated XFS. Ext3 is really a bad choice if you need the best performance your hardware can give.
Well, why that? There are some reasons. There’s this interesting IRC log of MySQL employee Kristian Köhntopp. Köhntopp is quite well known for his articles about computer topics and such. This IRC log is about which file system you should choose for a database server in general, but you can take his views of course too on the databases empowering Second Life.
Well, so what’s wrong with ext3 as filesystem for a database server according to Mr. Köhntopp and what’s ok about it? Several things:
- the amount of files in a directory doesn’t really matter anymore with ext3 compared to filesystems like XFS when you’ve created the ext3-filesystem with the option dir_index.
- A big disadvantage is that ext3 is flushing its log quite irregularly. Meaning: the execution times of certain queries in MySQL can differ quite a lot.
- Another disadvantage is that ext3 does not perform very well if many concurrent clients are connecting read/write, in numbers from 10-50. If only running a single thread, ext3 is mostly expected to be faster than XFS. But when running with many concurrent clients – and that’s what we got sure in Second Life – XFS beats ext3 hands down.
- XFS has in contrast to ext3 way much better flush times, they are more regular, and it’s much better at preventing the fragmentation of files.
- Ext3 is making "block marmelade", meaning inter chained files, if some files in the same directory are growing at the same time; XFS is good at preventing such a thing.
In conclusion Köhntopp states that ext2 (which is the base of ext3) is depending on the state of art around 1984. XFS on the contrary has been build on papers around 1994, meaning it’s younger and having a bigger code base. This means, that XFS might have more errors still than ext3 but on features that ext3 doesn’t have.
Oh, and by the way, according to this blog entry from 2005 about the switch back to ext3 from Mark Linden he hasn’t really understand what a journaling file system is for. If you take a look at the 2nd mail on this link, you see what Theodore T’so means. But keeping the data intact is not for what the journaling file system has been made. It has been made to keep the filesystem itself intact.
If you want to have an intact database after a crash, use an ACID-compliant one, like the InnoDB-Engine of MySQL.
So what’s to say in conclusion? If Lindenlab is still using only ext3 as filesystem for all of their database servers and those servers normally have many concurrent read/write clients around 10-50 or more, they’re denying themself from the speed a decent filesystem could give them and really, really should consider moving to another filesystem like XFS. This would be also one good explanation why e.g. the asset server is so damn slow – always, because the filesystem is slow.