Speeding up tests which talk to the Cassandra database
In 2011 I wrote a blog post on speeding up Django and Twisted tests titled Making Django and Twisted tests faster.
Today I’m going to show how to speed up tests which talk to the Cassandra database. Speedups will be achieved simply by tweaking the Cassandra configuration file.
Speeding up tests
There are many ways to speed up your application tests. Most common ways include paralleling the tests and updating them so they don’t touch a disk (hey, disk is slow!).
How hard it is to implement those things depends on your application architecture, programming language used, algorithms used and so on. In ideal world you would use Erlang and all your problems would be embarrassingly parallel. Sadly in many (most?) cases, this is not true.
If your application and tests weren’t build with parallelization in mind, making your tests run in parallel will be very hard and in many cases it’s not even worth the effort.
Today I’m going to ignore parallelization for a moment and focus on how to speed up tests which talk to the Cassandra database. I’ll focus on how to do this by simply tweaking the Cassandra configuration file.
The reason I’m focusing on this approach is that it takes very little effort and it has a potential to offer substantial speedups (aka offers most bang for the buck).
Keep in mind that the same general approach also applies to other databases. If you Google around you can find many articles which show how to do that for MySQL, PostgreSQL and so on.
Speeding up tests which talk to Cassandra
This is very generic guide for speeding up the tests. Actual speedup depends on many factors and in some cases they will be very small to none (YMMV).
Some of the factors which affect the speedup are:
- how many writes and reads your tests perform
- amount of memory available to Cassandra
- memtable flush setting
- storage device used for sstable files
- is your Cassandra process long running or do you spin up a new instance for every test run
1. Disable commit log
Cassandra provides write durability by appending writes to a commit log.
Depending on the commitlog_sync
option, the commit log is then synced to
disk either periodically (every 10 seconds by default) or in the batch
mode,
Cassandra will wait with acknowledging writes until a commit log has been fully
flushed (fsynced
) to disk.
The first and simplest way to speed things up is by disabling commit log. This
can be achieved on a per keyspace basis using durable_writes
option. For example:
2. Disable periodic saving of cache to disk
Second way to speed things up is to disable periodic saving of key and row cache to disk.
This can be achieved by setting key_cache_save_period
and
row_cache_save_period
option to 0
.
As noted above, your mileage may vary. If you are spinning up a new instance of Cassandra for every test run and your tests don’t run for a very long time (by default key cache is written to disk every 4 hours), this setting won’t bring you any noticeable speedups.
3. Using ram disk for data directory
The last and probably the most well known option is telling your database to write data to RAM instead of a hard drive. Cassandra doesn’t allow you to fully turn off memtable flushing to sstables on disk so this can be achieved by using a ram drive.
To create a ram drive on a Linux distribution, you can run use the following commands:
After the ram drive has been created, update your Cassandra config and redirect
all writes to a directory in /tmp/ramdisk/
. This can be achieved by updating
the following options:
data_file_directories
commitlog_directory
saved_caches_directory
If you have followed the first two steps, updating last two options is not necessary. They are included here for completeness.