fbpx
Choosing SQLite as a database (2/3)

In the first post, we have shortly described how SQLite differs from other databases. We have also introduced a real scenario in which we had to operate by choosing proper database engine and defined, for this particular scenario, a number of problems with using SQLite. YOU CAN FIND THE FIRST POST HERE.

One of the main problems was weak support for concurrency in SQLite. However, we have mentioned that it can be sometimes overcame by using Write-Ahead Logging (WAL) mechanism. This post is written with a goal to help you understand what WAL is and how to set up it properly.

Write speed and “Write-Ahead Logging”

When one queues multiple rows of data to be inserted sequentially into a single table, the main weakness of SQLite will occur: inserting speed in normal mode. As our tests showed, a single insert operation for our case lasted little above 100ms, regardless of table size. In our scenario, the data was arriving from worker processes in chunks, every process sending its statistics every 30 seconds. When data arrive from several processes at the same time, the combined time of inserting all of those data exceeds database ability to write, creating a bottleneck that impacts other functions of the application.

Fortunately, SQLite has a built-in solution for this, and it is called Write-Ahead Logging (“WAL” for short). There are three WAL modes, each of them pushing the time of writing data to hard disk little further:

– normal mode creates a copy of a table, then it writes data directly to a file as soon as the data is inserted to the table (potential rollback restores a copy of a table);

– WAL mode with synchronous=FULL writes data to a separate journal and waits until it synchronize write operation;

– WAL mode with synchronous=NORMAL writes data to a journal, but synchronize less often than FULL mode;

– WAL mode with synchronous=OFF writes data to a journal and never synchronize writes.

Inserting data in WAL modes can be two magnitudes faster than working in normal mode:

On the chart above we show median times needed to insert a single row to a table related to a table size. In standard mode, a single insert time oscillates around 110ms, regardless of table size. Other modes are visibly faster – there is a 60% drop in time between normal mode and WAL synchronous=FULL and a 99% drop in time between normal mode and WAL synchronous=OFF. For comparison, MariaDB is almost as fast as SQLite in WAL synchronous=OFF mode. Of course, using WAL comes with other costs – there are two main trade-offs of using WAL modes.

WAL mode and data integrity

The first trade-off of using WAL mode is that inserted data is not immediately written to a table, and it is delayed for as long as there is another type of operation called, namely checkpointing. Checkpointing happens as soon as the WAL journal grows, by default, to 1000 pages (about 4 MB). Then, all content of the journal is applied to an original table, and the journal is cleared. For a table with a low volume of operations, it could take a long time before data is synchronized with original table!

Fortunately, there are tools to tweak this:
– auto checkpoint limit can be changed or even turned off,
– checkpointing operation can be called directly, for example in idle time.

It should be noted that when using WAL mode we have to make good use of a checkpointing mechanism, and tweak it to our needs.

WAL mode and database integrity

The second trade-off of using WAL mode is the reduced integrity of the database. In the normal mode of using SQLite, there is just a negligible chance of database corruption. All operations are fully synchronized with their files and in a case of power loss during writing to a file, we only lose the last transaction, at most. But in WAL mode there is a much higher risk for losing data, or even for whole database corruption. As expected, different modes guarantee different levels of integrity:
– WAL mode with synchronous=FULL – can lose last transaction only,
– WAL mode with synchronous=NORMAL – can lose all transactions since last synchronization,
– WAL mode with synchronous=OFF – can lose all data since the last checkpoint, there is also a possibility of corrupting the whole database and rendering it unusable (losing all data).

This integrity trade-off is directly at odds with our requirement of reliability for configuration data. On the other hand, statistical data need speed more than reliability. Moreover, we cannot set different modes for different tables, modes are set for the whole database.

Our solution for this conundrum was simple: as a database in SQLite is a simple file, we can have as many databases as the number of files that we can open. Therefore, we are using one database in durable normal mode for configuration data and another database for statistical data, operating in WAL mode synchronous=FULL – as it is fast enough for our use case and potentially losing a single last transaction is not such a big problem.

Summary

As we can see, using WAL mode can greatly improve the performance of writing operations to the SQLite database, but choosing proper WAL mode is not trivial and comes with trade-offs that have to be carefully analyzed.

If you are interested in how SQLite read operations perform comparing to other databases and how indexing of a table influences this performance, check our next post!

Let us know, what do you think or share with us your knowledge: