SQLite
SQLite is one of the most versatile data tools to start a project with. It has been used forever by many projects, but with the plethora of available SQL databases, it is often overshadowed by larger SQL server products. In fact, these tools are not very comparable other than that they share a common dialect and basic functions. They are not competitors, but they can accomplish some of the same things, and the administrative and mental overhead for SQLite is so tiny, it allows me to focus energy and brainstorming on the immediate project goals instead of secondary ones.
SQLite's in-memory feature is a convenient and performant way to do unit testing. Using a sterilized test database allows you to check both expected behavior and database state without needing to create and establish a separate local service. If test are run asynchronously, such as with Rust and cargo
, it can even be efficient to create a new, untouched database for each unit test, completely isolating the data from being influenced by any other factors.
Using SQLite instead of large JSON, CSV, or TSV files has many advantages, and if your source data is in a flat file, it can be probably be easily imported into SQLite with its native tools. If you need to migrate data to a different service or even a different language, using a structured database that is supported in almost all languages will be easier and less error prone than dealing with data files. Sometimes it's also necessary to dig deeper into the results of your test data, in which case I'd much rather work with SQL than trying to parse fields or deal with different language-specific JSON or CSV libraries.
The sqlite3
command can give immediate insight independent of the project code. If you have an in-memory test database that you want to inspect after testing, just change the open call from in-memory to a file of your choosing, and the data can be analyzed or easily shared with others.
Query on Wikipedia Data:
bash-5.2$ sqlite3 wikipedia_data.sqlite3
SQLite version 3.43.0 2023-08-24 12:36:59
Enter ".help" for usage hints.
sqlite> .schema --indent
CREATE TABLE IF NOT EXISTS "snip"("uuid" TEXT, "timestamp" TEXT, "name" TEXT, "data" TEXT);
CREATE TABLE snip_index_rs(
term TEXT,
uuid TEXT,
count INTEGER,
positions TEXT
);
sqlite> .timer on
sqlite> select count() from snip_index_rs;
827890
Run Time: real 0.057 user 0.004142 sys 0.024443
sqlite> .mode table
sqlite> SELECT snip_index_rs.uuid, snip.name, snip_index_rs.count FROM snip
...> INNER JOIN snip_index_rs ON snip_index_rs.uuid = snip.uuid
...> WHERE snip_index_rs.term = 'plato' ORDER BY count DESC
...> LIMIT 10;
+--------------------------------------+--------------------------+-------+
| uuid | name | count |
+--------------------------------------+--------------------------+-------+
| 1d90add1-f8eb-424a-91f5-bcd6e25c97a2 | Aristotle | 49 |
| d387a5cd-7eff-4074-99f9-01dadd58e9d0 | Agathon | 17 |
| 49a1cd85-58e8-4d7d-813f-fc327f8c5eca | Aristophanes | 13 |
| 6d2c2d64-4899-4429-8e8a-3612a5e9c251 | Allegory | 9 |
| c1c62783-0ff5-4b06-b405-e5e934cec471 | Ammonius Hermiae | 8 |
| 04ac2bf7-15f0-4b67-adbc-1e623d1a3efb | Anaxagoras | 7 |
| 716a0cae-b40f-4ea1-916b-fbfdb0f21229 | Anaximenes of Miletus | 7 |
| 24395df7-014e-4ccc-9fa3-fd557b460545 | Aldine Press | 6 |
| 71a6c060-a941-4b0d-86e1-ea076f6894ee | Ammonius Saccas | 5 |
| f966a11f-839f-4fd2-a7bf-ca742e709e12 | Art | 4 |
+--------------------------------------+--------------------------+-------+
Run Time: real 0.114 user 0.054661 sys 0.020001
sqlite>
There are also many novel and curious use cases that people have discovered for SQLite, including serving small static files on a web server faster than from the filesystem. I even had a colleague populate a database of around 40GB and still had acceptable performance.
Richard Hipp, the original author, is an enthusiastic advocate for the software, and effectively argues for its increased use. He does a very energetic presentation on the history of SQLite. His energy and passion are rather contagious. It's also a reminder, similar to Linux, that if a simple personal project is useful enough, it can become one of the most popular pieces of software ever deployed.