Improving database performance
I am a bit of a DB n00b but for a project I had to setup and query an rather larger sqlite
database from an existing dump and schema. Here are a couple if things I leaned after asking for help.
/dev/shm/
I had no idea this magical place existed in Unix systems! Basically it’s like /tmp
but in RAM so everything happens much faster. This meant that inserting tables with several GB went down from over a day to about an hour - the final size of the DB was 50GB.
Another trick was to use TRANSACTION
before INSERT
:
BEGIN TRANSACTION;
"\t"
.separator
.import protein.tsv protein
.import genome.tsv genome
.import protein_info.tsv protein_infoCOMMIT;
Whilst the goal of transactions is to keep data consistent, it also seems to have an effect on the speed of transactions. I am not an expert, quite far from it, all I know is that it definitely helped.
Index tables
If the tricks about were helpful in speeding up db creation, I then had to contend with fairly complex queries in a large db - plenty of merges going on. INDEX
came to the rescue:
CREATE INDEX seqid_index ON protein_info(seqid); /*speed up queries*/
I didn’t index all the tables, just a few key ones, but I guess it wouldn’t harm to have done it for all.
Final words
I would still be stuck in the first step if I had not asked for help from a couple of colleagues that are more experienced in databases. One doesn’t have to know everything, but it is good to know persons that know.
Reuse
Citation
@online{domingues2020,
author = {Domingues, António},
title = {Improving Database Performance},
date = {2020-12-20},
url = {https://amjdomingues.com/posts/2020-12-20-db-hacks/},
langid = {en}
}