Improving database performance

Author

Published

December 20, 2020

I am a bit of a DB n00b but for a project I had to setup and query an rather larger sqlite database from an existing dump and schema. Here are a couple if things I leaned after asking for help.

/dev/shm/

I had no idea this magical place existed in Unix systems! Basically it’s like /tmp but in RAM so everything happens much faster. This meant that inserting tables with several GB went down from over a day to about an hour - the final size of the DB was 50GB.

Another trick was to use TRANSACTION before INSERT:

BEGIN TRANSACTION;
.separator "\t"
.import protein.tsv protein
.import genome.tsv genome
.import protein_info.tsv protein_info
COMMIT;

Whilst the goal of transactions is to keep data consistent, it also seems to have an effect on the speed of transactions. I am not an expert, quite far from it, all I know is that it definitely helped.

Index tables

If the tricks about were helpful in speeding up db creation, I then had to contend with fairly complex queries in a large db - plenty of merges going on. INDEX came to the rescue:

CREATE INDEX seqid_index ON protein_info(seqid); /*speed up queries*/

I didn’t index all the tables, just a few key ones, but I guess it wouldn’t harm to have done it for all.

Final words

I would still be stuck in the first step if I had not asked for help from a couple of colleagues that are more experienced in databases. One doesn’t have to know everything, but it is good to know persons that know.

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{domingues2020,
  author = {Domingues, António},
  title = {Improving Database Performance},
  date = {2020-12-20},
  url = {https://amjdomingues.com/posts/2020-12-20-db-hacks/},
  langid = {en}
}

For attribution, please cite this work as:

Domingues, António. 2020. “Improving Database Performance.” December 20, 2020. https://amjdomingues.com/posts/2020-12-20-db-hacks/.