Bam Query Index (qidx)
Warning: this is a work in progress
qidx is tool for indexing BAM alignments by query name. While
samtools have the ability to sort data by query name (also called
the read name), htslib does not provide built-in utilities
to retrieve alignments by query name. This can be advantageous
for examining multi-mapped alignments.
A utility bri predated qidx and also
indexes BAM files by query name. Yet, it reads all alignments into memory
which is impractical for most human genome data.
Notes:
- Currently,
qidxis very inefficient in terms of disk space. When indexing a 33GiB BAM file (Illumina 35x), it takes up 22GiB on disk when using STD compression. It initially maps 1.2TiB into memory. This is reduced to ~120GiB due to file holes. ZSTD block compression again reduces this to 22GiB. When I get a chance, I hope to look into this further. qidxcreates a disk-backed hashset using a sparse memory-mapped file. The underlying operating system must supportmmapand file holes.qidxdoesn't currently support compression. it is currently recommended to use block-level compression (such aszfszstdcompression.)- The bamfile must be sorted by query name before the index is built
samtools sort -n.
