Reorder chromosomes¶
TLDR¶
# Important! --bin-size should be the same resolution as matrix.cool
user@dev:/tmp hictk load <(hictk dump --join matrix.cool) \
output.cool \
--chrom-sizes=<(hictk dump --table=chroms matrix.cool | sort -k2,2nr) \
--format=bg2 \
--bin-size=1kbp \
--transpose-lower-triangular-pixels
Why is this needed?¶
Sometimes we want to compare files using the same reference genome assembly, but with different chromosome orders (e.g. in one file chromosomes are sorted by size while in the other they are sorted by name). This can be a problem especially when trying to visually compare such files. This tutorial shows how to convert a .cool file with chromosomes sorted by name to a .cool file with chromosomes sorted by size. The same procedure can be applied to .hic files.
Walkthrough¶
For this tutorial, we will use file 4DNFIOTPSS3L.hic as an example, which can be downloaded from here.
First, we extract the list of chromosomes from the input file:
user@dev:/tmp hictk dump 4DNFIOTPSS3L.hic --table=chroms | tee chrom.sizes
2L 23513712
2R 25286936
3L 28110227
3R 32079331
4 1348131
X 23542271
Y 3667352
Second, we re-order chromosomes:
user@dev:/tmp sort -k2,2nr chrom.sizes | tee chrom.sizes.sorted
3R 32079331
3L 28110227
2R 25286936
X 23542271
2L 23513712
Y 3667352
4 1348131
Next, we dump pixels in bedGraph2 format (see below for how to make this step more efficient):
user@dev:/tmp hictk dump 4DNFIOTPSS3L.hic --join --resolution 1kbp > pixels.bg2
user@dev:/tmp head pixels.bg2
2L 5000 6000 2L 5000 6000 41
2L 5000 6000 2L 6000 7000 126
2L 5000 6000 2L 7000 8000 60
2L 5000 6000 2L 8000 9000 77
2L 5000 6000 2L 9000 10000 97
2L 5000 6000 2L 10000 11000 3
2L 5000 6000 2L 11000 12000 1
2L 5000 6000 2L 12000 13000 66
2L 5000 6000 2L 13000 14000 116
2L 5000 6000 2L 14000 15000 64
Finally, we load pixels into a new .hic file
user@dev:/tmp hictk load pixels.bg2 \
output.hic \
--chrom-sizes=chrom.sizes.sorted \
--transpose-lower-triangular-pixels \
--format=bg2 \
--bin-size=1kbp
[2024-09-27 19:00:40.344] [info]: Running hictk v1.0.0-fbdcb591
[2024-09-27 19:00:40.353] [info]: begin loading pixels into a .hic file...
[2024-09-27 19:00:42.504] [info]: preprocessing chunk #1 at 4847310 pixels/s...
[2024-09-27 19:00:45.244] [info]: preprocessing chunk #2 at 3649635 pixels/s...
[2024-09-27 19:00:48.180] [info]: preprocessing chunk #3 at 3407155 pixels/s...
[2024-09-27 19:00:50.616] [info]: preprocessing chunk #4 at 4105090 pixels/s...
[2024-09-27 19:00:53.251] [info]: preprocessing chunk #5 at 3203434 pixels/s...
[2024-09-27 19:00:54.358] [info]: writing header at offset 0
[2024-09-27 19:00:54.358] [info]: begin writing interaction blocks to file "output.hic"...
[2024-09-27 19:00:54.358] [info]: [1000 bp] writing pixels for 3R:3R matrix at offset 171...
[2024-09-27 19:01:01.039] [info]: [1000 bp] written 9571521 pixels for 3R:3R matrix
...
[2024-09-27 19:01:26.831] [info]: [1000 bp] initializing expected value vector
[2024-09-27 19:01:32.649] [info]: [1000 bp] computing expected vector density
[2024-09-27 19:01:32.649] [info]: writing 1 expected value vectors at offset 93720080...
[2024-09-27 19:01:32.649] [info]: writing 0 normalized expected value vectors at offset 93848475...
[2024-09-27 19:01:32.682] [info]: ingested 114355295 interactions (48437845 nnz) in 52.337885908s!
Lastly, we check that chromosomes are properly sorted:
user@dev:/tmp hictk dump output.hic --table=chroms
3R 32079331
3L 28110227
2R 25286936
X 23542271
2L 23513712
Y 3667352
4 1348131
Tips and tricks¶
There is one potential problem with the above solution, and that is the size of file pixels.bg2
Luckily, we can completely avoid generating this file by using output redirection and process substitutions:
user@dev:/tmp hictk load <(hictk dump 4DNFIOTPSS3L.hic --join --resolution 1kbp) \
output.hic \
--chrom-sizes=chrom.sizes.sorted \
--transpose-lower-triangular-pixels \
--format=bg2 \
--bin-size=1kbp
Note that hictk still needs to generate some temporary file to load interactions into a new .cool or .hic file.
When processing large files, it is a good idea to specify custom folder where to create temporary files through the --tmpdir flag:
user@dev:/tmp hictk load <(hictk dump 4DNFIOTPSS3L.hic --join --resolution 1kbp) \
output.hic \
--chrom-sizes=chrom.sizes.sorted \
--transpose-lower-triangular-pixels \
--format=bg2 \
--bin-size=1kbp \
--tmpdir=/var/tmp/
Another option you may want to consider when working with .hic files is the --threads option, which can significantly reduce the time required to load interactions into .hic files.