Pixel transformers¶
The transformer library provides a set of common algorithms used to manipulate streams of pixels.
Classes in defined in this library take a pair of pixel iterators or PixelSelectors directly and transform and/or aggregate them in different ways.
Common types¶
Coarsening pixels¶
-
template<typename PixelIt>
class CoarsenPixels¶ Class used to coarsen pixels read from a pair of pixel iterators. The underlying sequence of pixels are expected to be sorted by their genomic coordinates. Coarsening is performed in a streaming fashion, minimizing the number of pixels that are kept into memory at any given time.
- PixelIt first_pixel,
- PixelIt last_pixel,
- std::shared_ptr<const BinTable> source_bins,
- std::size_t factor
Constructor for
CoarsenPixelsclass.first_pixelandlast_pixelsshould be a pair of iterators pointing to the stream of pixels to be coarsened.source_binsis a shared pointer to the bin table to whichfirst_pixelandlast_pixelrefer to.factorshould be an integer value greater than 1, and is used to determine the properties of thetarget_binsBinTableused for coarsening.
Accessors
BinTableaccessors.Iteration
-
[[nodiscard]] begin() const -> iterator;¶
-
[[nodiscard]] end() const -> iterator;¶
-
[[nodiscard]] cbegin() const -> iterator;¶
-
[[nodiscard]] cend() const -> iterator;¶
Return an InputIterator to traverse the coarsened pixels.
Others
Selecting pixels overlapping with a band around the matrix diagonal¶
-
template<typename PixelIt>
class DiagonalBand¶ Class used to select pixels overlapping with a band around the matrix diagonal.
-
DiagonalBand(PixelIt first_pixel, PixelIt last_pixel, std::uint64_t num_bins);¶
Constructor for
DiagonalBandclass.first_pixelandlast_pixelsshould be a pair of iterators pointing to the stream of pixels to be processed.num_binsshould correspond to the width of the band around the matrix diagonal. As all filtering operations are performed based on bin IDs, this transformer is unsuitable for processing pixels originating from files with bin tables of non-uniform size.
Iteration
-
[[nodiscard]] begin() const -> iterator;¶
-
[[nodiscard]] end() const -> iterator;¶
-
[[nodiscard]] cbegin() const -> iterator;¶
-
[[nodiscard]] cend() const -> iterator;¶
Return an InputIterator to traverse the pixels after filtering.
Others*
-
DiagonalBand(PixelIt first_pixel, PixelIt last_pixel, std::uint64_t num_bins);¶
Transforming COO pixels to BG2 pixels¶
-
template<typename PixelIt>
class JoinGenomicCoords¶ Class used to join genomic coordinates onto COO pixels, effectively transforming
ThinPixels intoPixels.-
);¶
Constructor for
JoinGenomicCoordsclass.first_pixelandlast_pixelsshould be a pair of iterators pointing to the stream of pixels to be processed.binsis a shared pointer to the bin table to whichfirst_pixelandlast_pixelrefer to.
Iteration
-
[[nodiscard]] begin() const -> iterator;¶
-
[[nodiscard]] end() const -> iterator;¶
-
[[nodiscard]] cbegin() const -> iterator;¶
-
[[nodiscard]] cend() const -> iterator;¶
Return an InputIterator to traverse the
Pixels.Others*
Merging streams of pre-sorted pixels¶
-
template<typename PixelIt>
class PixelMerger¶ Class used to merge streams of pre-sorted pixels, yielding a sequence of unique pixels sorted by their genomic coordinates. Merging is performed in a streaming fashion, minimizing the number of pixels that are kept into memory at any given time.
Duplicate pixels are aggregated by summing their corresponding interactions. Pixel merging also affects duplicate pixels coming from the same stream.
-
template<typename ItOfPixelIt>
PixelMerger( - ItOfPixelIt first_head,
- ItOfPixelIt last_head,
- ItOfPixelIt first_tail
Constructors taking either two vectors of InputIterators or pairs of iterators to InputIterators.
The
headandtailvectors should contain the iterators pointing to the beginning and end ofThinPixelstreams, respectively.Iteration
-
[[nodiscard]] auto begin() const -> iterator;¶
-
[[nodiscard]] auto end() const noexcept -> iterator;¶
Return an InputIterator to traverse the stream
ThinPixels after merging.Others
-
[[nodiscard]] auto read_all() const -> std::vector<PixelT>;¶
-
template<typename ItOfPixelIt>
Computing common statistics¶
Converting streams of pixels to Arrow Tables¶
-
template<typename PixelIt>
class ToDataFrame¶ - PixelIt first_pixel,
- PixelIt last_pixel,
- DataFrameFormat format = DataFrameFormat::COO,
- std::shared_ptr<const BinTable> bins = nullptr,
- QuerySpan span = QuerySpan::upper_triangle,
- bool include_bin_ids = false,
- bool mirror_pixels = true,
- std::size_t chunk_size = 256'000,
- std::optional<std::uint64_t> diagonal_band_width = {}
- PixelIt first_pixel,
- PixelIt last_pixel,
- std::optional<PixelCoordinates> coord1_,
- std::optional<PixelCoordinates> coord2_ = {},
- DataFrameFormat format = DataFrameFormat::COO,
- std::shared_ptr<const BinTable> bins = nullptr,
- QuerySpan span = QuerySpan::upper_triangle,
- bool include_bin_ids = false,
- bool mirror_pixels = true,
- std::size_t chunk_size = 256'000,
- std::optional<std::uint64_t> diagonal_band_width = {}
- const PixelSelector &sel,
- PixelIt it,
- DataFrameFormat format = DataFrameFormat::COO,
- std::shared_ptr<const BinTable> bins = nullptr,
- QuerySpan span = QuerySpan::upper_triangle,
- bool include_bin_ids = false,
- std::size_t chunk_size = 256'000,
- std::optional<std::uint64_t> diagonal_band_width = {}
Construct an instance of a
ToDataFrameconverter given a stream of pixels delimited byfirst_pixelandlast_pixel, a DataFrameformatand aBinTable. The underlying sequence of pixels are expected to be sorted by their genomic coordinates.The optional argument
spandetermines whether the resultingarrow::Tableshould contain interactions spanning the upper/lower-triangle or all interactions (regardless of whether they are located above or below the genome-wide matrix diagonal). It should be noted that queries spanning the the full-matrix or the lower-triangle are always more expensive because they involve an additional step where pixels are sorted by their genomic coordinates.When provided, the
diagonal_band_widthargument has the same semantics as thenum_binsargument from theDiagonalBandconstructor.When fetching interactions with
span=fullfrom the cis portion of interaction maps using one of the overloads taking a pair of pixel iterators, users should provide a pair of genomic coordinates to ensure that, if necessary, interactions are correctly mirrored.-
[[nodiscard]] std::shared_ptr<arrow::Table> operator()();¶
Convert the stream of pixels into an
arrow::Table.
Converting streams of pixels to Eigen Dense Matrices¶
-
template<typename N, typename PixelSelector>
class ToDenseMatrix¶ -
- ToDenseMatrix(
- PixelSelector sel,
- N n,
- QuerySpan span = QuerySpan::full,
- std::optional<std::uint64_t> diagonal_band_width = {}
- std::shared_ptr<const PixelSelector> sel,
- N n,
- QuerySpan span = QuerySpan::full,
- std::optional<std::uint64_t> diagonal_band_width = {}
Construct an instance of a
ToDenseMatrixconverter given aPixelSelectorobject and a count typen.The optional argument
spandetermines whether the resulting matrix should contain interactions spanning the upper/lower-triangle or all interactions (regardless of whether they are located above or below the genome-wide matrix diagonal). Note that attempting to fetch trans-interactions withspan=QuerySpan::lower_trianglewill result in an exception being thrown. If you need to fetch trans-interactions from the lower-triangle, consider exchanging the range arguments used to fetch interactions, then transpose the resulting matrix.When provided, the
diagonal_band_widthargument has the same semantics as thenum_binsargument from theDiagonalBandconstructor.Convert the stream of pixels into an
MatrixT.
Converting streams of pixels to Eigen Sparse Matrices¶
-
template<typename N, typename PixelSelector>
class ToSparseMatrix¶ -
- ToSparseMatrix(
- PixelSelector sel,
- N n,
- QuerySpan span = QuerySpan::upper_triangle,
- bool minimize_memory_usage = false,
- std::optional<std::uint64_t> diagonal_band_width = {}
- std::shared_ptr<const PixelSelector> sel,
- N n,
- QuerySpan span = QuerySpan::full,
- bool minimize_memory_usage = false,
- std::optional<std::uint64_t> diagonal_band_width = {}
Construct an instance of a
ToSparseMatrixconverter given aPixelSelectorobject and a count typen.The optional argument
spandetermines whether the resulting matrix should contain interactions spanning the upper/lower-triangle or all interactions (regardless of whether they are located above or below the genome-wide matrix diagonal). Note that attempting to fetch trans-interactions withspan=QuerySpan::lower_trianglewill result in an exception being thrown. If you need to fetch trans-interactions from the lower-triangle, consider exchanging the range arguments used to fetch interactions, then transpose the resulting matrix.When
minimize_memory_usage=true, hictk will minimize memory usage by doing two passes over the queried pixels: one to calculate the exact number of entries to allocate for each row in the matrix, and the second pass to fill values in the matrix. This is usually slower than the default strategy, which traverses the data only once (but may overall require more memory than what is strictly needed). It should be noted that matrices are always compressed before being returned. Thus, the memory footprint of the matrices returned byToSparseMatrix::operator()()will be the same regardless of the fill strategy.When provided, the
diagonal_band_widthargument has the same semantics as thenum_binsargument from theDiagonalBandconstructor.Convert the stream of pixels into an
MatrixT.