DOI: 10.1093/bioinformatics/btag339 ISSN: 1367-4811

ReadChop: a high-performance demultiplexer for long-read sequencing data

Chen Jiang, Yuanyan Xiong

Abstract

Summary

Long-read sequencing (LRS) platforms offer extended read lengths but present computational challenges due to high error rates and frequent insertion-deletion (indel) artifacts. While sample multiplexing is essential for cost-efficiency, existing demultiplexing solutions face a dichotomy: vendor-provided tools (e.g., Dorado) often lack the structural flexibility required for highly non-canonical designs, while open-source tools (e.g., Cutadapt) often lack the speed or algorithmic robustness to handle custom, high-complexity barcode designs. Here, we present ReadChop, a high-performance demultiplexer implemented in Rust. ReadChop leverages Myers’ bit-parallel algorithm to efficiently model indel-rich error profiles and employs a streaming architecture to ensure low memory footprint. Benchmarking demonstrates that ReadChop achieves classification precision exceeding 99.99% on both simulated datasets—even under ultra-high multiplexing conditions (e.g., 13,824-plex)—and empirical SARS-CoV-2 amplicons. Furthermore, it excels in filtering in silico chimeras (0.1% miss rate) and exhibits linear computational scalability on ultra-long templates (up to 100 kb). Crucially, it significantly accelerates execution speeds—being >6 times faster than Dorado, >2 times faster than Nanoplexer, and >30 times faster than Cutadapt—with memory usage consistently below 200 MB. ReadChop provides a flexible, robust solution for processing massive LRS datasets with non-canonical experimental designs.

Availability and Implementation

Source code and documentation are freely available under the MIT license at https://github.com/cherryamme/ReadChop.

Supplementary information

Supplementary data are available at Bioinformatics online.

More from our Archive