CNV-ECOD: A copy number variation detection method based on ECOD algorithm using next-generation sequencing data
Ranran Sun, Jinxin Dong, Hua Jiang, Ruchao Du, Yuxi ZhangCopy number variation (CNV), as a major type of DNA structural variations (SVs), plays a key role in causing human diseases and contributing to genetic diversity. Accurate identification of CNVs is significant for disease mechanism analysis, personalized diagnosis and treatment, and drug development. Although next-generation sequencing (NGS) technology has greatly promoted the development of CNV detection methods, the existing methods generally have problems such as high false positives and inaccurate boundaries. Therefore, a new method is proposed for detecting CNVs in a single sample of NGS data, called CNV-ECOD. The method first employs the empirical-cumulative-distribution-based outlier detection (ECOD) algorithm to identify abnormal signals of read depth (RD) for preliminary detection of CNVs. To correct false positives and refine CNV boundaries further, it integrates paired-end mapping (PEM) and split read (SR) strategies. The integration of the RD-PEM-SR hierarchical progressive framework and the anomaly scoring mechanism based on ECOD can effectively improve the accuracy of CNV detection. Comparing our approach to four peer methods, simulation results demonstrate that it achieves the best balance between precision and sensitivity. Also, the proposed method has the best F1-scores and the highest overlap density scores (ODSs) in real-sample experiments. Therefore, CNV-ECOD is expected to develop into an efficient and robust CNV detection tool.