|
@@ -1,21 +1,23 @@
|
|
|
-//! # Long-read Somatic Variant Calling and Analysis Framework
|
|
|
|
|
|
|
+//! # 🧬 Long-read Somatic Variant Calling and Analysis Framework
|
|
|
//!
|
|
//!
|
|
|
//! This Rust library provides a modular, parallelizable framework for somatic variant calling, annotation, and interpretation from long-read sequencing data. It is designed to support full pipelines for research and clinical workflows across multiple variant callers and analysis stages.
|
|
//! This Rust library provides a modular, parallelizable framework for somatic variant calling, annotation, and interpretation from long-read sequencing data. It is designed to support full pipelines for research and clinical workflows across multiple variant callers and analysis stages.
|
|
|
//!
|
|
//!
|
|
|
-//! ## Key Features
|
|
|
|
|
|
|
+//! The library also serves as an extensible platform that developers can leverage to add custom features, integrate new tools, and tailor workflows to specific use cases.
|
|
|
|
|
+//!
|
|
|
|
|
+//! ## 🧩 Key Features
|
|
|
//!
|
|
//!
|
|
|
-//! - **Pipeline Management**: Full orchestration of Dockerized execution pipelines for tools such as ClairS, Nanomonsv, DeepVariant, Savana, Modkit, and Severus.
|
|
|
|
|
//! - **POD5 Demultiplexing and Alignment**: End-to-end support for processing ONT POD5 files:
|
|
//! - **POD5 Demultiplexing and Alignment**: End-to-end support for processing ONT POD5 files:
|
|
|
-//! - Demux using barcode metadata and custom CSV input
|
|
|
|
|
-//! - POD5 subsetting and organization by flowcell case
|
|
|
|
|
|
|
+//! - Barcode-aware demultiplexing using metadata CSVs
|
|
|
|
|
+//! - POD5 subsetting and organization by case
|
|
|
//! - Integration with basecallers (e.g., Dorado) for read alignment
|
|
//! - Integration with basecallers (e.g., Dorado) for read alignment
|
|
|
|
|
+//! - **Pipeline Management**: Full orchestration of Dockerized execution pipelines for tools such as ClairS, Nanomonsv, DeepVariant, Savana, Modkit, and Severus.
|
|
|
//! - **Flexible Configuration**: Centralized configuration system (`Config`, `CollectionsConfig`) for all modules and pipelines.
|
|
//! - **Flexible Configuration**: Centralized configuration system (`Config`, `CollectionsConfig`) for all modules and pipelines.
|
|
|
//! - **Input Abstraction**: Unified handling of BAM, POD5, and VCF file collections across cohorts and directories.
|
|
//! - **Input Abstraction**: Unified handling of BAM, POD5, and VCF file collections across cohorts and directories.
|
|
|
//! - **Variant Processing**: Modular loading, filtering, statistical analysis, and annotation of somatic and germline variants.
|
|
//! - **Variant Processing**: Modular loading, filtering, statistical analysis, and annotation of somatic and germline variants.
|
|
|
//! - **Haplotype Phasing and Methylation**: Support for LongPhase-based phasing and Modkit methylation pileups with support for multi-threaded pileup and aggregation.
|
|
//! - **Haplotype Phasing and Methylation**: Support for LongPhase-based phasing and Modkit methylation pileups with support for multi-threaded pileup and aggregation.
|
|
|
//! - **Parallel Execution**: Uses `rayon` for efficient multicore parallelization over large cohorts and tasks.
|
|
//! - **Parallel Execution**: Uses `rayon` for efficient multicore parallelization over large cohorts and tasks.
|
|
|
//!
|
|
//!
|
|
|
-//! ## Module Highlights
|
|
|
|
|
|
|
+//! ## 📚 Module Highlights
|
|
|
//!
|
|
//!
|
|
|
//! - `callers`: Interfaces to variant calling tools (ClairS, DeepVariant, Nanomonsv, Savana, etc...)
|
|
//! - `callers`: Interfaces to variant calling tools (ClairS, DeepVariant, Nanomonsv, Savana, etc...)
|
|
|
//! - `runners`: Pipeline runners (e.g. `Somatic`, `SeverusSolo`, `LongphasePhase`) that manage end-to-end execution.
|
|
//! - `runners`: Pipeline runners (e.g. `Somatic`, `SeverusSolo`, `LongphasePhase`) that manage end-to-end execution.
|
|
@@ -25,26 +27,22 @@
|
|
|
//! - `functions`: Custom logic for genome assembly, entropy estimation, and internal tooling.
|
|
//! - `functions`: Custom logic for genome assembly, entropy estimation, and internal tooling.
|
|
|
//! - `positions`, `variant`, `helpers`: Utilities for SV modeling, variant filtering, position overlap logic, and helper methods.
|
|
//! - `positions`, `variant`, `helpers`: Utilities for SV modeling, variant filtering, position overlap logic, and helper methods.
|
|
|
//!
|
|
//!
|
|
|
-//! ---
|
|
|
|
|
-//!
|
|
|
|
|
-//! ## 🧬 Workflow Overview
|
|
|
|
|
|
|
+//! ## ⚡ Workflow Overview
|
|
|
//!
|
|
//!
|
|
|
//! ### 1. 📦 From POD5 to BAM Alignment
|
|
//! ### 1. 📦 From POD5 to BAM Alignment
|
|
|
//!
|
|
//!
|
|
|
//! - **Demultiplexing**: POD5 files are subset and demuxed using barcodes (via CSV metadata).
|
|
//! - **Demultiplexing**: POD5 files are subset and demuxed using barcodes (via CSV metadata).
|
|
|
-//! - **Flowcell Case Management**: Each sample is identified by a `FlowCellCase` containing its ID, time point, and POD5 directory.
|
|
|
|
|
-//! - **Alignment**: The `Dorado` module handles alignment of POD5 reads to reference genome, producing BAMs.
|
|
|
|
|
|
|
+//! - **Flowcell Case Management**: Each sample is identified by a [`collection::pod5::FlowCellCase`] containing its ID, time point, and POD5 directory.
|
|
|
|
|
+//! - **Alignment**: The [`commands::dorado::Dorado`] module handles alignment of POD5 reads to reference genome, producing BAMs.
|
|
|
//!
|
|
//!
|
|
|
//! ```rust
|
|
//! ```rust
|
|
|
//! let case = FlowCellCase { id: "PATIENT1", time_point: "diag", barcode: "01", pod_dir: "...".into() };
|
|
//! let case = FlowCellCase { id: "PATIENT1", time_point: "diag", barcode: "01", pod_dir: "...".into() };
|
|
|
//! Dorado::init(case, Config::default())?.run_pipe()?;
|
|
//! Dorado::init(case, Config::default())?.run_pipe()?;
|
|
|
//! ```
|
|
//! ```
|
|
|
//!
|
|
//!
|
|
|
-//! ---
|
|
|
|
|
-//!
|
|
|
|
|
//! ### 2. 🧬 Variant Calling (BAM ➝ VCF)
|
|
//! ### 2. 🧬 Variant Calling (BAM ➝ VCF)
|
|
|
//!
|
|
//!
|
|
|
-//! Using the aligned BAMs, multiple variant callers can be run in parallel. The `callers` and `runners` modules support:
|
|
|
|
|
|
|
+//! Using the aligned BAMs, multiple variant callers can be run in parallel. The [`callers`] and [`runners`] modules support:
|
|
|
//!
|
|
//!
|
|
|
//! - **ClairS** – somatic small variant calling with LongPhase haplotagging
|
|
//! - **ClairS** – somatic small variant calling with LongPhase haplotagging
|
|
|
//! - **Nanomonsv** – structural variants (SV)
|
|
//! - **Nanomonsv** – structural variants (SV)
|
|
@@ -60,14 +58,12 @@
|
|
|
//! NanomonSV::initialize("PATIENT1", Config::default())?.run()?;
|
|
//! NanomonSV::initialize("PATIENT1", Config::default())?.run()?;
|
|
|
//! ```
|
|
//! ```
|
|
|
//!
|
|
//!
|
|
|
-//! ---
|
|
|
|
|
-//!
|
|
|
|
|
//! ### 3. 📈 Aggregation & Statistics (VCF ➝ JSON / Stats)
|
|
//! ### 3. 📈 Aggregation & Statistics (VCF ➝ JSON / Stats)
|
|
|
//!
|
|
//!
|
|
|
//! After variant calling:
|
|
//! After variant calling:
|
|
|
//!
|
|
//!
|
|
|
-//! - Annotate with VEP (`annotation` module)
|
|
|
|
|
-//! - Load and filter with `variant_collection`
|
|
|
|
|
|
|
+//! - Annotate with VEP ([`annotation`] module)
|
|
|
|
|
+//! - Load and filter with [`variant::variant_collection`]
|
|
|
//! - Compute variant and region-level stats (e.g., mutation rates, alteration categories, coding overlaps)
|
|
//! - Compute variant and region-level stats (e.g., mutation rates, alteration categories, coding overlaps)
|
|
|
//!
|
|
//!
|
|
|
//! ```rust
|
|
//! ```rust
|
|
@@ -76,8 +72,6 @@
|
|
|
//! stats.save_to_json("/output/path/stats.json.gz")?;
|
|
//! stats.save_to_json("/output/path/stats.json.gz")?;
|
|
|
//! ```
|
|
//! ```
|
|
|
//!
|
|
//!
|
|
|
-//! ---
|
|
|
|
|
-//!
|
|
|
|
|
//! ### 4. 🧠 Intelligent Task Management (`collection` module)
|
|
//! ### 4. 🧠 Intelligent Task Management (`collection` module)
|
|
|
//!
|
|
//!
|
|
|
//! - Auto-discovers available samples, POD5s, BAMs, and VCFs
|
|
//! - Auto-discovers available samples, POD5s, BAMs, and VCFs
|
|
@@ -90,20 +84,6 @@
|
|
|
//! collections.run()?; // Run them automatically
|
|
//! collections.run()?; // Run them automatically
|
|
|
//! ```
|
|
//! ```
|
|
|
//!
|
|
//!
|
|
|
-//! ---
|
|
|
|
|
-//!
|
|
|
|
|
-//! ## 📁 Module Highlights
|
|
|
|
|
-//!
|
|
|
|
|
-//! - `callers`: Interfaces to ClairS, DeepVariant, Savana, Nanomonsv, etc.
|
|
|
|
|
-//! - `runners`: Pipeline runners like `Somatic` and `LongphasePhase`
|
|
|
|
|
-//! - `collection`: Auto-discovery of BAM/VCF/POD5s, task orchestration
|
|
|
|
|
-//! - `annotation`: VEP line parsing and transcript-level annotations
|
|
|
|
|
-//! - `pipes`: High-level pipelines (e.g., `run_somatic`, `todo_deepvariants`)
|
|
|
|
|
-//! - `variant`: Variant structs, filtering, alteration categories
|
|
|
|
|
-//! - `positions`, `helpers`, `functions`, `math`: Utility layers
|
|
|
|
|
-//!
|
|
|
|
|
-//! ---
|
|
|
|
|
-//!
|
|
|
|
|
//! ## 🔬 Testing
|
|
//! ## 🔬 Testing
|
|
|
//!
|
|
//!
|
|
|
//! Integration tests demonstrate the entire pipeline. Run with logging enabled:
|
|
//! Integration tests demonstrate the entire pipeline. Run with logging enabled:
|
|
@@ -113,8 +93,7 @@
|
|
|
//! cargo test -- --nocapture
|
|
//! cargo test -- --nocapture
|
|
|
//! ```
|
|
//! ```
|
|
|
//!
|
|
//!
|
|
|
-//! ---
|
|
|
|
|
-//! ## Example Use Cases
|
|
|
|
|
|
|
+//! ## 🧪 Example Use Cases
|
|
|
//!
|
|
//!
|
|
|
//! - Full somatic variant calling pipeline on matched tumor/normal samples
|
|
//! - Full somatic variant calling pipeline on matched tumor/normal samples
|
|
|
//! - POD5-based pipeline from raw signal to variants
|
|
//! - POD5-based pipeline from raw signal to variants
|
|
@@ -122,30 +101,22 @@
|
|
|
//! - Methylation analysis using nanopore-specific tools
|
|
//! - Methylation analysis using nanopore-specific tools
|
|
|
//! - Variant calling and analysis in large-scale longitudinal studies
|
|
//! - Variant calling and analysis in large-scale longitudinal studies
|
|
|
//!
|
|
//!
|
|
|
-//! ## Getting Started
|
|
|
|
|
|
|
+//! ## 🚀 Getting Started
|
|
|
//!
|
|
//!
|
|
|
//! All workflows are initialized from `Config` and driven by the `Collections` structure:
|
|
//! All workflows are initialized from `Config` and driven by the `Collections` structure:
|
|
|
//!
|
|
//!
|
|
|
//! ```rust
|
|
//! ```rust
|
|
|
-//! let config = Config::default();
|
|
|
|
|
//! let collections = Collections::new(CollectionsConfig::default())?;
|
|
//! let collections = Collections::new(CollectionsConfig::default())?;
|
|
|
//! collections.todo()?;
|
|
//! collections.todo()?;
|
|
|
//! collections.run()?;
|
|
//! collections.run()?;
|
|
|
//! ```
|
|
//! ```
|
|
|
//!
|
|
//!
|
|
|
-//! ## Running Tests
|
|
|
|
|
-//!
|
|
|
|
|
-//! Run the full suite with logging enabled:
|
|
|
|
|
-//!
|
|
|
|
|
-//! ```bash
|
|
|
|
|
-//! export RUST_LOG=debug
|
|
|
|
|
-//! cargo test -- --nocapture
|
|
|
|
|
-//! ```
|
|
|
|
|
-//!
|
|
|
|
|
//! ## 🔗 References
|
|
//! ## 🔗 References
|
|
|
-//! ### Basecalling and alignment
|
|
|
|
|
|
|
+//!
|
|
|
|
|
+//! **Basecalling and alignment**
|
|
|
//! - Dorado: <https://github.com/nanoporetech/dorado>
|
|
//! - Dorado: <https://github.com/nanoporetech/dorado>
|
|
|
-//! ### Variants Callers
|
|
|
|
|
|
|
+//!
|
|
|
|
|
+//! **Variants Callers**
|
|
|
//! - ClairS: <https://github.com/HKU-BAL/ClairS>
|
|
//! - ClairS: <https://github.com/HKU-BAL/ClairS>
|
|
|
//! - Nanomonsv: <https://github.com/friend1ws/nanomonsv>
|
|
//! - Nanomonsv: <https://github.com/friend1ws/nanomonsv>
|
|
|
//! - Savana: <https://github.com/cortes-ciriano-lab/savana>
|
|
//! - Savana: <https://github.com/cortes-ciriano-lab/savana>
|
|
@@ -153,7 +124,8 @@
|
|
|
//! - DeepSomatic: <https://github.com/google/deepsomatic>
|
|
//! - DeepSomatic: <https://github.com/google/deepsomatic>
|
|
|
//! - LongPhase: <https://github.com/PorubskyResearch/LongPhase>
|
|
//! - LongPhase: <https://github.com/PorubskyResearch/LongPhase>
|
|
|
//! - Modkit: <https://github.com/nanoporetech/modkit>
|
|
//! - Modkit: <https://github.com/nanoporetech/modkit>
|
|
|
-//! ### Variants annotation
|
|
|
|
|
|
|
+//!
|
|
|
|
|
+//! **Variants annotation**
|
|
|
//! - VEP: <https://www.ensembl.org/info/docs/tools/vep/index.html>
|
|
//! - VEP: <https://www.ensembl.org/info/docs/tools/vep/index.html>
|
|
|
//!
|
|
//!
|
|
|
//! ---
|
|
//! ---
|
|
@@ -693,11 +665,10 @@ mod tests {
|
|
|
#[test]
|
|
#[test]
|
|
|
fn pipe_somatic() -> anyhow::Result<()> {
|
|
fn pipe_somatic() -> anyhow::Result<()> {
|
|
|
init();
|
|
init();
|
|
|
- let id = "ACHITE";
|
|
|
|
|
|
|
+ let id = "AOUF";
|
|
|
SomaticPipe::initialize(id, Config::default())?.run()
|
|
SomaticPipe::initialize(id, Config::default())?.run()
|
|
|
}
|
|
}
|
|
|
|
|
|
|
|
-
|
|
|
|
|
#[test]
|
|
#[test]
|
|
|
fn overlaps() {
|
|
fn overlaps() {
|
|
|
init();
|
|
init();
|