Thomas 4 days ago
parent
commit
c0060e3f5a
7 changed files with 675 additions and 87 deletions
  1. 143 0
      CLAUDE.md
  2. 45 15
      src/callers/clairs.rs
  3. 79 10
      src/callers/deep_somatic.rs
  4. 89 28
      src/callers/deep_variant.rs
  5. 127 15
      src/callers/nanomonsv.rs
  6. 87 8
      src/callers/savana.rs
  7. 105 11
      src/callers/severus.rs

+ 143 - 0
CLAUDE.md

@@ -0,0 +1,143 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+This is a Rust library for somatic variant calling and analysis from Oxford Nanopore long-read sequencing data. The library provides a complete pipeline from POD5 files through basecalling, alignment, variant calling, annotation, and statistical analysis. It supports execution both locally and via Slurm HPC environments.
+
+## Build and Test Commands
+
+```bash
+# Build the library
+cargo build
+
+# Run tests with full output
+cargo test -- --nocapture
+
+# Run tests with debug logging
+RUST_LOG=debug cargo test -- --nocapture
+
+# Format code
+cargo fmt
+
+# Lint with warnings as errors
+cargo clippy -- -D warnings
+
+# Generate documentation
+cargo doc --open
+```
+
+## Configuration
+
+The library requires a configuration file at `~/.local/share/pandora/pandora-config.toml`. Use `pandora-config.example.toml` as a template. The configuration system uses path templates with placeholders like `{result_dir}`, `{id}`, `{time}`, `{reference_name}`, and `{haplotagged_bam_tag_name}`.
+
+Key configuration sections:
+- Filesystem layout (result directories, temp paths, database location)
+- Reference genome and annotations (FASTA, GFF3, BED files for regions/panels)
+- Tool-specific settings (DeepVariant, ClairS, Savana, Nanomonsv, Severus, Longphase, Modkit)
+- Alignment configuration (Dorado basecalling, samtools parameters)
+- Slurm vs local execution toggle
+
+## Architecture Overview
+
+### Command Execution Pattern
+
+The library uses a trait-based command execution system defined in `src/commands/mod.rs`:
+
+- **`Command` trait**: Provides `init()`, `cmd()`, and `clean_up()` lifecycle methods
+- **`LocalRunner` trait**: Executes commands directly via bash
+- **`SlurmRunner` trait**: Wraps commands with `srun` or `sbatch` for HPC execution
+- **`run!` macro** (line 639): Dispatches to LocalRunner or SlurmRunner based on `config.slurm_runner`
+- **`run_many!` macro** (line 987): Parallelizes multiple commands using Rayon
+
+All external tools (dorado, samtools, bcftools, longphase, modkit) implement these traits, allowing seamless switching between local and Slurm execution.
+
+### Module Organization
+
+- **`callers/`**: Variant calling tool interfaces
+  - `clairs.rs`: ClairS somatic small variant caller with LongPhase haplotagging
+  - `deep_variant.rs`, `deep_somatic.rs`: Google DeepVariant/DeepSomatic wrappers
+  - `nanomonsv.rs`: Structural variant calling (paired tumor/normal)
+  - `savana.rs`: SV and CNV analysis with haplotagged BAM support
+  - `severus.rs`: VNTR and repeat-based variant calling
+
+- **`commands/`**: External command wrappers implementing `Command`, `LocalRunner`, and `SlurmRunner`
+  - `dorado.rs`: Basecalling and alignment from POD5 files
+  - `samtools.rs`, `bcftools.rs`: SAM/BAM/VCF manipulation
+  - `longphase.rs`: Phasing and modcall for methylation
+  - `modkit.rs`: Methylation pileup and summary
+
+- **`collection/`**: Input data discovery and organization
+  - `run.rs`, `prom_run.rs`: PromethION run metadata and POD5 file discovery
+  - `bam.rs`: BAM file collection across cases and time points
+  - `vcf.rs`: VCF file organization
+  - `flowcells.rs`: Flowcell metadata management
+  - `minknow.rs`: MinKNOW sample sheet and telemetry parsing
+
+- **`runners.rs`**: Defines `Run`, `Wait`, `RunWait` traits and `run_wait()` function for command execution lifecycle with timestamped `RunReport` generation
+
+- **`pipes/`**: Multi-caller pipeline composition
+  - `somatic.rs`: Orchestrates full somatic pipeline across ClairS, Nanomonsv, Savana, etc.
+  - `somatic_slurm.rs`: Slurm-optimized batch submission variants
+
+- **`annotation/`**: VEP (Variant Effect Predictor) line parsing and consequence filtering
+
+- **`variant/`**: Variant data structures, loading, filtering, and statistics
+  - `variant.rs`: Core variant types, BND graph construction, alteration categorization
+  - `variant_collection.rs`: Bulk variant loading and grouping operations
+  - `variants_stats.rs`: Mutation rates, depth quality ranges, panel-based stats
+
+- **`io/`**: File readers/writers (BED, GFF, VCF, gzip handling)
+
+- **`positions.rs`**: Genome coordinate representations (`GenomePosition`, `GenomeRange`) with parallel overlap operations
+
+- **`config.rs`**: Global `Config` struct loaded from TOML (line 14 defines the struct)
+
+- **`helpers.rs`**: Path utilities, Shannon entropy, Singularity bind flag generation
+
+- **`scan/`**: Somatic variant scanning algorithms
+
+- **`functions/`**: Genome assembly and custom analysis logic
+
+## Typical Workflow Pattern
+
+1. **POD5 → BAM**: `commands::dorado::Dorado` basecalls and aligns POD5 to reference
+2. **BAM → VCF (Variants)**: Use caller modules (e.g., `callers::clairs::ClairS::initialize(...)?. run()?`)
+3. **VCF → Annotated JSON**: Load with `variant::variant_collection::Variants`, filter, annotate with `annotation::vep`
+4. **Stats Generation**: Create `variant::variants_stats::VariantsStats` for mutation rates and quality metrics
+5. **Multi-case orchestration**: Use `pipes::somatic::Somatic` runner or `collection::run::Collections` for batch processing
+
+## Testing Notes
+
+- Integration tests expect test data at the path in `TEST_DIR` constant (`src/lib.rs:158`): `/mnt/beegfs02/scratch/t_steimle/test_data`
+- If this path is unavailable, tests may fail or need to be skipped
+- Tests are co-located with modules using `#[cfg(test)]`
+
+## Key Dependencies
+
+External tools required at runtime (ensure they are in PATH or configured in config file):
+- minimap2, samtools, bcftools (alignment and BAM/VCF handling)
+- dorado (ONT basecalling)
+- modkit (methylation analysis)
+- VEP (variant annotation; see `pandora_lib_variants` for VEP install)
+- ClairS, DeepVariant/DeepSomatic, Nanomonsv, Savana, Severus, LongPhase (variant callers, via Docker/Singularity)
+
+Rust dependencies of note:
+- `rust-htslib`: HTSlib bindings for BAM/VCF reading (requires `cmake`, `libclang-dev` for build)
+- `rayon`: Parallel iteration across samples and tasks
+- `dashmap`: Concurrent hashmaps for thread-safe collections
+- `arrow`: Efficient columnar data handling (from Apache Arrow)
+- `noodles-*`: Pure-Rust bioinformatics file parsers (FASTA, GFF, CSI)
+
+## Dockerized Tool Execution
+
+Tools like ClairS, DeepVariant, and DeepSomatic run via Singularity containers. The `config.singularity_bin` setting defaults to `module load singularity-ce && singularity`. Image paths are specified per tool in the config (e.g., `deepvariant_image`, `clairs_image`).
+
+## Important Conventions
+
+- Use `anyhow::Result` with `?` operator; avoid `unwrap()` in production code paths
+- Propagate errors with `.context()` for debugging clarity
+- All paths in config use templates; resolve with `format!()` and config field substitution
+- Tumor sample is labeled `tumoral_name` (default "diag"), normal is `normal_name` (default "norm")
+- Haplotagged BAMs use tag name from `haplotagged_bam_tag_name` config field (default "HP")

+ 45 - 15
src/callers/clairs.rs

@@ -12,6 +12,22 @@
 //! - Separate SNV and indel calling pipelines
 //! - Clair3-based germline variant detection on the normal sample
 //!
+//! ## Key Features
+//!
+//! - **Deep learning-based** - CNN models trained on long-read data
+//! - **Haplotype-aware** - Uses phased germline variants for improved accuracy
+//! - **Dual output** - Somatic and germline variants in a single run
+//! - **Platform flexibility** - Supports ONT (R9/R10) and PacBio HiFi
+//! - **Parallel execution** - Genome chunking for HPC scalability
+//!
+//! ## Requirements
+//!
+//! Before running ClairS, ensure:
+//! - Tumor and normal BAMs are indexed (`.bai` files present)
+//! - Reference genome is accessible
+//! - Singularity/Docker image is available
+//! - Platform is correctly specified (`config.clairs_platform`)
+//!
 //! ## Execution Modes
 //!
 //! The module supports three execution strategies:
@@ -174,7 +190,7 @@ use crate::{
 };
 
 use anyhow::Context;
-use log::{debug, info};
+use log::{debug, info, warn};
 use rayon::iter::{IntoParallelRefIterator, ParallelIterator};
 use regex::Regex;
 use rust_htslib::bam::{self, Read};
@@ -184,14 +200,24 @@ use std::{
     process::{Command as ProcessCommand, Stdio},
 };
 
-/// A pipeline runner for executing ClairS on paired tumor and normal samples.
+/// ClairS haplotype-aware somatic variant caller runner.
+///
+/// Executes ClairS for paired tumor-normal variant calling with automatic
+/// post-processing (SNV+indel concatenation, PASS filtering, germline extraction).
 ///
-/// ClairS is a somatic variant caller that uses haplotype tagging from LongPhase.
-/// This struct supports:
-/// - Local execution via `run_local`
-/// - Slurm execution via `run_sbatch`
-/// - Chunked parallel execution via `run_clairs_chunked_sbatch_with_merge`
-/// - bcftools post-processing (germline + somatic PASS)
+/// # Fields
+///
+/// - `id` - Sample identifier (e.g., "34528")
+/// - `config` - Global pipeline configuration
+/// - `log_dir` - Directory for execution logs (e.g., "{result_dir}/{id}/log/clairs")
+/// - `region` - Optional genomic region for targeted calling (e.g., `Some("chr1:1000-2000")`)
+/// - `part_index` - Optional chunk index for parallel execution (e.g., `Some(3)` for part 3 of N)
+///
+/// # Execution Modes
+///
+/// - **Local** - Direct execution via `run_local()`
+/// - **Slurm** - Single job submission via `run_sbatch()`
+/// - **Chunked** - Parallel genome-wide execution via [`run_clairs_chunked_sbatch_with_merge`]
 ///
 /// # Output Files
 ///
@@ -200,20 +226,20 @@ use std::{
 ///
 /// # Chunked Execution
 ///
-/// For large genomes, use [`run_clairs_chunked_sbatch_with_merge`] which:
-/// 1. Splits genome into N regions
-/// 2. Runs ClairS in parallel Slurm jobs
+/// For whole-genome sequencing, use [`run_clairs_chunked_sbatch_with_merge`] which:
+/// 1. Splits genome into N equal-sized regions
+/// 2. Runs ClairS in parallel Slurm jobs (one per region)
 /// 3. Post-processes each part (concat SNV+indel, filter PASS)
 /// 4. Merges all part PASS VCFs into final output
 #[derive(Debug, Clone)]
 pub struct ClairS {
     /// Sample identifier
     pub id: String,
-    /// Pipeline configuration
+    /// Global pipeline configuration
     pub config: Config,
-    /// Log directory for this run
+    /// Directory for log file storage
     pub log_dir: String,
-    /// Optional region for restricted runs (format: `ctg:start-end`)
+    /// Optional genomic region restriction (format: "chr:start-end")
     pub region: Option<String>,
     /// Optional part index for chunked parallel runs (1-indexed)
     pub part_index: Option<usize>,
@@ -510,7 +536,11 @@ impl ClairS {
             .save_to_file(format!("{}/bcftools_pass_", self.log_dir))
             .context("Failed to save PASS filter logs")?;
 
-        fs::remove_file(&tmp_path).ok();
+        // Clean up temporary concatenated VCF
+        debug!("Removing temporary file: {}", tmp_path);
+        if let Err(e) = fs::remove_file(&tmp_path) {
+            warn!("Failed to remove temporary file {}: {}", tmp_path, e);
+        }
 
         Ok(())
     }

+ 79 - 10
src/callers/deep_somatic.rs

@@ -11,6 +11,21 @@
 //! - Paired tumor-normal analysis
 //! - Containerized execution via Singularity
 //!
+//! ## Key Features
+//!
+//! - **Deep learning-based** - CNN models specifically trained for somatic variant detection
+//! - **Tumor-normal paired** - Leverages matched control for improved specificity
+//! - **Platform flexibility** - Supports ONT, PacBio, and Illumina
+//! - **Parallel execution** - Genome chunking for HPC scalability
+//!
+//! ## Requirements
+//!
+//! Before running DeepSomatic, ensure:
+//! - Tumor and normal BAMs are indexed (`.bai` files present)
+//! - Reference genome is accessible
+//! - Singularity/Docker image is available
+//! - Model type matches sequencing platform (`config.deepsomatic_model_type`)
+//!
 //! ## Execution Modes
 //!
 //! Execution mode is automatically selected via `config.slurm_runner`:
@@ -29,15 +44,39 @@
 //!
 //! ## Usage
 //!
+//! ### Chunked Parallel Execution (Recommended for WGS)
+//!
 //! ```ignore
-//! use crate::config::Config;
-//! use crate::pipes::deepsomatic::run_deepsomatic_chunked_with_merge;
+//! use pandora_lib_promethion::callers::deep_somatic::run_deepsomatic_chunked_with_merge;
+//! use pandora_lib_promethion::config::Config;
 //!
 //! let config = Config::default();
+//! // Run DeepSomatic in 30 parallel chunks
 //! let outputs = run_deepsomatic_chunked_with_merge("sample_001", &config, 30)?;
 //! # Ok::<(), anyhow::Error>(())
 //! ```
 //!
+//! ### Single-Run Execution
+//!
+//! ```ignore
+//! use pandora_lib_promethion::callers::deep_somatic::DeepSomatic;
+//! use pandora_lib_promethion::config::Config;
+//! use pandora_lib_promethion::pipes::Initialize;
+//! use pandora_lib_promethion::runners::Run;
+//!
+//! let config = Config::default();
+//! let mut caller = DeepSomatic::initialize("sample_001", &config)?;
+//!
+//! if caller.should_run() {
+//!     caller.run()?;
+//! }
+//!
+//! // Load somatic variants
+//! let variants = caller.variants(&annotations)?;
+//! println!("Found {} somatic variants", variants.variants.len());
+//! # Ok::<(), anyhow::Error>(())
+//! ```
+//!
 //! ## References
 //!
 //! - [DeepSomatic GitHub repository](https://github.com/google/deepsomatic)
@@ -75,12 +114,35 @@ use crate::{
     },
 };
 
+/// DeepSomatic paired tumor-normal somatic variant caller.
+///
+/// Executes DeepSomatic for somatic SNV and indel detection using matched
+/// tumor and normal BAM files with automatic post-processing (PASS filtering).
+///
+/// # Fields
+///
+/// - `id` - Sample identifier (e.g., "34528")
+/// - `regions` - Space-separated list of genomic regions to process (e.g., "chr1 chr2 chr3")
+/// - `log_dir` - Directory for execution logs (e.g., "{result_dir}/{id}/log/deepsomatic")
+/// - `config` - Global pipeline configuration
+/// - `part_index` - Optional chunk index for parallel execution (e.g., `Some(3)` for part 3 of N)
+///
+/// # Execution Modes
+///
+/// - **Local** - Direct execution
+/// - **Slurm** - Single job submission
+/// - **Chunked** - Parallel genome-wide execution via [`run_deepsomatic_chunked_with_merge`]
 #[derive(Debug, Clone)]
 pub struct DeepSomatic {
+    /// Sample identifier
     pub id: String,
+    /// Space-separated list of genomic regions to process
     pub regions: String,
+    /// Directory for log file storage
     pub log_dir: String,
+    /// Global pipeline configuration
     pub config: Config,
+    /// Optional part index for chunked parallel runs (1-indexed)
     pub part_index: Option<usize>,
 }
 
@@ -186,7 +248,7 @@ impl JobCommand for DeepSomatic {
             {binds} \
             --bind {output_dir}:/output \
             {image} \
-            /opt/deepvariant/bin/deepsomtic/run_deepsomatic \
+            /opt/deepvariant/bin/deepsomatic/run_deepsomatic \
             --model_type={model_type} \
             --ref={reference} \
             --reads_normal={normal_bam} \
@@ -339,7 +401,7 @@ impl Version for DeepSomatic {
         let out = ProcessCommand::new("bash")
             .arg("-c")
             .arg(format!(
-                "{} exec {} /opt/deepvariant/bin/deepsomtic/run_deepsomatic --version",
+                "{} exec {} /opt/deepvariant/bin/deepsomatic/run_deepsomatic --version",
                 config.singularity_bin, config.deepsomatic_image
             ))
             .stdout(Stdio::piped())
@@ -377,7 +439,7 @@ impl Version for DeepSomatic {
         impl JobCommand for DeepSomaticVersionJob<'_> {
             fn cmd(&self) -> String {
                 format!(
-                    "{} exec {} /opt/deepvariant/bin/deepsomtic/run_deepsomatic --version",
+                    "{} exec {} /opt/deepvariant/bin/deepsomatic/run_deepsomatic --version",
                     self.config.singularity_bin, self.config.deepsomatic_image
                 )
             }
@@ -515,11 +577,18 @@ pub fn run_deepsomatic_chunked_with_merge(
     // Run DeepSomatic jobs
     let outputs = run_many!(config, jobs.clone())?;
 
-    // Filter PASS variants for each part
-    info!("Filtering PASS variants for {} parts", actual_n_parts);
-    for job in &jobs {
-        job.filter_pass()?;
-    }
+    // Filter PASS variants for each part in parallel
+    info!(
+        "Filtering PASS variants for all {} parts in parallel",
+        actual_n_parts
+    );
+    let filter_jobs: Vec<_> = jobs
+        .iter()
+        .map(|job| {
+            BcftoolsKeepPass::from_config(&job.config, job.output_vcf_path(), job.passed_vcf_path())
+        })
+        .collect();
+    run_many!(config, filter_jobs)?;
 
     // Merge PASS VCFs
     merge_deepsomatic_parts(&base, actual_n_parts)?;

+ 89 - 28
src/callers/deep_variant.rs

@@ -11,6 +11,22 @@
 //! - Haploid calling for sex chromosomes (configurable by karyotype)
 //! - Containerized execution via Singularity/Apptainer
 //!
+//! ## Key Features
+//!
+//! - **Deep learning-based** - CNN models trained on diverse sequencing platforms
+//! - **Karyotype-aware** - Automatically calls X/Y as haploid based on sample sex
+//! - **Solo mode** - Single-sample germline variant calling
+//! - **Platform flexibility** - Supports ONT, PacBio, and Illumina
+//! - **Parallel execution** - Genome chunking for HPC scalability
+//!
+//! ## Requirements
+//!
+//! Before running DeepVariant, ensure:
+//! - BAM file is indexed (`.bai` file present)
+//! - Reference genome is accessible
+//! - Singularity/Docker image is available
+//! - Model type matches sequencing platform (`config.deepvariant_model_type`)
+//!
 //! ## Execution Modes
 //!
 //! Execution mode is automatically selected via `config.slurm_runner`:
@@ -29,15 +45,39 @@
 //!
 //! ## Usage
 //!
+//! ### Chunked Parallel Execution (Recommended for WGS)
+//!
 //! ```ignore
-//! use crate::config::Config;
-//! use crate::pipes::deepvariant::run_deepvariant_chunked_with_merge;
+//! use pandora_lib_promethion::callers::deep_variant::run_deepvariant_chunked_with_merge;
+//! use pandora_lib_promethion::config::Config;
 //!
 //! let config = Config::default();
+//! // Run DeepVariant in 30 parallel chunks for "norm" time point
 //! let outputs = run_deepvariant_chunked_with_merge("sample_001", "norm", &config, 30)?;
 //! # Ok::<(), anyhow::Error>(())
 //! ```
 //!
+//! ### Single-Run Execution
+//!
+//! ```ignore
+//! use pandora_lib_promethion::callers::deep_variant::DeepVariant;
+//! use pandora_lib_promethion::config::Config;
+//! use pandora_lib_promethion::pipes::InitializeSolo;
+//! use pandora_lib_promethion::runners::Run;
+//!
+//! let config = Config::default();
+//! let mut caller = DeepVariant::initialize("sample_001", "norm", &config)?;
+//!
+//! if caller.should_run() {
+//!     caller.run()?;
+//! }
+//!
+//! // Load variants
+//! let variants = caller.variants(&annotations)?;
+//! println!("Found {} germline variants", variants.variants.len());
+//! # Ok::<(), anyhow::Error>(())
+//! ```
+//!
 //! ## References
 //!
 //! - [DeepVariant GitHub repository](https://github.com/google/deepvariant)
@@ -86,46 +126,53 @@ use crate::commands::{
     SlurmParams,
 };
 
-/// Pipeline runner for executing [DeepVariant](https://github.com/google/deepvariant)
-/// on a single sample at a specific time point (e.g., normal or tumor).
+/// DeepVariant solo (single-sample) variant caller.
+///
+/// Executes DeepVariant for germline variant calling on a single BAM file.
+/// Supports karyotype-aware calling for accurate sex chromosome genotyping.
+///
+/// # Fields
+///
+/// - `id` - Sample identifier (e.g., "34528")
+/// - `time_point` - Time point label: typically `config.normal_name` ("norm") or `config.tumoral_name` ("diag")
+/// - `regions` - Genomic regions to process (e.g., "chr1 chr2 chr3" or "chr1:1-1000000")
+/// - `log_dir` - Directory for execution logs (e.g., "{result_dir}/{id}/log/deepvariant")
+/// - `config` - Global pipeline configuration
+/// - `part_index` - Optional chunk index for parallel execution (e.g., `Some(3)` for part 3 of N)
+/// - `karyotype` - Sex karyotype for haploid calling on X/Y chromosomes (XX or XY)
 ///
-/// This struct orchestrates variant calling on a BAM file using DeepVariant
-/// in a Singularity/Apptainer containerized environment. It supports:
+/// # Execution Modes
 ///
-/// - **Conditional execution**: Skips re-running if outputs are up-to-date
-/// - **Local or Slurm execution**: Via [`run_local`] or [`run_sbatch`]
-/// - **Quartered parallel execution**: Via [`run_deepvariant_quartered_sbatch_with_merge`]
-/// - **Annotation integration**: Implements [`CallerCat`], [`Variants`], and [`Label`]
+/// - **Local** - Direct execution via `run_local()`
+/// - **Slurm** - Single job submission via `run_sbatch()`
+/// - **Chunked** - Parallel genome-wide execution via [`run_deepvariant_chunked_with_merge`]
 ///
-/// # Parallelization Strategy
+/// # Karyotype-Aware Calling
 ///
-/// For large genomes, use [`run_deepvariant_quartered_sbatch_with_merge`] which:
-/// 1. Splits chromosomes into N chunks
-/// 2. Submits parallel Slurm jobs
-/// 3. Merges PASS-filtered VCFs via `bcftools concat`
+/// DeepVariant automatically adjusts ploidy based on `karyotype`:
+/// - **XY karyotype**: chrX and chrY called as haploid
+/// - **XX karyotype**: chrX called as diploid, chrY skipped
 #[derive(Debug, Clone)]
 pub struct DeepVariant {
-    /// Sample identifier (e.g., "PATIENT_001")
+    /// Sample identifier
     pub id: String,
 
-    /// Time point label, typically matching [`Config::normal_name`] or [`Config::tumoral_name`]
+    /// Time point identifier (e.g., "norm" or "diag")
     pub time_point: String,
 
-    /// Comma-separated list of genomic regions to process (e.g., "chr1,chr2,chr3")
+    /// Space-separated list of genomic regions to process
     pub regions: String,
 
-    /// Directory for DeepVariant and bcftools log files
+    /// Directory for log file storage
     pub log_dir: String,
 
-    /// Shared pipeline configuration
+    /// Global pipeline configuration
     pub config: Config,
 
-    /// Optional part index for quartered parallel runs (1-indexed)
-    ///
-    /// When `Some(n)`, output files are suffixed with `.partN` to avoid collisions.
+    /// Optional part index for chunked parallel runs (1-indexed)
     pub part_index: Option<usize>,
 
-    // Karyotype for haploid contig handling (default: XY)
+    /// Sex karyotype for haploid contig handling (XX or XY)
     pub karyotype: Karyotype,
 }
 
@@ -169,6 +216,15 @@ impl InitializeSolo for DeepVariant {
         let id = id.to_string();
         let time_point = time_point.to_string();
 
+        // Validate time_point matches configured names
+        anyhow::ensure!(
+            time_point == config.normal_name || time_point == config.tumoral_name,
+            "Invalid time_point '{}': must be either '{}' (normal) or '{}' (tumor)",
+            time_point,
+            config.normal_name,
+            config.tumoral_name
+        );
+
         let karyotype = WGSBamStats::open(&id, &time_point, config)?.karyotype()?;
 
         info!("Initializing DeepVariant for {id} {time_point}.");
@@ -438,10 +494,10 @@ impl CallerCat for DeepVariant {
     /// Maps the time point to either [`Sample::SoloConstit`] (normal) or
     /// [`Sample::SoloTumor`] (tumoral).
     ///
-    /// # Panics
+    /// # Safety
     ///
-    /// Panics if `time_point` doesn't match either configured name.
-    /// Consider returning `Result` for robustness.
+    /// The time_point is validated during initialization, so this can never fail.
+    /// If it does, it indicates a serious logic error in the code.
     fn caller_cat(&self) -> Annotation {
         let Config {
             normal_name,
@@ -453,7 +509,12 @@ impl CallerCat for DeepVariant {
         } else if *tumoral_name == self.time_point {
             Annotation::Callers(Caller::DeepVariant, Sample::SoloTumor)
         } else {
-            panic!("Error in time_point name: {}", self.time_point);
+            // SAFETY: time_point is validated in initialize() to be either normal_name or tumoral_name.
+            // If we reach here, it's a logic error in the code, not a user error.
+            unreachable!(
+                "Invalid time_point '{}': expected '{}' or '{}'. This should have been caught during initialization.",
+                self.time_point, normal_name, tumoral_name
+            )
         }
     }
 }

+ 127 - 15
src/callers/nanomonsv.rs

@@ -1,6 +1,73 @@
-//! NanomonSV structural variant caller orchestration (paired and solo).
+//! # NanomonSV Structural Variant Caller Orchestration
 //!
-//! Runs parse/get and PASS filtering through the shared runner interfaces (local/Slurm) using the global `Config`.
+//! This module provides wrappers for [NanomonSV](https://github.com/friend1ws/nanomonsv),
+//! a structural variant (SV) caller optimized for long-read sequencing data.
+//!
+//! ## Overview
+//!
+//! NanomonSV detects structural variants including:
+//! - Deletions, insertions, duplications
+//! - Inversions and translocations
+//! - Complex rearrangements
+//!
+//! ## Execution Modes
+//!
+//! - **Paired (somatic)** - Compares tumor vs normal BAMs to identify somatic SVs
+//! - **Solo** - Single-sample SV calling without matched control
+//!
+//! Both modes support local and Slurm execution via the `Config.slurm_runner` flag.
+//!
+//! ## Output Files
+//!
+//! Paired mode PASS-filtered VCF:
+//! ```text
+//! {result_dir}/{id}/nanomonsv/{id}_diag_nanomonsv_PASSED.vcf.gz
+//! ```
+//!
+//! Solo mode PASS-filtered VCF:
+//! ```text
+//! {result_dir}/{id}/nanomonsv_solo/{id}_{time_point}_nanomonsv_PASSED.vcf.gz
+//! ```
+//!
+//! ## Usage
+//!
+//! ### Paired (Tumor-Normal) Mode
+//!
+//! ```ignore
+//! use pandora_lib_promethion::callers::nanomonsv::NanomonSV;
+//! use pandora_lib_promethion::config::Config;
+//! use pandora_lib_promethion::pipes::Initialize;
+//! use pandora_lib_promethion::runners::Run;
+//!
+//! let config = Config::default();
+//! let mut caller = NanomonSV::initialize("sample_001", &config)?;
+//!
+//! if caller.should_run() {
+//!     caller.run()?;
+//! }
+//!
+//! // Load variants
+//! let variants = caller.variants(&annotations)?;
+//! println!("Found {} somatic SVs", variants.variants.len());
+//! # Ok::<(), anyhow::Error>(())
+//! ```
+//!
+//! ### Solo Mode
+//!
+//! ```ignore
+//! use pandora_lib_promethion::callers::nanomonsv::NanomonSVSolo;
+//! use pandora_lib_promethion::pipes::InitializeSolo;
+//!
+//! let config = Config::default();
+//! let mut caller = NanomonSVSolo::initialize("sample_001", "norm", &config)?;
+//! caller.run()?;
+//! # Ok::<(), anyhow::Error>(())
+//! ```
+//!
+//! ## References
+//!
+//! - [NanomonSV GitHub](https://github.com/friend1ws/nanomonsv)
+//! - [NanomonSV Paper](https://doi.org/10.1186/s13059-020-02175-y)
 use rayon::prelude::*;
 use std::{
     fs::{self},
@@ -30,17 +97,30 @@ use crate::{
     },
 };
 
-/// Represents the NanomonSV runner, responsible for structural variant calling
-/// from diagnostic and normal BAMs using the NanomonSV tool. This runner initialize,
-/// run, classify, and extract variants from VCF.
+/// NanomonSV paired (tumor-normal) structural variant caller.
+///
+/// Executes the NanomonSV pipeline for somatic SV detection by comparing
+/// tumor and normal BAM files.
+///
+/// # Fields
+///
+/// - `id` - Sample identifier (e.g., "34528")
+/// - `log_dir` - Directory for execution logs (e.g., "{result_dir}/{id}/log/nanomonsv")
+/// - `config` - Global pipeline configuration
+/// - `job_args` - Internal command-line arguments passed to nanomonsv binary
+/// - `threads` - Number of CPU threads for parallel processing (from `config.nanomonsv_threads`)
 #[derive(Debug)]
 pub struct NanomonSV {
+    /// Sample identifier
     pub id: String,
+    /// Directory for log file storage
     pub log_dir: String,
+    /// Global pipeline configuration
     pub config: Config,
 
-    // Command args and threads used by the shared runner.
+    /// Command-line arguments for nanomonsv executable
     job_args: Vec<String>,
+    /// Number of threads for parallel execution
     threads: u8,
 }
 
@@ -293,16 +373,27 @@ impl Label for NanomonSV {
     }
 }
 
-/// NanomonSV caller in solo (single-sample) mode.
+/// NanomonSV solo (single-sample) structural variant caller.
 ///
 /// Processes a single BAM file to detect structural variants without a matched control.
+/// Useful for germline SV detection or when no matched normal is available.
+///
+/// # Fields
+///
+/// - `id` - Sample identifier (e.g., "34528")
+/// - `bam` - Path to input BAM file (e.g., "{bam_dir}/{id}_{time_point}.bam")
+/// - `time_point` - Time point label: typically `config.normal_name` ("norm") or `config.tumoral_name` ("diag")
+/// - `out_dir` - Output directory (e.g., "{result_dir}/{id}/nanomonsv_solo/{time_point}")
+/// - `log_dir` - Log directory (e.g., "{result_dir}/{id}/log/nanomonsv_solo")
+/// - `vcf_passed` - PASS-filtered output VCF path
+/// - `config` - Global pipeline configuration
 #[derive(Debug)]
 pub struct NanomonSVSolo {
     /// Sample identifier
     pub id: String,
     /// Path to input BAM file
     pub bam: String,
-    /// Time point identifier (e.g., "normal" or "tumor")
+    /// Time point identifier (e.g., "norm" or "diag")
     pub time_point: String,
     /// Output directory for NanomonSV results
     pub out_dir: String,
@@ -323,7 +414,18 @@ impl InitializeSolo for NanomonSVSolo {
     /// Returns an error if directory creation fails.
     fn initialize(id: &str, time: &str, config: &Config) -> anyhow::Result<Self> {
         let id = id.to_string();
-        info!("Initialize Nanomonsv solo for {id} {time}.");
+        let time_point = time.to_string();
+
+        // Validate time_point matches configured names
+        anyhow::ensure!(
+            time_point == config.normal_name || time_point == config.tumoral_name,
+            "Invalid time_point '{}': must be either '{}' (normal) or '{}' (tumor)",
+            time_point,
+            config.normal_name,
+            config.tumoral_name
+        );
+
+        info!("Initialize Nanomonsv solo for {id} {time_point}.");
         let log_dir = format!("{}/{}/log/nanomonsv_solo", config.result_dir, &id);
 
         if !Path::new(&log_dir).exists() {
@@ -331,17 +433,17 @@ impl InitializeSolo for NanomonSVSolo {
                 .context(format!("Failed  to create {log_dir} directory"))?;
         }
 
-        let out_dir = config.nanomonsv_solo_output_dir(&id, time);
+        let out_dir = config.nanomonsv_solo_output_dir(&id, &time_point);
         fs::create_dir_all(&out_dir)?;
 
-        let bam = config.solo_bam(&id, time);
+        let bam = config.solo_bam(&id, &time_point);
 
-        let vcf_passed = config.nanomonsv_solo_passed_vcf(&id, time);
+        let vcf_passed = config.nanomonsv_solo_passed_vcf(&id, &time_point);
 
         Ok(Self {
             id,
             bam,
-            time_point: time.to_string(),
+            time_point,
             out_dir,
             log_dir,
             vcf_passed,
@@ -407,6 +509,11 @@ impl Run for NanomonSVSolo {
 
 impl CallerCat for NanomonSVSolo {
     /// Returns the caller annotation based on whether this is a normal or tumor sample.
+    ///
+    /// # Safety
+    ///
+    /// The time_point is validated during initialization, so this can never fail.
+    /// If it does, it indicates a serious logic error in the code.
     fn caller_cat(&self) -> Annotation {
         let Config {
             normal_name,
@@ -418,7 +525,12 @@ impl CallerCat for NanomonSVSolo {
         } else if *tumoral_name == self.time_point {
             Annotation::Callers(Caller::NanomonSVSolo, Sample::SoloTumor)
         } else {
-            panic!("Error in time_point name: {}", self.time_point);
+            // SAFETY: time_point is validated in initialize() to be either normal_name or tumoral_name.
+            // If we reach here, it's a logic error in the code, not a user error.
+            unreachable!(
+                "Invalid time_point '{}': expected '{}' or '{}'. This should have been caught during initialization.",
+                self.time_point, normal_name, tumoral_name
+            )
         }
     }
 }
@@ -657,7 +769,7 @@ pub fn nanomonsv_create_pon(config: &Config, pon_path: &str) -> anyhow::Result<(
                     passed_mrd.push(output);
                 }
             }
-            (Some(_), Some(p), None) => warn!("Prossing csi for {}", p.display()),
+            (Some(_), Some(p), None) => warn!("Processing csi for {}", p.display()),
             (Some(_), Some(p), Some(_)) => passed_mrd.push(p),
             _ => {} // All files found
         }

+ 87 - 8
src/callers/savana.rs

@@ -1,7 +1,75 @@
-//! Savana somatic variant caller orchestration.
+//! # Savana Haplotype-Aware Somatic Variant Caller
 //!
-//! This module wires Savana execution (haplotagging prerequisites, run, PASS filtering)
-//! through the shared runner interfaces (local/Slurm) using the global `Config`.
+//! This module provides wrappers for [Savana](https://github.com/cortes-ciriano-lab/savana),
+//! a haplotype-aware somatic variant caller for structural variants and copy number alterations
+//! from long-read sequencing data.
+//!
+//! ## Overview
+//!
+//! Savana detects:
+//! - Structural variants (SVs): deletions, duplications, inversions, translocations
+//! - Copy number variations (CNVs) with allele-specific information
+//! - Complex genomic rearrangements
+//!
+//! ## Key Features
+//!
+//! - **Haplotype-aware calling** - Uses phased germline variants and haplotagged BAMs
+//! - **Integrated phasing** - Automatically runs LongPhase if needed
+//! - **CNV segmentation** - Provides detailed copy number profiles
+//!
+//! ## Requirements
+//!
+//! Before running Savana, ensure:
+//! - Tumor and normal BAMs are indexed
+//! - Reference genome is accessible
+//! - Conda environment with Savana is configured (`config.conda_sh`)
+//!
+//! ## Output Files
+//!
+//! PASS-filtered somatic variants:
+//! ```text
+//! {result_dir}/{id}/savana/somatic_sv_PASS.vcf.gz
+//! ```
+//!
+//! Copy number segmentation:
+//! ```text
+//! {result_dir}/{id}/savana/copy_number_segmentation.txt.gz
+//! ```
+//!
+//! Read count bins:
+//! ```text
+//! {result_dir}/{id}/savana/read_counts.txt.gz
+//! ```
+//!
+//! ## Usage
+//!
+//! ```ignore
+//! use pandora_lib_promethion::callers::savana::Savana;
+//! use pandora_lib_promethion::config::Config;
+//! use pandora_lib_promethion::pipes::Initialize;
+//! use pandora_lib_promethion::runners::Run;
+//!
+//! let config = Config::default();
+//! let mut caller = Savana::initialize("sample_001", &config)?;
+//!
+//! if caller.should_run() {
+//!     caller.run()?;  // Automatically handles phasing and haplotagging
+//! }
+//!
+//! // Load SV/CNV variants
+//! let variants = caller.variants(&annotations)?;
+//! println!("Found {} somatic SVs/CNVs", variants.variants.len());
+//!
+//! // Load copy number data
+//! let cn_data = SavanaCN::parse_file("sample_001", &config)?;
+//! println!("Copy number segments: {}", cn_data.segments.len());
+//! # Ok::<(), anyhow::Error>(())
+//! ```
+//!
+//! ## References
+//!
+//! - [Savana GitHub](https://github.com/cortes-ciriano-lab/savana)
+//! - [Savana Paper](https://doi.org/10.1038/s41467-022-34590-2)
 use crate::{
     annotation::{Annotation, Annotations, Caller, CallerCat, Sample},
     collection::vcf::Vcf,
@@ -34,16 +102,27 @@ use std::{
     str::FromStr,
 };
 
-/// The `Savana` struct orchestrates the haplotype-aware somatic variant calling
-/// pipeline using the Savana tool. It manages initialization, conditional execution,
-/// phasing dependencies, haplotagging, and output filtering.
+/// Savana haplotype-aware somatic SV and CNV caller.
+///
+/// Orchestrates the Savana pipeline including prerequisite phasing and haplotagging steps.
+/// Automatically invokes LongPhase if required phased VCFs or haplotagged BAMs are missing.
+///
+/// # Fields
+///
+/// - `id` - Sample identifier (e.g., "34528")
+/// - `config` - Global pipeline configuration
+/// - `log_dir` - Directory for execution logs (e.g., "{result_dir}/{id}/log/savana")
+/// - `job_args` - Internal command-line arguments passed to Savana (populated in `run()`)
 #[derive(Debug)]
 pub struct Savana {
+    /// Sample identifier
     pub id: String,
+    /// Global pipeline configuration
     pub config: Config,
+    /// Directory for log file storage
     pub log_dir: String,
 
-    // Arguments for the Savana command (populated before execution).
+    /// Command-line arguments for Savana executable (populated during run)
     job_args: Vec<String>,
 }
 
@@ -149,7 +228,7 @@ impl Run for Savana {
             })?;
 
             // Check for phased germline vcf
-            // no required anymore since >= 1.3.0
+            // not required anymore since >= 1.3.0
             let phased_germline_vcf = self.config.constit_phased_vcf(&self.id);
             if !Path::new(&phased_germline_vcf).exists() {
                 let mut phase = LongphasePhase::initialize(&self.id, &self.config.clone())?;

+ 105 - 11
src/callers/severus.rs

@@ -1,6 +1,82 @@
-//! Severus structural variant caller orchestration.
+//! # Severus Structural Variant Caller
 //!
-//! Uses shared runner traits (local/Slurm) with global `Config` and handles PASS filtering via bcftools.
+//! This module provides wrappers for [Severus](https://github.com/KolmogorovLab/Severus),
+//! a structural variant caller specialized in VNTR (Variable Number Tandem Repeat) detection
+//! and complex SV resolution from long-read sequencing data.
+//!
+//! ## Overview
+//!
+//! Severus detects:
+//! - Structural variants (deletions, insertions, inversions, translocations)
+//! - VNTR expansions and contractions
+//! - Complex nested rearrangements
+//! - Junction-level SV breakpoints with high precision
+//!
+//! ## Key Features
+//!
+//! - **VNTR-aware calling** - Uses VNTR annotations from BED file
+//! - **Phasing integration** - Leverages phased VCFs for haplotype resolution
+//! - **High precision** - Resolves overlapping SVs and ambiguous junctions
+//! - **Alignment writing** - Optional detailed alignment output for validation
+//!
+//! ## Requirements
+//!
+//! - Tumor and normal BAMs (paired mode) or single BAM (solo mode)
+//! - VNTR annotation BED file (`config.vntrs_bed`)
+//! - Phased germline VCF (automatically generated via LongPhase if missing)
+//! - Conda environment with Severus configured
+//!
+//! ## Output Files
+//!
+//! Paired mode PASS-filtered VCF:
+//! ```text
+//! {result_dir}/{id}/severus/severus_PASSED.vcf.gz
+//! ```
+//!
+//! Solo mode PASS-filtered VCF:
+//! ```text
+//! {result_dir}/{id}/severus_solo/{time_point}/severus_PASSED.vcf.gz
+//! ```
+//!
+//! ## Usage
+//!
+//! ### Paired (Tumor-Normal) Mode
+//!
+//! ```ignore
+//! use pandora_lib_promethion::callers::severus::Severus;
+//! use pandora_lib_promethion::config::Config;
+//! use pandora_lib_promethion::pipes::Initialize;
+//! use pandora_lib_promethion::runners::Run;
+//!
+//! let config = Config::default();
+//! let mut caller = Severus::initialize("sample_001", &config)?;
+//!
+//! if caller.should_run() {
+//!     caller.run()?;  // Automatically handles phasing if needed
+//! }
+//!
+//! // Load variants including VNTRs
+//! let variants = caller.variants(&annotations)?;
+//! println!("Found {} SVs (including VNTRs)", variants.variants.len());
+//! # Ok::<(), anyhow::Error>(())
+//! ```
+//!
+//! ### Solo Mode
+//!
+//! ```ignore
+//! use pandora_lib_promethion::callers::severus::SeverusSolo;
+//! use pandora_lib_promethion::pipes::InitializeSolo;
+//!
+//! let config = Config::default();
+//! let mut caller = SeverusSolo::initialize("sample_001", "norm", &config)?;
+//! caller.run()?;
+//! # Ok::<(), anyhow::Error>(())
+//! ```
+//!
+//! ## References
+//!
+//! - [Severus GitHub](https://github.com/KolmogorovLab/Severus)
+//! - [Severus Paper](https://doi.org/10.1038/s41587-024-02340-1)
 use crate::{
     annotation::{Annotation, Annotations, Caller, CallerCat, Sample},
     collection::vcf::Vcf,
@@ -24,13 +100,23 @@ use log::{debug, info};
 use rayon::prelude::*;
 use std::{fs, path::Path};
 
-/// Represents a wrapper around the Severus pipeline, responsible for calling structural variants
-/// using phased VCFs and tumor/control BAMs. It handles initialization, conditional execution,
-/// logging, and cleanup.
+/// Severus paired (tumor-normal) structural variant caller.
+///
+/// Executes Severus for somatic SV detection including VNTR analysis.
+/// Automatically handles prerequisite phasing via LongPhase if needed.
+///
+/// # Fields
+///
+/// - `id` - Sample identifier (e.g., "34528")
+/// - `config` - Global pipeline configuration
+/// - `log_dir` - Directory for execution logs (e.g., "{result_dir}/{id}/log/severus")
 #[derive(Debug)]
 pub struct Severus {
+    /// Sample identifier
     pub id: String,
+    /// Global pipeline configuration
     pub config: Config,
+    /// Directory for log file storage
     pub log_dir: String,
 }
 
@@ -269,7 +355,7 @@ impl Label for Severus {
 }
 
 impl Version for Severus {
-    /// Retrieves the Severus version by running `severus --version` in its coda environment.
+    /// Retrieves the Severus version by running `severus --version` in its conda environment.
     ///
     /// # Errors
     /// Returns an error if command execution fails or "Version " not found in output.
@@ -289,18 +375,26 @@ impl Version for Severus {
     }
 }
 
-/// Severus SV caller in solo (single-sample) mode.
+/// Severus solo (single-sample) structural variant caller.
+///
+/// Detects structural variants including VNTRs from a single BAM file without a matched control.
+/// Useful for germline SV detection or when no matched normal is available.
+///
+/// # Fields
 ///
-/// Detects structural variants from long-read sequencing data without a matched control.
+/// - `id` - Sample identifier (e.g., "34528")
+/// - `time` - Time point label: typically `config.normal_name` ("norm") or `config.tumoral_name` ("diag")
+/// - `config` - Global pipeline configuration
+/// - `log_dir` - Directory for execution logs (e.g., "{result_dir}/{id}/log/severus_solo")
 #[derive(Debug)]
 pub struct SeverusSolo {
     /// Sample identifier
     pub id: String,
-    /// Time point identifier (e.g., "normal" or "tumor")
+    /// Time point identifier (e.g., "norm" or "diag")
     pub time: String,
-    /// Pipeline configuration
+    /// Global pipeline configuration
     pub config: Config,
-    /// Directory for log files
+    /// Directory for log file storage
     pub log_dir: String,
 }
 impl InitializeSolo for SeverusSolo {