## Install ### Dependencies For building required HTSlib: ``` sudo apt install cmake libclang-dev ``` ### Windows build Install the native dependencies with Rtools/MSYS2: ```powershell pacman -S --needed mingw-w64-x86_64-curl mingw-w64-x86_64-sqlite3 mingw-w64-x86_64-openssl mingw-w64-x86_64-tre mingw-w64-x86_64-libiconv mingw-w64-x86_64-gettext ``` Make sure these user environment variables are set: ```powershell [Environment]::SetEnvironmentVariable("LIBCLANG_PATH", "$env:USERPROFILE\tools\libclang-win", "User") [Environment]::SetEnvironmentVariable("PKG_CONFIG", "C:\rtools45\usr\bin\pkg-config.exe", "User") [Environment]::SetEnvironmentVariable("PKG_CONFIG_PATH", "C:\rtools45\mingw64\lib\pkgconfig", "User") ``` Also ensure the user `Path` contains: ```text %USERPROFILE%\tools\libclang-win C:\rtools45\mingw64\bin C:\rtools45\usr\bin ``` Then open a fresh PowerShell in the repository and run: ```powershell .\setup-patches.ps1 cargo build ``` `setup-patches.ps1` generates the Windows-only local Cargo patch config in `.cargo/config.toml` and patched crate sources in `patches/`. Both are ignored by git. This keeps Linux builds on the registry crates while Windows builds use the patched `hts-sys` / `rust-htslib` sources. * minimap2 * (samtools)[https://www.htslib.org/download/] * (dorado)[https://github.com/nanoporetech/dorado] * (bcftools)[https://www.htslib.org/download/] * (modkit)[https://github.com/nanoporetech/modkit] * VEP: cf pandora_lib_variants for VEP install * nanomonsv (cf dependencies at github, TODO: use racon) ## Usage ### SomaticPipe output container A proposed single-file container for SomaticPipe results is documented in [`docs/somaticpipe-output-format.md`](docs/somaticpipe-output-format.md). ### Use jq for selecting variants * Somatic Variants of chrM (25) ``` zcat /data/longreads_basic_pipe/*/diag/somatic_variants.json.gz | \ jq -L ./jq_filters -C 'include "jq_variants"; [.data[] | select(contig("chrM") and n_in_constit <= 1) | format]' ``` ### Using jq and find to look for chrM norm coverage ``` find /data/longreads_basic_pipe/ -name "*_diag_hs1_info.json" -type f -exec sh -c 'basename $(dirname $(dirname "{}")) | tr -d "\n"' \; -printf "\t" -exec jq -L ./jq_filters -r 'include "jq_bam"; contig_coverage("chrM")' {} \; ``` ### Using jq and find VEP consequences (cf https://www.ensembl.org/info/genome/variation/prediction/predicted_data.html) ``` zcat /data/longreads_basic_pipe/ADJAGBA/diag/somatic_variants.json.gz | jq -L ./jq_filters -C 'include "jq_variants"; consequence("SynonymousVariant")' | bat ``` ### Using jq and find to count VEP consequences ``` find /data/longreads_basic_pipe/ -name "somatic_variants.json.gz" -type f -exec sh -c 'dirname=$(basename $(dirname $(dirname "$1"))); count=$(zcat "$1" | jq -L ./jq_filters -r '\''include "jq_variants"; count_consequence("SynonymousVariant") | [.true_count, .total_count, .proportion] | @tsv'\''); echo "${dirname}\t${count}"' sh {} \; ``` ### Find recurrence by VEP consequence ``` find /data/longreads_basic_pipe/ -name "somatic_variants.json.gz" -type f -exec sh -c 'dirname=$(basename $(dirname $(dirname "$1"))); count=$(zcat "$1" | jq -L ./jq_filters -r '\''include "jq_variants"; consequence("StopGained") | .[] | select(.has_consequence == true) | [.chr, .position, .ref, .alt] | @tsv'\''); echo "${count}"' sh {} \; | sort -k1,1V -k2,2n | uniq -c | awk '$1 > 1 {print $2"\t"$3"\t"$4"\t"$5"\t"$1}' ``` ### Reading log files ``` zcat /data/longreads_basic_pipe/ID/log/deepsomatic/deepvariant_e7ed1.log.gz | jq -r '.log' ```