PoreC_analysis
This repository contains the PoreC data analysis pipeline, encompassing both the pre-processing of PoreC data and the scaffolding of genomic sequences using PoreC data. This pipline is inspired by wf-pore-c and HapHiC.
Installation
Make a new conda enviornment and install dependencies:
conda env create -f environment.yaml
conda activate PoreC-env
conda install bioconda::minimap2
conda install bioconda::samtools
conda install bioconda::yahs
Additionally, this pipeline requires two scripts, rename_reads_for_bam.py and filter_bam.py which are borrowed from HapHiC. These scripts are availble in the scripts directory.
Getting started
1- Make index of contig files
samtools faidx genome.contig.fasta
2- Digest chimeric reads
ENZYME="DpnII"
THREADS=56
pore-c-py digest PoreC.reads.fq.gz ${ENZYME} --threads ${THREADS} --max_monomers 250 > Digest.bam
Here, you can adjust the enzyme and threads settings based on the enzyme used in the PoreC library preparation and the available computational resources.
3- Getting digest reads
samtools fastq -@ 8 -T '*' Digest.bam > Digest.tag.fastq
4- Map to the genome
Now map these monomers to the genome.
minimap2 -ay -x map-ont -t ${THREADS} genome.contig.fasta Digest.tag.fastq >Digest.map.sam
5- Annotate links
SAMPLE="genome"
args="--chromunity --chromunity_merge_distance -1 --paired_end --filter_pairs --paired_end_minimum_distance 100 --paired_end_maximum_distance 200"
pore-c-py annotate Digest.map.sam ${SAMPLE}_annotated --monomers --threads ${THREADS} --stdout --summary ${args} |\
tee ${SAMPLE}_out.ns.bam |\
samtools sort -@ ${THREADS} -m 2g -u --write-index -o ${SAMPLE}.cs.bam -
⚠️ If you get any buffer size error then use following args:
args="--paired_end --filter_pairs --paired_end_minimum_distance 100 --paired_end_maximum_distance 200"
6- Sort by name
YaHS and SALSA require name sorted bam file.
samtools sort -@ ${THREADS} -n ${SAMPLE}_annotated.pe.bam -o ${SAMPLE}_pe_ns.bam
7- More pre-processing
rename_reads_for_bam.py ${SAMPLE}_pe_ns.bam | samtools view - -@ ${THREADS} -S -h -b -F 3340 -o ${SAMPLE}_poreC.bam
filter_bam.py ${SAMPLE}_poreC.bam 1 --threads ${THREADS} | samtools view -b -@ ${THREADS} -o ${SAMPLE}_poreC.filtered.bam
8- Scaffolding
Now run YaHS to make scaffold.
yahs -o ${SAMPLE}_yahs genome.contig.fasta ${SAMPLE}_poreC.filtered.bam