Skip to content
Snippets Groups Projects
Forked from an inaccessible project.
Arslan Tariq's avatar
Arslan Tariq authored
70367249
History
Name Last commit Last update
scripts
README.md
environment.yaml

PoreC_analysis

This repository contains the PoreC data analysis pipeline, encompassing both the pre-processing of PoreC data and the scaffolding of genomic sequences using PoreC data. This pipline is inspired by wf-pore-c and HapHiC.

Installation

Make a new conda enviornment and install dependencies:

conda env create -f environment.yaml
conda activate PoreC-env
conda install bioconda::minimap2
conda install bioconda::samtools
conda install bioconda::yahs

Additionally, this pipeline requires two scripts, rename_reads_for_bam.py and filter_bam.py which are borrowed from HapHiC. These scripts are availble in the scripts directory.

Getting started

1- Make index of contig files

samtools faidx genome.contig.fasta

2- Digest chimeric reads

ENZYME="DpnII"
THREADS=56
pore-c-py digest PoreC.reads.fq.gz  ${ENZYME} --threads ${THREADS} --max_monomers 250 > Digest.bam

Here, you can adjust the enzyme and threads settings based on the enzyme used in the PoreC library preparation and the available computational resources.

3- Getting digest reads

samtools fastq -@ 8 -T '*' Digest.bam > Digest.tag.fastq

4- Map to the genome

Now map these monomers to the genome.

minimap2 -ay -x map-ont -t ${THREADS} genome.contig.fasta Digest.tag.fastq >Digest.map.sam

5- Annotate links

SAMPLE="genome"
args="--chromunity --chromunity_merge_distance -1 --paired_end --filter_pairs --paired_end_minimum_distance 100 --paired_end_maximum_distance 200"

pore-c-py annotate Digest.map.sam ${SAMPLE}_annotated --monomers --threads ${THREADS} --stdout --summary ${args} |\
        tee ${SAMPLE}_out.ns.bam |\
        samtools sort -@ ${THREADS} -m 2g -u --write-index -o ${SAMPLE}.cs.bam -

⚠️ If you get any buffer size error then use following args:

args="--paired_end --filter_pairs --paired_end_minimum_distance 100 --paired_end_maximum_distance 200"

6- Sort by name

YaHS and SALSA require name sorted bam file.

samtools sort -@ ${THREADS} -n ${SAMPLE}_annotated.pe.bam -o ${SAMPLE}_pe_ns.bam

7- More pre-processing

rename_reads_for_bam.py ${SAMPLE}_pe_ns.bam | samtools view - -@ ${THREADS} -S -h -b -F 3340 -o ${SAMPLE}_poreC.bam
filter_bam.py ${SAMPLE}_poreC.bam 1 --threads ${THREADS} | samtools view -b -@ ${THREADS} -o ${SAMPLE}_poreC.filtered.bam

8- Scaffolding

Now run YaHS to make scaffold.

yahs -o ${SAMPLE}_yahs genome.contig.fasta ${SAMPLE}_poreC.filtered.bam