diff --git a/assays/BacterialGenomeAssemblyAndAnnotation/README.md b/assays/BacterialGenomeAssemblyAndAnnotation/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/assays/BacterialGenomeAssemblyAndAnnotation/dataset/.gitkeep b/assays/BacterialGenomeAssemblyAndAnnotation/dataset/.gitkeep new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/assays/BacterialGenomeAssemblyAndAnnotation/isa.assay.xlsx b/assays/BacterialGenomeAssemblyAndAnnotation/isa.assay.xlsx new file mode 100644 index 0000000000000000000000000000000000000000..32b9da3d095f128a27995a6d20d3f525ff96bc09 Binary files /dev/null and b/assays/BacterialGenomeAssemblyAndAnnotation/isa.assay.xlsx differ diff --git a/assays/BacterialGenomeAssemblyAndAnnotation/protocols/.gitkeep b/assays/BacterialGenomeAssemblyAndAnnotation/protocols/.gitkeep new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/assays/BacterialGenomeAssemblyAndAnnotation/protocols/BacterialGenomeAssemblyAndAnnotationProtocol.md b/assays/BacterialGenomeAssemblyAndAnnotation/protocols/BacterialGenomeAssemblyAndAnnotationProtocol.md new file mode 100644 index 0000000000000000000000000000000000000000..7a7880509a2a67b3e2577c1c419fc03faeef8422 --- /dev/null +++ b/assays/BacterialGenomeAssemblyAndAnnotation/protocols/BacterialGenomeAssemblyAndAnnotationProtocol.md @@ -0,0 +1,3 @@ +## Bacterial genome assembly and annotation + +Paired-end Illumina reads were first subjected to length-trimming and quality-filtering using Trimmomatic (Bolger, A. M. et al., 2014). Reads were assembled using the A5 assembly pipeline (Tritt, A. et al, 2012), which uses the IDBA algorithm (Peng, Y. et al., 2012) to assemble error-corrected reads. Detailed assembly statistics and corresponding metadata can be found in Supplementary Data 2. Genomes with multi-modal *k*-mer and GC content distributions or multiple instances of marker genes from diverse taxonomic groups were flagged as not originating from clonal cultures. These samples were processed using a metagenome binning approach (Pasolli, E. et al., 2019). Briefly, contigs from each metagenome sample were clustered using METABAT2 (Kang, D. D. et al., 2019), followed by an assessment of completeness and contamination of each metagenome-assembled genome using CheckM (Parks, D. H. et al., 2015). Only bins with completeness scores larger than 75% and contamination rates lower than 5% were retained and added to the collection (Supplementary Data 2, designated metagenome-assembled genome (MAG) in the column ‘type’). Functional annotation of genes was conducted using Prokka and using a custom database based on Kyoto Encyclopedia of Genes and Genomes (KEGG) orthologue groups (Kanehisa, M. et al., 2014) downloaded from the KEGG FTP server in November 2019. Hits to sequences in the database were filtered using an E value threshold of 10 × 10−9 and a minimum coverage of 80% of the length of the query sequence. \ No newline at end of file