add processing of rRNA data assay and protocol

89e23df4 · Viktoria Petrova · 52f89db2 · 89e23df4 · 89e23df4 · 89e23df4
Commit 89e23df4 authored 8 months ago by Viktoria Petrova
--- a/assays/ProcessingOf16SrRNAGeneAmpliconData/README.md
+++ b/assays/ProcessingOf16SrRNAGeneAmpliconData/README.md
--- a/assays/ProcessingOf16SrRNAGeneAmpliconData/dataset/.gitkeep
+++ b/assays/ProcessingOf16SrRNAGeneAmpliconData/dataset/.gitkeep
--- a/assays/ProcessingOf16SrRNAGeneAmpliconData/isa.assay.xlsx
+++ b/assays/ProcessingOf16SrRNAGeneAmpliconData/isa.assay.xlsx
--- a/assays/ProcessingOf16SrRNAGeneAmpliconData/protocols/.gitkeep
+++ b/assays/ProcessingOf16SrRNAGeneAmpliconData/protocols/.gitkeep
--- a/assays/ProcessingOf16SrRNAGeneAmpliconData/protocols/ProcessingOf16SrRNAGeneAmpliconDataProtocol.md
+++ b/assays/ProcessingOf16SrRNAGeneAmpliconData/protocols/ProcessingOf16SrRNAGeneAmpliconDataProtocol.md
+## Processing of 16S rRNA gene amplicon data
+Amplicon sequencing data from *Lj* (Thiergart, T. et al., 2019) and *At* (Duran, P. et al., 2018) roots of plants grown in CAS soil in the greenhouse, along with unplanted controls, were demultiplexed according to their barcode sequence using the QIIME (Caporaso, J. G. et al., 2010) pipeline. DADA2 (Callahan, B. J. et al., 2016) was used to process the raw sequencing reads of each sample. Unique amplicon variants (ASVs) were inferred from error-corrected reads, followed by chimera filtering, also using the DADA2 pipeline. ASVs were aligned to the SILVA database (Quast, C. et al., 2013) for the taxonomic assignment using the naïve Bayesian classifier implemented by DADA2. Raw reads were mapped to the inferred ASVs to generate a relative abundance table, which was subsequently used for analyses of diversity and differential abundance using the R package vegan (Oksanen, J. et al., 2007).
+Amplicon sequencing reads from the *Lotus* and *Arabidopsis* (Bai, Y. et al., 2015) IRLs and from their corresponding culture-independent root community profiling were quality-filtered and demultiplexed according to their two-barcode (well and plate) identifiers using custom scripts and a combination of tools included in the QIIME (Caporaso, J. G. et al., 2010) and USEARCH (Edgar, R. C., 2010) pipelines. Sequences were clustered into OTUs with a 97% sequence identity similarity using the UPARSE algorithm, followed by identification of chimeras using UCHIME (Edgar, R. C. et al., 2011). Samples (wells) with fewer than 100 good quality reads were removed from the data set as well as OTUs not found in a well with at least ten reads. A purity threshold of 90% was chosen for identification of recoverable OTUs. We identified *Lj*-IRL samples matching OTUs found in the culture-independent root samples and selected a set of 294 representative strains maximizing taxonomic coverage for subsequent validation and WGS, forming the basis of the core *Lj*-SPHERE collection.
+Sequencing data from SynCom experiments (including FlowPot and millifluidics experiments) were preprocessed similarly as natural community 16S rRNA data. Quality-filtered, merged paired-end reads were then aligned to a reference set of sequences extracted from the whole-genome assemblies of every strain included in a given gnotobiotic experiment, using USEARCH (uparse_ref command) (Edgar, R. C., 2013). Only sequences with a perfect match to the reference database were retained. We checked that the fraction of unmapped reads did not significantly differ between compartment, experiment or host species. We generated a count table that was used for downstream analyses of diversity with the R package vegan (Oksanen, J. et al., 2007). We visualized amplicon data from all experimental systems using the ggplot2 R package (Wickham, H., 2016).
\ No newline at end of file