diff --git a/workflows/preprocessing_data/scripts/hapmap_convertion_scripts/README.md b/workflows/preprocessing_data/scripts/hapmap_convertion_scripts/README.md index 4633751d3644f169e2b8c446071e3cfc3f7cda2c..8027fc180c17477416762e37d942cb038a02a045 100644 --- a/workflows/preprocessing_data/scripts/hapmap_convertion_scripts/README.md +++ b/workflows/preprocessing_data/scripts/hapmap_convertion_scripts/README.md @@ -1,4 +1,5 @@ -# The directory contains scripts used to: +# Directory Desccription -- convert genotype hapmap file to other file formats (h5, hapmap, vcf and plink) -- impute missing snps with heterozygous snps at the respective positions +- The directory contains scripts used to: + - convert genotype hapmap file to other file formats (h5, hapmap, vcf and plink). + - impute missing snps with heterozygous snps at the respective positions. diff --git a/workflows/preprocessing_data/scripts/workflow_order.md b/workflows/preprocessing_data/scripts/workflow_order.md index 77acdce9c38327e28b1aad9f9d694687d6218a4f..cc7e428bb5ae4dc074d0972ab90f44f780a4c9f3 100644 --- a/workflows/preprocessing_data/scripts/workflow_order.md +++ b/workflows/preprocessing_data/scripts/workflow_order.md @@ -2,11 +2,18 @@ This document describes the order in which the scripts were executed -1. `remove_spp_tags_in_hapmap_files.sh` +1. Covertion of initial 'raw' genetic data from VCF format to hapmap format was done by loading them directly to TASSEL software [doi:10.1093/bioinformatics/btm308] and saving them as diploid hapmap files + + The respective raw data files can be located in the directories: + + - /initial_data/data/genetic/filtered/\*.vcf + - /initial_data/data/meta/ADN_pasap_3604.txt + +2. `remove_spp_tags_in_hapmap_files.sh` - Removes ssp tags on the overall teosinte hapmap files ids -2. `geno_pheno_accession_selection_3455_accesions_matched.ipynb` +3. `geno_pheno_accession_selection_3455_accesions_matched.ipynb` - Code used to filter out genotypes not in the phenotype accessions and vice verser for GWAS analysis using GAPIT - Individual taxa subsets are also generated within the notebook. -3. `extract_indiv_spp_pheno_data.sh` +4. `extract_indiv_spp_pheno_data.sh` - Extaracts and creates individual species phenotype files -4. The subdirectory `hapmap_convertion_scripts` contains scripts used to convert the hapmap genotype file type to other formats (plink, numeric, H5 formats) +5. The subdirectory `hapmap_convertion_scripts` contains scripts used to convert the hapmap genotype file type to other formats (plink, numeric, H5 formats)