From f08b099ef0f181eceda0b70dc857db8b1fcbe034 Mon Sep 17 00:00:00 2001
From: Joseph Atemia <j.atemia@fz-juelich.de>
Date: Thu, 26 Sep 2024 20:01:01 +0200
Subject: [PATCH] update: documentation

---
 .../scripts/hapmap_convertion_scripts/README.md   |  7 ++++---
 .../preprocessing_data/scripts/workflow_order.md  | 15 +++++++++++----
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/workflows/preprocessing_data/scripts/hapmap_convertion_scripts/README.md b/workflows/preprocessing_data/scripts/hapmap_convertion_scripts/README.md
index 4633751d36..8027fc180c 100644
--- a/workflows/preprocessing_data/scripts/hapmap_convertion_scripts/README.md
+++ b/workflows/preprocessing_data/scripts/hapmap_convertion_scripts/README.md
@@ -1,4 +1,5 @@
-# The directory contains scripts used to:
+# Directory Desccription
 
-- convert genotype hapmap file to other file formats (h5, hapmap, vcf and plink)
-- impute missing snps with heterozygous snps at the respective positions
+- The directory contains scripts used to:
+  - convert genotype hapmap file to other file formats (h5, hapmap, vcf and plink).
+  - impute missing snps with heterozygous snps at the respective positions.
diff --git a/workflows/preprocessing_data/scripts/workflow_order.md b/workflows/preprocessing_data/scripts/workflow_order.md
index 77acdce9c3..cc7e428bb5 100644
--- a/workflows/preprocessing_data/scripts/workflow_order.md
+++ b/workflows/preprocessing_data/scripts/workflow_order.md
@@ -2,11 +2,18 @@
 
 This document describes the order in which the scripts were executed
 
-1. `remove_spp_tags_in_hapmap_files.sh`
+1. Covertion of initial 'raw' genetic data from VCF format to hapmap format was done by loading them directly to TASSEL software [doi:10.1093/bioinformatics/btm308] and saving them as diploid hapmap files
+
+   The respective raw data files can be located in the directories:
+
+   - /initial_data/data/genetic/filtered/\*.vcf
+   - /initial_data/data/meta/ADN_pasap_3604.txt
+
+2. `remove_spp_tags_in_hapmap_files.sh`
    - Removes ssp tags on the overall teosinte hapmap files ids
-2. `geno_pheno_accession_selection_3455_accesions_matched.ipynb`
+3. `geno_pheno_accession_selection_3455_accesions_matched.ipynb`
    - Code used to filter out genotypes not in the phenotype accessions and vice verser for GWAS analysis using GAPIT
    - Individual taxa subsets are also generated within the notebook.
-3. `extract_indiv_spp_pheno_data.sh`
+4. `extract_indiv_spp_pheno_data.sh`
    - Extaracts and creates individual species phenotype files
-4. The subdirectory `hapmap_convertion_scripts` contains scripts used to convert the hapmap genotype file type to other formats (plink, numeric, H5 formats)
+5. The subdirectory `hapmap_convertion_scripts` contains scripts used to convert the hapmap genotype file type to other formats (plink, numeric, H5 formats)
-- 
GitLab