diff --git a/README.md b/README.md index 28de16e3b13ced0767e7ce3df0badd373bb50a93..429731f1aacb01ace93a5c5ef28af2158f9abfb3 100644 --- a/README.md +++ b/README.md @@ -141,32 +141,6 @@ An overall GWAS summary plot was generated using the script: --- -<!-- ## GWAS Analysis - -### Data Preprocessing for GWAS analysis - -The aforementioned analysis section's scripts and files are located in the directory [workflows/preprocessing_data](workflows/preprocessing_data). - -1. **Conversion of Genetic Data**: - The initial 'raw' genetic data, provided in VCF format, was converted to hapmap format using TASSEL software [doi:10.1093/bioinformatics/btm308]. The files were loaded directly into TASSEL and saved as diploid hapmap files. The respective raw data files used can be found in the following directories: - - - `/initial_data/data/genetic/filtered/*.vcf` - - `/initial_data/data/meta/ADN_pasap_3604.txt` - -2. **Hapmap File Cleaning**: - After conversion, species-specific tags were removed from the overall teosinte hapmap file IDs using the script `remove_spp_tags_in_hapmap_files.sh` to ensure standardized identifiers across all files. - -3. **Genotype-Phenotype Accession Matching and taxon subset data files**: - The Jupyter notebook `geno_pheno_accession_selection_3455_accesions_matched.ipynb` was used to filter out genotypes that did not have corresponding phenotype data, and vice versa, in preparation for GWAS analysis using GAPIT. This notebook also generated individual taxa subsets for further analysis. - -4. **Genotype File Conversion**: - The hapmap genotype files were further converted to various formats (Plink, numeric, H5) using scripts located in the `hapmap_convertion_scripts` subdirectory. These conversions facilitated compatibility with other analysis tools used in downstream analyses. - -5. **Phenotype Data Extraction**: - The script `extract_indiv_spp_pheno_data.sh` was used to extract and create individual species phenotype files. This step was essential to ensure that each species had its own specific phenotype dataset for analysis. - -The cleaned and filtered genotype and phenotype files were saved in the `/mnt/data/joseph/TEOSINTE/analyses/GWAS/data/` or `studies/processed_genotype_phenotype_teosinte_data` directory, ready for use in the GWAS pipeline. --> - ## GWAS Analysis ### Data Preprocessing for GWAS Analysis @@ -455,7 +429,7 @@ Rscript mercator_enrichment_ura.R ../../results_mercator_heatmaps_enrichment/enr #### Jaccard Similarity index -This was calculated and plotted following the steps outlined in the Jupyter notebook located at: `workflows/snp_gene_neighborhood_pipeline/scripts/snps_and_proteins_jaccard_index` +This was calculated and plotted following the steps outlined in the Jupyter notebook located at: `workflows/snp_gene_neighborhood_pipeline/scripts/snps_and_proteins_jaccard_index.ipynb` ---