- for FRAGMENTED TEs use: .fasta.mod.EDTA.TEanno.gff3 file
- for INTACT TEs use: .fasta.mod.EDTA.intact.gff3 file
make sure to delete headers starting with ### in the respecrtive files, they can't be read by numpy!
### Fig. 2: EDTA results analysis:
The [code for the results presented in Fig.2](https://git.nfdi4plants.org/hhu-plant-biochemistry/triesch2023_brassicaceae_transposons/-/blob/main/workflows/Fig2_genome_size.ipynb) contains a basic analysis of the `EDTA` results. The genome size were hard-coded from the amount of bases in the respective genome .fasta files. It was distinguished between the "FRAGMENTED" and "INTACT" outputs of `EDTA`.
### Fig.3: TE classes
The [code for Fig.3](https://git.nfdi4plants.org/hhu-plant-biochemistry/triesch2023_brassicaceae_transposons/-/blob/main/workflows/Fig3_TE_types.ipynb) contains a breakdown of the TE classes as analyzed in `EDTA`. The lengths of the TEs (as numbers of base pairs) were counted and compared.
### Fig.4: LTR age calculation
For the [LTR age calculation as presented in Fig.4](https://git.nfdi4plants.org/hhu-plant-biochemistry/triesch2023_brassicaceae_transposons/-/blob/main/workflows/Fig4_LTR_age.ipynb) LTR-TEs were extracted from the `EDTA` results, sorted by photosynthesis phenotype and visualized. Furthermore, statistical parameters were calculated.
### Fig.5 f: TE-gene association:
The [TE-gene association analysis](https://git.nfdi4plants.org/hhu-plant-biochemistry/triesch2023_brassicaceae_transposons/-/blob/main/workflows/Fig5_TE_gene_association.py) was conducted using the .gff3 annotation files from `Helixer` and `EDTA`. Only the INTACT TEs predicted by `EDTA` were used, the FRAGMENTED were ignored to reduce the amount of false-positive hits. For each contigs, it was check if a TE was starting/ending in a gene, residing inside a gene, spanning a gene or residing up- or downstream of a gene. Strand specificity was considered. Results were written to a `.tsv` file and visualized using [this code](https://git.nfdi4plants.org/hhu-plant-biochemistry/triesch2023_brassicaceae_transposons/-/blob/main/workflows/Fig5_stackedbar.ipynb). <br>
Single genes were visualized using [this code snippet](https://git.nfdi4plants.org/hhu-plant-biochemistry/triesch2023_brassicaceae_transposons/-/blob/main/workflows/Fig6_single_gene_visualization.ipynb). <br>
Statistics were performed using [this code](https://git.nfdi4plants.org/hhu-plant-biochemistry/triesch2023_brassicaceae_transposons/-/blob/main/workflows/Tab1_statistics.ipynb).