From 82c0d3fdae2d1da4753ceb18c445d540fe195fc8 Mon Sep 17 00:00:00 2001 From: Dominik <dominik.brilhaus@hhu.de> Date: Wed, 24 Aug 2022 16:09:33 +0200 Subject: [PATCH] add notepad --- _slides/notepad.md | 121 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 121 insertions(+) create mode 100644 _slides/notepad.md diff --git a/_slides/notepad.md b/_slides/notepad.md new file mode 100644 index 0000000..27cb14c --- /dev/null +++ b/_slides/notepad.md @@ -0,0 +1,121 @@ +# Notepad around the workshop + +> this is part of the notes collected during the workshop + +- [RStudio Keyboard Shortcuts](#rstudio-keyboard-shortcuts) +- [Notes on codes](#notes-on-codes) +- [Challenge exercise Day 3](#challenge-exercise-day-3) + - [A. Find data](#a-find-data) + - [B. Design your *in silico* experiment](#b-design-your-in-silico-experiment) + - [C. Run your analysis](#c-run-your-analysis) + +## RStudio Keyboard Shortcuts + +- Execute line / highlighted block of code: `strg (ctrl) + Enter` +- Duplicate line / highlighted block: `strg (ctrl) + shift + d` +- Delete line / highlighted block: `strg (ctrl) + d` +- Interrupt R: `ESC` +- (Un)Comment line / highlighted block: `ctrl (strg) + shift + c` + +## Notes on codes + +1. You can start an interpreter from terminal / command line + +- just type `python`, try some python commands, run `quit()` to exit back to terminal. +- just type `R`, try some R commands, run `q()` to exit back to terminal. + +2. interpreter ~ environment ~ programming language + +- Most (not all) pre-installed on your machine +- gazillion other languages and interpreters (perl, julia, fsharp, ...) + +3. You can write a script (in a simple text editor or IDE), store it and execute it from the command line + +- `bash <nameOfScript>.sh` +- `Rscript <nameOfScript>*.R` +- `python <nameOfScript>*.py` + +4. File extensions + +- more for human than machine +- machine ~ default software to handle specific file types +- File <-> Software association is not "fixed" +- can add any extension, still works (try `bash <nameOfScript>.randomExtension`) + +5. IDEs (Integrated Development Environments) + +- Multi-purpose: Visual Studio Code (+ extensions) +- Good for R: RStudio +- Good for Python: Pycharm + +## Challenge exercise Day 3 + +The pipeline we've shown you in the class was desigend to work (mostly smoothly) with the data, structure and parameters just as we've provided. +Now, let's try to take this to the next level - i.e. transfer it to a real life challenge - by reproducing some RNA-Seq data from a published paper. + +> Tips: +> +> - Along this adventure, you'll probably run into other important topics concerning *good scientific practice* (or bad examples of those). So don't be afraid, if it's harder than it should be. +> - Consider this a challenge somewhere between peer-review, data reproducibility, positive controls (also for yourself ≈> is your pipeline correct?) and FAIR data management +> - To make life easier, don't take the first best paper, but rather search for one where you can somewhat easily answer the questions in (A) + +#### A. Find data + +1. Find a paper from your research area of interest that used **mRNA-Sequencing**. +2. Within that paper, find a figure that **plots gene expression** / transcript abundance of any kind, e.g. + - bar / dot plots of gene expression + - heat maps + - ... + +3. Identify the **experimental design**, e.g. + - What species was/were sequenced? + - How many replicates? + - Controls? + - Different genotypes, ecotypes, treatments, other conditions, ... + +4. What **RNASeq data** was produced or re-used for analysis + - What reference was used? + - Transcriptome? Genome? + - Version? + - Can you find and access (i.e. download) it? + - What reads were produced (i.e. *.fastq files) + - Sequencer? + - Read length? + - Paired or single end? + - Are the reads trimmed / filtered? + - Can you find and access (i.e. download) them? + +#### B. Design your *in silico* experiment + +- From step 3: Pick a small sample subset + - e.g. 3 replicates wildtype and 3 replicates mutant + - Write this down into a simple spreadsheet, e.g. + + | file_name | sample | group | + |:----------------- |:-------- |:------ | + | wt_rep1.fastq.gz | WT_rep1 | WT | + | wt_rep2.fastq.gz | WT_rep2 | WT | + | wt_rep3.fastq.gz | WT_rep2 | WT | + | mut_rep1.fastq.gz | mut_rep1 | mutant | + | mu_rep2.fastq.gz | mut_rep2 | mutant | + | mu_rep3.fastq.gz | mut_rep3 | mutant | + +- From step 4: Download the relevant data, i.e.: + - reference transcriptome or genome + - raw fastq files for your sample subset +- From step 2: pick a simple plot, that you feel manageable to reproduce + +#### C. Run your analysis + +1. Put all data together into one directory (make it write-accessible by the docker container) +2. Start the *rnaseq* docker container +3. `Kallisto build` the index for the reference (section 6.4.1) +4. Write a for loop that... (section 6.4.1) + - ...trims the fastq files (`trimmomatic`) + - ...runs `fastqc` before and after trimming + - ...maps the fastq reads against the reference (`kallisto quant`) +5. Start your RStudio docker +6. Write an R script to + - import the kallisto results (requires `library(sleuth)`) + - analyse the differential gene expression via `sleuth` + - plot the results using `ggplot` -- GitLab