add notepad

82c0d3fd · Dominik Brilhaus · 19a14248 · 82c0d3fd
Commit 82c0d3fd authored 2 years ago by Dominik Brilhaus
--- a/_slides/notepad.md
+++ b/_slides/notepad.md
+# Notepad around the workshop
+
+> this is part of the notes collected during the workshop
+
+- [RStudio Keyboard Shortcuts](#rstudio-keyboard-shortcuts)
+- [Notes on codes](#notes-on-codes)
+- [Challenge exercise Day 3](#challenge-exercise-day-3)
+    - [A. Find data](#a-find-data)
+    - [B. Design your *in silico* experiment](#b-design-your-in-silico-experiment)
+    - [C. Run your analysis](#c-run-your-analysis)
+
+## RStudio Keyboard Shortcuts
+
+- Execute line / highlighted block of code: `strg (ctrl) + Enter`
+- Duplicate line / highlighted  block: `strg (ctrl) + shift + d`
+- Delete line / highlighted  block: `strg (ctrl) + d`
+- Interrupt R: `ESC`
+- (Un)Comment line / highlighted  block: `ctrl (strg) + shift + c`
+
+## Notes on codes
+
+1. You can start an interpreter from terminal / command line
+
+- just type `python`, try some python commands, run `quit()` to exit back to terminal.
+- just type `R`, try some R commands, run `q()` to exit back to terminal.
+
+2. interpreter ~ environment ~ programming language
+
+- Most (not all) pre-installed on your machine
+- gazillion other languages and interpreters (perl, julia, fsharp, ...)
+
+3. You can write a script (in a simple text editor or IDE), store it and execute it from the command line
+
+- `bash <nameOfScript>.sh`
+- `Rscript <nameOfScript>*.R`
+- `python <nameOfScript>*.py`
+
+4. File extensions
+
+- more for human than machine
+- machine ~ default software to handle specific file types
+- File <-> Software association is not "fixed"
+- can add any extension, still works (try `bash <nameOfScript>.randomExtension`)
+
+5. IDEs (Integrated Development Environments)
+
+- Multi-purpose: Visual Studio Code (+ extensions)
+- Good for R: RStudio
+- Good for Python: Pycharm
+
+## Challenge exercise Day 3
+
+The pipeline we've shown you in the class was desigend to work (mostly smoothly) with the data, structure and parameters just as we've provided.
+Now, let's try to take this to the next level - i.e. transfer it to a real life challenge - by reproducing some RNA-Seq data from a published paper.
+
+> Tips:
+>
+> - Along this adventure, you'll probably run into other important topics concerning *good scientific practice* (or bad examples of those). So don't be afraid, if it's harder than it should be.
+> - Consider this a challenge somewhere between peer-review, data reproducibility, positive controls (also for yourself ≈> is your pipeline correct?) and FAIR data management
+> - To make life easier, don't take the first best paper, but rather search for one where you can somewhat easily answer the questions in (A)
+
+#### A. Find data
+
+1. Find a paper from your research area of interest that used **mRNA-Sequencing**.
+2. Within that paper, find a figure that **plots gene expression** / transcript abundance of any kind, e.g.
+    - bar / dot plots of gene expression
+    - heat maps
+    - ...
+
+3. Identify the **experimental design**, e.g.
+    - What species was/were sequenced?
+    - How many replicates?
+    - Controls?
+    - Different genotypes, ecotypes, treatments, other conditions, ...
+
+4. What **RNASeq data** was produced or re-used for analysis
+    - What reference was used?
+        - Transcriptome? Genome?
+        - Version?
+        - Can you find and access (i.e. download) it?
+    - What reads were produced (i.e. *.fastq files)
+        - Sequencer?
+        - Read length?
+        - Paired or single end?
+        - Are the reads trimmed / filtered?
+        - Can you find and access (i.e. download) them?
+
+#### B. Design your *in silico* experiment
+
+- From step 3: Pick a small sample subset
+  - e.g. 3 replicates wildtype and 3 replicates mutant
+  - Write this down into a simple spreadsheet, e.g.
+
+    | file_name         | sample   | group  |
+    |:----------------- |:-------- |:------ |
+    | wt_rep1.fastq.gz  | WT_rep1  | WT     |
+    | wt_rep2.fastq.gz  | WT_rep2  | WT     |
+    | wt_rep3.fastq.gz  | WT_rep2  | WT     |
+    | mut_rep1.fastq.gz | mut_rep1 | mutant |
+    | mu_rep2.fastq.gz  | mut_rep2 | mutant |
+    | mu_rep3.fastq.gz  | mut_rep3 | mutant |
+
+- From step 4: Download the relevant data, i.e.:
+  - reference transcriptome or genome
+  - raw fastq files for your sample subset
+- From step 2: pick a simple plot, that you feel manageable to reproduce
+
+#### C. Run your analysis
+
+1. Put all data together into one directory (make it write-accessible by the docker container)
+2. Start the *rnaseq* docker container
+3. `Kallisto build` the index for the reference (section 6.4.1)
+4. Write a for loop that... (section 6.4.1)
+    - ...trims the fastq files (`trimmomatic`)
+    - ...runs `fastqc` before and after trimming
+    - ...maps the fastq reads against the reference (`kallisto quant`)
+5. Start your RStudio docker
+6. Write an R script to
+    - import the kallisto results (requires `library(sleuth)`)
+    - analyse the differential gene expression via `sleuth`
+    - plot the results using `ggplot`