notepad.md



Notepad around the workshop

this is part of the notes collected during the workshop


RStudio Keyboard Shortcuts
Notes on codes

Challenge exercise Day 3

A. Find data
B. Design your in silico experiment
C. Run your analysis


RStudio Keyboard Shortcuts

Execute line / highlighted block of code: strg (ctrl) + Enter

Duplicate line / highlighted  block: strg (ctrl) + shift + d

Delete line / highlighted  block: strg (ctrl) + d

Interrupt R: ESC

(Un)Comment line / highlighted  block: ctrl (strg) + shift + c


Notes on codes

You can start an interpreter from terminal / command line


just type python, try some python commands, run quit() to exit back to terminal.
just type R, try some R commands, run q() to exit back to terminal.


interpreter ~ environment ~ programming language


Most (not all) pre-installed on your machine
gazillion other languages and interpreters (perl, julia, fsharp, ...)


You can write a script (in a simple text editor or IDE), store it and execute it from the command line


bash <nameOfScript>.sh
Rscript <nameOfScript>*.R
python <nameOfScript>*.py


File extensions


more for human than machine
machine ~ default software to handle specific file types
File <-> Software association is not "fixed"
can add any extension, still works (try bash <nameOfScript>.randomExtension)


IDEs (Integrated Development Environments)


Multi-purpose: Visual Studio Code (+ extensions)
Good for R: RStudio
Good for Python: Pycharm


specify the interpreter in the first line (for terminal executability), e.g.


bash: #! /bin/bash

python: #! /usr/bin/env python3

r: #!/usr/bin/env Rscript


once you make the script executable (chmod +x <script>.sh), you can execute it directly (i.e. ./<script>.sh instead of bash./<script>.sh)


Challenge exercise Day 3
The pipeline we've shown you in the class was desigend to work (mostly smoothly) with the data, structure and parameters just as we've provided.
Now, let's try to take this to the next level - i.e. transfer it to a real life challenge - by reproducing some RNA-Seq data from a published paper.

Tips:

Along this adventure, you'll probably run into other important topics concerning good scientific practice (or bad examples of those). So don't be afraid, if it's harder than it should be.
Consider this a challenge somewhere between peer-review, data reproducibility, positive controls (also for yourself ≈> is your pipeline correct?) and FAIR data management
To make life easier, don't take the first best paper, but rather search for one where you can somewhat easily answer the questions in (A)


A. Find data


Find a paper from your research area of interest that used mRNA-Sequencing.


Within that paper, find a figure that plots gene expression / transcript abundance of any kind, e.g.

bar / dot plots of gene expression
heat maps
...


Identify the experimental design, e.g.

What species was/were sequenced?
How many replicates?
Controls?
Different genotypes, ecotypes, treatments, other conditions, ...


What RNASeq data was produced or re-used for analysis

What reference was used?

Transcriptome? Genome?
Version?
Can you find and access (i.e. download) it?


What reads were produced (i.e. *.fastq files)

Sequencer?
Read length?
Paired or single end?
Are the reads trimmed / filtered?
Can you find and access (i.e. download) them?


B. Design your in silico experiment


From step 3: Pick a small sample subset


e.g. 3 replicates wildtype and 3 replicates mutant


Write this down into a simple spreadsheet, e.g.


file_name
sample
group


wt_rep1.fastq.gz
WT_rep1
WT


wt_rep2.fastq.gz
WT_rep2
WT


wt_rep3.fastq.gz
WT_rep2
WT


mut_rep1.fastq.gz
mut_rep1
mutant


mu_rep2.fastq.gz
mut_rep2
mutant


mu_rep3.fastq.gz
mut_rep3
mutant


From step 4: Download the relevant data, i.e.:

reference transcriptome or genome
raw fastq files for your sample subset


From step 2: pick a simple plot, that you feel manageable to reproduce


C. Run your analysis

Put all data together into one directory (make it write-accessible by the docker container)
Start the rnaseq docker container

Kallisto build the index for the reference (section 6.4.1)
Write a for loop that... (section 6.4.1)

...trims the fastq files (trimmomatic)
...runs fastqc before and after trimming
...maps the fastq reads against the reference (kallisto quant)


Start your RStudio docker
Write an R script to

import the kallisto results (requires library(sleuth))
analyse the differential gene expression via sleuth

plot the results using ggplot