-
Dominik Brilhaus authoredDominik Brilhaus authored
Notepad around the workshop
this is part of the notes collected during the workshop
RStudio Keyboard Shortcuts
- Execute line / highlighted block of code:
strg (ctrl) + Enter
- Duplicate line / highlighted block:
strg (ctrl) + shift + d
- Delete line / highlighted block:
strg (ctrl) + d
- Interrupt R:
ESC
- (Un)Comment line / highlighted block:
ctrl (strg) + shift + c
Notes on codes
- You can start an interpreter from terminal / command line
- just type
python
, try some python commands, runquit()
to exit back to terminal. - just type
R
, try some R commands, runq()
to exit back to terminal.
- interpreter ~ environment ~ programming language
- Most (not all) pre-installed on your machine
- gazillion other languages and interpreters (perl, julia, fsharp, ...)
- You can write a script (in a simple text editor or IDE), store it and execute it from the command line
bash <nameOfScript>.sh
Rscript <nameOfScript>*.R
python <nameOfScript>*.py
- File extensions
- more for human than machine
- machine ~ default software to handle specific file types
- File <-> Software association is not "fixed"
- can add any extension, still works (try
bash <nameOfScript>.randomExtension
)
- IDEs (Integrated Development Environments)
- Multi-purpose: Visual Studio Code (+ extensions)
- Good for R: RStudio
- Good for Python: Pycharm
- specify the interpreter in the first line (for terminal executability), e.g.
- bash:
#! /bin/bash
- python:
#! /usr/bin/env python3
- r:
#!/usr/bin/env Rscript
once you make the script executable (
chmod +x <script>.sh
), you can execute it directly (i.e../<script>.sh
instead ofbash./<script>.sh
)
Challenge exercise Day 3
The pipeline we've shown you in the class was desigend to work (mostly smoothly) with the data, structure and parameters just as we've provided. Now, let's try to take this to the next level - i.e. transfer it to a real life challenge - by reproducing some RNA-Seq data from a published paper.
Tips:
- Along this adventure, you'll probably run into other important topics concerning good scientific practice (or bad examples of those). So don't be afraid, if it's harder than it should be.
- Consider this a challenge somewhere between peer-review, data reproducibility, positive controls (also for yourself ≈> is your pipeline correct?) and FAIR data management
- To make life easier, don't take the first best paper, but rather search for one where you can somewhat easily answer the questions in (A)
A. Find data
-
Find a paper from your research area of interest that used mRNA-Sequencing.
-
Within that paper, find a figure that plots gene expression / transcript abundance of any kind, e.g.
- bar / dot plots of gene expression
- heat maps
- ...
-
Identify the experimental design, e.g.
- What species was/were sequenced?
- How many replicates?
- Controls?
- Different genotypes, ecotypes, treatments, other conditions, ...
-
What RNASeq data was produced or re-used for analysis
- What reference was used?
- Transcriptome? Genome?
- Version?
- Can you find and access (i.e. download) it?
- What reads were produced (i.e. *.fastq files)
- Sequencer?
- Read length?
- Paired or single end?
- Are the reads trimmed / filtered?
- Can you find and access (i.e. download) them?
- What reference was used?
B. Design your in silico experiment
-
From step 3: Pick a small sample subset
-
e.g. 3 replicates wildtype and 3 replicates mutant
-
Write this down into a simple spreadsheet, e.g.
file_name sample group wt_rep1.fastq.gz WT_rep1 WT wt_rep2.fastq.gz WT_rep2 WT wt_rep3.fastq.gz WT_rep2 WT mut_rep1.fastq.gz mut_rep1 mutant mu_rep2.fastq.gz mut_rep2 mutant mu_rep3.fastq.gz mut_rep3 mutant
-
-
From step 4: Download the relevant data, i.e.:
- reference transcriptome or genome
- raw fastq files for your sample subset
-
From step 2: pick a simple plot, that you feel manageable to reproduce
C. Run your analysis
- Put all data together into one directory (make it write-accessible by the docker container)
- Start the rnaseq docker container
-
Kallisto build
the index for the reference (section 6.4.1) - Write a for loop that... (section 6.4.1)
- ...trims the fastq files (
trimmomatic
) - ...runs
fastqc
before and after trimming - ...maps the fastq reads against the reference (
kallisto quant
)
- ...trims the fastq files (
- Start your RStudio docker
- Write an R script to
- import the kallisto results (requires
library(sleuth)
) - analyse the differential gene expression via
sleuth
- plot the results using
ggplot
- import the kallisto results (requires