From 789d48deb0caa9f63c369fc97ba4eb9cba05a110 Mon Sep 17 00:00:00 2001 From: alisandra <alisandra.denton@hhu.de> Date: Wed, 31 Aug 2022 15:29:02 +0200 Subject: [PATCH] adding draft take home instructions --- .gitignore | 3 + README.md | 4 + TakeHome.md | 92 +++++++++++++++++ workflows/singularity/Singularity.def | 143 ++++++++++++++++++++++++++ workflows/singularity/notes.md | 38 +++++++ 5 files changed, 280 insertions(+) create mode 100644 TakeHome.md create mode 100644 workflows/singularity/Singularity.def create mode 100644 workflows/singularity/notes.md diff --git a/.gitignore b/.gitignore index 916af75..0a3830b 100644 --- a/.gitignore +++ b/.gitignore @@ -389,3 +389,6 @@ runs/isoseq/polished/ *.aux RNAseqWorkshop.out RNAseqWorkshop.toc + +# singularity +*.sif diff --git a/README.md b/README.md index 35a2866..900fa64 100644 --- a/README.md +++ b/README.md @@ -45,3 +45,7 @@ If you want to build the docker images from scratch (rather than downloading fro - [workflows/maindocker](workflows/maindocker) - [workflows/rstudiodocker](workflows/rstudiodocker) - [workflows/userdocker](workflows/userdocker) + +### Take home + +Find three options for using the images from the course on your own computer [TakeHome.md](here). diff --git a/TakeHome.md b/TakeHome.md new file mode 100644 index 0000000..d4ba188 --- /dev/null +++ b/TakeHome.md @@ -0,0 +1,92 @@ +# Take Home + +You can access the images used in the course via +Sciebo: https://uni-duesseldorf.sciebo.de/s/53pA9W9TbOKTgGQ + +password provided via e-mail. + +All image files referred to below can be found on Sciebo. + +We provide three options in order of recommendation, +but take what works best for you. + +### Where's Rstudio? +The Rstudio Docker is not mentioned below, as +R & Rstudio are probably easier for you to install directly. +See: + +https://www.r-project.org/ + +https://www.rstudio.com/products/rstudio/download/ + +### Disclaimer +> These instructions have not actually been tested on a large +> variety of machines. The good thing is most issues will +> probably become generic Docker and/or Singularity issues that +> are very google-able. So first thing to do is always search +> the error message, this will be the fastest help. +> If it is not _enough_ help, you are welcome to contact us +> either via e-mail, or better yet by adding an issue to this +> repository :-) + +## Docker - with the user matching inside and outside the container +i.e. a better solution for permissions, which didn't work during +the course for reasons that are hard to summarize here and probably +not relevant on your machine. + + +1. Install Docker as appropriate for your machine: https://www.docker.com/ +2. Download pre-built image from course from sciebo `rnaseq_docker.tar.gz` +3. Load into Docker: `docker image load -i rnaseq_docker.tar.gz` +4. Build a slight modification to the image where the users match + - navigate to the directory `workflows/userdocker`, e.g. by using `cd` on linux + - run `docker build --build-arg USERID=$(id -u) -t rnaseqme --rm .`, this should take only a few seconds and ~3GB of hard drive space. +5. Run for the first time! + - for instance from the parent directory of where you have `rnaseq-workshop` downloaded + via `docker run -it --name rnalive --mount type=bind,source="$(pwd)"/rnaseq-workshop,target=/home/zim-gast/rnaseq-workshop rnaseqme:latest`. + - for your own data / files + - you will want to make sure they are all to be found (directly or better yet nested) within one directory + - change the `$(pwd)"/rnaseq-workshop` of the command above to point to the directory with _your_ files + - if something goes wrong and you have to e.g. try mounting again, you can remove the container with + `docker container rm rnalive` so that the `docker run` command can be repeated. + - if you want to have _multiple_ containers, e.g. to point to separate ARCs for + separate projects, you can also replace `rnalive` in the commands above to a descriptive name for _your_ project. +6. Resume with `docker start -i rnalive` + +> Note: if you did not configure a docker group during install, you may need +> to preface all commands above with `sudo`. + +## Singularity +A container option targeted more at convenience and less at security. +(was not available on host machines during the course). + +1. Install Singularity as appropriate for your machine (including via VM for Windows or Mac): https://docs.sylabs.io/guides/3.0/user-guide/installation.html +2. Download pre-build image from course from sciebo `rnaseq-workshop.sif` +3. Run! + - `singularity run rnaseq-workshop.sif`, all files within your home directory should be automatically accessible + - do you need some other directory of files? you can add them with `--bind` like this `singularity run --bind <your_directory>:/mnt/ rnaseq-workshop.sif` will + make the files available under `/mnt/` in the image. For instance, if I was working on the HHU HPC I might want to run + `singularity run --bind /gpfs/project/alden101/projectA:/mnt/ rnaseq-workshop.sif` to mount my folder 'projectA' in my large storage folder on the HPC. +4. Resuming is the exact same as running. + + +## Docker - with permissive permissions. +This is what we used during the course, and no, it is still not good +practice. But we're including it for completeness, after all, it's familiar. +And realistically, if you're on your own machine where no one else, or +only trusted colleagues have an account, and considering that we're hardly using +Docker to run a public facing web server, the risks could be worse. + +> Still, do it this way at your own risk. It doesn't take a malicious actor +> to accidentally delete important files. Have backups (actually always have backups, +> anyways, and completely regardless of Docker). + +1. Install Docker as appropriate for your machine: https://www.docker.com/ +2. Download pre-built image from course from sciebo `rnaseq_docker.tar.gz` +3. Load into Docker: `docker image load -i rnaseq_docker.tar.gz` +4. Set permissions on your data directory (here `rnaseq-workshop`) to be + and stay permissive: `setfacl TODO # can just use chmod 777 if you want to test right now` +5. Run `docker run -it --name rnalive --mount type=bind,source="$(pwd)"/rnaseq-workshop,target=/home/zim-gast/rnaseq-workshop rnaseq:latest` +6. All info on adjusting directories, retrying, resuming, and maybe needing `sudo` is the same as for the other Docker option above. + + diff --git a/workflows/singularity/Singularity.def b/workflows/singularity/Singularity.def new file mode 100644 index 0000000..50e9ab0 --- /dev/null +++ b/workflows/singularity/Singularity.def @@ -0,0 +1,143 @@ +Bootstrap: docker +From: ubuntu:latest +Stage: spython-base + +%files +python_installs.sh ./ +./first.sh /opt/ +%post +#FROM nvidia/cuda:11.2.0-cudnn8-runtime-ubuntu20.04 + +# Overide user name at build, if buil-arg no passed, will create user named `default` user +export DOCKER_USER=zim-gast + + +# Create a group and user +adduser $DOCKER_USER --no-create-home +mkdir /opt/$DOCKER_USER +# mv because $DOCKER_USER did not exist yet at file copy +mv /opt/first.sh /opt/$DOCKER_USER/ + +#RUN useradd --create-home --shell /bin/bash zim-gast +apt-get update -y +apt install python3-dev \ +python3-pip \ +git \ +libhdf5-dev \ +curl \ +wget \ +nano vim emacs -y +apt-get autoremove -y + +export DEBIAN_FRONTEND=noninteractive +export TZ=Europe/Berlin + +apt install tzdata libncurses5-dev zlib1g-dev libbz2-dev liblzma-dev cmake jellyfish python-tk libcurl4-openssl-dev libgit2-dev libssl-dev -y + +mkdir /opt/$DOCKER_USER/repos && \ +cd /opt/$DOCKER_USER/repos && \ +git clone https://github.com/alisandra/RNAseq_workshop_helpers.git && \ +mkdir /opt/$DOCKER_USER/bin && \ +find RNAseq_workshop_helpers . -maxdepth 2 -type f -executable|xargs -I% cp % /opt/$DOCKER_USER/bin/ + + +cd /opt/$DOCKER_USER/bin +wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/faToTwoBit && chmod +x faToTwoBit && \ +wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/blat/blat && chmod +x blat + + +# --- classic bioinf --- # +cd /opt/$DOCKER_USER/ +apt install hisat2 \ +bowtie2 \ +augustus \ +gffread \ +fastqc \ +salmon \ +samtools \ +minimap2 \ +mash \ +cd-hit tar bzip2 \ +libhdf5-dev m4 -y +# last ones are for kallisto + +# --- used to be conda, now binaries... --- # + +# for virtualenv intro +pip install HTSeq virtualenv +wget https://anaconda.org/bioconda/isoseq3/3.7.0/download/linux-64/isoseq3-3.7.0-h9ee0642_0.tar.bz2 && \ +tar xvf isoseq3-3.7.0-h9ee0642_0.tar.bz2 && \ +wget https://anaconda.org/bioconda/lima/2.6.0/download/linux-64/lima-2.6.0-h9ee0642_0.tar.bz2 && \ +tar xvf lima-2.6.0-h9ee0642_0.tar.bz2 && \ +wget https://anaconda.org/bioconda/pbccs/6.4.0/download/linux-64/pbccs-6.4.0-h9ee0642_0.tar.bz2 && \ +tar xvf pbccs-6.4.0-h9ee0642_0.tar.bz2 && \ +wget https://anaconda.org/bioconda/bax2bam/0.0.11/download/linux-64/bax2bam-0.0.11-0.tar.bz2 && \ +tar xvf bax2bam-0.0.11-0.tar.bz2 + +# kallisto +cd /opt/$DOCKER_USER/repos && \ +curl -O -L http://ftpmirror.gnu.org/autoconf/autoconf-2.69.tar.gz && \ +tar -xzf autoconf-2.69.tar.gz && cd /opt/$DOCKER_USER/repos/autoconf-2.69 && \ +./configure && make && make install && cd /opt/$DOCKER_USER/repos && \ +git clone https://github.com/pachterlab/kallisto.git && \ +mkdir kallisto/build && \ +cd /opt/$DOCKER_USER/repos/kallisto/build && \ +cmake -DCMAKE_INSTALL_PREFIX=/opt/$DOCKER_USER/ -DUSE_HDF5=ON .. && make && make install +# python +./python_installs.sh && rm python_installs.sh + + +# jars +mkdir /opt/$DOCKER_USER/sw && \ +cd /opt/$DOCKER_USER/sw && \ +wget http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.39.zip && \ +apt install unzip -y && \ +unzip Trimmomatic-0.39.zip && \ +rm Trimmomatic-0.39.zip + +# cleanup +cd /opt/$DOCKER_USER/ +rm *.bz2 && rm -r info + + +# shared folder +# rnaseq-workshop folder +wget https://github.com/git-lfs/git-lfs/releases/download/v3.2.0/git-lfs-linux-amd64-v3.2.0.tar.gz && \ +mv git-lfs-linux-amd64-v3.2.0.tar.gz sw/ && \ +cd /opt/$DOCKER_USER/sw/ && \ +tar xvf git-lfs-linux-amd64-v3.2.0.tar.gz && \ +cd /opt/$DOCKER_USER/sw/git-lfs-3.2.0/ && \ +./install.sh && \ +rm ../git-lfs-linux-amd64-v3.2.0.tar.gz +cd /opt/$DOCKER_USER/ + +#RUN git clone https://git.nfdi4plants.org/brilator/rnaseq-workshop.git + +mkdir /opt/$DOCKER_USER/rnaseq-workshop + +apt install gmap -y +rm -rf /var/lib/apt/lists/* + +# EXPOSE 8889 + +chown $DOCKER_USER:$DOCKER_USER /opt/$DOCKER_USER/first.sh + +cd /opt/$DOCKER_USER/repos/alisandra/cDNA_Cupcake && \ +pip install . + +su - $DOCKER_USER # USER $DOCKER_USER + +git lfs install +echo "alias gmap='/usr/bin/gmap'" >> .bashrc + + +%environment +DOCKER_USER=zim-gast +export TZ=Europe/Berlin +export PATH=/opt/$DOCKER_USER/.local/bin:${PATH} +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu/hdf5/serial/lib +export PATH="/opt/$DOCKER_USER/bin:${PATH}" +%runscript +exec /bin/bash "$@" +%startscript +exec /bin/bash "$@" diff --git a/workflows/singularity/notes.md b/workflows/singularity/notes.md new file mode 100644 index 0000000..402b30e --- /dev/null +++ b/workflows/singularity/notes.md @@ -0,0 +1,38 @@ +# Singularity from Docker + +While there's probably better ways, a 1:1 generation of the +Singularity image from Docker would not have entirely made sense. +So instead a singularity build file was autogenerated from the Dockerfile, +changes were applied, and a singularity image was subsequently built. + +## autogen +The file Singularity.def was initially created automatically from the 'maindocker/Dockerfile' via +spython, according to info found here: https://stackoverflow.com/questions/60314664/how-to-build-singularity-container-from-dockerfile +in the answer from Serge. + +Briefly, in a virtual environment: + +```bash +pip install spython +cd </path/to/maindocker> +spython recipe Dockerfile &> ../singularity/Singularity.def +``` +## tailor for singularity (and to actually build successfully) +Substantial modifications were necessary, e.g. +making variables TZ, DOCKER\_USER, and DEBIAN\_FRONTEND available during +and after build as necessary (changing to be in the right `%` block, and +adding `export`). + +Changing '/home' to '/opt', as singularity will mount the 'home' from the +host and having container content there can lead to trouble. + +Changing the entry command to simply be `/bin/bash` + +Remove recipe `cd` changes so that default working directory is $HOME + +Instead of an exact listing, please simply find the file Singularity.def, +included here with modifications. Run a diff to the automatically +generated one, if useful. + +## build +`sudo singularity build rnaseq-workshop.sif Singularity.def` -- GitLab