diff --git a/_reader/README.md b/_reader/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..bb75081700801e1dc46338be303721ffe2b7b3a9
--- /dev/null
+++ b/_reader/README.md
@@ -0,0 +1,33 @@
+# RNAseq_workshop
+
+Protocol and other material for RNAseq workshop (Illumina + some long read).
+
+## why latex
+we now have both a) syntax highlighting and b) the ability to copy
+paste from code blocks and get valid commands.
+
+I'd give them a printed version only until they are through the linux section, 
+and after that they can have the pdf as well. 
+
+## to make the pdf (needs texlive, and probably texlive-extra-utils)
+pdflatex RNAseqWorkshop.tex
+
+### support files
+
+RNAseqWorkshop.tex imports everything in sections. So for instance,
+all of the content of the biological data extraction section is 
+organized in "sections/03_biological_data_extraction.tex". 
+
+As this section had
+a lot of code, it currently imbeds scripts from "R/". But that
+can definitely be changed if desired.
+
+## support scripts for the course
+The course uses a few scripts from us that should be put in the 
+$PATH for the .py scripts under "python/".
+
+The exception to this is the join_r_scripts.py in the main directory,
+which just creates "musings.R", which should be provided in the RNAdata
+directory the way the script is written (although since we can now
+copy paste from the pdf when latex/listings is used, 
+we have the theoretical option to drop this).
diff --git a/_reader/RNAseqWorkshop.pdf b/_reader/RNAseqWorkshop.pdf
index 544750320fdcf56222147c804cce8f585612ee67..fc61e85050fa86d9901987a8072d30b921d723df 100644
Binary files a/_reader/RNAseqWorkshop.pdf and b/_reader/RNAseqWorkshop.pdf differ
diff --git a/_reader/RNAseqWorkshop.tex b/_reader/RNAseqWorkshop.tex
new file mode 100644
index 0000000000000000000000000000000000000000..7540d79bc0d35501a7d034d02b6d9908f2035011
--- /dev/null
+++ b/_reader/RNAseqWorkshop.tex
@@ -0,0 +1,173 @@
+\documentclass[]{article}
+\usepackage{amssymb,amsmath}
+\usepackage[utf8]{inputenc}
+\usepackage[T1]{fontenc}%required
+\usepackage{textcomp}
+\usepackage{listings} % syntax highlighting
+% \usepackage{markdown}
+\usepackage{color}
+\usepackage{pxfonts}
+\usepackage{graphicx}
+\usepackage{wrapfig} % wrapping text around figures
+\graphicspath{{figures/}}
+\usepackage{booktabs} % tables
+\usepackage{hyperref} % for links
+% title page formatting
+%\usepackage{fancyhdr}
+\usepackage[bottom]{footmisc} % foot note
+%\usepackage{color} %Colorbox?
+% xcolor-hypersetup for making links not ugly
+\usepackage{xcolor}
+\hypersetup{ 
+    colorlinks,
+    linkcolor={red!50!black},
+    citecolor={blue!50!black},
+    urlcolor={blue!80!black}
+}
+\usepackage[parfill]{parskip} % spaces, not indentation
+\usepackage{geometry} % normal sized margins
+ \geometry{
+ a4paper,
+ total={160mm,237mm},
+ left=25mm,
+ top=35mm,
+ }
+
+
+% most from: http://latexcolor.com/
+\definecolor{mygray}{gray}{0.4}
+\definecolor{lightgray}{gray}{0.9}
+\definecolor{aliceblue}{rgb}{0.94, 0.97, 1.0}
+\definecolor{bananamania}{rgb}{0.98, 0.91, 0.71}
+\definecolor{cerulean}{rgb}{0.0, 0.48, 0.65}
+\definecolor{seashell}{rgb}{1.0, 0.96, 0.93}
+
+\lstdefinestyle{basestyle}{
+                showstringspaces=false,
+                columns=fullflexible,
+                literate={*}{\char42}1,% {-}{\char45}1,
+                upquote=true,
+                %literate={`}{\char18}1 {'}{\char13}1,  % \char18 = ` and \char13 = '
+                frame=
+}
+
+% setting up style sets
+\lstdefinestyle{bashstyle}{language=bash,
+                style=basestyle,
+                commentstyle=\color{mygray},
+                basicstyle=\ttfamily,
+                backgroundcolor=\color{lightgray},
+                keywordstyle=\color{cerulean}\bfseries,
+                stringstyle=\color{magenta}
+}
+ 
+
+\lstdefinestyle{Rstyle}{language=R,
+                style=basestyle,
+                commentstyle=\color{mygray},
+                basicstyle=\ttfamily,
+                backgroundcolor=\color{aliceblue},
+                keywordstyle=\bfseries,
+                stringstyle=\color{magenta},
+}
+\newcommand{\il}[1]{\colorbox{seashell}{\lstinline[columns=fixed]{#1}}}
+
+% title page prep
+\newcommand{\HRule}[1]{\rule{\linewidth}{#1}}
+
+\setcounter{tocdepth}{2} % table of content depth to just subsection
+%-------------------------------------------------------------------------------
+% HEADER & FOOTER
+%-------------------------------------------------------------------------------
+%\pagestyle{fancy}
+%\fancyhf{}
+%\setlength\headheight{15pt}
+%\fancyhead[L]{RNAseq Workshop}
+%\fancyfoot[R]{Page \thepage\ of \pageref{LastPage}}
+
+%\title{RNAseq Workshop 2022}
+
+\title{ \normalsize \textsc{Heinrich Heine University, Düsseldorf}
+                \\ [1.0cm]
+                \textsc{Cluster of Excellence on Plant Sciences}
+		\\ [4.0cm]
+		\HRule{0.5pt} \\
+		\LARGE \textbf{RNAseq Workshop 2022}
+		\HRule{2pt} \\ [0.5cm]
+		\normalsize \today \vspace*{5\baselineskip}
+                \\ [1.0cm]
+}
+
+\author{Alisandra Denton \\ Dominik Brilhaus}
+\date{}
+
+\begin{document}
+\maketitle
+
+\newpage
+
+\tableofcontents
+\section{License}
+
+This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License. To view a
+copy of this license, visit 
+
+\url{https://creativecommons.org/licenses/by-nc-sa/4.0/}
+
+In non-legal terminology: You are welcome to use, distribute,
+and modify this material for non- commercial purposes.
+
+\section{Acknowledgements}
+The authors would like to thank Juliane Schmid and the CEPLAS team
+for organizing.  
+
+This course builds directly or indirectly on a series of workshops that
+we've been involved with for over a decade. Particularly the 2018 iteration
+under the same name. Even though we do not have a list of contributors
+to the various workshops, we'd like to thank all of you all the same, 
+and acknowledge this wasn't put together in a vacuum.
+
+\section{Preface}
+
+Here we want to jump start your big data and RNAseq analyses. We dive
+head first into doing this on the command line in linux (via Docker).
+
+While there maybe more "user-friendly" options out there to complete
+the same tasks (google "galaxy rnaseq" if you're interested!), and there
+are certainly more precise and detailed deeper levels (e.g. algorithms
+and source code), we choose the command line for several reasons:
+
+\begin{itemize}
+\item it is customizable and flexible
+\item we work at a good level to gain a conceptual understanding 
+\item you might as well try the command line in a setting where you can
+   ask for help, you can always come back to galaxy \& co. later.
+\end{itemize}
+
+At this point, we will not strive for the most elegant, nor the most
+efficient code; but rather for simplicity and understandability for
+beginners. We may write ugly, redundant, copy-paste code; but it will
+get the job done and allow us to analyze our own data following a more
+or less standardized pipeline.
+
+There are many good resources out there, in so far as you find it 
+useful we recommend using some of them, such as \url{https://linuxsurvival.com/},
+to help cement what you have learned in the course. 
+We also provide a set of handouts that may be useful to you during
+or after the workshop in the ARC under \il{_handouts/}
+
+The Docker setup should also allow you to take home and practice
+the material anywhere you can get docker installed: \url{https://www.docker.com/}
+
+\include{sections/00_section_arc}
+\include{sections/01_section_linux}
+\include{sections/02_section_illumina}
+\include{sections/03_biological_data_extraction}
+\include{sections/04_section_longreads}
+
+\section{Questions and answers}
+
+\paragraph{Congratulations, you've made it through the workshop material!}
+\paragraph{Do you have questions on what we've covered?}
+\paragraph{Do you have questions on transferring the information here to your own data?}
+\end{document}
diff --git a/_reader/figures/align_n_quant.pdf b/_reader/figures/align_n_quant.pdf
new file mode 100644
index 0000000000000000000000000000000000000000..56ddf660d370ae600e1dc0e601f7c30a22d8a777
Binary files /dev/null and b/_reader/figures/align_n_quant.pdf differ
diff --git a/_reader/figures/align_n_quant.svg b/_reader/figures/align_n_quant.svg
new file mode 100644
index 0000000000000000000000000000000000000000..fa35e0bc90cafaf04adc519170effc5fc4bbf1fa
--- /dev/null
+++ b/_reader/figures/align_n_quant.svg
@@ -0,0 +1,692 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+   xmlns:dc="http://purl.org/dc/elements/1.1/"
+   xmlns:cc="http://creativecommons.org/ns#"
+   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+   xmlns:svg="http://www.w3.org/2000/svg"
+   xmlns="http://www.w3.org/2000/svg"
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+   width="187.95236mm"
+   height="132.43103mm"
+   viewBox="0 0 665.97293 469.2438"
+   id="svg2"
+   version="1.1"
+   inkscape:version="0.91 r13725"
+   sodipodi:docname="align_n_quant.svg">
+  <defs
+     id="defs4">
+    <marker
+       inkscape:isstock="true"
+       style="overflow:visible"
+       id="marker5238"
+       refX="0"
+       refY="0"
+       orient="auto"
+       inkscape:stockid="Arrow1Mend">
+      <path
+         transform="matrix(-0.4,0,0,-0.4,-4,0)"
+         style="fill:#ff0000;fill-opacity:1;fill-rule:evenodd;stroke:#ff0000;stroke-width:1pt;stroke-opacity:1"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         id="path5240"
+         inkscape:connector-curvature="0" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Mend"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Mend"
+       style="overflow:visible"
+       inkscape:isstock="true"
+       inkscape:collect="always">
+      <path
+         id="path4563"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1pt;stroke-opacity:1"
+         transform="matrix(-0.4,0,0,-0.4,-4,0)"
+         inkscape:connector-curvature="0" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Mstart"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Mstart"
+       style="overflow:visible"
+       inkscape:isstock="true">
+      <path
+         id="path4560"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1pt;stroke-opacity:1"
+         transform="matrix(0.4,0,0,0.4,4,0)"
+         inkscape:connector-curvature="0" />
+    </marker>
+  </defs>
+  <sodipodi:namedview
+     id="base"
+     pagecolor="#ffffff"
+     bordercolor="#666666"
+     borderopacity="1.0"
+     inkscape:pageopacity="0.0"
+     inkscape:pageshadow="2"
+     inkscape:zoom="0.98994949"
+     inkscape:cx="323.02001"
+     inkscape:cy="160.97131"
+     inkscape:document-units="px"
+     inkscape:current-layer="layer1"
+     showgrid="false"
+     showguides="true"
+     inkscape:guide-bbox="true"
+     fit-margin-top="1"
+     fit-margin-left="1"
+     fit-margin-right="1"
+     fit-margin-bottom="1"
+     inkscape:window-width="1855"
+     inkscape:window-height="1176"
+     inkscape:window-x="1985"
+     inkscape:window-y="24"
+     inkscape:window-maximized="1">
+    <sodipodi:guide
+       position="116.46858,220.82034"
+       orientation="1,0"
+       id="guide4158" />
+    <sodipodi:guide
+       position="3.0001897,572.49187"
+       orientation="1,0"
+       id="guide4166" />
+    <sodipodi:guide
+       position="216.47368,262.2366"
+       orientation="1,0"
+       id="guide4176" />
+    <sodipodi:guide
+       position="269.68046,206.81666"
+       orientation="1,0"
+       id="guide4182" />
+    <sodipodi:guide
+       position="584.84803,464.40556"
+       orientation="0,1"
+       id="guide5030" />
+    <sodipodi:guide
+       position="431.30484,3.7759992"
+       orientation="0,1"
+       id="guide5032" />
+  </sodipodi:namedview>
+  <metadata
+     id="metadata7">
+    <rdf:RDF>
+      <cc:Work
+         rdf:about="">
+        <dc:format>image/svg+xml</dc:format>
+        <dc:type
+           rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
+        <dc:title></dc:title>
+      </cc:Work>
+    </rdf:RDF>
+  </metadata>
+  <g
+     inkscape:label="Layer 1"
+     inkscape:groupmode="layer"
+     id="layer1"
+     transform="translate(173.03712,-119.0553)">
+    <ellipse
+       style="fill:#4d4d4d;fill-opacity:0.2871287;stroke:none;stroke-width:5;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="path4142"
+       cx="-122.03661"
+       cy="382.77521"
+       rx="47.457199"
+       ry="43.43401" />
+    <rect
+       style="fill:#b3b3b3;fill-opacity:1;stroke:none;stroke-width:5;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4136"
+       width="153.50325"
+       height="425.14413"
+       x="-44.451809"
+       y="159.37892" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="35.304344"
+       y="134.7782"
+       id="text4152"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4154"
+         x="37.692039"
+         y="134.7782"
+         style="text-align:center;text-anchor:middle">Reference sequence </tspan><tspan
+         sodipodi:role="line"
+         x="35.304344"
+         y="153.5282"
+         id="tspan4156"
+         style="text-align:center;text-anchor:middle">used</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="-120.38283"
+       y="371.60571"
+       id="text4144"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4146"
+         x="-120.38283"
+         y="371.60571"
+         style="text-align:center;text-anchor:middle">Illuimina</tspan><tspan
+         sodipodi:role="line"
+         x="-120.38283"
+         y="390.35571"
+         id="tspan4150"
+         style="text-align:center;text-anchor:middle">RNAseq</tspan><tspan
+         sodipodi:role="line"
+         x="-120.38283"
+         y="409.10571"
+         id="tspan4148"
+         style="text-align:center;text-anchor:middle">reads</tspan></text>
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="m -6.39223,278.58534 0,47.47717 49.82879,0 0,0 52.527932,1.01015"
+       id="path4180"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="ccccc" />
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="m 96.64334,191.47225 -53.20678,0 0,77.11309 53.20678,0"
+       id="path4186"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cccc" />
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="m -56.56854,372.71737 0,102.00879 50.17631,0 0,40.52794 103.03557,0"
+       id="path4194"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="ccccc" />
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="m -6.39223,474.72616 0,-35.47717 103.03557,0"
+       id="path4196"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="ccc" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="-35.355335"
+       y="180.38087"
+       id="text4198"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4200"
+         x="-35.355335"
+         y="180.38087">Transcriptome</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="-27.605433"
+       y="345.27939"
+       id="text4202"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4204"
+         x="-27.605433"
+         y="345.27939">Genome</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="-29.625746"
+       y="411.97485"
+       id="text4206"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4208"
+         x="-29.625746"
+         y="411.97485">Transcriptome</tspan><tspan
+         sodipodi:role="line"
+         x="-29.625746"
+         y="430.72485"
+         id="tspan4210">close relative</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="6.4079494"
+       y="543.47748"
+       id="text4212"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         x="6.4079494"
+         y="543.47748"
+         id="tspan4216"><tspan
+   id="tspan4538"
+   style="font-style:italic;text-align:center;text-anchor:middle">de novo</tspan></tspan><tspan
+         sodipodi:role="line"
+         x="6.4079494"
+         y="562.22748"
+         id="tspan4540">Assembly</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="86.541801"
+       y="182.40117"
+       id="text4218"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4220"
+         x="86.541801"
+         y="182.40117">1.</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="83.511345"
+       y="345.11694"
+       id="text4222"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4224"
+         x="83.511345"
+         y="345.11694">2.</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="84.521507"
+       y="433.18805"
+       id="text4226"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4228"
+         x="84.521507"
+         y="433.18805">3.</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="84.521515"
+       y="544.4267"
+       id="text4230"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4232"
+         x="84.521515"
+         y="544.4267">4.</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:100%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="167.81126"
+       y="182.38087"
+       id="text4234"
+       sodipodi:linespacing="100%"><tspan
+         sodipodi:role="line"
+         id="tspan4236"
+         x="170.19896"
+         y="182.38087"
+         style="line-height:100%;text-align:center;writing-mode:lr-tb;text-anchor:middle">&quot;nice&quot; </tspan><tspan
+         sodipodi:role="line"
+         x="167.81126"
+         y="197.38087"
+         style="line-height:100%;text-align:center;writing-mode:lr-tb;text-anchor:middle"
+         id="tspan4317">transcriptome</tspan><tspan
+         sodipodi:role="line"
+         x="167.81126"
+         y="212.38087"
+         id="tspan4238"
+         style="line-height:100%;text-align:center;writing-mode:lr-tb;text-anchor:middle">count k-mers</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:100%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="162.21339"
+       y="326.10309"
+       id="text4240"
+       sodipodi:linespacing="100%"><tspan
+         sodipodi:role="line"
+         id="tspan4242"
+         x="162.21339"
+         y="326.10309"
+         style="line-height:100%;text-align:center;writing-mode:lr-tb;text-anchor:middle">splice-aware</tspan><tspan
+         sodipodi:role="line"
+         x="164.60109"
+         y="341.10309"
+         id="tspan4244"
+         style="line-height:100%;text-align:center;writing-mode:lr-tb;text-anchor:middle">fast read </tspan><tspan
+         sodipodi:role="line"
+         x="162.21339"
+         y="356.10309"
+         style="line-height:100%;text-align:center;writing-mode:lr-tb;text-anchor:middle"
+         id="tspan4319">alignment</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:100%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="162.0696"
+       y="434.27942"
+       id="text4246"
+       sodipodi:linespacing="100%"><tspan
+         sodipodi:role="line"
+         id="tspan4248"
+         x="164.45729"
+         y="434.27942"
+         style="line-height:100%;text-align:center;writing-mode:lr-tb;text-anchor:middle">slower read </tspan><tspan
+         sodipodi:role="line"
+         x="162.0696"
+         y="449.27942"
+         id="tspan4321"
+         style="line-height:100%;text-align:center;writing-mode:lr-tb;text-anchor:middle">alignment</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:100%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="160.27658"
+       y="486.06116"
+       id="text4250"
+       sodipodi:linespacing="100%"><tspan
+         sodipodi:role="line"
+         id="tspan4252"
+         x="162.66428"
+         y="486.06116"
+         style="line-height:100%;text-align:center;writing-mode:lr-tb;text-anchor:middle">Assembly </tspan><tspan
+         sodipodi:role="line"
+         x="160.27658"
+         y="501.06116"
+         style="line-height:100%;text-align:center;writing-mode:lr-tb;text-anchor:middle"
+         id="tspan4323">followed</tspan><tspan
+         sodipodi:role="line"
+         x="162.66428"
+         y="516.06116"
+         id="tspan4254"
+         style="line-height:100%;text-align:center;writing-mode:lr-tb;text-anchor:middle">by fast read </tspan><tspan
+         sodipodi:role="line"
+         x="162.66428"
+         y="531.06116"
+         id="tspan4256"
+         style="line-height:100%;text-align:center;writing-mode:lr-tb;text-anchor:middle">alignment </tspan><tspan
+         sodipodi:role="line"
+         x="160.27658"
+         y="546.06116"
+         style="line-height:100%;text-align:center;writing-mode:lr-tb;text-anchor:middle"
+         id="tspan4331">(as above)</tspan></text>
+    <rect
+       style="fill:#999999;fill-opacity:1;stroke:none;stroke-width:5;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4136-3"
+       width="194.91951"
+       height="285.74313"
+       x="226.69382"
+       y="171.43575" />
+    <rect
+       style="fill:#aaccff;fill-opacity:1;stroke:none;stroke-width:4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4273"
+       width="103.03556"
+       height="24.243666"
+       x="225.66365"
+       y="147.64987" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="247.92761"
+       y="165.83263"
+       id="text4275"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4277"
+         x="247.92761"
+         y="165.83263">Position</tspan></text>
+    <rect
+       style="fill:#e9afaf;fill-opacity:1;stroke:none;stroke-width:4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4273-6"
+       width="96.974648"
+       height="24.243666"
+       x="323.46573"
+       y="147.64989" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="347.79062"
+       y="165.7514"
+       id="text4294"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4296"
+         x="347.79062"
+         y="165.7514">Quant.</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:100%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="153.49268"
+       y="268.58533"
+       id="text4325"
+       sodipodi:linespacing="100%"><tspan
+         sodipodi:role="line"
+         id="tspan4327"
+         x="155.88037"
+         y="268.58533"
+         style="line-height:100%;text-align:center;writing-mode:lr-tb;text-anchor:middle">fast read </tspan><tspan
+         sodipodi:role="line"
+         x="153.49268"
+         y="283.58533"
+         id="tspan4329"
+         style="line-height:100%;text-align:center;writing-mode:lr-tb;text-anchor:middle">alignment</tspan></text>
+    <rect
+       style="fill:#aaccff;fill-opacity:1;stroke:none;stroke-width:4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4371-3"
+       width="82.832504"
+       height="227.28433"
+       x="236.06468"
+       y="223.39357" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="257.27789"
+       y="438.11191"
+       id="text4355"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4357"
+         x="257.27789"
+         y="438.11191">BLAT</tspan></text>
+    <rect
+       style="fill:#ffdd55;fill-opacity:1;stroke:none;stroke-width:4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4273-6-6"
+       width="96.974648"
+       height="24.243666"
+       x="274.43018"
+       y="472.37585" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="289.58249"
+       y="491.56873"
+       id="text4359"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4361"
+         x="289.58249"
+         y="491.56873">Assembly</tspan></text>
+    <rect
+       style="fill:#d6d0ff;fill-opacity:1;stroke:none;stroke-width:4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4273-6-3"
+       width="170.71579"
+       height="24.243666"
+       x="237.57991"
+       y="182.48239" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="296.65353"
+       y="199.65498"
+       id="text4367"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4369"
+         x="296.65353"
+         y="199.65498">Kallisto</tspan></text>
+    <rect
+       style="fill:#e9afaf;fill-opacity:1;stroke:none;stroke-width:4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4371"
+       width="82.832504"
+       height="141.42136"
+       x="327.88705"
+       y="223.89865" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="370.61859"
+       y="329.96463"
+       id="text4349"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4351"
+         x="370.61859"
+         y="329.96463"
+         style="text-align:center;text-anchor:middle">HTSeq</tspan><tspan
+         sodipodi:role="line"
+         x="370.61859"
+         y="348.71463"
+         id="tspan4353"
+         style="text-align:center;text-anchor:middle">StringTie</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="278.64844"
+       y="267.33524"
+       id="text4333"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4335"
+         x="278.64844"
+         y="267.33524"
+         style="text-align:center;text-anchor:middle">bowtie2</tspan><tspan
+         sodipodi:role="line"
+         x="278.64844"
+         y="286.08524"
+         id="tspan4337"
+         style="text-align:center;text-anchor:middle">BWA</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="277.08066"
+       y="328.9545"
+       id="text4343"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4345"
+         x="277.08066"
+         y="328.9545"
+         style="text-align:center;text-anchor:middle">Hisat2</tspan><tspan
+         sodipodi:role="line"
+         x="277.08066"
+         y="347.7045"
+         id="tspan4347"
+         style="text-align:center;text-anchor:middle">STAR</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="342.09012"
+       y="271.37582"
+       id="text4339"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4341"
+         x="342.09012"
+         y="271.37582">eXpress</tspan></text>
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="m -74.072444,384.71737 17.503904,0 0,-106.13203 50.17631,0 0,-49.63592 49.82879,0"
+       id="path4178"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cccccc" />
+    <rect
+       style="fill:#999999;fill-opacity:1;stroke:none;stroke-width:5;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4136-3-8"
+       width="193.90936"
+       height="53.668118"
+       x="226.69382"
+       y="-550.36475"
+       transform="scale(1,-1)" />
+    <rect
+       style="fill:#ffdd55;fill-opacity:1;stroke:none;stroke-width:4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4273-6-6-0"
+       width="174.75639"
+       height="25.253807"
+       x="236.00375"
+       y="509.89368" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="297.62314"
+       y="527.08655"
+       id="text4363"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4365"
+         x="297.62314"
+         y="527.08655">Trinity</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="253.21698"
+       y="133.99507"
+       id="text4448"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4450"
+         x="253.21698"
+         y="133.99507">Example software</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="435.97336"
+       y="135.99507"
+       id="text4454"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4456"
+         x="435.97336"
+         y="135.99507"
+         style="font-weight:bold">Easier</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="431.00381"
+       y="584.5434"
+       id="text4458"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4460"
+         x="431.00381"
+         y="584.5434"
+         style="font-weight:bold">Harder</tspan></text>
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:5.97104597;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#Arrow1Mstart);marker-end:url(#Arrow1Mend)"
+       d="m 459.88338,159.14944 0,395.16854"
+       id="path4462"
+       inkscape:connector-curvature="0" />
+    <g
+       id="g4546"
+       transform="translate(299.00515,16.162441)">
+      <ellipse
+         ry="17.172594"
+         rx="16.667517"
+         cy="500.68045"
+         cx="-257.41513"
+         id="path4542"
+         style="fill:none;fill-opacity:1;stroke:#ff0000;stroke-width:4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <path
+         inkscape:connector-curvature="0"
+         id="path4544"
+         d="m -269.03188,512.80228 22.22335,-22.22335"
+         style="fill:none;fill-rule:evenodd;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
+    </g>
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#ff0000;stroke-width:2;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#marker5238)"
+       d="m 52.182746,543.10686 c 12.365497,31.31473 18.408531,31.31473 43.450442,29.29443"
+       id="path5226"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="117.81595"
+       y="578.50281"
+       id="text5228"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan5230"
+         x="117.81595"
+         y="578.50281"
+         style="fill:#ff0000">Get long reads.</tspan></text>
+  </g>
+</svg>
diff --git a/_reader/figures/dun_merge.pdf b/_reader/figures/dun_merge.pdf
new file mode 100644
index 0000000000000000000000000000000000000000..be634624679c5eb41f5f1a501cdf06722b5041d0
Binary files /dev/null and b/_reader/figures/dun_merge.pdf differ
diff --git a/_reader/figures/dun_merge.svg b/_reader/figures/dun_merge.svg
new file mode 100644
index 0000000000000000000000000000000000000000..aee12711ab0d8cd7b51b9f9b93b729b871de4859
--- /dev/null
+++ b/_reader/figures/dun_merge.svg
@@ -0,0 +1,642 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+   xmlns:dc="http://purl.org/dc/elements/1.1/"
+   xmlns:cc="http://creativecommons.org/ns#"
+   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+   xmlns:svg="http://www.w3.org/2000/svg"
+   xmlns="http://www.w3.org/2000/svg"
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+   width="144.60693mm"
+   height="92.354301mm"
+   viewBox="0 0 512.38677 327.23965"
+   id="svg2"
+   version="1.1"
+   inkscape:version="0.91 r13725"
+   sodipodi:docname="dun_merge.svg">
+  <defs
+     id="defs4" />
+  <sodipodi:namedview
+     id="base"
+     pagecolor="#ffffff"
+     bordercolor="#666666"
+     borderopacity="1.0"
+     inkscape:pageopacity="0.0"
+     inkscape:pageshadow="2"
+     inkscape:zoom="1.4"
+     inkscape:cx="285.01794"
+     inkscape:cy="141.65781"
+     inkscape:document-units="px"
+     inkscape:current-layer="g4396-8"
+     showgrid="false"
+     showguides="true"
+     inkscape:guide-bbox="true"
+     inkscape:window-width="1855"
+     inkscape:window-height="1176"
+     inkscape:window-x="1985"
+     inkscape:window-y="24"
+     inkscape:window-maximized="1"
+     fit-margin-top="0"
+     fit-margin-left="0"
+     fit-margin-right="0"
+     fit-margin-bottom="0" />
+  <metadata
+     id="metadata7">
+    <rdf:RDF>
+      <cc:Work
+         rdf:about="">
+        <dc:format>image/svg+xml</dc:format>
+        <dc:type
+           rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
+        <dc:title></dc:title>
+      </cc:Work>
+    </rdf:RDF>
+  </metadata>
+  <g
+     inkscape:label="Layer 1"
+     inkscape:groupmode="layer"
+     id="layer1"
+     transform="translate(-92.954338,-150.47261)">
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="98.994949"
+       y="171.00412"
+       id="text4252"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4254"
+         x="98.994949"
+         y="171.00412">5'</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="494.46964"
+       y="171.97366"
+       id="text4256"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4258"
+         x="494.46964"
+         y="171.97366">3'</tspan></text>
+    <g
+       id="g4396">
+      <rect
+         y="185.61624"
+         x="107.54617"
+         height="0.57523108"
+         width="373.3215"
+         id="rect4349"
+         style="fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:2.91969275;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="209.54118"
+         x="214.54729"
+         height="0.72535276"
+         width="266.39545"
+         id="rect4349-4"
+         style="fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:2.76957107;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="231.40164"
+         x="325.52454"
+         height="1.0044436"
+         width="155.55775"
+         id="rect4349-4-8"
+         style="fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:2.49048018;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="253.57932"
+         x="166.60318"
+         height="0.64910936"
+         width="314.30145"
+         id="rect4349-4-8-8"
+         style="fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:2.84581447;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <g
+         id="g4260">
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136"
+           width="147.48227"
+           height="10.101525"
+           x="94.954338"
+           y="181.61072" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:2.51830149;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4"
+           width="50.979172"
+           height="11.583224"
+           x="298.2135"
+           y="180.86987" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-3"
+           width="78.725777"
+           height="11.045557"
+           x="420.48233"
+           y="181.1387" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:1.32415295;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-3-7"
+           width="12.777371"
+           height="12.777372"
+           x="381.61642"
+           y="180.2728" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:2.10028768;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-5"
+           width="34.22459"
+           height="12.001238"
+           x="209.16187"
+           y="204.66086" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:2.51830149;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-9"
+           width="50.979172"
+           height="11.583224"
+           x="298.2135"
+           y="204.86987" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-3-6"
+           width="78.725777"
+           height="11.045557"
+           x="420.48233"
+           y="205.1387" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:1.32415295;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-3-7-2"
+           width="12.777371"
+           height="12.777372"
+           x="381.61642"
+           y="204.2728" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:2.20774055;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-9-7"
+           width="38.157749"
+           height="11.893785"
+           x="311.19022"
+           y="226.7146" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-3-6-8"
+           width="78.725777"
+           height="11.045557"
+           x="420.48233"
+           y="227.1387" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:1.32415295;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-3-7-2-5"
+           width="12.777371"
+           height="12.777372"
+           x="381.61642"
+           y="226.2728" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:3.51366901;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-7"
+           width="108.57265"
+           height="10.587856"
+           x="134.10712"
+           y="249.36755" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:2.51830149;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-4"
+           width="50.979172"
+           height="11.583224"
+           x="298.2135"
+           y="248.86987" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-3-1"
+           width="78.725777"
+           height="11.045557"
+           x="420.48233"
+           y="249.1387" />
+      </g>
+      <rect
+         y="181.61072"
+         x="94.954338"
+         height="10.101525"
+         width="147.48227"
+         id="rect4136-59"
+         style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="180.86987"
+         x="298.2135"
+         height="11.583224"
+         width="50.979172"
+         id="rect4136-4-7"
+         style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:2.51830149;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="181.1387"
+         x="420.48233"
+         height="11.045557"
+         width="78.725777"
+         id="rect4136-4-3-5"
+         style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="180.2728"
+         x="381.61642"
+         height="12.777372"
+         width="12.777371"
+         id="rect4136-4-3-7-3"
+         style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:1.32415295;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="204.66086"
+         x="209.16187"
+         height="12.001238"
+         width="34.22459"
+         id="rect4136-5-8"
+         style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:2.10028768;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="204.86987"
+         x="298.2135"
+         height="11.583224"
+         width="50.979172"
+         id="rect4136-4-9-8"
+         style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:2.51830149;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="205.1387"
+         x="420.48233"
+         height="11.045557"
+         width="78.725777"
+         id="rect4136-4-3-6-3"
+         style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="204.2728"
+         x="381.61642"
+         height="12.777372"
+         width="12.777371"
+         id="rect4136-4-3-7-2-1"
+         style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:1.32415295;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="226.7146"
+         x="311.19022"
+         height="11.893785"
+         width="38.157749"
+         id="rect4136-4-9-7-8"
+         style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:2.20774055;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="227.1387"
+         x="420.48233"
+         height="11.045557"
+         width="78.725777"
+         id="rect4136-4-3-6-8-9"
+         style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="226.2728"
+         x="381.61642"
+         height="12.777372"
+         width="12.777371"
+         id="rect4136-4-3-7-2-5-6"
+         style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:1.32415295;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="249.36755"
+         x="134.10712"
+         height="10.587856"
+         width="108.57265"
+         id="rect4136-7-4"
+         style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:3.51366901;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="248.86987"
+         x="298.2135"
+         height="11.583224"
+         width="50.979172"
+         id="rect4136-4-4-3"
+         style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:2.51830149;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="249.1387"
+         x="420.48233"
+         height="11.045557"
+         width="78.725777"
+         id="rect4136-4-3-1-3"
+         style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+    </g>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="297.99503"
+       y="161.40767"
+       id="text4431"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4433"
+         x="297.99503"
+         y="161.40767">Input/sam</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:36.84271622px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="737.42059"
+       y="156.09688"
+       id="text4441"
+       sodipodi:linespacing="125%"
+       transform="scale(0.69109067,1.4469881)"><tspan
+         sodipodi:role="line"
+         id="tspan4443"
+         x="737.42059"
+         y="156.09688">}</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="561.64484"
+       y="209.89497"
+       id="text4445"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4447"
+         x="564.03253"
+         y="209.89497">Different </tspan><tspan
+         sodipodi:role="line"
+         x="561.64484"
+         y="228.64497"
+         id="tspan4449">Transcripts?</tspan></text>
+    <g
+       transform="translate(-7.3730462e-8,144)"
+       id="g4396-6">
+      <rect
+         y="185.61624"
+         x="107.54617"
+         height="0.57523108"
+         width="373.3215"
+         id="rect4349-43"
+         style="fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:2.91969275;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="207.57932"
+         x="166.60318"
+         height="0.64910936"
+         width="314.30145"
+         id="rect4349-4-8-8-0"
+         style="fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:2.84581447;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <g
+         id="g4260-9">
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-2"
+           width="147.48227"
+           height="10.101525"
+           x="94.954338"
+           y="181.61072" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:2.51830149;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-5"
+           width="50.979172"
+           height="11.583224"
+           x="298.2135"
+           y="180.86987" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-3-4"
+           width="78.725777"
+           height="11.045557"
+           x="420.48233"
+           y="181.1387" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:1.32415295;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-3-7-0"
+           width="12.777371"
+           height="12.777372"
+           x="381.61642"
+           y="180.2728" />
+      </g>
+      <rect
+         y="181.61072"
+         x="94.954338"
+         height="10.101525"
+         width="147.48227"
+         id="rect4136-59-4"
+         style="fill:#008080;fill-opacity:1;stroke:#008080;stroke-width:4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="180.86987"
+         x="298.2135"
+         height="11.583224"
+         width="50.979172"
+         id="rect4136-4-7-8"
+         style="fill:#008080;fill-opacity:1;stroke:#008080;stroke-width:2.51830149;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="181.1387"
+         x="420.48233"
+         height="11.045557"
+         width="78.725777"
+         id="rect4136-4-3-5-1"
+         style="fill:#008080;fill-opacity:1;stroke:#008080;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="180.2728"
+         x="381.61642"
+         height="12.777372"
+         width="12.777371"
+         id="rect4136-4-3-7-3-2"
+         style="fill:#008080;fill-opacity:1;stroke:#008080;stroke-width:1.32415295;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="203.36755"
+         x="134.10712"
+         height="10.587856"
+         width="108.57265"
+         id="rect4136-7-4-1"
+         style="fill:#008080;fill-opacity:1;stroke:#008080;stroke-width:3.51366901;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="202.86987"
+         x="298.2135"
+         height="11.583224"
+         width="50.979172"
+         id="rect4136-4-4-3-0"
+         style="fill:#008080;fill-opacity:1;stroke:#008080;stroke-width:2.51830149;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="203.1387"
+         x="420.48233"
+         height="11.045557"
+         width="78.725777"
+         id="rect4136-4-3-1-3-5"
+         style="fill:#008080;fill-opacity:1;stroke:#008080;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+    </g>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="297.99503"
+       y="305.40765"
+       id="text4431-1"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4433-1"
+         x="297.99503"
+         y="305.40765">Default collapse</tspan></text>
+    <g
+       transform="translate(-7.3730462e-8,236)"
+       id="g4396-8">
+      <rect
+         y="189.61624"
+         x="107.54617"
+         height="0.57523108"
+         width="373.3215"
+         id="rect4349-6"
+         style="fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:2.91969275;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="211.40164"
+         x="325.52454"
+         height="1.0044436"
+         width="155.55775"
+         id="rect4349-4-8-84"
+         style="fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:2.49048018;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="233.57932"
+         x="166.60318"
+         height="0.64910936"
+         width="314.30145"
+         id="rect4349-4-8-8-7"
+         style="fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:2.84581447;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <g
+         id="g4260-2"
+         transform="translate(0,4)">
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-40"
+           width="147.48227"
+           height="10.101525"
+           x="94.954338"
+           y="181.61072" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:2.51830149;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-6"
+           width="50.979172"
+           height="11.583224"
+           x="298.2135"
+           y="180.86987" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-3-2"
+           width="78.725777"
+           height="11.045557"
+           x="420.48233"
+           y="181.1387" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:1.32415295;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-3-7-9"
+           width="12.777371"
+           height="12.777372"
+           x="381.61642"
+           y="180.2728" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:2.20774055;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-9-7-1"
+           width="38.157749"
+           height="11.893785"
+           x="311.19022"
+           y="202.7146" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-3-6-8-1"
+           width="78.725777"
+           height="11.045557"
+           x="420.48233"
+           y="203.1387" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:1.32415295;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-3-7-2-5-0"
+           width="12.777371"
+           height="12.777372"
+           x="381.61642"
+           y="202.2728" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:3.51366901;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-7-3"
+           width="108.57265"
+           height="10.587856"
+           x="134.10712"
+           y="225.36755" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:2.51830149;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-4-4"
+           width="50.979172"
+           height="11.583224"
+           x="298.2135"
+           y="224.86987" />
+        <rect
+           style="fill:#aca793;fill-opacity:1;stroke:#aca793;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4136-4-3-1-0"
+           width="78.725777"
+           height="11.045557"
+           x="420.48233"
+           y="225.1387" />
+      </g>
+      <rect
+         y="185.61072"
+         x="94.954338"
+         height="10.101525"
+         width="147.48227"
+         id="rect4136-59-3"
+         style="fill:#67536c;fill-opacity:1;stroke:#67536c;stroke-width:4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="184.86987"
+         x="298.2135"
+         height="11.583224"
+         width="50.979172"
+         id="rect4136-4-7-9"
+         style="fill:#67536c;fill-opacity:1;stroke:#67536c;stroke-width:2.51830149;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="185.1387"
+         x="420.48233"
+         height="11.045557"
+         width="78.725777"
+         id="rect4136-4-3-5-19"
+         style="fill:#67536c;fill-opacity:1;stroke:#67536c;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="184.2728"
+         x="381.61642"
+         height="12.777372"
+         width="12.777371"
+         id="rect4136-4-3-7-3-6"
+         style="fill:#67536c;fill-opacity:1;stroke:#67536c;stroke-width:1.32415295;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="206.7146"
+         x="311.19022"
+         height="11.893785"
+         width="38.157749"
+         id="rect4136-4-9-7-8-0"
+         style="fill:#67536c;fill-opacity:1;stroke:#67536c;stroke-width:2.20774055;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="207.1387"
+         x="420.48233"
+         height="11.045557"
+         width="78.725777"
+         id="rect4136-4-3-6-8-9-5"
+         style="fill:#67536c;fill-opacity:1;stroke:#67536c;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="206.2728"
+         x="381.61642"
+         height="12.777372"
+         width="12.777371"
+         id="rect4136-4-3-7-2-5-6-6"
+         style="fill:#67536c;fill-opacity:1;stroke:#67536c;stroke-width:1.32415295;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="229.36755"
+         x="134.10712"
+         height="10.587856"
+         width="108.57265"
+         id="rect4136-7-4-6"
+         style="fill:#67536c;fill-opacity:1;stroke:#67536c;stroke-width:3.51366901;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="228.86987"
+         x="298.2135"
+         height="11.583224"
+         width="50.979172"
+         id="rect4136-4-4-3-4"
+         style="fill:#67536c;fill-opacity:1;stroke:#67536c;stroke-width:2.51830149;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+      <rect
+         y="229.1387"
+         x="420.48233"
+         height="11.045557"
+         width="78.725777"
+         id="rect4136-4-3-1-3-0"
+         style="fill:#67536c;fill-opacity:1;stroke:#67536c;stroke-width:3.05596852;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+    </g>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="297.99503"
+       y="401.40765"
+       id="text4431-0"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4433-4"
+         x="297.99503"
+         y="401.40765">--dun-merge-5-shorter collapse</tspan></text>
+  </g>
+</svg>
diff --git a/_reader/figures/longread_workflow.pdf b/_reader/figures/longread_workflow.pdf
new file mode 100644
index 0000000000000000000000000000000000000000..7ad50902a9de876e6b95de4be83584b4cdbc18a3
Binary files /dev/null and b/_reader/figures/longread_workflow.pdf differ
diff --git a/_reader/figures/longread_workflow.svg b/_reader/figures/longread_workflow.svg
new file mode 100644
index 0000000000000000000000000000000000000000..bea7de1c9f01d77b9192bab180f9da3166124d14
--- /dev/null
+++ b/_reader/figures/longread_workflow.svg
@@ -0,0 +1,847 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+   xmlns:dc="http://purl.org/dc/elements/1.1/"
+   xmlns:cc="http://creativecommons.org/ns#"
+   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+   xmlns:svg="http://www.w3.org/2000/svg"
+   xmlns="http://www.w3.org/2000/svg"
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+   width="130.59871mm"
+   height="165.21391mm"
+   viewBox="0 0 462.75134 585.40362"
+   id="svg2"
+   version="1.1"
+   inkscape:version="0.91 r13725"
+   sodipodi:docname="longread_workflow.svg">
+  <defs
+     id="defs4">
+    <marker
+       inkscape:stockid="Arrow1Send"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Send"
+       style="overflow:visible"
+       inkscape:isstock="true">
+      <path
+         id="path4409"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         style="fill:#aca793;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:1pt;stroke-opacity:1"
+         transform="matrix(-0.2,0,0,-0.2,-1.2,0)"
+         inkscape:connector-curvature="0" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Send"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Send-8"
+       style="overflow:visible"
+       inkscape:isstock="true">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4409-5"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         style="fill:#aca793;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:1pt;stroke-opacity:1"
+         transform="matrix(-0.2,0,0,-0.2,-1.2,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Send"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Send-8-7"
+       style="overflow:visible"
+       inkscape:isstock="true">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4409-5-5"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         style="fill:#aca793;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:1pt;stroke-opacity:1"
+         transform="matrix(-0.2,0,0,-0.2,-1.2,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Send"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Send-8-7-8"
+       style="overflow:visible"
+       inkscape:isstock="true">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4409-5-5-8"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         style="fill:#aca793;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:1pt;stroke-opacity:1"
+         transform="matrix(-0.2,0,0,-0.2,-1.2,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Send"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Send-8-7-8-3"
+       style="overflow:visible"
+       inkscape:isstock="true">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4409-5-5-8-3"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         style="fill:#aca793;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:1pt;stroke-opacity:1"
+         transform="matrix(-0.2,0,0,-0.2,-1.2,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Send"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Send-8-8"
+       style="overflow:visible"
+       inkscape:isstock="true">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4409-5-6"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         style="fill:#aca793;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:1pt;stroke-opacity:1"
+         transform="matrix(-0.2,0,0,-0.2,-1.2,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Send"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Send-8-4"
+       style="overflow:visible"
+       inkscape:isstock="true">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4409-5-8"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         style="fill:#aca793;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:1pt;stroke-opacity:1"
+         transform="matrix(-0.2,0,0,-0.2,-1.2,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Send"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Send-8-89"
+       style="overflow:visible"
+       inkscape:isstock="true">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4409-5-7"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         style="fill:#aca793;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:1pt;stroke-opacity:1"
+         transform="matrix(-0.2,0,0,-0.2,-1.2,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Send"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Send-8-6"
+       style="overflow:visible"
+       inkscape:isstock="true">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4409-5-4"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         style="fill:#aca793;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:1pt;stroke-opacity:1"
+         transform="matrix(-0.2,0,0,-0.2,-1.2,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Send"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Send-8-3"
+       style="overflow:visible"
+       inkscape:isstock="true">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4409-5-0"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         style="fill:#aca793;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:1pt;stroke-opacity:1"
+         transform="matrix(-0.2,0,0,-0.2,-1.2,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Send"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Send-8-7-8-2"
+       style="overflow:visible"
+       inkscape:isstock="true">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4409-5-5-8-5"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         style="fill:#aca793;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:1pt;stroke-opacity:1"
+         transform="matrix(-0.2,0,0,-0.2,-1.2,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Send"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Send-8-7-8-0"
+       style="overflow:visible"
+       inkscape:isstock="true">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4409-5-5-8-59"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         style="fill:#aca793;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:1pt;stroke-opacity:1"
+         transform="matrix(-0.2,0,0,-0.2,-1.2,0)" />
+    </marker>
+    <marker
+       inkscape:stockid="Arrow1Send"
+       orient="auto"
+       refY="0"
+       refX="0"
+       id="Arrow1Send-8-7-8-0-9"
+       style="overflow:visible"
+       inkscape:isstock="true">
+      <path
+         inkscape:connector-curvature="0"
+         id="path4409-5-5-8-59-2"
+         d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
+         style="fill:#aca793;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:1pt;stroke-opacity:1"
+         transform="matrix(-0.2,0,0,-0.2,-1.2,0)" />
+    </marker>
+  </defs>
+  <sodipodi:namedview
+     id="base"
+     pagecolor="#ffffff"
+     bordercolor="#666666"
+     borderopacity="1.0"
+     inkscape:pageopacity="0.0"
+     inkscape:pageshadow="2"
+     inkscape:zoom="0.98994949"
+     inkscape:cx="241.19547"
+     inkscape:cy="109.05528"
+     inkscape:document-units="px"
+     inkscape:current-layer="layer1"
+     showgrid="false"
+     inkscape:window-width="1855"
+     inkscape:window-height="1176"
+     inkscape:window-x="1985"
+     inkscape:window-y="24"
+     inkscape:window-maximized="1"
+     showguides="true"
+     inkscape:guide-bbox="true"
+     fit-margin-top="0"
+     fit-margin-left="0"
+     fit-margin-right="0"
+     fit-margin-bottom="0" />
+  <metadata
+     id="metadata7">
+    <rdf:RDF>
+      <cc:Work
+         rdf:about="">
+        <dc:format>image/svg+xml</dc:format>
+        <dc:type
+           rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
+        <dc:title></dc:title>
+      </cc:Work>
+    </rdf:RDF>
+  </metadata>
+  <g
+     inkscape:label="Layer 1"
+     inkscape:groupmode="layer"
+     id="layer1"
+     transform="translate(65.761436,-15.905114)">
+    <path
+       style="fill:none;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:4.00000048;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="m 82.62947,599.30874 -0.556924,-53.3915 78.482974,-20.34963 75.59684,20.34963 -2,53.3915 z"
+       id="path10920"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cccccc" />
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#aca793;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Send-8-7-8-2)"
+       d="m 192.03051,301.84438 79.28422,53.02025 43.95439,70.21836"
+       id="path4388-9-3-3-4"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="ccc" />
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#aca793;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Send-8-3)"
+       d="m 317.33003,457.38757 0,23.23351"
+       id="path4388-9-9"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#aca793;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Send-8-6)"
+       d="m 28.08122,458.39772 0,23.23351"
+       id="path4388-9-30"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#aca793;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Send-8-89)"
+       d="m 128.55333,457.40787 0,23.23351"
+       id="path4388-9-7"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#aca793;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Send-8-8)"
+       d="m -45.91878,395.78857 0,23.23351"
+       id="path4388-9-0"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#aca793;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Send-8-4)"
+       d="m 195.56349,394.77842 0,23.23351"
+       id="path4388-9-8"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#aca793;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Send-8-7-8-3)"
+       d="m 13.91878,247.29614 0,112.12693"
+       id="path4388-9-3-3-3"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#aca793;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Send-8-7-8)"
+       d="m 140.67517,296.83423 0,63.63961"
+       id="path4388-9-3-3"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#aca793;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Send-8-7)"
+       d="m 162.65486,242.18447 0,23.23351"
+       id="path4388-9-3"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <rect
+       style="fill:#cdde87;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4200-6-2-5-7"
+       width="119.198"
+       height="27.274122"
+       x="-56.690384"
+       y="491.22244" />
+    <rect
+       style="fill:#cdde87;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4200-6-2-5-7-4"
+       width="104.04572"
+       height="27.274122"
+       x="105.15738"
+       y="490.21231" />
+    <rect
+       style="fill:#cdde87;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4200-6-2-5-7-1"
+       width="55.558392"
+       height="27.274122"
+       x="289.99503"
+       y="490.2326" />
+    <rect
+       style="fill:#ffe680;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4200-6-2"
+       width="117.17769"
+       height="27.274122"
+       x="-57.700523"
+       y="431.62344" />
+    <rect
+       style="fill:#ffe680;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4200-6-2-8"
+       width="108.08631"
+       height="27.274122"
+       x="106.16754"
+       y="431.62347" />
+    <rect
+       style="fill:#ffe680;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4200-6-2-5"
+       width="141.42136"
+       height="27.274122"
+       x="244.55844"
+       y="431.62347" />
+    <rect
+       style="fill:#ffe680;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4200-6-9"
+       width="101.01526"
+       height="27.274122"
+       x="104.14725"
+       y="367.98386" />
+    <rect
+       style="fill:#ffccaa;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4200-6-6"
+       width="117.17769"
+       height="27.274122"
+       x="-56.670067"
+       y="368.68942" />
+    <rect
+       style="fill:#ffccaa;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4200-6-1"
+       width="131.31982"
+       height="27.274122"
+       x="-65.761436"
+       y="221.51173" />
+    <rect
+       style="fill:#ffe680;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4200-6-7"
+       width="96.97464"
+       height="27.274122"
+       x="104.14725"
+       y="221.51173" />
+    <rect
+       style="fill:#ffe680;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4200-6-5"
+       width="87.88327"
+       height="27.274122"
+       x="110.20816"
+       y="275.99905" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="111.85931"
+       y="239.56741"
+       id="text4148"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4150"
+         x="111.85931"
+         y="239.56741">unpolished</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="-58.977531"
+       y="240.8024"
+       id="text4152"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4154"
+         x="-58.977531"
+         y="240.8024">unpolished flnc</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="122.05293"
+       y="294.3136"
+       id="text4156"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4158"
+         x="122.05293"
+         y="294.3136">polished</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="112.49309"
+       y="449.25998"
+       id="text4180"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4182"
+         x="112.49309"
+         y="449.25998">collapsed hq</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="252.07471"
+       y="450.51523"
+       id="text4184"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4186"
+         x="252.07471"
+         y="450.51523">collapsed cogent</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="-43.914452"
+       y="388.01703"
+       id="text4188"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4190"
+         x="-43.914452"
+         y="388.01703">mapped flnc</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="114.29804"
+       y="386.30127"
+       id="text4192"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4194"
+         x="114.29804"
+         y="386.30127">mapped hq</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="-50.148678"
+       y="449.6066"
+       id="text4196"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4198"
+         x="-50.148678"
+         y="449.6066">collapsed flnc</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="-47.598999"
+       y="508.90012"
+       id="text4314"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4316"
+         x="-47.598999"
+         y="508.90012">augustus flnc</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="112.22845"
+       y="508.85953"
+       id="text4318"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4320"
+         x="112.22845"
+         y="508.85953">augustus hq</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="297.0864"
+       y="507.88995"
+       id="text4322"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4324"
+         x="297.0864"
+         y="507.88995">angel</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="30.263966"
+       y="197.83408"
+       id="text4372"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4374"
+         x="30.263966"
+         y="197.83408">isoseq3 cluster</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="173.82745"
+       y="266.46353"
+       id="text4376"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4378"
+         x="173.82745"
+         y="266.46353">isoseq3 polish</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="78.55864"
+       y="325.59549"
+       id="text4384"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4386"
+         x="78.55864"
+         y="325.59549">gmap</tspan><tspan
+         sodipodi:role="line"
+         x="78.55864"
+         y="344.34549"
+         id="tspan4796">(-&gt; genome)</tspan></text>
+    <g
+       id="g5825"
+       transform="translate(0,4)">
+      <g
+         transform="translate(468.95444,20.162441)"
+         id="g5807">
+        <rect
+           style="fill:#cdde87;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4200-1-8-6-4"
+           width="176.77669"
+           height="27.274122"
+           x="-288.90363"
+           y="75.542107" />
+        <rect
+           style="fill:#ffe680;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4200-1-8-6"
+           width="213.14218"
+           height="27.274122"
+           x="-288.90363"
+           y="49.542107" />
+        <rect
+           style="fill:#ffccaa;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4200-1-8"
+           width="175.76654"
+           height="27.274122"
+           x="-288.90363"
+           y="23.542141" />
+        <rect
+           style="fill:#ffaaaa;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+           id="rect4200-1"
+           width="96.97464"
+           height="27.274122"
+           x="-288.90399"
+           y="-2.742131" />
+        <text
+           xml:space="preserve"
+           style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+           x="-273.27554"
+           y="16.541292"
+           id="text4160"
+           sodipodi:linespacing="125%"><tspan
+             sodipodi:role="line"
+             id="tspan4162"
+             x="-273.27554"
+             y="16.541292">subreads</tspan></text>
+        <text
+           xml:space="preserve"
+           style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+           x="-274.94925"
+           y="43.148224"
+           id="text4164"
+           sodipodi:linespacing="125%"><tspan
+             sodipodi:role="line"
+             id="tspan4166"
+             x="-274.94925"
+             y="43.148224">consensus full reads</tspan></text>
+        <text
+           xml:space="preserve"
+           style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+           x="-275.36768"
+           y="67.611588"
+           id="text4168"
+           sodipodi:linespacing="125%"><tspan
+             sodipodi:role="line"
+             id="tspan4170"
+             x="-275.36768"
+             y="67.611588">reconstructed transcripts</tspan></text>
+        <text
+           xml:space="preserve"
+           style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+           x="-276.40753"
+           y="94.475937"
+           id="text4176"
+           sodipodi:linespacing="125%"><tspan
+             sodipodi:role="line"
+             id="tspan4178"
+             x="-276.40753"
+             y="94.475937">coding gene models</tspan></text>
+      </g>
+      <rect
+         y="13.905127"
+         x="176.797"
+         height="112.12689"
+         width="218.19295"
+         id="rect5823"
+         style="fill:none;fill-opacity:1;stroke:#aca793;stroke-width:4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
+    </g>
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#aca793;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Send-8-7-8-0)"
+       d="M 59.538084,158.40272 28.17026,177.52818 14.08122,215.98141"
+       id="path4388-9-3-3-46"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="ccc" />
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#aca793;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Send-8-7-8-0-9)"
+       d="m 115.03642,157.4911 31.36782,19.12546 14.08904,38.45323"
+       id="path4388-9-3-3-46-2"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="ccc" />
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#aca793;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Send)"
+       d="m 89.781746,40.148744 0,23.233508"
+       id="path4388"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <path
+       style="fill:none;fill-rule:evenodd;stroke:#aca793;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Send-8)"
+       d="m 89.802051,102.84433 0,23.23351"
+       id="path4388-9"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cc" />
+    <rect
+       style="fill:#ffccaa;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4200-6-3"
+       width="78.791893"
+       height="27.274122"
+       x="48.365494"
+       y="133.58786" />
+    <rect
+       style="fill:#ffaaaa;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4200"
+       width="96.97464"
+       height="27.274122"
+       x="40.284271"
+       y="15.905084" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="55.550404"
+       y="35.753052"
+       id="text4136"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4138"
+         x="55.550404"
+         y="35.753052">subreads</tspan></text>
+    <rect
+       style="fill:#ffccaa;fill-opacity:1;stroke:none;stroke-width:2;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       id="rect4200-6"
+       width="46.467014"
+       height="27.274122"
+       x="65.538086"
+       y="76.009155" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="76.416977"
+       y="92.895905"
+       id="text4140"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4142"
+         x="76.416977"
+         y="92.895905">ccs</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="56.213924"
+       y="152.4057"
+       id="text4144"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4146"
+         x="56.213924"
+         y="152.4057">trimmed</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="96.852814"
+       y="61.36195"
+       id="text4364"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4366"
+         x="96.852814"
+         y="61.36195">ccs</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="96.852814"
+       y="122.98125"
+       id="text4368"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan4370"
+         x="96.852814"
+         y="122.98125">lima</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="238.49754"
+       y="326.0625"
+       id="text10272"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan10274"
+         x="238.49754"
+         y="326.0625">cogent</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="329.49249"
+       y="477.52448"
+       id="text10276"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan10278"
+         x="329.49249"
+         y="477.52448">angel</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="-37.335033"
+       y="416.97626"
+       id="text10334"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan10336"
+         x="-37.335033"
+         y="416.97626">collapse_isoforms_by_sam.py</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="43.436562"
+       y="477.58539"
+       id="text10800"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan10802"
+         x="43.436562"
+         y="477.58539">augustus</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="285.79193"
+       y="570.80109"
+       id="text10894"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         x="285.79193"
+         y="570.80109"
+         id="tspan10898"
+         style="font-style:italic">de novo</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="157.87939"
+       y="551.79095"
+       id="text10902"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan10904"
+         x="157.87939"
+         y="551.79095">ref based,</tspan><tspan
+         sodipodi:role="line"
+         x="160.26709"
+         y="570.54095"
+         id="tspan10906">mapped high </tspan><tspan
+         sodipodi:role="line"
+         x="157.87939"
+         y="589.29095"
+         id="tspan10908">quality transcripts</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;font-size:15px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       x="0.37531292"
+       y="552.80115"
+       id="text10910"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan10912"
+         x="0.37531292"
+         y="552.80115">ref based,</tspan><tspan
+         sodipodi:role="line"
+         x="2.7630081"
+         y="571.55115"
+         id="tspan10914">mapped </tspan><tspan
+         sodipodi:role="line"
+         x="0.37531292"
+         y="590.30115"
+         id="tspan10922">reads</tspan></text>
+    <path
+       style="fill:none;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:4.00000048;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="m 256.51983,599.30874 -0.55692,-53.3915 62.48298,-20.34963 59.59684,20.34963 -2,53.3915 z"
+       id="path10920-4"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cccccc" />
+    <path
+       style="fill:none;fill-opacity:1;fill-rule:evenodd;stroke:#aca793;stroke-width:4.00000048;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="m -59.37053,599.30874 -0.556924,-53.3915 62.4829736,-20.34963 59.5968494,20.34963 -2,53.3915 z"
+       id="path10920-7"
+       inkscape:connector-curvature="0"
+       sodipodi:nodetypes="cccccc" />
+  </g>
+</svg>
diff --git a/_reader/intro_scripts_data/R/ggplot.R b/_reader/intro_scripts_data/R/ggplot.R
new file mode 100644
index 0000000000000000000000000000000000000000..c53fa0a17697ca15f3e92093f6a2a0b25f6083ad
--- /dev/null
+++ b/_reader/intro_scripts_data/R/ggplot.R
@@ -0,0 +1,72 @@
+head(df0)
+# now we will learn about how ggplot makes figures
+# the minimal information is where is the data, what is the x variable
+# and what is the y variable and what we want to plot
+# let's try
+
+# this will make and save plot object
+baseplot <- ggplot(data = df0, aes(x = hp, y = gas_mileage)) +
+  geom_point()
+# the first line indicates the data, what x is and what y is
+# the second line says we want points
+
+# this will display plot object
+baseplot
+
+# now we can customize that plot and make it prettier
+# first off, different style, we do that by using theme
+baseplot + theme_minimal() # display plot object + modification
+# or alternatively, if you prefer
+baseplot + theme_bw()
+# now we'll save theme_bw with the object for later use
+baseplot <- baseplot + theme_bw()
+
+# now we want colored dots, simply red ones
+# google ``r colors" for an idea of the named colors available in R
+# you can choose them by name, number or hexcode; we like names
+baseplot + geom_point(color = "firebrick")
+
+# and a different size and shape
+baseplot + geom_point(color = "firebrick", shape = 18, size = 4)
+
+# or choose colors based on something else, like the number of cylinders
+
+# note, that while many things can just be 'added' to a ggplot,
+# a second ggplot object is not one of them, so we'll start over
+# as it's normally best to specify dynamic color as an
+# argument to aesthetics (called aes).
+ggplot(data = df0, aes(x = hp, y = gas_mileage, color = cyl)) +
+  geom_point(shape = 18, size = 4) +
+  theme_bw()
+
+# you can also make your color changes abrupt, not continuous,
+# by defining cyl as a factor rather than a number
+# first check what they are
+class(df0$cyl)
+# now make them a factor
+baseplot <- ggplot(
+  data = df0,
+  aes(x = hp, y = gas_mileage, color = as.factor(cyl))
+) +
+  geom_point(shape = 18, size = 4) +
+  theme_bw()
+baseplot
+# not the best colors, but you can modify them using scale_color_manual
+# to do that you need to know which levels you have as your factors
+levels(as.factor(df0$cyl))
+# and now you can look at colors and decide what is what
+baseplot <- baseplot +
+  scale_color_manual(
+    values = c(
+      "4" = "black",
+      "6" = "orange",
+      "8" = "red3"
+    ),
+    name = "cylinders"
+  )
+baseplot
+# finally, a little more polishing
+baseplot +
+  xlab("gross horsepower") +
+  ylab("miles per gallon") +
+  theme(text = element_text(size = 16))
diff --git a/_reader/intro_scripts_data/example_table.txt b/_reader/intro_scripts_data/example_table.txt
new file mode 100644
index 0000000000000000000000000000000000000000..ec459db60809864c530b55983342f9c3b811dc0c
--- /dev/null
+++ b/_reader/intro_scripts_data/example_table.txt
@@ -0,0 +1,33 @@
+""	"mpg"	"cyl"	"disp"	"hp"	"drat"	"wt"	"qsec"	"vs"	"am"	"gear"	"carb"
+"Mazda RX4"	21	6	160	110	3.9	2.62	16.46	0	1	4	4
+"Mazda RX4 Wag"	21	6	160	110	3.9	2.875	17.02	0	1	4	4
+"Datsun 710"	22.8	4	108	93	3.85	2.32	18.61	1	1	4	1
+"Hornet 4 Drive"	21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
+"Hornet Sportabout"	18.7	8	360	175	3.15	3.44	17.02	0	0	3	2
+"Valiant"	18.1	6	225	105	2.76	3.46	20.22	1	0	3	1
+"Duster 360"	14.3	8	360	245	3.21	3.57	15.84	0	0	3	4
+"Merc 240D"	24.4	4	146.7	62	3.69	3.19	20	1	0	4	2
+"Merc 230"	22.8	4	140.8	95	3.92	3.15	22.9	1	0	4	2
+"Merc 280"	19.2	6	167.6	123	3.92	3.44	18.3	1	0	4	4
+"Merc 280C"	17.8	6	167.6	123	3.92	3.44	18.9	1	0	4	4
+"Merc 450SE"	16.4	8	275.8	180	3.07	4.07	17.4	0	0	3	3
+"Merc 450SL"	17.3	8	275.8	180	3.07	3.73	17.6	0	0	3	3
+"Merc 450SLC"	15.2	8	275.8	180	3.07	3.78	18	0	0	3	3
+"Cadillac Fleetwood"	10.4	8	472	205	2.93	5.25	17.98	0	0	3	4
+"Lincoln Continental"	10.4	8	460	215	3	5.424	17.82	0	0	3	4
+"Chrysler Imperial"	14.7	8	440	230	3.23	5.345	17.42	0	0	3	4
+"Fiat 128"	32.4	4	78.7	66	4.08	2.2	19.47	1	1	4	1
+"Honda Civic"	30.4	4	75.7	52	4.93	1.615	18.52	1	1	4	2
+"Toyota Corolla"	33.9	4	71.1	65	4.22	1.835	19.9	1	1	4	1
+"Toyota Corona"	21.5	4	120.1	97	3.7	2.465	20.01	1	0	3	1
+"Dodge Challenger"	15.5	8	318	150	2.76	3.52	16.87	0	0	3	2
+"AMC Javelin"	15.2	8	304	150	3.15	3.435	17.3	0	0	3	2
+"Camaro Z28"	13.3	8	350	245	3.73	3.84	15.41	0	0	3	4
+"Pontiac Firebird"	19.2	8	400	175	3.08	3.845	17.05	0	0	3	2
+"Fiat X1-9"	27.3	4	79	66	4.08	1.935	18.9	1	1	4	1
+"Porsche 914-2"	26	4	120.3	91	4.43	2.14	16.7	0	1	5	2
+"Lotus Europa"	30.4	4	95.1	113	3.77	1.513	16.9	1	1	5	2
+"Ford Pantera L"	15.8	8	351	264	4.22	3.17	14.5	0	1	5	4
+"Ferrari Dino"	19.7	6	145	175	3.62	2.77	15.5	0	1	5	6
+"Maserati Bora"	15	8	301	335	3.54	3.57	14.6	0	1	5	8
+"Volvo 142E"	21.4	4	121	109	4.11	2.78	18.6	1	1	4	2
diff --git a/_reader/sections/00_section_arc.tex b/_reader/sections/00_section_arc.tex
new file mode 100644
index 0000000000000000000000000000000000000000..b78a4c9d9f474acb87281a8fbe26d77e93bebbf3
--- /dev/null
+++ b/_reader/sections/00_section_arc.tex
@@ -0,0 +1,164 @@
+\section{Course material}
+
+This workshop is organized based on an ARC (Annotated Research Context). 
+
+\subsection{Annotated Research Context}
+
+The ARC stores all required input data, backup data, scripts, presentations as well as this reader in one research data package.
+
+\subsubsection{The basic ARC structure summarized} 
+
+\begin{itemize}
+  \item \il{studies} \verb| -> |  external data and sample metadata
+  \item \il{assays}  \verb| -> |  measurement data (i.e. "raw" sequence data) and metadata
+  \item \il{workflows}  \verb| -> |  computational analyses (i.e. "scripts")
+  \item \il{runs}  \verb| -> |  outputs of workflow  (i.e. "results")
+\end{itemize}
+
+
+\subsubsection{Material added to the ARC for this course}
+
+\begin{itemize}
+  \item \il{_reader} \verb| -> |  The basis to this reader (written in \LaTeX)
+  \item \il{_slides} \verb| -> |  Slides presented during the workshop
+  \item \il{_handouts} \verb| -> |  Cheat sheets and additional materials
+  \item \il{runs/_backups} \verb| -> |   Backups for runs, in case a workflow unexpectedly did not work or takes too long.
+  \begin{itemize}
+    \item If a workflow would output \il{blat_results}, but failed, copy \il{runs/_backup/blat_results} to \il{runs/blat_results}.
+  \end{itemize}  
+\end{itemize}
+
+
+\subsubsection{Disclaimer}
+
+For this workshop we do not take full advantage of all ARC features. This is in part due to the fact that some developments are work in progress. 
+More so, we could easily spend two more days exploring those features, which are simply out of scope for a three-day workshop focusing on RNA-Seq data analysis. And besides, the ARC will in the future circumvent some annoyances that we run into for training purposes during this course. 
+
+If you want to learn more, check out the \href{https://nfdi4plants.org}{DataPLANT website}.
+
+\subsection{During the workshop}
+
+For the in-person workshop, we have already stored the ARC to the "Raumlaufwerke" 
+folder under our room number (25.41.00.41). Please copy the folder "rnaseq-workshop" 
+into your own home folder, so that everyone has their own copy. 
+We have written all code and commands so that they can be "executed from" the root of the ARC.
+
+As a check now, please open a terminal (applications \verb| -> | terminal) and run:
+
+\lstset{language=bash, style=bashstyle}
+\begin{lstlisting}
+cd $HOME/rnaseq-workshop
+\end{lstlisting}
+
+This is also a very good thing to try throughout the workshop if you're getting
+any sort of 'file not found' error, make sure you're in the right place.
+
+%% TODO: Software dependencies. 
+\subsection{Docker}
+We rely heavily on Docker for this workshop, which allows us to provide an 'image'
+with all the software you will need for the workshop installed and ready to use.
+There are two major reasons we chose Docker. 1) Convenience for development, as it
+works the same on our normal work machines, and on the ZIM teaching machines we 
+use for the workshop, and more importantly 2) Portability, you can all take the 
+Docker file home, work once through standardized instructions for installing 
+Docker: \url{https://docs.docker.com/} (or ask your admin to), and then you can
+readily reproduce the work from the course.
+
+As a brief disclaimer however, none of this is intended to demonstrate
+good practice with Docker, particularly not for other purposes.
+
+The Docker image ``rnaseq\_docker.tar" is available on Sciebo at:  %% TODO!
+\url{https://uni-duesseldorf.sciebo.de/s/53pA9W9TbOKTgGQ}.
+the password is written on the board, or you have received it by e-mail.
+
+The image can also be found in the ``Raumlaufwerke" folder under our room (25.41.00.41).
+
+\subsubsection{load image}
+To load the image, run:
+
+\begin{lstlisting}
+docker image load -i </path/to/>rnaseq_docker.tar
+\end{lstlisting}
+
+Where \il{</path/to/>} is replaced by e.g. ``Downloads"
+or the path to the ``Raumlaufwerke" folder as appropriate, e.g.
+
+\begin{lstlisting}
+docker image load -i Downloads/rnaseq_docker.tar.gz
+\end{lstlisting}
+
+\subsubsection{run image as container}
+To start a writable `container' from the image where you can work
+interactively, \emph{from your home directory} (\il{cd $HOME}) 
+run:
+
+\begin{lstlisting}
+docker run -it --name rnalive -p 8889:8889 --mount \
+  type=bind,source="$(pwd)"/rnaseq-workshop,target=/home/ZIM-gast/rnaseq-workshop \
+  rnaseq:latest
+\end{lstlisting}
+
+\subsubsection{restart container}
+\begin{lstlisting}
+docker start -i rnalive
+\end{lstlisting}
+
+\subsubsection{exiting}
+You can exit a container with Ctr+D or by typing \il{exit}.
+
+You won't need to do this much in the workshop, however.
+
+\subsubsection{Then what?}
+You should now have a terminal, that looks a lot like the one before,
+except now know that you're in the 'container', where all the necessary
+software is installed.
+
+\begin{lstlisting}
+# the workshop directory is shared
+cd $HOME/rnaseq-workshop
+# you should be able to see a list of all the files, that you can, 
+# e.g. by opening the folder in a file manager
+ls
+\end{lstlisting}
+
+\subsubsection{What to do when}
+At home:
+\begin{itemize}
+\item{load: once per machine or update to the image}
+\item{run: once after each load, on first use}
+\item{start: whenever you want to use it}
+\end{itemize}
+
+For the workshop however, the computers are wiped clean each night.
+So you will have to run :
+
+\begin{itemize}
+\item{load: each morning}
+\item{run: after running load each morning}
+\item{start: as needed, should you exit the container}
+\end{itemize}
+
+\textbf{Important:} keep any data you wish to save within the folder
+rnaseq-workshop, and make sure the contents of this folder are copied
+to either the ``Raumlaufwerke" folder or to e.g. your USB flash drive or
+your sciebo account before you logout of the computer!!!
+
+\fbox{\begin{minipage}{45em}
+If you have any trouble whatsoever with Docker, please ask!!
+
+Properly understanding docker is beyond the scope of this workshop,
+so don't worry if the above ``doesn't make sense",
+but you will absolutely need it to be running as expected for 
+the other parts to work, so just ask :-)
+\end{minipage}}
+
+
+\subsection{After the workshop}
+
+At least for the next half year, the ARC to this workshop is shared publicly under a CC-BY 4.0 license at \url{https://git.nfdi4plants.org/brilator/rnaseq-workshop.git}. 
+Feel free to download and unarchive the whole ARC as a zip / tar archive or \il{git clone} or \il{git fork} the ARC. Just like docker, explaining the full usage of `git` is beyond the scope of this course.
+
+To ensure, that all code works as used in this reader, make sure to store the ARC as \il{rnaseq-workshop} in your \$HOME directory.
+
+Similarly, the built docker image will remain available on sciebo. The built image is not included
+in the ARC as we did not have the time to check licenses for included software, regarding distribution.
diff --git a/_reader/sections/01_section_linux.tex b/_reader/sections/01_section_linux.tex
new file mode 100644
index 0000000000000000000000000000000000000000..be7e146824582c1ad2b575e683d0e8ac529abc69
--- /dev/null
+++ b/_reader/sections/01_section_linux.tex
@@ -0,0 +1,699 @@
+\section{Basics}
+\subsection{Linux \& Bash}
+\subsubsection{Command-line basics}
+We type our instruction for the computer into the terminal, open one by
+typing (Ctrl + Alt + T) or by navigating to one using the GUI. In all
+languages used in the workshop, the \texttt{\#} character indicates
+comments, and what follows will not be executed by the computer, but is
+there for the user.
+\lstset{language=bash, style=bashstyle}
+
+\begin{lstlisting}
+# list the contents of the current directory
+ls
+# move to the home directory
+cd
+# list the contents
+ls
+# change to the directory with the data to be used
+cd $HOME/rnaseq-workshop
+# change to one directory up
+cd ..
+# make a directory and change into it
+mkdir a_dummy_directory
+cd a_dummy_directory
+\end{lstlisting}
+
+Alright, we can change between folders, look at their contents, etc... but
+we hardly needed to learn to use a command line to accomplish this. Let's 
+very briefly look at some things that are nice on the command line, like
+variables, loops, and wild cards.
+
+\begin{lstlisting}
+## variables
+# set one variable
+species="Arabidopsis_thaliana_Col0"
+
+# use (here simply display) the variable by adding a '$' before the name
+echo $species
+
+# that might already save some typing, or make it easy to perform
+# a similar analysis on multiple different species. But variables are
+# even more useful when it comes to loops
+
+## loops
+# this will loop through the numbers 1 - 20
+for i in {01..20};
+
+# the keyword 'do' designates the start of each iteration
+do 
+   # 'touch' will create an empty text file with the given name
+   touch my_sample_${i}.txt
+
+   # the keyword 'done' is used to designate the end of each iteration
+done
+
+# look at the result
+ls
+
+## wild cards
+# we need one more file for the next example
+touch keep_me_for_now.md
+
+# note any differences in the results of the following commands
+
+ls
+ls *
+ls *.md
+ls my_sample_*
+
+# now we can selectively clean up all those files we made
+rm my_sample_*
+ls  # check result
+# be careful with wild cards though, and when possible restrict
+# their scope. Think about what would happen if you ran 
+# rm *
+# in the wrong directory
+
+## more on variables
+# if you were wondering why we used ${i} and not $i in the loop
+# the answer is it's best practice when the variable is used
+# within words. Compare the results
+
+echo $i
+echo $isample.txt  # '$isample' is not defined
+echo ${i}sample.txt
+
+# we'll also run into (and use) built-in variables
+# for instance 
+echo $HOME  # home directory
+echo $PWD  # working (current) directory
+
+# we can return to our main directory this way
+cd $HOME/rnaseq-workshop
+\end{lstlisting}
+
+Now we will write our first shell script. We will try everything out before we actually make the
+script. There are five steps to making a script: (i) define what you want to happen, in our case, we want the
+computer to print the current time and date, (ii) test out the code by writing it out piece by piece, (iii)
+actually write the script using a text editor, (iv) make the script executable and (v) run the script.
+
+First, test the code:
+
+\begin{lstlisting}
+# make the computer repeat what you wrote
+echo "My 1st script prints the date and time."
+echo $(date +%F_%T)
+\end{lstlisting}
+
+And now we will produce the script (or program if you will). Open the text editor, gedit, 
+on the host machine either
+by double clicking a text file from the GUI file manager or e.g. by opening applications 
+( \vdots \vdots \vdots~, lower left) and searching for 'gedit'. 
+
+Then write the following in the text editor: 
+
+
+\begin{lstlisting}
+#!/bin/bash
+echo "My 1st script prints the date and time."
+echo $(date +%F_%T)
+exit
+\end{lstlisting}
+Save the file as \il{myfirstprogram.sh} in the \il{rnaseq-workshop} directory.
+
+This program cannot yet be executed (ran) as the computer has not been told that it is an executable
+program. But we can set the permissions. We'll start just by checking the current
+permissions.
+
+\begin{lstlisting}
+# from the rnaseq-workshop directory where you saved myfirstprogram.sh
+ls
+# note the colors
+ls -l
+# note the information you see on the screen
+chmod u+x myfirstprogram.sh
+# this command tells the computer to add (+) the 
+# execute permission (x) to the current user (u)
+# if you omit the u, you will give permission to everyone
+ls
+ls -l
+# compare what changed to output from above
+\end{lstlisting}
+
+\fbox{\begin{minipage}{45em}
+As a side note, there's a lot more to permissions control, both in \emph{what} can be done
+and also in \emph{how} it can be done. It's more than we have time or need to get into here
+but there's a lot of nice resources out there to explain it if you're curious, e.g.
+\url{https://www.pluralsight.com/blog/it-ops/linux-file-permissions}.
+\end{minipage}}
+
+You are ready to run the script.
+
+\begin{lstlisting}
+# execute the script in the current directory (./)
+./myfirstprogram.sh
+# we can also store the output
+./myfirstprogram.sh > text.txt
+# look in the current directory in the GUI to find 
+# the file text.txt and open it by double clicking.
+
+# further, view the output in the terminal
+less text.txt
+# enter 'q' to quit
+
+# extra exercise: 
+# try appending output to the existing text.txt file 
+# by using '>>' instead of '>'
+\end{lstlisting}
+
+Writing shell scripts is
+easy, you just put in the program that you would write in the terminal and the computer will execute it
+line by line. The first line tells the computer what to use to read the program (in our case 
+\il{bash}) and the last line tells it to exit from executing the 
+program. Writing these small shell
+scripts becomes important when you want to run your read mapping overnight or over a weekend. We will
+revisit them when we do the read mapping.
+
+You can also use commands to make analyses in a file. We will use a transcriptome (you
+will assemble your own later) which is in .fasta format
+
+As necessary {\Large\textbf c}hange into the
+{\large\textbf d}irectory \\
+\il{$HOME/rnaseq-workshop/studies/AthalianaReferences/resources/}.
+
+\begin{lstlisting}
+# look at the file using the command line
+less Athaliana_primaryTranscripts.fa
+# you can move through the document by pressing Enter
+# you can leave the file by pressing q
+# if you only want to look at the first few lines
+head Athaliana_primaryTranscripts.fa
+# if you want to look at the last few lines
+tail Athaliana_primaryTranscripts.fa
+# view the whole file
+cat Athaliana_primaryTranscripts.fa
+# now you know why the 'head' and 'tail' commands 
+# are so important. To interrupt this, you'll 
+# want the shortcut Ctrl + C (or Strg + C)
+\end{lstlisting}
+
+With these large files, we often get our first look and first stats in bash.
+
+\begin{lstlisting}
+# count the number of sequences, more specifically:
+# count the headers, which start with '>'
+# the grep command, is a type of search, we'll look for '>'
+# the '|' tells the computer to pass the output of one command
+# to another, and wc is the command for "word count"
+grep ">" Athaliana_primaryTranscripts.fa | wc --lines
+
+# the number you see, is the number of transcripts
+# you can also estimate* how many bases there are in total
+# by counting all characters in the file with the 
+# exception of the fasta headers.
+# you do that by inverting grep (-v), producing all but 
+# lines starting with ">", and then counting characters
+grep ">" Athaliana_primaryTranscripts.fa -v | wc --chars
+# *estimate, because the end-of-line character is counted.
+\end{lstlisting}
+
+Linux has many more such small useful commands. It is frequently helpful to google or to look at
+Linux tutorials and see what you can scavenge for your purpose.
+
+\subsubsection{Installing software}
+\fbox{\begin{minipage}{45em}
+If you're running behind, this is a good-to-know sub-section that we won't reference or
+need later in the course, so feel free to skip. You can always come back to it later
+on your own time.
+\end{minipage}}
+
+We've pre-installed everything for this course, and we'll be sending you home with a 
+copy of the Linux image. That said, we would be remiss if we didn't briefly cover how to install additional 
+software.
+
+In general, a lot of software can be installed through the package manager, which is best
+when the program is available there, up-to-date, and we have administrative privileges.
+
+Let's try and install a program, "TopHat", which is a now-retired deprecated short-read aligner
+that we don't need, but which has a fairly typical installation process for a bioinformatics tool.
+
+\begin{lstlisting}
+# search for a program
+apt search tophat
+# you can see (in green) exact program names matching the search
+# now we know exactly what to install 
+sudo apt install tophat
+# Hmmmm.... now we run into the problem that we don't have 
+# administrative rights on these computers
+# That's OK though, you'll normally either 
+# a) have administrative rights yourself, or
+# b) have a systems administrator around that you can ask for help
+
+\end{lstlisting}
+
+Note that the package-manager is distribution specific
+so the above 'apt' commands work for Ubuntu. While other systems
+have their own package-managers (e.g. Fedora uses 'dnf').
+There are much more extensive resources on these elsewhere (e.g.
+\url{https://www.digitalocean.com/community/tutorials/package-management-basics-apt-yum-dnf-pkg}).
+
+But now let's look at one fall-back alternative when installation with the package
+manager isn't working, basically how almost any compiled program can be locally 
+"installed" for one user. This occurs a lot for bioinformatic programs, most often when packages
+aren't available via the package manager or they are only available in old versions (e.g. 
+"FastQC", "samtools", "Trimmomatic"). That said, it can also work to install software on e.g.
+a shared cluster, for instance when it's just a specific program that only you will use 
+and you don't want to bother the administrators to install it.
+
+Google "cufflinks rnaseq github" and your first hit is presumably 
+\url{https://github.com/cole-trapnell-lab/cufflinks}.
+
+Scroll down to ``Installing a pre-compiled binary release", and click ``here"
+
+If you then right click the link for 2.2.1, Linux
+and then click ``Copy Link" you get the path shown below. Or you could
+just left click it and download it normally of course, it 
+would just slightly change the instructions from the demo.
+
+You can further unpack the downloaded file by double clicking in the 
+file manager GUI, or as shown below.
+
+\begin{lstlisting}
+# change to a directory we can store the files
+cd $HOME/rnaseq-workshop/workflows
+
+# download and unpack the files
+wget http://cole-trapnell-lab.github.io/cufflinks/assets/\
+downloads/cufflinks-2.2.1.Linux_x86_64.tar.gz
+
+tar -xvf cufflinks-2.2.1.Linux_x86_64.tar.gz
+# look
+ls
+\end{lstlisting}
+At this point we have downloaded and unpacked the file, but we can't just run 
+it yet. The next thing one should almost always do is look for some sort of "Read me" file, 
+(e.g. README.txt, Readme.txt, Readme.md) or installation instructions file (e.g. "INSTALL.txt").
+
+Unfortunately Tophat's README doesn't do much but point you towards
+the website. But if you search there for 'install' or 'quick start' you should
+find some information. Even then, it's not exceptionally beginner friendly,
+so see below, if and when you want further help.
+
+
+\begin{lstlisting}
+
+# change into the new cufflinks directory
+cd cufflinks-2.2.1.Linux_x86_64 
+# look at what's there
+ls -l 
+# you should see a lot of executable files
+# check that tophat works
+./cufflinks -h
+\end{lstlisting}
+
+Hopefully you got the help function and it's working. That's great, but this only works
+from this directory or if you supply the full path. e.g. 
+
+\il{$HOME/rnaseq-workshop/workflows/cufflinks-2.2.1.Linux_x86_64/cufflinks}.
+
+That'd be a pain to remember. So how do we tell Linux to look for this executable
+no matter which directory we are currently in? We can do this through the \il{$PATH} variable.
+
+This \il{$PATH} variable, stores all the directories where Linux will look
+for executables, and we can append directories to it.
+\begin{lstlisting}
+
+# the syntax of the 'mv' AKA "move" command is 'mv source destination'
+# now add the file to our path
+export PATH=$PATH:$HOME/rnaseq-workshop/workflows/cufflinks-2.2.1.Linux_x86_64/
+# to break this down
+# PATH= is just setting the variable
+# $PATH:$HOME/rnaseq-workshop/workflows/cufflinks-2.2.1.Linux_x86_64/ is just the 
+# old path with our custom bit appended (old):(custom)
+# and 'export' makes it available to subprocesses of this shell
+
+# test that it works
+cufflinks -h
+# test that something else still works
+ls
+# at this point, if you want this line to always be available, add
+# the line "export PATH=$PATH:$HOME/Documents/tophat-2.1.1.Linux_x86_64"
+# to the file $HOME/.bashrc and it will run every time you open a terminal
+# e.g. using a text editor like 'gedit' as above
+
+\end{lstlisting}
+
+\subsubsection{Getting help}
+
+Most programs and often even simple scripts will come with usage instructions.
+
+\begin{lstlisting}
+# for instance, here are several ways to get documentation for 'grep' 
+man grep
+grep -h
+grep --help
+# while the above commands are standard, you'll occasionally see things
+# like -help or -?, and simply typing the name of the program without
+# any parameters can also be a good bet
+
+### challenge assignment ###
+# grep can actually do a lot more than we showed here. From it's usage
+# function, can you figure out how to accomplish the results of the 
+# command above: `grep ">" Athaliana_primaryTranscripts.fa | wc --lines`
+# using just `grep`, and not `wc`?
+\end{lstlisting}
+
+If this doesn't help (enough) or perhaps you don't know quite which tool
+to use yet: Google it, duck-duck-go-it, your-search-engine-of-choice it.
+
+Searching for any error messages you encounter can also be extremely 
+helpful during trouble shooting.
+
+% todo, this needs a cheat sheet
+
+\subsection{R}
+We will run a brief introduction to R, including installing packages, getting your data into R, exporting data from R and how
+to make a figure using the package "ggplot2". We will use RStudio to write and execute R code since it makes our life
+easy. As a rule, Interactive Development Environments (IDEs), like RStudio, make the programmer's life easier,
+particularly at the start. PyCharm is a nice equivalent for python.
+
+You can open Rstudio, either by entering \il{rstudio} 
+on the command line, or from the applications
+menu under `Development'.
+
+\subsubsection{Rstudio via second Docker file}
+Normally, you would open R by entering \il{rstudio} 
+on the command line, or from the applications
+menu under `Development'. However, for the duration
+of the course we'll use a second Docker container for
+this.
+
+\begin{itemize}
+\item acquire the \il{rstudio_docker.tar} file from sciebo or the ``Raumlaufwereke" folder (as before) 
+\item on the \emph{host} machine, open a new terminal
+\item run the following
+\end{itemize}
+\begin{lstlisting}
+docker image load -i </path/to/>rstudio_docker.tar
+docker run --rm -p 8787:8787 -e PASSWORD=rstudio --mount \
+  type=bind,source=${PWD}/rnaseq-workshop,target=/home/ZIM-gast/rnaseq-workshop \
+  rstudiotest
+\end{lstlisting}
+\begin{itemize}
+\item on the \emph{host} machine go to the browser and enter 'localhost:8787'
+\item enter 'rstudio' as the username and 'rstudio' as the password
+\item have fun learing R \& Rstudio!
+\end{itemize}
+
+We will use the basic R functions that come with R when you install it. We will also use R
+packages, little or large things other people wrote for us and those need to be installed. To install one we
+type in the Rstudio script window and execute by typing (Ctrl + Enter). We type only the lines without the
+preceding "\#" and we execute each line after typing it by pressing (Ctrl + Enter).
+
+
+\lstset{language=R, style=Rstyle}
+
+Within Rstudio, the first thing we'll want to do is navigate to the
+usual directory.
+
+\begin{lstlisting}
+setwd('/home/ZIM-gast/rnaseq-workshop/')
+\end{lstlisting}
+
+\subsubsection{basics in R and Rstudio}
+
+\begin{lstlisting}
+install.packages("ggplot2")
+# this installs the packages, you can see some logging information in the
+# console about what R is doing. If you get an error about permissions, 
+# don't worry, it's already installed
+library(ggplot2)
+# you only need to install the package once, afterwards you only need the
+# library() command. 
+\end{lstlisting}
+
+Let's assume you have an Excel file and want to do something to it using R. First, always, always
+use tab-delimited text files to move from program to program, no matter which programs you use. Look at
+the small tab-delimited txt file in the \$HOME/rnaseq-workshop/\_WorkshopReader/intro\_scripts\_data called "example\_table.txt" 
+using a text editor (e.g. `gedit') or LibreOffice Calc (which is just like Excel 
+except free, open source, and Linux compatible). Next we'll import this into R
+\begin{lstlisting}
+# set your working directory to the directory with the example_table.txt
+# file using the [Session > set working directory > choose directory] GUI, 
+# and look what happens in the console
+df0 <- read.delim("example_table.txt")
+head(df0)
+# the first command reads in a tab-delimited text file, our go-to format,
+# it stores the table under the name 'df0'
+# the second command, much like 'head' in bash shows the first rows
+\end{lstlisting}
+
+The standard format for biologists applications in R is the \il{data.frame} 
+which is essentially what we
+used to call a table. It is organized in columns and rows with column names and row names (not present in
+our table, that is why \il{head(df0)} gives us numbers. 
+Now we will briefly look at how to get particular
+columns out, and how to get particular rows out. We will always look at the data.frame, do something and
+look again. We will also look a bit at classes (or types) of data because that is a frequent cause for errors
+when writing scripts.
+
+\begin{lstlisting}
+# let's remind ourselves what the data.frame looks like
+head(df0)
+# now we will extract the first column
+df1 <- df0[, 1]
+head(df1)
+# now let's look at classes
+class(df0)
+class(df1)
+# when we extracted the first column only, we no longer have a data.frame
+# we now have a factor 
+# let's extract the first two columns
+df2 <- df0[, c(1, 2)]
+# this syntax means: take columns 1 and 2 from the data.frame 'df0'
+# subsetting data frames works with [rows, columns]
+# and c(1, 2) indicates which columns
+head(df2)
+df3 <- df0[, c("mpg", "cyl", "disp", "hp")]
+head(df3)
+# this time we have used the column names instead of the column number
+# now let's look at rows
+df4 <- df0[1:5, ]
+# notice how the comma is now on the other side
+# this extracts the first five rows, the column names are not a row, they
+# are column names
+df4
+# notice how I no longer ask for head(), but show everything. After all,
+# I now only have five rows to display
+# now some advanced magic, extract all rows in which there
+# are cars with at least six cylinders
+df5 <- df0[(df0$cyl >= 6), ]
+# this evaluates the stuff in the bracket which says check if column 'cyl'
+# of the data.frame df0 is equal or larger than 6
+# then it returns all rows (note the trailing comma) where this is TRUE
+# how many?
+dim(df5)
+dim(df0)
+df5
+\end{lstlisting}
+
+Let's assume you want to rename the columns but do not yet know how. If that happened in the lab,
+you'd ask a colleague or ask google. Same here: google "rename columns r". The best result is not always
+the first. To speed things up, we'll tell you the one we like, which is ``How to rename a single column in a
+data.frame in R? - Stack Overflow" and there the 2nd solution which is:
+\lstset{language=R, style=Rstyle, frame=trbl, rulecolor=\color{red}}
+
+\begin{lstlisting}
+# df = dataframe
+# old.var.name = The name you don't like anymore
+# new.var.name = The name you want to get
+names(df)[names(df) == "old.var.name"] <- "new.var.name"
+\end{lstlisting}
+
+Now we can try renaming a column!
+
+
+\lstset{language=R, style=Rstyle}
+
+\begin{lstlisting}
+# look at current column names
+colnames(df0)
+# we tell R to look at the names in the data.frame df and find the one for
+# which the name = mpg
+# and then we re-assign that name to gas_mileage
+# avoid spaces or number at the beginning of a name which are both bad in R
+names(df0)[names(df0) == "mpg"] <- "gas_mileage"
+# and look at after
+colnames(df0)
+\end{lstlisting}
+
+In R, we can make plots and make prettier plots. There are three major options
+for plotting in R, ggplot2, lattice and the basic plot function. 
+We will use the package ggplot2 which was installed and loaded earlier today. 
+Our example plot will come from the data.frame we have worked
+with all the time. We want to visualize whether horsepower of a car 
+and gas mileage have something to do with each other. Let's look at the 
+data.frame again:
+
+\lstinputlisting{intro_scripts_data/R/ggplot.R}
+You can spend days and weeks over figures in ggplot2. Refer to the cheat sheet 
+provided by Rstudio for the most important functions. 
+
+\url{https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf}
+
+We will cover common RNAseq-related examples as needed.
+
+\subsection{Perl/Java/others}
+\lstset{language=bash, style=bashstyle}
+While they're common starting points and the most critical for this workshop, 
+bash and R are just a tiny sampling of available languages. At the least,
+you'll want to know how to execute scripts in other languages
+This is a quick primer on how to do that. Very generally, you start 
+by checking whether the language is installed and you
+can do that either by typing the name and seeing what happens 
+or by typing the name followed by 
+\il{-v} or \il{--version}
+which usually checks the version or by typing the name followed 
+by \il{-h} which will call the help pages.
+
+
+\begin{lstlisting}
+
+perl
+# nothing seems to happen, use the interrupt (Ctrl + C)
+perl -v
+# you should see the version information
+perl -h
+# shows included help function, -h, -help, and --help are your 
+# first "goto"s, the second is google
+python3
+# ups, you can now directly write code and execute it, we need 
+# to get out of here
+quit
+# this is not working but python tells you what will work
+quit()
+# and now you made it out.
+# apparently both are installed
+\end{lstlisting}
+Google for "count\_fasta.pl". This will give you a script. 
+Look at the first line, apparently it is in perl (also standard for a .pl ending). 
+Copy the script into the text editor and save it in the same folder as 
+\il{Athaliana_primaryTranscripts.fa}
+naming it \il{count_fasta.pl}. Now it needs permission to be executed
+
+\begin{lstlisting}
+ls
+# the filename is white
+chmod u+x count_fasta.pl
+ls
+# the filename is green
+# find out how it works
+./count_fasta.pl -h
+# apparently, you can specify something using -i but you do 
+# not have to do it and you need a fasta file. 
+# Use the transcriptome file from earlier
+./count_fasta.pl Athaliana_primaryTranscripts.fa
+# this will compute for a moment and then give you statistics
+# about the fasta file. If you wanted that as a text file, 
+# you can redirect the output into file using >
+./count_fasta.pl Athaliana_primaryTranscripts.fa > ATstats.txt
+# you can look at the resulting file using the GUI or by typing
+less ATstats.txt
+\end{lstlisting}
+
+You can find scripts by googling what you want to do or what previous
+authors say they have used (or made) in papers. Hosting services such 
+as \url{https://github.com} provide a way for all of us to backup, track and share our 
+code. Many simple scripts stand alone and can be executed similar to
+above, while bundles of scripts or 'packages' may require installation. 
+Just check whatever documentation comes with the code you want to use
+(it should contain instructions for any required installation) 
+and give it a try. We will use Java later when we use trimmomatic and 
+a lot of the analyses are performed with python scripts and packages.
+
+\subsection{Python}
+\subsubsection{Python virtual environments}
+As more and more software is provided in python, it's more and more
+important to be at least partially familiar with installing things
+in python and with python virtual environments (yes including conda). Often, and
+particularly when installing less-stable bioinformatics software, there
+are good reasons to install it within a virtual environment. Basically
+it temporarily adds things to your PATH, while protecting you from any
+damage to your system that could be caused by, say, two executables 
+having the same name.
+
+We won't use a virtual environment for the primary software in 
+this course, because we're using Docker. 
+
+However, a quick demo is worth the effort.
+
+Virtual environments are also a great way to install things \emph{with}
+a package manager \emph{without} needing administrative rights.
+Pip is the general python package manager.
+
+First, we'll setup a virtual environment
+\begin{lstlisting}
+# setup environment, 'venv' can be any name or valid path of your choice
+virtualenv venv
+# activate or 'turn on'
+source venv/bin/activate
+\end{lstlisting}
+
+
+Now we can use the virtual environment, and install
+things with a package manager. 
+
+\begin{lstlisting}
+# first, we'll try and run what we're about to install
+jupyter
+# no surprise there
+pip install jupyter
+# see all those individual progress bars whizzing past,
+# those are all the dependencies that pip is taking care
+# of for us
+
+# start jupyter so that it's accessible from host
+jupyter notebook --port 8889 --ip 0.0.0.0
+# you can use 'Ctr + c', and then 'y' to leave
+# but leave it on until you've finished python basics
+\end{lstlisting}
+
+You can find much information on virtual environments
+at \url{https://docs.python-guide.org/dev/virtualenvs/}, or if you
+are doing a lot of Python work under windows or run into installation
+instructions that require \il{conda} you might be interested in Anaconda-based
+virtual environments \url{https://conda.io/docs/user-guide/index.html}.
+
+Notes from a installation mini-workshop can also be found here
+\url{https://github.com/weberlab-hhu/reproducbility-collection/blob/main/installation_intro/installation_intro.md}
+
+\subsubsection{python basics}
+While it probably won't be critical for the rest of the workshop,
+we would recommend learning some python basics to everyone. It's a very
+approachable and widely used language. 
+
+If you have time, we recommend the following.
+
+Above, when you ran `jupyter notebook` it should have opened
+a server and listed a link to access it. Maybe something like
+\il{http://127.0.0.1:8889/?token=8f463d2...}, the one with 127.0.0.1
+will be easiest.
+
+Copy this link and paste it into a browser on your host machine.
+
+Click on the 'new' button, and then make a new 'python3' notebook.
+Work through the examples here:
+
+\url{https://jckantor.github.io/CBE30338/01.01-Getting-Started-with-Python-and-Jupyter-Notebooks.html}
+
+Jupyter is also great for mixing code with documentation,
+visualization, or interpretation of results. A demo of what
+jupyter can do can be found here: 
+
+\url{https://github.com/weberlab-hhu/reproducbility-collection/blob/main/jupyter_intro/jupyter_demo.ipynb} 
+
+just as an example resources, there's much more.
+
+When you're done with the above, hit Ctr+c (Strg + c), followed by 'y'
+in the terminal to close the jupyter server.
diff --git a/_reader/sections/02_section_illumina.tex b/_reader/sections/02_section_illumina.tex
new file mode 100644
index 0000000000000000000000000000000000000000..1a647ebe567b4194678dad802ea017720237f8af
--- /dev/null
+++ b/_reader/sections/02_section_illumina.tex
@@ -0,0 +1,505 @@
+\section{Example short read RNAseq analysis}
+\subsection{Description of the datasets you have been given to work on}
+
+% \lstinputlisting{../studies/Bernsdorff2016_Arabidopsis_SAR/README.md}
+
+The data you see is part of an experiment to test signaling during systemic 
+acquired resistance (SAR) in \emph{A. thaliana.} During this experiment 
+you challenge a leaf with either mock or pathogen solution.
+Two days later, you harvest a different leaf for RNA-seq and see if the 
+"we are under attack" signal has arrived and how it transforms the 
+transcriptome. Your data is only from wild type \emph{Arabidopsis} treated with
+mock and pathogen solution. The original experiment also contains the analysis 
+of mutants with defects in different aspects of signaling, that through salicylic 
+acid and that through pipecolic acid. The complete
+experiment can be found here: \url{http://www.plantcell.org/content/28/1/102.short}
+
+\lstset{language=bash, style=bashstyle}
+\subsection{First look at reads}
+Many, but not all, bioinformatics tools support compressed data as input. 
+If you do need to extract it, any GUI archive tool or quickly googling the 
+ending of the file name (e.g. *.gz) with "extract Linux" should find an easy solution.
+For the workshop files, run
+\begin{lstlisting}
+gunzip assays/Bernsdorff2016_Illumina/dataset/Col0treatment1.fastq.gz
+\end{lstlisting}
+
+Take a quick look at the fasta file
+\begin{lstlisting}
+head assays/Bernsdorff2016_Illumina/dataset/Col0treatment1.fastq
+\end{lstlisting}
+
+Every four lines represents a read.
+\begin{verbatim}
+1: @ID
+2: Sequence[ATCGN]
+3: +ID
+4: Phred quality scores
+\end{verbatim}
+
+Note that if this was paired end data, each sample would have two files with 
+both having matching sorting and read IDs with all the forward reads in one 
+file and reverse reads in the other. The quality scores generally encode the 
+numbers from 0-40 that are -10 log 10 p, where p=probability that the base call was incorrect. So 30 is a 1 in 1000, and 20 a 1 in 100 chance of a miss-called
+base. Currently, the most common encoding is Phred+33, and looks like this:
+\begin{verbatim}
+!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHI
+|  |                     |    |         |
+0..3.....................26...31........41
+\end{verbatim}
+
+No one wants to look at fastq files by hand, instead, 
+the program FastQC will create a full quality report for you.
+\begin{lstlisting}
+mkdir runs/fastqc_results
+fastqc assays/Bernsdorff2016_Illumina/dataset/Col0treatment1.fastq \ 
+  --outdir runs/fastqc_results
+\end{lstlisting}
+
+You can open the result in a browser. Either from a file manager, 
+or from the terminal. \emph{This has to be done from the host machine}.
+\begin{lstlisting}
+firefox runs/fastqc_results/Col0treatment1_fastqc.html
+\end{lstlisting}
+
+FastQC uses a simple code of green for "good" or "normal" data and 
+yellow then red for more questionable data.
+Go through the report items and compare them to good Illumina data:
+
+\url{https://www.bioinformatics.babraham.ac.uk/projects/fastqc/good_sequence_short_fastqc.html}
+
+and to bad Illumina data:
+
+\url{https://www.bioinformatics.babraham.ac.uk/projects/fastqc/bad_sequence_fastqc.html}
+
+How do they measure up?
+
+\subsection{Trimming}
+You've seen from the FastQC reports that the data both a) has lower quality towards the end of
+reads, and b) contains some adapter sequences. Both of these can lead to problems in down stream
+analyses, and the solution of choice is generally trimming the reads. Trimmomatic is a java program that
+can be used to perform both adapter and trimming steps. The command looks complicated, but for that it
+is a very powerful and flexible tool.
+
+Breaking it down into pieces:
+
+\begin{tabular}{r l}
+    \toprule
+    \parbox[t][][t]{85mm}{SE} &
+    \parbox[t][][t]{65mm}{indicates single end reads} \\ \midrule
+
+    \parbox[t][][t]{85mm}{Col0treatment1.fq} &
+    \parbox[t][][t]{65mm}{your file to trim} \\ \midrule
+
+    \parbox[t][][t]{85mm}{Col0treatment1.trimmed.fq} &
+    \parbox[t][][t]{65mm}{name of output file} \\ \midrule
+
+    \parbox[t][][t]{85mm}{ILLUMINACLIP:\textless adapters \textgreater:\textless seed mismatches\textgreater : \\ 
+ \textless palindrome clip threshold\textgreater :\textless simple clip threshold\textgreater } &
+    \parbox[t][][t]{65mm}{this trims the adapters} \\ \midrule
+
+    \parbox[t][][t]{85mm}{MAXINFO:\textless targetLength\textgreater : \\
+ \textless strictness (0-1 for longer-stricter)\textgreater } &
+    \parbox[t][][t]{65mm}{this trims low quality bases from the 3' end} \\ \midrule
+
+    \parbox[t][][t]{85mm}{MINLEN:\textless min\textgreater } &
+    \parbox[t][][t]{65mm}{drop reads below this length} \\
+
+    \bottomrule
+\end{tabular}
+
+\begin{lstlisting}
+# run trimmomatic
+# note that the '\' in the following command is there to 'escape'
+# the end-of-line character. If you write this on one line,
+# skip the '\' character
+
+mkdir runs/trimmed_fastq/
+
+# $HOME/sw is just where trimmomatic is located in our Docker container
+# and would need to match the machine in question
+java -jar $HOME/sw/Trimmomatic-0.39/trimmomatic-0.39.jar SE \
+  assays/Bernsdorff2016_Illumina/dataset/Col0treatment1.fastq \
+  runs/trimmed_fastq/Col0treatment1.trimmed.fastq \
+  ILLUMINACLIP:$HOME/sw/Trimmomatic-0.39/adapters/TruSeq3-SE.fa:2:30:10 \
+  MAXINFO:50:0.8 MINLEN:36
+
+\end{lstlisting}
+
+It is normally best to check the quality of the data after trimming. This way you are confident the right
+adapters were removed.
+
+\begin{lstlisting}
+fastqc runs/trimmed_fastq/Col0treatment1.trimmed.fastq --outdir runs/fastqc_results
+
+# and to view it 
+firefox runs/fastqc_results/Col0treatment1.trimmed_fastqc.html
+
+\end{lstlisting}
+
+\subsection{From Reads to Quantified Transcripts}
+A file with 5 million reads, high quality or not, is still very far away from human legible data, let
+alone a biological conclusion. The next big step is to get from reads to transcripts and their abundance;
+however, what this looks like will depend heavily on what species you are using and how good of a
+genome and/or transcriptome annotation is already available.
+
+Some major options are shown in the following figure. Basically, the less data is already available
+as a reference sequence, and the lower quality the available annotation there of, the harder this will
+become. If you have a nice genome and a well annotated transcriptome with only modest duplication, like
+\emph{A. thaliana} used here. 
+You can just align the reads to the transcriptome or even just count the k-mers in the
+reads. However, if you are working with a non-sequenced species, you will either have to align the reads
+to a closely related species, or make your own \emph{de novo} assembly.
+
+All of the options come with their own caveats, and what is appropriate depends on your reference and
+your data.
+
+\includegraphics[width=\textwidth]{align_n_quant.pdf}
+
+For instance:
+\begin{itemize}
+\item If your DNAse treatment was incomplete, you don't want the DNA reads mapping to the most-
+similar transcript, you want to align the reads directly to the genome (including organellar
+genomes).
+\item If you have a species where the genome is more complete than the corresponding transcriptome
+annotation, your data will be easier to interpret if you align it directly to the genome.
+\item If your species of interest is not sequenced, and not phylogenetically close to a sequenced species,
+you will have little option but performing a \emph{de novo} assembly, but should be aware of biases such
+as splitting highly abundant transcripts, chimeric transcripts, partial or absent low-abundance
+transcripts, and detection of transcripts from other species (e.g. biotrophic fungi). Getting long reads
+will help with some, but not all of the above.
+\item If everything has gone well and you have a good genome available, a k-mer counting strategy will
+run very quickly and very easily and give you data you can work with.
+\end{itemize}
+
+We will focus on option 1. from the figure using Kallisto for k-mer counting, as we are working with
+clean, easy, A. thaliana data. However, we'll also run a brief example for branches 2. - 4., in case this more
+accurately reflects your data.
+
+\subsubsection{Counting k-mers with kallisto}
+Classically reads have been aligned to the reference sequence. While it can be sped up by efficient
+indexing, this is fundamentally a computationally intensive (and therefore slow) process, because the
+reference sequences are very large and there are many reads to search against them. Recently, options such
+as Kallisto have been developed, that do not actually align the reads, but count the k-mers they are made
+up of and assign these counts to transcripts with these k-mers. A k-mer is a stretch of bases from a read
+long enough to generally be unique, short enough to rarely contain errors and to be computationally
+feasible.
+
+Example sequence and 6-mers that can be derived from it:
+\begin{verbatim}
+TCTTCCGGAGGTGGAGGAAAACCGACGATT
+TCTTCC
+ CTTCCG
+  TTCCGG
+   TCCGGA
+    CCGGAG
+     .
+      .
+       .
+\end{verbatim}
+
+While much faster than read alignment, this will sometimes be less accurate for assigning reads to the
+correct transcripts as it does not fully utilize all the read information.
+
+When available, plant reference sequences can be obtained from major databases like Phytozome
+(phytozome.jgi.doe.gov) or Ensembl (plants.ensembl.org), which provides some naming consistency.
+Many more species are available through species specific resources.
+
+Here we have pre-downloaded the \emph{A. thaliana} information from Phytozome.
+
+\begin{lstlisting}
+# first kallisto needs to create an index (de Bruijn graph)
+# of the reference sequence
+mkdir runs/kallisto_index/
+
+kallisto index -i runs/kallisto_index/Ath_kallisto_index \
+studies/AthalianaReferences/resources/Athaliana_transcripts.fa
+
+\end{lstlisting}
+
+The indexing parameters are:
+
+\begin{tabular}{r l}
+    \toprule
+    \parbox[t][][t]{55mm}{index} &
+    \parbox[t][][t]{65mm}{tells kallisto to make an index} \\ \midrule
+
+    \parbox[t][][t]{55mm}{-i} &
+    \parbox[t][][t]{65mm}{assign the index name} \\ \midrule
+
+    \parbox[t][][t]{55mm}{Athaliana\_transcripts.fa} &
+    \parbox[t][][t]{65mm}{the downloaded reference transcriptome} \\
+
+    \bottomrule
+\end{tabular}
+
+\begin{lstlisting}
+# we'll want to keep our output in a separate folder here
+mkdir runs/kallisto_results
+
+# now kallisto can quantify how many reads (appear to) come
+# from each transcript
+ 
+kallisto quant -b 30 -i runs/kallisto_index/Ath_kallisto_index \
+-o runs/kallisto_results/treatment1 --single \
+-l 190 -s 20 runs/trimmed_fastq/Col0treatment1.trimmed.fastq
+\end{lstlisting}
+
+The general quantification parameters are:
+
+\begin{tabular}{r l}
+    \toprule
+    \parbox[t][][t]{55mm}{quant} &
+    \parbox[t][][t]{65mm}{tells kallisto to run quantification} \\ \midrule
+    
+    \parbox[t][][t]{55mm}{-b} &
+    \parbox[t][][t]{65mm}{number of bootstraps (optional, for use with sleuth)} \\ \midrule
+
+    \parbox[t][][t]{55mm}{-i} &
+    \parbox[t][][t]{65mm}{index name assigned above} \\ \midrule
+
+    \parbox[t][][t]{55mm}{-o} &
+    \parbox[t][][t]{65mm}{output directory name} \\
+
+    \bottomrule
+\end{tabular}
+
+And a few extra parameters are required for single end reads:
+
+\begin{tabular}{r l}
+    \toprule
+    \parbox[t][][t]{55mm}{\il{--single}} &
+    \parbox[t][][t]{65mm}{} \\ \midrule
+
+    \parbox[t][][t]{55mm}{-l} &
+    \parbox[t][][t]{65mm}{mean fragment length} \\ \midrule
+
+    \parbox[t][][t]{55mm}{-s} &
+    \parbox[t][][t]{65mm}{standard deviation of fragment length} \\
+
+    \bottomrule
+\end{tabular}
+
+\begin{lstlisting}
+# kallisto created a directory with three files
+ls runs/kallisto_results/treatment1
+
+# we just need one of them, take a look
+less runs/kallisto_results/treatment1/abundance.tsv
+\end{lstlisting}
+
+You should see tab separated values, most importantly showing us the transcript ID (target\_id), the
+estimated read counts (est\_counts), and the normalized abundance estimate Transcripts Per Million (tpm).
+
+\begin{verbatim}
+target_id    length eff_length est_counts tpm
+ATCG00020.1  1062   873        995        293.647
+ATCG00040.1  1581   1392       9          1.66579
+ATCG00050.1  240    51.331     0          0
+ATCG00065.1  114    5.2474     0          0
+\end{verbatim}
+
+We will later use the TPM for graphs and the counts for statistics. However, you may have noticed that
+we've so far ignored five of our six read files. You can save all the commands individually to a file for
+each of the other samples and run it.
+
+\begin{lstlisting}
+#!/bin/bash
+# trimmomatic
+
+java -jar $HOME/sw/Trimmomatic-0.39/trimmomatic-0.39.jar SE \
+assays/Bernsdorff2016_Illumina/dataset/Col0treatment2.fastq.gz \
+runs/trimmed_fastq/Col0treatment2.trimmed.fastq \
+ILLUMINACLIP:$HOME/sw/Trimmomatic-0.39/adapters/TruSeq3-SE.fa:2:30:10 \
+MAXINFO:50:0.8 MINLEN:36
+
+java -jar $HOME/sw/Trimmomatic-0.39/trimmomatic-0.39.jar SE \
+assays/Bernsdorff2016_Illumina/dataset/Col0treatment2.fastq.gz \
+runs/trimmed_fastq/Col0treatment3.trimmed.fastq \
+ILLUMINACLIP:$HOME/sw/Trimmomatic-0.39/adapters/TruSeq3-SE.fa:2:30:10 \
+MAXINFO:50:0.8 MINLEN:36
+
+# ... and so on 
+# kallisto
+kallisto quant -i runs/kallisto_index/Ath_kallisto_index \
+ -o runs/kallisto_results/treatment2 --single \
+ -b 30 -l 190 -s 20 runs/trimmed_fastq/Col0treatment2.trimmed.fastq
+
+kallisto quant -i runs/kallisto_index/Ath_kallisto_index \ 
+ -o runs/kallisto_results/treatment3 --single \
+ -b 30 -l 190 -s 20 runs/trimmed_fastq/Col0treatment3.trimmed.fastq
+# ... and so on.
+\end{lstlisting}
+
+However, if you recall loops from the bash introduction, we can use loops, which will almost certainly be easier 
+in the long run. Particularly if you
+have to change a parameter later. 
+Here is a very simple loop to catch all five of the other samples up with treatment1
+
+\lstset{language=bash, style=bashstyle}
+\lstinputlisting{../workflows/kallisto_loop.sh}
+
+Ideally, you will save the loop to a file, this is extremely helpful when it comes to repeating your work,
+taking on the next study, or writing the methods section.
+
+Now we have six directories with six abundance files. We'll put these together into organized tables later,
+but first lets take a look at some of the other options for when k-mer counts is not applicable.
+
+\subsubsection{Aligning reads to the genome}
+This is a very robust method, even in combination with a good reference it can clean up the results
+compared to using a primary transcriptome. For instance, it allows inclusion of splice variants without
+causing problems with ambiguous alignments for reads that map to shared exons. Also, in reality most
+RNAseq will have a lot of reads that properly map to intergenic regions, be they from DNA
+contamination, unannotated genes, rRNA, or less-than perfect transcriptional regulation. That and reads
+from introns that have not yet been spliced. Including the genomic background in the reference allows
+reads to find their best mapping and doesn't bias these towards the next-closest gene. This all comes at a
+cost of run time, but in the worst case it comes down to the computers time vs your time trying to interpret
+the data later.
+
+Recent developments in indexing have also greatly sped up the alignment stages. In particular Hisat2
+makes a 10-fold gain on speed, while producing highly similar results to its predecessor, TopHat.
+
+In contrast to k-mer counting with Kallisto, alignment and quantification is a two step process, so we'll
+first align reads with Hisat2, and then count the reads uniquely mapping to each locus with HTSeq.
+
+\begin{lstlisting}
+# make the index
+mkdir runs/hisat_index/
+
+hisat2-build studies/AthalianaReferences/resources/Athaliana_167_TAIR9.fa \
+ runs/hisat_index/Ath_hisat2_index
+
+# align the reads
+mkdir runs/hisat_results/
+
+hisat2 -x runs/hisat_index/Ath_hisat2_index \
+  -U runs/trimmed_fastq/Col0treatment1.trimmed.fastq \
+  -S runs/hisat_results/treatment1.hisat.sam
+ 
+# quantify the alignments
+htseq-count runs/hisat_results/treatment1.hisat.sam \
+studies/AthalianaReferences/resources/Athaliana_167_TAIR10.gene_exons.gtf \
+-s no > runs/hisat_results/treatment1.hisat.htseq
+
+ # check output
+less runs/hisat_results/treatment1.hisat.htseq
+\end{lstlisting}
+
+For information on the various parameters, check the help function of the individual programs.
+
+If you wanted to use this method, you could run a script with a loop in a very similar way to what we did
+with Kallisto.
+
+
+\subsubsection{Aligning reads to the transcriptome of a related species}
+Sometimes your species of interest hasn't been sequenced, but one that is phylogenetically close has.
+Because \emph{de novo} transcriptome assemblies can be very hard to work with, cross species alignment can be
+a very appealing option. Bear in mind, you won't get the same resolution between paralogs that you would
+have mapping to the actual genome, and the alignment will take much longer.
+
+We will demo BLAT (BLAST-like Alignment Tool), which falls in between BLAST and a short-read
+aligner like Bowtie2 on the sensitivity vs speed scale.
+
+\begin{lstlisting}
+# Still, on these computers we'll be using a smaller read set
+# (minidata.fastq), and we'll have to convert it to fasta
+mkdir runs/miniexample/
+
+awk 'NR % 4 == 1 {print ">" $0 } NR % 4 == 2 {print $0}' \
+studies/BLATexample/resources/minidata.fastq > runs/miniexample/minidata.fasta
+\end{lstlisting}
+
+You might be wondering what \il{awk} is and why that command looked so much more 
+\emph{complicated} and less \emph{human readable} than most?
+Basically, \il{awk} makes it easy to perform simple, custom, text manipulations.
+It's very useful when what you want to do is a bit too complicated for
+some combination of what we've seen before such as find (e.g. \il{grep}) 
+or find and replace (e.g. \il{sed}), but is not yet complicated enough
+that one wants to write a python script. 
+
+As a brief explanation, the command above checks the remainder when the
+number of rows (NR) is divided by $4$ in order to to determine if it's on an ID or 
+sequence row in a fastq file. It appends '>' to the ID and just prints
+the whole sequence, producing fasta format.
+
+A proper introduction to \il{awk} is more than we have time for here, but
+there's a lot of nice resources available on-line, e.g. 
+\url{https://www.tutorialspoint.com/awk/}. In any case, you can use the 
+commands here, or simply search for and use \il{awk} code for what you want to 
+accomplish, without already having mastered \il{awk}.
+
+\begin{lstlisting}
+## now we set up a BLAT database (not required, but faster)
+# we will map reads to Brassica rapa transcripts
+mkdir runs/blat_db/
+
+faToTwoBit studies/BrapaReferences/resources/Brapa_primaryTranscripts.fa \
+  runs/blat_db/Brapa.2bit
+## run blat (with 6-frame translation)
+mkdir runs/blat_results/
+
+blat runs/blat_db/Brapa.2bit runs/miniexample/minidata.fasta -out=blast8 \
+-q=dnax -t=dnax runs/blat_results/treatment1-Brapa.tsv
+## count up the results
+# count_blat.py is a mini-script to count the 
+# best blat/blast hits to each target sequence.
+workflows/count_blat.py -i runs/blat_results/treatment1-Brapa.tsv > \
+  runs/blat_results/treatment1-Brapa.counts
+# check output
+less runs/blat_results/treatment1-Brapa.counts
+\end{lstlisting}
+
+% \subsubsection{\emph{de novo} transcriptome assembly}
+% \fbox{\begin{minipage}{45em}
+
+% This section is entirely optional. \\
+
+% Honestly, the end of \emph{de novo} assembly of short read data is in 
+% sight, as long read sequencing makes the assembly puzzle fundamentally 
+% both easier and more possible to solve. If you do not already have the
+% data, \textbf{do not plan on a \emph{de novo} assembly of Illumina or other
+% short read data}, not as a first choice, particularly not for RNAseq. \\
+
+% That said, sometimes the data is already there, and at the very least
+% preliminary results are required before more funding for long read
+% sequencing can be found. \\
+
+% So we aren't deleting this section yet.
+% \end{minipage}}
+
+
+% If you have the first sequencing data for a species, and cross species alignment isn't sufficient you can try
+% \emph{de novo} assembly. This makes sense when, for instance, you are interested in novel genes for your species,
+% when the next closest species is dozens of millions of years of evolution away, or when you are interested
+% in the sequences themselves for anything from primer design to evolutionary analyses).
+
+% The leading current short read \emph{de novo} assembly options for transcriptomes (and genomes) work with a de Bruijn
+% graph. That is, they have nodes, or k-mers, with the sequence data. and they have edges connecting them.
+% This allows for a much smaller way to look for shared regions between reads than say all-on-all
+% comparisons. But it still needs a lot of RAM and makes sub-optimal use of the read information.
+% Therefore, make sure to run a thorough quality control once you are done, that all key genes of interest are
+% present, that you have a sufficient number of genes that are full length, and that you filter out contigs
+% coming from, for instance, biotrophic fungi or insects.
+
+% \begin{lstlisting}
+% # We'll run a de novo assembly of the subsetted data with Trinity
+% Trinity --seqType fq --single minidata.fastq --max_memory 1G
+% less trinity_out_dir/Trinity.fasta
+% # early quality control
+% ./count_fasta.pl trinity_out_dir/Trinity.fasta
+% \end{lstlisting}
+
+% \textbf{If you are planning to run a de novo assembly you should definitely use paired-end, stranded
+% sequencing.} This gives you a lot more information to work with. If you had paired-end, stranded Illumina
+% data you would run Trinity with something like the following parameters.
+
+
+% \begin{lstlisting}
+% # --left my_reads_1.fq --right my_reads_2.fq 
+% # --seqType fq --SS_lib_type RF
+% \end{lstlisting}
+
+% \begin{lstlisting}
+
+% \end{lstlisting}
+
diff --git a/_reader/sections/03_biological_data_extraction.tex b/_reader/sections/03_biological_data_extraction.tex
new file mode 100644
index 0000000000000000000000000000000000000000..c7a25b4b953b4c5cdb44a807e3c153250988607e
--- /dev/null
+++ b/_reader/sections/03_biological_data_extraction.tex
@@ -0,0 +1,233 @@
+\section{Biological data extraction}
+\subsection{Rstudio via second Docker file}
+Please see sub section "Rstudio via second Docker file" from "Basics" 
+on the first day to restart working environment.
+
+\subsection{The Plan}
+Now, we use R in earnest. Rstudio will let us see what is happening immediately when we do it, so it is a
+good environment for beginners.
+
+\fbox{\begin{minipage}{45em}
+Note: To allow for the level of customization that is often necessary for each project, this 
+section is a little lower-level and relies less on pre-made tools than before. In as much
+there is a lot of R code, and retyping it would be rough on the schedule, so
+we'll be using a different paradigm than we did with the bash scripts. You don't have to retype anything,
+rather you're encouraged to copy from the .pdf. It is then your responsibility to go through it
+slowly and make sure you understand what you are doing. We would recommend, for every section to
+make sure you have understood it, that you vary something(s). Look at a different column, change the
+plot color, etc... and then save your changes into a script. 
+
+If for some reason we haven't given you the .pdf yet, please notify us.
+\end{minipage}}
+
+ We begin by getting our data from the kallisto results into a data.frame in
+R. Open a GUI which allows you to see the data structure – all results are in folders with different names,
+but the files themselves all have the same name.
+
+
+\subsection{Import}
+\lstset{language=R, style=Rstyle}
+
+\lstinputlisting{../workflows/kallisto_data_import.R}
+The results from both methods are identical. This first step by step method is more to write, but safe for
+beginners since you can change the names by hand. The loop is more challenging to produce but certainly
+faster to write. We will use the laborious way for the rest of the training course.
+
+\subsection{Basic data.frame / matrix calculations}
+
+Before we do high level analyses, let's calculate means for the replicates and fold-changes.
+
+\lstinputlisting{../workflows/kallisto_data_frames.R}
+
+\subsection{PCA and HCL}
+Before we start, let's think about what to expect. We have an experiment with a single factor, the
+treatment. So when we look at how the samples group together, how many groups with information do we
+expect? Yes, right, only one. So for our principal component analysis we expect that we have one
+dimension reflecting the treatment variation. As we have no other variables, the other dimensions of the
+PCA should show "noise" or biological variation due to random variables outside of our control. Ideally,
+we want to see the treatment variation in the first dimension (meaning most of the variation is due to
+treatment) and the noise based variation should occupy the lower dimensions.
+Now we start with our first analysis. We ask whether the replicates are more similar to each other than the
+samples. To that we use principal component analysis and hierarchical clustering of the samples.
+
+\lstinputlisting{../workflows/clustering_PCA_HCL.R}
+
+\subsection{Differential Expression}
+One of the most typical questions for an RNA-seq analysis is what is different between two samples? To
+answer that question we use sleuth. There is some debate about which tool is the best to detect differential
+expression. Here we will use Sleuth, while other possibilities are deseq2, edgeR, and cufflinks. 
+Most meta analysis papers agree on cufflinks
+being less suitable. DESeq2 and edgeR are both well established and work well with counts
+per gene and assuming a negative binomial distribution, but appear to be less appropriate for the
+'estimated counts' produced by Kallisto. Sleuth is developed by the same people as kallisto and is 
+supposedly most suitable for use with the results thereof. 
+Note: In order to use sleuth, kallisto has to be run with the bootstrapping option. 
+
+The following has been adapted from this on-line tutorial:
+\url{https://pachterlab.github.io/sleuth_walkthroughs/trapnell/analysis.html}
+
+
+\lstinputlisting{../workflows/sleuth_differential_expression.R}
+
+\subsection{GO term enrichment}
+We have about 1k significantly changed genes, too many to go through by hand quickly. We now check
+if some functional categories are overrepresented. The idea behind this is as follows:
+we have 1k genes changed of 28k genes in the genome, or 3.5\%. When we now look through the
+categories, they should all have 3.5\% of their members changed if the changes were random. The
+statistics check whether the difference from 3.5\% we see is significant. To do so, we use the package
+TopGO.
+
+This analysis looks at the significantly different genes.
+
+The biggest impediment to the use of TopGO will be to have the gene to GO term assignment in the way
+the package wants it. To this end, look at the file Athid2go.map in the folder. This is one way the package
+accepts. If you have a different species and a different way of GO term annotation, you need to reformat
+the file to match this one (can be done in R, but will be different for each case).
+
+\lstinputlisting{../workflows/functional_GO.R}
+GO term lists are notoriously difficult to show in nice figures and also difficult to interpret.
+Generally, you want many similar terms up to be convinced something is real. Revigo (on-line tool) can
+help you to summarize the GO terms but essentially you need your biologist knowledge to understand
+them.
+
+We chose a rather simple example. Among the terms enriched in the upregulated genes you can
+always observe the words defense and immune. You have to know that systemic acquired resistance also
+refers to the immune system, as does response to biotic stimulus. Clearly, the treatment induces the
+defense response.
+
+Among the enriched terms in down-regulated genes appears photosynthesis, which occurs in the
+plastid, and involves glyceraldehyde-3-phosphate, pigments and NADP. So clearly, although the names
+are different, what they describe is similar. There is no way but simply knowing this. There are thousands
+of GO terms described. The figures which were also produced are sometimes helpful. The GO terms
+connected by lines are the ones defined as in parent:daughter relationships.
+
+\subsection{MapMan}
+MapMan is an alternative way of looking at the overall patterns. Usually, you load all genes with
+their fold-changes, not just the significant ones. In a perfect world, MapMan and GO should give you
+similar results. One caveat is that the annotation behind GO and MapMan was made by different people
+and therefore different areas of plant biology are covered differently.
+
+Before starting with MapMan we will be making use of the size of the class to install MapMan in
+parallel on all course computers. MapMan's provided install script can be found both on sciebo, 
+and in the Raumlaufwerke folder. Run
+the following with java, and follow the instructions of the installation wizard.
+
+\lstset{language=bash, style=bashstyle}
+
+\begin{lstlisting}
+java -jar </path/to/>/MapManInst-3_6_0RC1.jar
+\end{lstlisting}
+We only need to export our fold-change table from R. Again, we have to make sure that our gene
+identifiers match the ones used in the MapMan tool.
+
+Open MapMan, click on mappings and look at the Ath\_AGI mapping. Click on until you see AGI
+codes and observe how they are formatted--again, no transcript suffix, only the locus.
+Back to R.
+\lstset{language=R, style=Rstyle}
+
+\lstinputlisting{../workflows/functional_mapman.R}
+
+Now back to MapMan. Right click on Experiments, choose add data. Select your exported table
+named forMapmanloading.txt. Now you can choose if there is a header (yes!), whether you have decimal
+point or decimal comma, and so on and so forth. The default works for us. Click okay
+
+Now click on Pathways, on Overview and choose Metabolism\_overview by double click.
+You will have to choose a mapping; we use Ath\_AGI. Click okay.
+
+If nothing is displayed, click on your file shown in Experiments. I always change the color code to
++red and-blue and the scale to whatever fits my dataset (for this one I'd use 3).
+
+Clearly, photosynthesis, the CBB cycle and photorespiration are down. If you would like to test
+that statistically, Mapman does that for you. Look at the stuff below the figure and pull it up. You will see
+the Wilcoxon Rank Sum Test scores. Correct the p-values for multiple hypothesis testing and sort the table
+by probability. Some of the significantly changed pathways we can see, most are not displayed. Look at
+the other options for visualization to see, if anyone fits your needs.
+
+You can make your own pathway figures and map the MapMan bins onto them. Refer to the
+MapMan manual for that!
+
+\subsection{Adding information from public data sources}
+Combining information from multiple sources is often helpful, or even necessary to fully
+understand one's results. This becomes harder to write a standardized protocol for, however, as one
+researcher might simply be interested in including the TAIR annotation with their "mother table" for ease
+of looking through the gene descriptions, and another might be interested in knowing whether there's a
+significant overlap between their study and a particular paper. That said, there is frequently a similar line
+of attack. First, get the desired comparative data in an organized format such as a list of gene IDs or a
+table (csv/tsv/xls) with gene IDs and associated values. Second, make sure the gene IDs are comparable
+(are they from the same genome/annotation release, do you have transcript IDs when you wanted gene
+IDs, are they both upper/lower case). Third, combine data with yours (e.g. merge). Finally, in many cases
+you will visually and statistically evaluate whether there is a overlap or correlation between the studies.
+
+\subsubsection{Including gene descriptions from TAIR}
+First step first, we need to get the descriptions, and not by copying them one by one from the
+website. Most biological databases have a bulk download page if you look, and TAIR is on the easy side.
+From \url{https://arabidopsis.org} you simply need to go to Download:Genes, select the annotation (TAIR10), and you
+already have a list of what is available for download. We'll take "TAIR10\_functional\_descriptions", which
+is a tab-separated text file. Save, copy or move this file to studies/AthalianaReferences/resources for ease of access. The
+ID format matches, so adding the descriptions to our major data.frame in R is very simple.
+
+\lstinputlisting{../workflows/additional_TAIR.R}
+
+\subsubsection{Including MapMan annotations}
+% TODO, honestly we should move this to mapman4
+Frequently you want to have more flexibility working with a dataset than is possible with the provided
+GUI programs. We'll now import the MapMan annotations into R, and look briefly at what else one could
+do with them. For a handful of plant species, MapMan annotations are provided in the 'store' at
+\url{https://mapman.gabipd.org/web/guest/mapmanstore}. 
+More commonly, they can be created for a species
+using the Mercator webtool 
+\url{http://www.plabipd.de/portal/web/guest/mercator-sequence-annotation}
+Assigning
+MapMan annotations with Mercator normally runs in less than 15min and avoids any annotation version
+trouble. For today, the results are provided in the file mapmanTAIR10.txt.
+
+\lstinputlisting{../workflows/additional_MapMan.R}
+You can, of course, automate a Fisher's exact test or Wilcoxon test for every category in R, visualize many
+samples at once by different bin levels, or whatever suits your purposes.
+
+\subsubsection{Special list - from a review paper}
+Frequently, whole genome annotations have not been updated with the very latest information or
+you are interested in a more obscure list, so you want to compare data with another paper directly. This
+often makes the first step of getting the data in an organized table harder. Let's look at the review paper
+\url{https://doi.org/10.1007/s11427-016-0048-4}
+``Diverse roles of SERK family genes in
+plant growth, development and defense response". In table 1, it has a list of SERK interacting genes,
+which we might be interested in, but they are only available in pdf format. There are many ways to
+approach this. You can for instance copy the whole table, and clean up any formatting issues by hand.
+Often you can get the formatting to work by pasting it into either a spreadsheet program or a text file.
+However, as we've worked with \emph{A. thaliana} a lot before, we already had a handy script to find all of the
+AGIs out of a text file and will use this.
+
+First, copy all of table 1 (or even the whole paper) into a text file and save it as serk\_interacting.txt
+
+\lstset{language=bash, style=bashstyle}
+
+\begin{lstlisting}
+# now run the following in bash
+workflows/agi_finder.py -i studies/Fan2016_SERK_review/resources/serk_interacting.txt \
+  > studies/Fan2016_SERK_review/resources/serk_interacting.agi
+less studies/Fan2016_SERK_review/resources/serk_interacting.agi
+\end{lstlisting}
+
+For today, we only needed the AGIs and not the additional information, so this will do.
+
+\lstset{language=R, style=Rstyle}
+\lstinputlisting{../workflows/additional_SERK.R}
+
+OK, we'll never be able to cover all possible comparative data and data formats you might want to
+look at, but hopefully you have some ideas on where to start now. Take a look around, if you are ahead of
+your neighbors this would be a great time to try importing contextual data you would actually be
+interested in for your studies.
+
+\subsection{Further Clustering}
+\subsubsection{K-means}
+\lstinputlisting{../workflows/clustering_kmeans.R}
+\subsubsection{Heatmaps - gene set}
+We've seen heatmaps at a large scale when displaying the hierarchical clustering above.
+Sometimes however, they are useful just for looking at dozens of genes at once. Let's look at the MapMan
+category, stress.biotic.signalling as an example.
+\lstinputlisting{../workflows/miniheatmaps.R}
+
+\begin{lstlisting}
+\end{lstlisting}
+
diff --git a/_reader/sections/04_section_longreads.tex b/_reader/sections/04_section_longreads.tex
new file mode 100644
index 0000000000000000000000000000000000000000..7fd0c0ad786ded8c7a913b42dc3d7048f2ee5bbe
--- /dev/null
+++ b/_reader/sections/04_section_longreads.tex
@@ -0,0 +1,1137 @@
+\section{Long Read Sequencing}
+
+\fbox{\begin{minipage}{45em}
+Important qualifier: while we've worked a lot with RNAseq, we've so far only
+had our own projects working with long reads of \emph{DNA} (and more Nanopore than PacBio).
+So writing this section was learning by doing for us, too. We'd rather look 
+forward than look back, but please take everything with a grain of salt / don't
+think that the following represents stable repeatedly-tested pipelines.
+\end{minipage}}
+
+
+Recent advances in long read sequencing are making drastic changes to what
+is possible in terms of RNA sequencing. While we've covered the technologies
+more thoroughly in a lecture, here is a drastic over simplification.
+
+Pacific Bioscience's Iso-Seq provides a way to repeatedly sequence potentially full length
+cDNA of transcripts and their reverse complement, and the sequencing will proceed
+in a circle until the polymerase falls off. Thus the preliminary base called
+read produced requires more processing then some basic trimming, but can have very
+high quality after consensus calling.
+
+Oxford Nanopore Technology's direct RNAseq measures the current
+across a membrane while the RNA molecule itself (and in some cases
+it's single cDNA strand) proceeds through a pore across the membrane. While the
+error rate is high, the output appears better than PacBio, and there are some
+exciting possibilities in detecting RNA modifications.
+
+\subsection{The Plan}
+
+\begin{wrapfigure}{r}{0.5\textwidth}
+\includegraphics[width=0.5\textwidth]{longread_workflow.pdf}
+\end{wrapfigure}
+
+While the throughput of these technologies is improving, we would argue 
+that as the sheer read count is \emph{currently} (at least) an order of magnitude lower
+than for Illumina, if the goal is simply quantification, use Illumina. That 
+said, if the goal is establishing coding gene models (with or without a reference)
+or something more specific, like quantifying alternative splicing, the long read
+technologies can be extremely helpful. Here we'll look at three options for reconstructing
+coding gene models with example Iso-Seq data. As with the Illumina RNAseq analyses
+exactly which analysis one chooses will depend upon the exact data one has and
+what sort of reference is available.
+
+In the left-most option, we'll map the preprocessed reads directly to a reference,
+collapse them to transcripts, and use the transcripts to support gene calling.
+This is also the most similar to a tractable approach for ONT direct RNAseq data.
+In the middle option, we will cluster the preprocessed reads into draft transcripts
+and then map these to the reference genome and perform gene calling as in the first option.
+In the right-most option, we will look at a pipeline to make gene models without any 
+reference or with a very poor reference.
+
+\subsection{Resources}
+Many of the commands are based on the following resources, and even if I were to
+repeat this on similar data in just three months I would check back here to see if
+anything major has changed.
+
+\textbf{The main Iso-Seq pipeline (trimming-polishing):}
+
+Main organizational page:
+
+\url{https://github.com/PacificBiosciences/IsoSeq_SA3nUP/wiki}
+
+In particular:
+
+\href{https://github.com/PacificBiosciences/isoseq3}{isoseq3}
+
+\href{https://github.com/PacificBiosciences/IsoSeq_SA3nUP/wiki/Tutorial:-Installing-and-Running-Iso-Seq-3-using-Conda}
+{Tutorial:-Installing-and-Running-Iso-Seq-3-using-Conda}
+
+\href{https://github.com/Magdoll/cDNA_Cupcake/wiki/Cupcake-ToFU:-supporting-scripts-for-Iso-Seq-after-clustering-step}
+{Cupcake ToFU: supporting scripts for Iso Seq after clustering step}
+
+The \emph{de novo} options:
+
+\href{https://github.com/Magdoll/Cogent/wiki/Running-Cogent}{Running-Cogent}
+
+\href{https://github.com/PacificBiosciences/ANGEL}{Angel}
+
+\subsection{Data description}
+
+You can find the data for this section here:
+\lstset{language=bash, style=bashstyle}
+
+% TODO double-check directory
+\begin{lstlisting}
+ls assays/Zhu2017_IsoSeq/dataset
+\end{lstlisting}
+
+
+This is a fairly early Iso-Seq dataset designed to investigate any role of alternative
+splicing in the abscisic acid response in Arabidopsis seedlings 
+(\url{https://www.ncbi.nlm.nih.gov/pubmed/28407323}). This study also relied heavily
+on Illumina data, which we won't be using here.
+Just a brief caveat that as Iso-Seq develops (and becomes cheaper) much deeper sequencing and more replicates
+will almost certainly be expected than is seen here. But it's plenty for the workshop;
+we'll even only be using one of their two conditions.
+
+While the raw .hd5 files were downloaded from SRA, we've done some of the initial
+pre-processing for you a) for the sake of time and b) because the early steps
+in particular are very specific to a given technology and can often be provided by a sequencing
+center.
+
+Still, it looked like this:
+
+\lstset{language=bash, style=bashstyle, frame=trbl, rulecolor=\color{red}}
+
+% TODO raw/m16*bax.h5 missing
+\begin{lstlisting}
+# convert from Hierarchical Data Format files to bam
+# (sequel anyways directly outputs bam for newer data)
+bax2bam raw/m16*bax.h5
+# organize things by moving output into it's own directory
+mkdir -p runs/isoseq/subreads runs/isoseq/ccs
+mv m16* subreads/
+# take the raw subreads and convert them to the 'circular consensus sequence'
+bam=m161031_124550_42199_c100941222550000001823217706101620_s1_X0.subreads.bam
+ccs --numThreads=4 --noPolish --minLength=50 \
+    --maxLength=15000 --minPasses=1 --minPredictedAccuracy=0.8  \
+    --reportFile=runs/isoseq/ccs/keep.ccs.report --minSnr=3.75 --minZScore=-999 \
+    --maxDropFraction=0.8 runs/isoseq/subreads/$bam runs/isoseq/ccs/m16.ccs.bam
+# we also truncated the output file name here for the workshop, although
+# we'd generally encourage you too keep the naming as consistent as possible
+\end{lstlisting}
+
+\subsection{First Look}
+
+So first thing first, take a look at the data.
+
+\lstset{language=bash, style=bashstyle}
+
+%TODO add `cd` or rephrase
+\begin{lstlisting}
+# check that you're in the IsoSeqData directory and
+ls -R runs/isoseq/
+# OK, this is new, as you can see there are neither fasta nor fastq sequencing files
+# Instead the .bam files are the main sequence containers, 
+# we've seen .sam/.bam files before for an _alignment_, hmmm...
+# further, the following won't get us very far
+less runs/isoseq/subreads/m16*.subreads.bam
+# note that throughout this whole section using the full file name instead
+# of m16* is always OK. Just hit tab.
+\end{lstlisting}
+
+So what's going on here? While they aren't aligned to a \emph{reference}, 
+one of PacBio's raw reads is well represented aligned back to \emph{itself}
+as the read is basically going in circles. The circular consensus sequence
+is no longer aligned to anything, but it avoids a trivial format conversion.
+It also allows a little more flexibility for PacBio to sneak some additional
+data in.
+
+We're going to need some ways to view .bam files now. They have a text-based
+(and much more hard-drive-hungry) sister format, .sam. Luckily interconversion
+is easy.
+
+\begin{lstlisting}
+# convert to the text format, and pipe the first few lines to a file
+# so we can look at them in more detail 
+samtools view runs/isoseq/subreads/m16*.bam|head > runs/isoseq/subreads/head_subreads.sam
+samtools view runs/isoseq/ccs/m16*.bam|head > runs/isoseq/ccs/head_ccs.sam
+\end{lstlisting}
+
+Now take a look at the files with either \il{less} or \il{gedit}.
+We'd also encourage you to use \il{cut -f N <file>} to look just at a
+specific column (N) when appropriate.
+There are a few things to notice.
+\begin{itemize}
+\item The basic format: google 'sam format' or checkout
+\url{https://samtools.github.io/hts-specs/SAMv1.pdf} for a full description,
+but the basic idea of the format is that it has different columns for: the sequence,
+its quality information, where and how it maps to a reference (or not),
+and some.
+\item OK, it's not quite normal .bam/.sam format, PacBio has expanded it some,
+if you're curious see: \url{https://pacbiofileformats.readthedocs.io/en/3.0/BAM.html}
+\item Let's look at the sequence itself (column 10), note the large stretches
+of AAAA's and TTTT's, these are your poly-A tail being read as the circle 
+turns round. 
+\item Judging just off of the poly-A tail, do you notice the
+quality difference between the subreads and the ccs?
+\item Look at the sequence identifiers in column 1. For the subreads you see 
+that after taking the top few lines with \il{head} 
+we only actually see sub reads from one read (.../8/) 
+while sub-coordinates are given in the final field. In contrast we see 
+different reads in the ccs and the final field just says 'ccs'.
+\item Can you find the primer sequences (see \il{clonetech_SMARTer.fa})
+in the reads? Hint: you can orient where to look with the poly-A tail.
+\end{itemize}
+
+Unfortunately there really is currently no equivalent of FastQC or any at-the-start
+quality control for PacBio data. The quality can be estimated by mapping
+to a reference, but we're getting there in good time anyways; so for now 
+just try and get a feel for the sort of sequence we're looking at. The subreads
+go in circles, the ccs is the consensus of the subreads from individual full-reads.
+Both still have adapters and poly-A tails. 
+
+\subsection{Trimming, Clustering and Polishing}
+\subsubsection{Leveling up relative to day one}
+On the first day with the Illumina RNAseq data, we always just put results
+in our main directory, and it was pretty full. While this made things a 
+bit easier at the beginning, it becomes very confusing as a project grows. 
+Both because today's
+work has the tendency to produce multiple and complicated output files
+and because it is generally better form, we will try and use sub directories
+to structure the project a bit more.
+
+Similarly, on the first day we wrote out, at least for the Kallisto pipeline,
+almost every parameter. Today we'll just take the basic explanations from each
+tool's usage function, and only stop to explain parameters that are particularly
+confusing or where we've deviated from the 'standard analysis' pipeline. But
+please feel free to ask questions (or point out anything confusing)!
+
+\subsubsection{Adapter trimming}
+
+We'll trim off the adapters with \il{lima}. If the run contained multiple
+multiplexed libraries, we could also perform demultiplexing with \il{lima}.
+
+\begin{lstlisting}
+mkdir -p runs/isoseq/trimmed/
+lima runs/isoseq/ccs/m16.ccs.bam workflows/pacbiosciences_lima/clonetech_SMARTer.fa \ 
+  runs/isoseq/trimmed/m16.ccs.bam --no-pbi --isoseq
+# The first three arguments are obvious from the usage
+# Usage: lima [options] INPUT BARCODES OUTPUT
+# --no-pbi since we won't need the .pbi output
+# --isoseq is necessary to tell it what to expect for the primers
+
+# let's check the output
+ls runs/isoseq/trimmed/
+# in particular, the summary file might be helpful
+less runs/isoseq/trimmed/m16.ccs.lima.summary
+# In addition to removing adapters
+# we see lima removed any reads with mismatched adpaters
+# and now let's convert some of the .bam output to .sam as above
+samtools view runs/isoseq/trimmed/m16.ccs.primer_5p--primer_3p.bam | head \
+  > runs/isoseq/trimmed/head_trimmed.sam
+less runs/isoseq/trimmed/head_trimmed.sam
+# notice that _most_ of the adapters are now gone and that
+# _most_ of the poly-A tails are at the end of the sequence now
+\end{lstlisting}
+
+\subsubsection{Clustering}
+So at this point we have some reads that are \emph{in some ways} similar
+to that which we had after running Trimmomatic on the Illumina data. But 
+not quite. We just saw the remaining poly-As and adapters that were in the
+middle of some reads, which may be caused by concatemers. Further, 
+as the reads have a poly-A in them, we essentially have strand information. 
+
+The next step, \il{isoseq3 cluster}, cleans up any concatemers and orients 
+reads while clustering different reads that \emph{appear to} represent the same transcript.
+
+Note that the primary goal of this step is the reconstruction of the 
+gene models, which should be done with all data, not sample by sample.
+So if we were using more samples, we'd want to run \il{dataset create}
+first, which would link files in a way that they could all be processed together. 
+
+\begin{lstlisting}
+mkdir -p runs/isoseq/unpolished/
+# from the help function:
+# isoseq3 cluster [options] input output
+isoseq3 cluster runs/isoseq/trimmed/m16.ccs.primer_5p--primer_3p.bam \
+  runs/isoseq/unpolished/m16.unpolished.bam --verbose --require-polya
+# again, we have a lot of output
+ls -sh runs/isoseq/unpolished/
+# there's two files we really care about, and both are .bam
+# *.unpolished.flnc.bam contains the fully cleaned, AKA
+# "full-length non-chimeric" reads. Finally.
+# *.unpolished.bam has draft reconstructed transcripts
+
+# we can also get a report on how many flnc reads were used
+# for each draft transcript
+isoseq3 summarize runs/isoseq/unpolished/m16.unpolished.bam \ 
+  runs/isoseq/unpolished/m16.summary.csv
+less runs/isoseq/unpolished/m16.summary.csv
+\end{lstlisting}
+
+After this, the pipelines we're looking at today start to diverge,
+with the first "ref based, mapped reads" option continuing with
+the flnc reads and the others continuing to polishing, below.
+
+\paragraph{Optional challenge: }
+We got to this step, compared it to the reported results
+from the paper to see if the basics (e.g. number of transcripts produced)
+seemed similar, and freaked out a bit.
+
+\begin{lstlisting}
+# how many draft reconstructed transcripts did we get?
+samtools view runs/isoseq/unpolished/m16.unpolished.bam|wc --lines
+\end{lstlisting}
+
+Can you see why we were worried? Can you also figure out why we decided 
+the difference was nothing to worry about, just a reason to read papers with care?
+
+\subsubsection{Polishing}
+In terms of quantity, each draft transcript was constructed
+from an average of around 10 full-length non-chimeric reads, which sounds
+OK at first. but if you glanced at \il{runs/isoseq/unpolished/m16.summary.csv} 
+or maybe even made a histogram,
+you know that they are definitely not evenly distributed.
+Indeed, well over half are constructed from just 2-3 flnc reads. 
+But at the very start of this pipeline we had a lot more 
+data, right? The subreads. 
+
+The next step is to take the draft `unpolished' transcripts and carefully polish
+them with all that subread data we had at the very very start. This will allow
+any reads that were dropped (e.g. because they were not full length) to still
+contribute to the quality of the final sequences, and further allow for a 
+high quality ccs read from 10+ subreads to essentially be trusted more than
+a ccs read with just a single subread.
+
+%%%%%%%%%
+%TODO : running into error: Missing .pbi
+
+\begin{lstlisting}
+mkdir -p runs/isoseq/polished/
+# isoseq3 polish -h  # uncomment for explanation
+isoseq3 polish runs/isoseq/unpolished/m16.unpolished.bam \ 
+  runs/isoseq/subreads/m16*.subreads.bam \
+  runs/isoseq/polished/m16.polished.bam
+\end{lstlisting}
+
+OK, this step is going to take a long time (maybe an hour?). It's working with 5GB of reads after all
+But we actually want to be able to compare the pipelines in the end, so we didn't 
+want to unnecessarily subset this data. 
+
+In the mean time we can continue with the "reference based, mapped reads" /"left most" 
+option.
+
+%%%%%%%%%
+%TODO : adapt back-up routine
+% Maybe add a back-up folder to runs?
+
+If something goes wrong or if you get through the "Mapping FLNC reads" part 
+before it finishes. There is a copy of the expected output at \il{runs/_BackUpData/isoseq/polished/}
+Feel free to copy the data from there if need be.
+\lstset{language=bash, style=bashstyle, frame=trbl, rulecolor=\color{red}}
+
+\begin{lstlisting}
+cp runs/_BackUpData/isoseq/polished/* runs/isoseq/polished/
+\end{lstlisting} 
+% printf "\nruns/isoseq/polished/\n" >> .gitignore
+
+\subsection{Coding Genome Definition}
+\subsubsection{Reference Based}
+\paragraph{Mapping FLNC reads.}
+
+While polishing is running happily, we're going to check out what happens
+when we take the cleaned up "full-length non-chimeric consensus reads" and
+map them to the genome directly. This method will depend upon the genome
+to help clean up any errors in the reads, but let's face it, \emph{sometimes}
+you probably should trust the genome over the new transcriptome. Not always of course,
+but sometimes, often even.
+
+GMAP is a pretty fast way to map full transcripts or similar back to the genome.
+But like most aligners, it expects .fasta or .fastq files.
+
+Open a new terminal (Ctr+Alt+T) so you can work while polishing runs. 
+
+\lstset{language=bash, style=bashstyle}
+\begin{lstlisting}
+# activate the virtual environment in the new terminal
+source $HOME/Documents/venv/bin/activate 
+\end{lstlisting}
+
+Now let's take a look at the `flnc' reads
+
+\begin{lstlisting}
+# we have a .bam file for the flnc reads
+samtools view runs/isoseq/unpolished/m16.unpolished.flnc.bam | \
+  head > runs/isoseq/unpolished/head_flnc.sam
+less runs/isoseq/unpolished/head_flnc.sam
+# two things:
+# First, we finally have some cleaned-up mappable looking reads
+# Second, the data from a fasta file is just a subset of what we see
+# the read name (column 1), and the sequence (column 10)
+# we're just going to make a fasta file out of the bam with simple text manipulation
+samtools view runs/isoseq/unpolished/m16.unpolished.flnc.bam | \
+  awk '{print ">"$1"\n"$10}' > \
+  runs/isoseq/unpolished/m16.unpolished.flnc.fa
+# we simply googled 'bam to fasta awk', just FYI
+head runs/isoseq/unpolished/m16.unpolished.flnc.fa
+\end{lstlisting} 
+
+Back to GMAP, like the other aligners we've seen, GMAP is going to need
+an index, and then we can map to this.
+
+\begin{lstlisting}
+# make index (-D where -d index_name input.fa)
+gmap_build -D runs/gmap_index -d Athaliana \
+  studies/AthalianaReferences/resources/Athaliana_167_TAIR9.fa
+# map reads
+mkdir -p runs/isoseq/mapped/
+# this should take about 8 min with one thread
+# add -t N to speed it up if polishing is finished
+gmap -D runs/gmap_index -d Athaliana -f samse -n 0 --cross-species \
+  --max-intronlength-ends 200000 -z sense_force \
+  runs/isoseq/unpolished/m16.unpolished.flnc.fa 2> \
+  runs/isoseq/mapped/m16.flnc.gmap.log > runs/isoseq/mapped/m16.flnc.sam
+# the parameters are simply those recommended for isoseq, but briefly
+# -D, -d : to find the index made above
+# -f samse : to export .sam format
+# -n 0 : allow chimeric alignments
+# --max-intronlength-ends 200000 : more realistic max for EUK introns
+# -z sense_force : since sequences were oriented 5'-3' by poly-A tail
+# 2> runs/isoseq/mapped/m16.flnc.gmap.log : redirect copius standard error output to file
+# > runs/isoseq/mapped/m16.flnc.sam : finally redirect standard out to our sam file
+
+# now we have a "normal" sam/bam file  
+less runs/isoseq/mapped/m16.flnc.sam
+# note: in columns 3, and 4 we have the chromosome and position of mapping
+
+# we'll want a sorted bam file for the next step
+samtools sort runs/isoseq/mapped/m16.flnc.sam > runs/isoseq/mapped/m16.flnc.sorted.bam
+\end{lstlisting} 
+
+OK, we have mapped data, it's time to take a break and really look at our long
+read data, and also how it compares to the Illumina reads we had before.
+Feel free to pair up with your neighbor for the next part and have one person 
+run the Illumina half and the other the Iso-Seq half.
+
+%TODO: create samtools index in runs/
+%TODO: data missing in runs/hisat_results/treatment1.hisat.sam -> earlier script didn't run. 
+%TODO: %TODO: Stopped here for now.
+
+
+\begin{lstlisting}
+# first to visualize the reads we'll need indexes
+# one for the genome
+samtools faidx studies/AthalianaReferences/resources/Athaliana_167_TAIR9.fa
+# one for the new mapping of flnc reads
+samtools index runs/isoseq/mapped/m16.flnc.sorted.bam
+# for the Illumina reads we need both to convert to bam and sort
+samtools sort runs/hisat_results/treatment1.hisat.sam \ 
+  > runs/isoseq/mapped/illumina.sorted.bam
+# and we need an index
+samtools index runs/isoseq/mapped/illumina.sorted.bam
+\end{lstlisting} 
+
+To look at the assembly, we want to briefly install a genome browser,
+Tablet, \emph{on the host machine}.
+
+Open a new terminal on the host machine, and run
+\begin{lstlisting}
+cd $HOME/rnaseq-workshop/
+./workflows/tablet_linux_x64_1_21_02_08.sh
+# click through the installer, it should open a 
+# GUI genome visualization tool when done.
+\end{lstlisting} 
+
+In Tablet, you'll want to click to "Import an assembly".
+Keep in mind that in Tablet lingo "assembly" is referring
+to what we've been calling "mapped" or "bam". You'll also
+want to open the "Reference" 
+
+\il{studies/AthalianaReferences/resources/Athaliana_167_TAIR9.fa}. 
+
+Finally, once
+you've imported the "assembly" you'll  want to click on "Import features"
+and open 
+
+\il{studies/AthalianaReferences/resources/Athaliana_167_TAIR10.gene_exons.gtf}.
+
+Now take some time to look at the data. Check out your favorite gene perhaps.
+Can you see why long reads are so much better for cases of complicated
+alternative splicing? Do you see other differences between the data (besides
+that they are from totally different samples with different genes expressed)?
+Is the typical coverage distribution the same?
+
+Once you've got a feel for the differences between Illumina and Iso-Seq data
+we'll move on to collapsing mapped reads into transcripts. 
+
+\begin{lstlisting}
+mkdir collapsed
+# the next script requires a _sorted_ sam file, so one more conversion
+samtools view runs/isoseq/mapped/m16.flnc.sorted.bam > \
+  runs/isoseq/mapped/m16.flnc.sorted.sam
+# and the actual command to collapse
+collapse_isoforms_by_sam.py --input runs/isoseq/unpolished/m16.unpolished.flnc.fa \
+  -s runs/isoseq/mapped/m16.flnc.sorted.sam -o collapsed/m16.flnc.to_genome \
+  --flnc_coverage 2
+
+ls collapsed
+\end{lstlisting}
+
+To understand everything going on and the parameters for the previous step, 
+we need to think a little bit about some of the wet lab details of Iso-Seq. 
+In particular, reading a PacBio pamphlet, or just looking at the naming of
+"full length, non chimeric" reads, it seems like one could simply collapse every
+read that maps to the same position, perhaps allowing a small margin of flexibility
+for what counts as "the same". Certainly, this is part of what is going on, and 
+the \il{--max_3_diff} of 100 bps is basically this "small margin of flexibility". 
+But what about the \il{--max_5_diff} which defaults to 1000 bps? This must be, 
+and is, much more
+lenient because of wet lab constraints. Nothing in the general Iso-Seq protocol
+actually has a way to tell if reads run from the 5' cap to the 3' poly-A tail, 
+but rather, what PacBio defines as "full length" are reads that have the poly-A tail
+on one end, and the 5' adapter on the other end. 
+The 5' adapters are attached with a blunt end
+ligation, so this does mean that reverse transcription successfully proceeded 
+to the end of the piece of RNA. It doesn't mean that this was still an intact 
+mRNA, and indeed, normally degradation will result in reads, particularly,
+from longer transcripts, that are slightly to severely truncated on the 5' end.
+That's why the collapsible margin is wider on the five prime end. 
+
+Some care should be taken here depending upon whether one is working with
+FLNC reads or high quality transcripts. This script appears to have been designed 
+primarily for performing the second round of collapsing for the high quality
+transcripts. Therefore it's likely that both 5' and 3' margins should be increased 
+when working with FLNC reads, although \emph{most} of the time it seemed ok for the
+FLNC reads here. If you have high quality transcripts from \il{isoseq3 polish} 
+that you trust a little more you could also set \il{--dun-merge-5-shorter}, 
+which will prevent alignments from collapsing that differ on the 5' end by less 
+than the margin, but which have a different set of exons.
+
+\includegraphics[width=0.65\textwidth]{dun_merge.pdf}
+
+Finally, the \il{--flnc_coverage 2} is \emph{only} appropriate for FLNC reads,
+as it will cause every possible transcript that is supported by less than 2
+alignments to be sorted into the 'bad' gff file while the rest end up in the 'good'
+gff file. With high quality transcripts, if the first step worked right for a transcript,
+you only would have a coverage of 1, but definitely want to keep it (and consider it
+'good') anyways.
+We were skeptical this made sense for FLNC reads either, as particularly with 
+a dataset like this that is a little low on coverage in general, there will be
+a lot of real transcripts with just one read. I personally hate throwing out
+precious data. However, the \il{collapse_isoforms_by_sam.py} tool does not have a 
+complicated error model or anything behind it, it basically just applies a series
+of thresholds to see if some alignments should be collapsed. This means it treats
+a single unique transcript, which provides the only data for a genetic region the same
+as it treats a single transcript that is 'unique' only because it was erroneous and
+slightly misaligned at a splice junction. In the first case, we want
+to keep the single read, because it's all we have. However, in the second case, 
+we definitely want to ignore the single outlying read and trust the majority 
+at this locus. Luckily splitting the transcripts into 'good'
+and 'bad' allows us to handle them in the future with different trustworthiness and
+basically set the 'bad' set to be ignored at every locus where there was a 'good' 
+model. This workaround is a bit unwieldy, and certainly not ideal, 
+but it introduces fewer errors than without using
+\il{--flnc_coverage 2}, while keeping more data than just taking high quality 
+transcripts.
+
+Now you could stop here and just use the 'good' transcripts, or add one more
+step to merge in the non-conflicting set of 'bad' transcripts. 
+That would provide a model for the transcribed parts of the genome.
+The recommend Iso-Seq tutorials kinda peter out here, and maybe there
+are projects where one would stop here, e.g. if you were running Iso-Seq to 
+learn about alternative splicing in your favorite tissue, but in an otherwise
+well annotated species. That said, one of the most promising applications of 
+Iso-Seq is to help define gene models for newly sequenced species. And
+here, where the goal is to describe full gene models, we really ought to take 
+full advantage of the genome to help clean up, in particular, omissions in our 
+transcript models. 
+
+For instance, the genome can help further when:
+\begin{itemize}
+\item None of the reads / transcripts for a locus were truly full length
+\item Transcripts for a locus were not expressed (enough to be measured) in any sequenced condition
+\end{itemize}
+
+\fbox{\begin{minipage}{45em}
+OK, typically to define genes for a new genome one would bring in \emph{all} 
+the extrinsic data that is available, whether it was Illumina RNAseq, Iso-Seq,
+protein alignments from related species, ESTs, or something else and feed 
+all of this to help support
+a \emph{de novo} gene caller such as SNAP or Augustus that has a model for typical
+codon usage of genic and non-genic regions and can look for likely ORFs. 
+This is somewhat organized and automated in public tools such as MAKER or PASA. 
+\\
+
+That said, that really starts to get beyond the scope of this course, so we will
+simply look at how you could use \emph{just} the Iso-Seq data to support the 
+\emph{de novo} gene caller Augustus. \\
+
+Hopefully, Iso-Seq will be incorporated more officially into gene calling 
+pipelines in the near future.
+\end{minipage}}
+
+Augustus can incorporate extrinsic evidence, like our Iso-Seq data, by adjusting
+its posterior probability of a particular call based upon the extrinsic data
+provided as 'hints'. That is to say, it runs its \emph{de novo} prediction
+and then adjusts them as more or less likely based on the hints. 
+
+So we have to convert our collapsed gff file into a slightly different gff file that
+Augustus can understand as hints. Basically what we're doing here is converting the 
+transcripts into a format where we can give AUGUSTUS more careful instructions on how to 
+consider each part. Are the splice junctions as trust worthy as the aligned regions 
+themselves? What about the transcription start site? Should exons and transcripts be considered 
+only as a whole, or also as a part?
+
+Let's just try it and look at the result. 
+
+\begin{lstlisting}
+# convert to gff3 (bc the hints script takes only gff3 correctly)
+pfx=collapsed/m16.flnc.to_genome.collapsed.
+gffread ${pfx}bad.gff -o ${pfx}bad.gff3
+gffread ${pfx}good.gff -o ${pfx}good.gff3
+# feel free to look at output with 'less' to see differences between gff and gff3
+
+# convert our exons and transcripts to 'hints' for augustus
+# note the 'good' hints (w/ 2+ reads) have a higher priority than the 'bad'
+gff3_to_hints_isoseq.py -i ${pfx}good.gff3 -o ${pfx}good.hints.gff3 --priority=2
+gff3_to_hints_isoseq.py -i ${pfx}bad.gff3 -o ${pfx}bad.hints.gff3 --priority=1
+
+# finally we'll need the hints to be in one file
+cat ${pfx}good.hints.gff3 ${pfx}bad.hints.gff3 > ${pfx}hints.gff3
+\end{lstlisting}
+
+Let's look at the 'features' column of the gff, and how they've now changed
+from the typical 'transcript' 'exon' that we had before.
+
+\begin{lstlisting}
+# open file | get 3rd column | sort | count unique lines
+less ${pfx}hints.gff3 |cut -f3 | sort | uniq -c
+\end{lstlisting}
+
+OK, so first, all 'ep' is very similar to the number of exons 
+(if you run a similar count on the input files) and the features
+tts and tss are very close to the number of transcripts. The 
+remaining features are in between. Huh, but the names are pretty 
+cryptic, so what's what?
+
+\begin{tabular}{r l}
+    \toprule
+    \parbox[t][][t]{15mm}{ass} & 
+    \parbox[t][][t]{135mm}{Acceptor Splice Site, on 3' end of the intron (most commonly AG).
+Not our name for it.} \\ \midrule
+    \parbox[t][][t]{15mm}{dss} & 
+    \parbox[t][][t]{135mm}{Donor Splice Site, on the 5' end of the intron (most commonly GT)} \\ \midrule
+
+    \parbox[t][][t]{15mm}{ep} & 
+    \parbox[t][][t]{135mm}{Exon Part, so an exon with the edges trimmed (by \il{--trim_exonparts}) and Augustus will try to contain it in an exon, not match exactly} \\ \midrule
+
+    \parbox[t][][t]{15mm}{ip} &
+    \parbox[t][][t]{135mm}{Intron Part, so an intron with the edges trimmed (by \il{--trim_intronparts}) and Augustus will try to contain it in an intron, not match exactly} \\ \midrule
+
+    \parbox[t][][t]{15mm}{tss} &
+    \parbox[t][][t]{135mm}{Transcription Start Site} \\ \midrule
+
+
+    \parbox[t][][t]{15mm}{tts} &
+    \parbox[t][][t]{135mm}{Transcription Termination Site} \\ \midrule
+
+    \bottomrule
+\end{tabular}
+
+
+I'd like to say we could run augustus now, but it would run overnight for the
+whole genome. So we're just going to run it for a subset of the genome. 
+
+The following script will cut a little chunk out of fa, gff3, and bam files 
+alike. This is appropriate for the workshop or other niche cases like iterating
+through different parameters, etc to see what works best. But for the record, 
+if you're doing this for the main run on a real project, please don't
+cut it smaller than your chromosomes/scaffolds.
+
+\begin{lstlisting}
+# feel free to just type in place
+official_gtf=studies/AthalianaReferences/resources/Athaliana_167_TAIR10.gene_exons.gtf
+
+subset_genome_related.py --fasta studies/AthalianaReferences/resources/Athaliana_167_TAIR9.fa \ 
+  -sChr1 -f1 -t300000 --gff ${pfx}good.gff3,${pfx}bad.gff3,${pfx}hints.gff3,$official_gtf \
+  --bam runs/isoseq/mapped/m16.flnc.sorted.bam
+
+# the script reports its output, but if it's easier
+find -name *__Chr1*
+\end{lstlisting}
+
+Now we can run augustus
+\begin{lstlisting}
+# I don't like typing the same things over and over again
+where=Chr1_1-300000
+mkdir gene_models
+augustus --hintsfile=${pfx}hints__${where}.gff3 --species=arabidopsis \
+  --alternatives-from-evidence=true --extrinsicCfgFile=extrinsic.E.cfg \
+  --UTR=on --allow_hinted_splicesites=atac \
+  studies/AthalianaReferences/resources/Athaliana_167_TAIR9__${where}.fa \ 
+  > gene_models/flnc.${where}.augustus
+
+less ${pfx}augustus
+# the augustus output has the hints, commented protein sequence, explanation,
+# and, what we are after, gtf lines with 'AUGUSTUS', the gene models.
+# let's subset it to have just these.
+less gene_models/flnc.${where}.augustus | grep AUGUSTUS > \
+  gene_models/flnc.${where}.augustus.gtf
+\end{lstlisting}
+%# copying the commands for the full runs here so as to be able to provide
+%# comparative results
+%augustus --hintsfile=collapsed/m16.flnc.to_genome.collapsed.hints.gff3 --species=arabidopsis --alternatives-from-evidence=true --extrinsicCfgFile=extrinsic.E.cfg --UTR=on --allow_hinted_splicesites=atac studies/AthalianaReferences/resources/Athaliana_167_TAIR9.fa > ../BackUpData/augustus/m16.flnc.to_genome.full.gtf
+%# and for the hq (note next section must also be finished)
+%augustus --hintsfile=collapsed/m16.hq.to_genome.collapsed.hints.gff3 --species=arabidopsis --alternatives-from-evidence=true --extrinsicCfgFile=extrinsic.E.cfg --UTR=on --allow_hinted_splicesites=atac studies/AthalianaReferences/resources/Athaliana_167_TAIR9.fa> ../BackUpData/augustus/m16.hq.to_genome.full.gtf
+%augustus --species=arabidopsis --UTR=on studies/AthalianaReferences/resources/Athaliana_167_TAIR9.fa> ../BackUpData/augustus/no_hints.full.gtf
+
+OK cool, we have our gene models, we recommend loading them into tablet
+(with the subsetted .bam and .fa files) and comparing them to the subsetted versions of the
+raw transcript (good/bad.gff3), and the official Arabidopsis annotation
+(studies/AthalianaReferences/resources/Athaliana\_167\_TAIR10.gene\_exons.gtf).
+
+Tablet is a little frustrating in that loading multiple 'transcript' features
+from different .gff's will pile them on top of each other in one line. You
+may a) open multiple tablet instances (think teamwork) or b) adjust the 'feature'
+column of the gff to have a different name. See below for an example
+
+\begin{lstlisting}
+less gene_models/flnc.${where}.augustus.gtf | awk 'BEGIN {OFS = FS = "\t"} \
+  { sub(/^/, "flnc.", $3) }1' > gene_models/flnc.${where}.augustus.gtf.tablet
+less gene_models/flnc.${where}.augustus.gtf.tablet  # check results 
+# don't worry about understanding all of the 'awk' command. The first bit
+# is to tell it to use only tabs as a column separator. The second bit says
+# find and replace the start of column 3 with "flnc."
+# if you want to run this for the other files. The only part of the awk command
+# you'll want to change is the "flnc." (e.g. to "original.")
+\end{lstlisting}
+
+Final note on viewing multiple gffs worth of feature sets in Tablet, you'll have to go to the features tab
+(blue triangle), click on 'select tracks', and then check all the tracks you want 
+to see
+
+In any case, congratulations for getting all the way to our first full
+gene models!
+
+\paragraph{Mapping high quality isoforms}
+We're now going to run (nearly) all the same steps for mapping high quality
+isoforms that we ran for mapping flnc reads above. As it is essentially
+the same pipeline, we are not going to give you the commands,
+although we will point out every instance where the commands should
+differ from above beyond the dataset being used. It is entirely up
+to you if, when and how much you check the intermediate output
+with e.g. less and tablet. For clarity, wherever you're supposed to fill
+in something it will be marked with \il{...} for a whole line or \il{< >}
+for a partial line.
+
+Since you no longer have  an exact script, we highly recommend filling
+in one as you go (e.g. in a text file)
+
+\begin{lstlisting}
+# first, we need to look at the output from polishing
+# if polishing did not finish, please stop it and copy the backup polishing results
+ls polished
+# The polished high quality draft transcripts are m16.polished.hq.fastq.gz
+# extract them with 'gunzip'
+...
+# we'll start with gmap.
+# we already have a database, so we don't have to run gmap_build, just gmap
+# your input file will be 'polished/m16.polished.hq.fastq' 
+# please name the output in a way you know what it is and that it doesn't
+# overwrite the previous analyses.
+...
+# we'll now need to sort, and convert back to sam
+# think 'samtools view' for conversions, and 'samtools sort' to sort
+...
+# while you're at it and for later, please index your sorted bam
+samtools index <your sorted bam>
+# for the collapse_isoforms_by_sam.py, there are quite a few changes
+collapse_isoforms_by_sam.py --input <the hq fastq file> -s \
+  <the gmap sam output> -o <your output prefix> --fq
+# we don't want a coverage filter as we had for the flnc reads,
+# and we have to explicitly tell the script to parse fastq input (--fq)
+
+# if you check your output, you'll see fewer output files than before,
+# as without the --flnc_coverage parameter, the output won't be split into
+# 'good' and 'bad'. Can you find the main output? It should be named
+# <your output prefix>.collapsed.gff
+# please convert this to gff3 with 'gffread'
+...
+# please convert the .gff3 file to hints using gff3_to_hints_isoseq.py
+# note that you do not need to set --priority (but it also won't hurt)
+...
+# please subset the new bam and gff files with ./subset_genome_related.py
+subset_genome_related.py --bam <your sorted.bam> \
+  --gff <your collapsed.gff>,<your hints.gff> -s Chr1 -f 1 -t 300000
+# and run augustus. 
+# All you will need to change are the hints and output files.
+...
+# and subset to just lines containing 'AUGUSTUS' with 'grep'
+...
+\end{lstlisting}
+% and with answers....
+%# first, we need to look at the output from polishing
+%# if polishing did not finish, please stop it and copy the backup polishing results
+%ls polished
+%# The polished high quality draft transcripts are m16.polished.hq.fastq.gz
+%# extract them with 'gunzip'
+%gunzip runs/isoseq/polished/m16.polished.hq.fastq.gz
+%# we'll start with gmap.
+%# we already have a database, so we don't have to run gmap_build, just gmap
+%# your input file will be 'polished/m16.polished.hq.fastq' 
+%# please name the output in a way you know what it is and that it doesn't
+%# overwrite the previous analyses.
+%gmap -D runs/gmap_index -d Athaliana -f samse -n 0 --cross-species --max-intronlength-ends 200000 -z sense_force runs/isoseq/polished/m16.polished.hq.fastq 2> runs/isoseq/mapped/m16.hq.gmap.log > runs/isoseq/mapped/m16.hq.sam
+%# we'll now need to convert to bam, sort, and convert back to sam
+%# think 'samtools view [-b]' for conversions, 'samtools sort' to sort
+%samtools view -b runs/isoseq/mapped/m16.hq.sam |samtools sort > runs/isoseq/mapped/m16.hq.sorted.bam
+%samtools view runs/isoseq/mapped/m16.hq.sorted.bam > runs/isoseq/mapped/m16.hq.sorted.sam
+%# while you're at it and for later, please index your sorted bam
+%samtools index runs/isoseq/mapped/m16.hq.sorted.bam
+%# for the collapse_isoforms_by_sam.py, there are quite a few changes
+%collapse_isoforms_by_sam.py --input runs/isoseq/polished/m16.polished.hq.fastq -s \
+%  runs/isoseq/mapped/m16.hq.sorted.sam -o collapsed/m16.hq.to_genome --fq
+%# we don't want a coverage filter as we had for the flnc reads,
+%# and we have to explicity tell the script to parse fastq input (--fq)
+%
+%# if you check your output, you'll see fewer output files than before,
+%# as without the --flnc_coverage parameter, the output won't be split into
+%# 'good' and 'bad'. Can you find the main output? It should be named
+%# <your output prefix>.collapsed.gff
+%# please convert this to gff3 with 'gffread'
+%gffread collapsed/m16.hq.to_genome.collapsed.gff -o collapsed/m16.hq.to_genome.collapsed.gff3
+
+%# please convert the .gff3 file to hints using gff3_to_hints_isoseq.py
+%# note that you do not need to set --priority (but it also won't hurt)
+%gffread collapsed/m16.hq.to_genome.collapsed.gff -o collapsed/m16.hq.to_genome.collapsed.gff3
+%# please subset the new bam and gff files with ./subset_genome_related.py
+%subset_genome_related.py -s Chr1 -f1 -t300000 --bam runs/isoseq/mapped/m16.hq.sorted.bam --gff collapsed/m16.hq.to_genome.collapsed.gff,collapsed/m16.hq.to_genome.collapsed.hints.gff3  --fasta studies/AthalianaReferences/resources/Athaliana_167_TAIR9.fa
+%# and run augustus. 
+%# All you will need to change are the hints and output files.
+%augustus --hintsfile=collapsed/m16.hq.to_genome.collapsed.hints__Chr1_1-300000.gff3 --species=arabidopsis --alternatives-from-evidence=true --extrinsicCfgFile=extrinsic.E.cfg --UTR=on --allow_hinted_splicesites=atac studies/AthalianaReferences/resources/Athaliana_167_TAIR9__${where}.fa > gene_models/hq.$where.augustus
+%# and subset to just lines containing 'AUGUSTUS' with 'grep'
+%less gene_models/hq.$where.augustus|grep AUGUSTUS > gene_models/hq.Chr1_1-300000.augustus.gtf
+
+Alright, well done. You can get surprisingly far in executing a bioinformatics
+pipeline by taking an example from somewhere else that does approximately
+what you want, checking help functions to see if all parameters make sense
+for your data, and then plugging in your data (as you basically just did above).
+That said it is extremely important that you check as you go to see if the 
+output is giving you what you expect, to read what information is available
+on the tools and generally try not to blindly trust it fits your use-case.
+
+We'll do a larger scale comparison of the methods later, but feel free to
+compare the final gene models in tablet. 
+
+\subsubsection{\emph{de Novo}}
+So you've now seen two variations on reference based gene calling, and you've 
+probably also gotten a chance to appreciate some of the major advantages, such
+as using the genome to fill in missing transcripts or 5' degraded transcripts. 
+
+But sometimes the genome is not available, or it can be so fragmented that many
+to most genes do not occur on one scaffold, or otherwise have quality issues.
+In such cases it can be preferable to use a \emph{de Novo} approach.
+
+That said, the only \emph{de Novo} approaches we found very much consist of
+\textbf{under-development software} that hasn't yet been polished for an end user.
+In as much, we'll be approaching the next section as an \textbf{example of how you
+have to sanity check work as you go and be ready to fill gaps.}
+
+The first step is, as with a genome based approach, to collapse draft transcripts
+to somewhat better draft transcripts. When a genome was available this made perfect
+sense, the genome brought in extra information to differentiate say alleles from paralogs.
+However, it is still quite necessary for a \emph{de Novo} approach, as the first 
+clustering step was intentionally conservative on the logic it is easier to collapse
+later than to deal with transcripts that have been too aggressively merged.
+Thus, the next clustering step with cogent is required to shift from 'under clustering'
+to a somewhat more aggressive 'best guess' sort of clustering.
+
+The main commands we've taken from \url{https://github.com/Magdoll/Cogent/wiki/Running-Cogent}
+
+Instead of having a \emph{clustering} step, as under-development
+software, Cogent has a multi step process, with distance calculation, 
+partitioning, and consensus calling all performed separately
+
+\begin{lstlisting}
+# distance calculation:
+run_mash.py -h
+# looks like we really just need an input fasta file
+# which we just need to extract from the polishing step
+gunzip runs/isoseq/polished/m16.polished.hq.fasta.gz 
+
+run_mash.py --cpus 4 runs/isoseq/polished/m16.polished.hq.fasta
+# check output
+less runs/isoseq/polished/m16.polished.hq.fasta.s1000k30.dist
+# you see pairs of transcripts with some info on 
+# distance and alignment length in the later columns.
+# Seems reasonable. We'll also check the size of the file
+# we don't know how large it is supposed to be, but I think
+# we can say that if has fewer lines than input sequences or more
+# than input sequences squared, we should worry.
+wc --lines  runs/isoseq/polished/m16.polished.hq.fasta.s1000k30.dist 
+\end{lstlisting}
+
+In the next step we move from pairwise distances to partitions / 
+clusters.
+
+\begin{lstlisting}
+# partitioning
+process_kmer_to_graph.py -h
+# we'll need the input and output from above and we'll need
+# to set an output directory & prefix
+mkdir cogent
+# also the wiki recommended using -c COUNTS_FILE, and showed an example
+# COUNTS_FILE with (hq transcript ID, N flnc reads).
+# Hmmm, do we have the number for how many full length reads
+# went into each high quality transcript?
+less runs/isoseq/polished/m16.polished.hq.fasta
+# we do, we just have to parse it out of the fasta headers
+less runs/isoseq/polished/m16.polished.hq.fasta | grep '>' | cut -f1 -d';' |\
+  sed -e 's/>//g' -e 's/full_length_coverage=//g' -e 's/ /\t/g' \
+  > runs/isoseq/polished/m16.polished.hq.weights
+# for help understanding the above command, please truncate it
+# to the grep, check the output, and then start extending it piece by
+# piece (grep, grep | cut, grep | cut | sed -e, grep | cut | sed -e -e, etc...)
+
+# back to partitioning
+process_kmer_to_graph.py -c runs/isoseq/polished/m16.polished.hq.weights \
+  runs/isoseq/polished/m16.polished.hq.fasta \ 
+  runs/isoseq/polished/m16.polished.hq.fasta.s1000k30.dist \
+  cogent/ m16.polished.hq
+ls cogent  # puh, that's a lot of output
+# let's keep looking at one folder
+ls cogent/m16.polished.hq_0  
+less cogent/m16.polished.hq_0/in.fa 
+less cogent/m16.polished.hq_0/in.weights 
+# alright, it looks like we have a bunch of folders that contain the
+# bits of fasta and weight assigned to their partition. Simple actually.
+\end{lstlisting}
+
+In the final Cogent step, the sequences within the partitions are finally
+assigned to transcripts and collapsed.
+\begin{lstlisting}
+# consensus
+reconstruct_contig.py -h
+# from the wiki (more than the help function) we know
+# this has to be ran for each partition. We'll use a for loop
+for item in `ls cogent`;
+do 
+  reconstruct_contig.py -p $item cogent/$item/;
+done
+
+ls cogent/m16.polished.hq_0/
+# the final output is always in cogent2.renamed.fasta
+# now lets sanity check the results\
+# from the genome-based hq analysis we have some idea what to expect for 
+# number of transcripts 
+less collapsed/m16.hq.to_genome.collapsed.gff|awk '$3 == "transcript"'|wc --lines
+# so nearly 2k
+# counting from this output format is a bit harder 
+# but ultimately is just concatenating all the fasta files, and counting the
+# headers as we learned on the first day
+cat cogent/m16.polished.hq_*/cogent2.renamed.fasta | grep '>' | wc --lines
+# a lot less... hmmm...
+# while it's same order of magnitude and _could_ be reasonable
+# it was worrisome enough to start double checking
+# let's count home many draft transcripts were still there after partitioning
+cat cogent/m16.polished.hq_*/in.fa |grep '>'|wc
+# and compare to total hq draft transcripts
+less runs/isoseq/polished/m16.polished.hq.fasta | grep '>' |wc
+# so barely over half of the sequences remained after partitioning. 
+# if we look at one of the meta info outputs from process_kmers_to_graph.py
+less m16.polished.hq.partition.txt
+# we see a) that the vast majority of partitions have 2+ sequences and
+# b) a list of unassigned sequences, we can count these
+less m16.polished.hq.partition.txt | grep unassigned | sed 's/,/ /g' | wc --words
+# exactly the number we were missing. Success.
+# OK, but what should we do with these, if we were making orthogroups,
+# we would just skip the 'unassigned' singletons. But in this case, being a
+# singleton doesn't even mean it's low quality or an outlier, just that the FLNCs from
+# this draft transcript were successfully collapsed in the polish step, and cogent
+# didn't have anything left to do.
+# We already put together a convenience python script that will join 
+# collapsed and singleton transcripts together in a single file.  
+clean_cogent_output.py -f runs/isoseq/polished/m16.polished.hq.fasta -c cogent/ \
+  -o collapsed/m16.hq.de_novo.fa
+# feel free to run with -h, or simply open if you want to know how it works
+less collapsed/m16.hq.de_novo.fa|grep '>'|wc
+# and now we have very comparable results to the genome based methods (numerically)
+\end{lstlisting}
+
+\fbox{\begin{minipage}{45em}
+To be fair to the Cogent developers, this was all documented, and they
+do provide a way to get everything back together, see: 
+\href{https://github.com/Magdoll/Cogent/wiki/Tutorial\%3A-Using-Cogent-to-collapse-redundant-transcripts-in-absence-of-genome}
+{Using-Cogent-to-collapse-redundant-transcripts-in-absence-of-genome}. 
+That said, we missed the documentation in the first run, and noticing the 
+discrepancy between expectation and result was therefore critical to getting
+the output right. 
+Ultimately our solution was easier for the workshop, so we kept it.
+\end{minipage}}
+
+OK, so we have transcript models now. Without a genome one certainly couldn't 
+use e.g. augustus to extend or fill gaps in the sequences, what's missing
+is irrecoverable (without more sequencing). But we aren't done yet, the other
+thing that Augustus gave us was predictions for cds/protein sequences. For this 
+we'll use ANGEL. When we were working with Augustus, we just used the pre-trained
+model for arabidopsis, ANGEL doesn't ship with pre-trained models, but it is 
+easy to train. 
+
+First we need a training set, we'll just take a naive ORF caller (you know, based
+on start/stop codons and length), and grab the transcripts with the longest predicted CDS.
+Then we'll filter them, e.g. to avoid having two nearly-identical transcripts for
+training. Then we'll train.
+
+\begin{lstlisting}
+mkdir angel
+
+dumb_predict.py -h
+# since this is such a small dataset we set --min_aa_length kinda low
+dumb_predict.py collapsed/m16.hq.de_novo.fa angel/m16.hq.de_novo --min_aa_length=290
+ls angel/  # we'll continue with *.final.*
+# pick low-redundancy training set
+angel_make_training_set.py -h
+angel_make_training_set.py angel/m16.hq.de_novo.final angel/m16.hq.de_novo.final.train
+ls angel/  # we'll continue with *.train.cds and *.train.utr
+# and train, this will take a minute
+angel_train.py -h
+angel_train.py angel/m16.hq.de_novo.final.train.cds \
+  angel/m16.hq.de_novo.final.train.utr angel/arabidopsis.pickle --cpus=4
+# our trained model is now at angel/arabidopsis.pickle, importantly the computer
+# can read it, even if we can't
+\end{lstlisting}
+
+Now that we have our trained model, we just need to run it on all the sequences
+
+\begin{lstlisting}
+angel_predict.py -h
+# angel can only output predictions to the same directory
+cd angel/
+# this will take several minutes
+angel_predict.py ../collapsed/m16.hq.de_novo.fa arabidopsis.pickle m16.predictions \
+    --output_mode=best --min_angel_aa_length 100 --min_dumb_aa_length 100
+ls
+cd ..
+\end{lstlisting}
+
+Now we have our completed gene models!
+
+\subsection{Comparing and Evaluating Gene Models}
+And with completed gene models, whether or not one has several methods to compare,
+one definitely wants some feedback on whether they are any good or not. 
+
+We'll run through a few examples of basic ways to evaluate the results. 
+\textbf{We'll just provide the first command, and then let you figure out how to change
+it / if it makes sense for the rest of the methods.}
+
+Exactly what you compare is up to you, you could compare the results of processing
+the high quality draft transcripts in a \emph{de novo} or reference based manner.
+You could compare the augustus output when it gets hints from the flnc reads,
+the hq transcripts, or is run without hints. You could compare any of the methods
+to the official \emph{Arabidopsis} gene models. You could compare the 
+genome-based IsoSeq transcripts to the Trinity assembly (minidata.fastq from
+day 1 consisted of just reads that mapped to Chr1:1-300000).
+There are a lot of options.
+
+As it can be hard to compare just the genome fragment we used for Augustus to the
+full genome/transcriptome predictions, we've provided full augustus 
+results if you want them.
+
+\begin{lstlisting}
+cp -r ../BackUpData/augustus ./  
+\end{lstlisting}
+
+First we need to have actual transcript/protein sequences, these are easy to create from
+the gtf/gffs.
+
+\begin{lstlisting}
+# get transcript sequences from gff with just transcripts/exons 
+# e.g. our collapsed gff sequences
+gffread collapsed/m16.flnc.to_genome.collapsed.good.gff -O \
+  -g studies/AthalianaReferences/resources/Athaliana_167_TAIR9.fa \
+  -w collapsed/m16.flnc.to_genome.collapsed.good.transcript.fa
+
+# get transcript, cds, and protein sequences from a full gff (e.g. from Augustus)
+basename=studies/AthalianaReferences/resources/Athaliana_167_TAIR10.gene_exons
+gffread ${basename}.gtf -g studies/AthalianaReferences/resources/Athaliana_167_TAIR9.fa \
+  -w ${basename}.transcript.fa -x ${basename}.cds.fa -y ${basename}.protein.fa
+\end{lstlisting}
+
+Next we'll want to run a numerical comparison. 
+How many transcripts/proteins were
+produced and how long were they?
+\begin{lstlisting}
+# quast is a program that can perform a fairly thorough quality control
+# on a variety of DNA sequences / assemblies. 
+quast.py angel/m16.predictions.ANGEL.cds --min-contig=1 -o quast/angel_cds
+# but quast will only work for DNA sequence, for proteins we can use the
+# count_fasta.pl from the first day. Copy it into the directory, then
+./count_fasta.pl -i 20 angel/m16.predictions.ANGEL.pep
+\end{lstlisting}
+
+Knowing you have long sequences is nice, and it's particularly encouraging
+when you have longer protein sequences. However, sometimes and particularly when
+there is no reference available, it can be hard to interpret. Moreover, given
+an 'omics sized analysis, you'll always have the occasional long ORF by chance,
+etc. So once one has a short list of methods that produced decent 
+\emph{quantitative} results, it's good to perform a more \emph{qualitative}
+check. This can include steps you've seen before, like \emph{looking} at the
+models with Tablet. But another important question is always whether the 
+sequences can be annotated. 
+
+In plant biology, Mercator (and the newly released Mercator 4) provide a nice
+quick way to get functional annotations. 
+You can upload your sequences to \url{http://www.plabipd.de/portal/web/guest/mercator4}, 
+and the results should be there within a 
+few minutes. That said, if everyone does this at once, it may take a good deal
+longer, so feel free to look over each others shoulders. 
+
+The web interface provides some summary output, but most importantly you can
+download the results as a tab separated annotation file and then calculate
+whatever you need.
+\begin{lstlisting}
+# count all transcripts
+less <your results> |grep "T$"|wc --lines
+# count annotated transcripts
+less <your results> |grep "T$"|grep -v "not assigned"|wc --lines
+\end{lstlisting}
+
+One could similarly look at assigning pfam domains, assigning GO terms,
+or comparing transcripts directly to other related species whether via 
+reciprocal best blast or orthogroup creation.
+
+\paragraph{Challenge Assignment.} If you've made it this far, and we haven't 
+yet interrupted you to start the Q\&A session, you might want to take this time
+to try and apply some of your R skills to the comparison above. Can you make
+a plot of the N50 or total length for your comparisons? 
+Can you make a comparative
+plot that summarizes the Mercator/MapMan annotations?
+
+\subsection{Future perspectives}
+
+We spent a lot of this section talking about how this was un-stable software,
+how it was changing a lot, how to check as you go when working with new
+pipelines, etc... You might almost get the idea that we don't expect you
+to be able to use this script as an appropriate guideline 6 months from now, and you'd be right.
+
+Here are some things that we think are needed, and we think will come
+\emph{soonish}.
+These are things I would definitely look for at the start of my next project.
+\begin{itemize}
+\item Generally, there will be a flux, better tools will come out, tools will change names and parameters
+\item Some equivalents of FastQC that give more tailored feedback for PacBio / Nanopore data will become available
+\item Cogent or a similar tool will likely provide a 1-step process for genome-free draft transcript collapsing
+\item More tools will become available that are specialized for direct RNAseq
+\begin{itemize}
+\item Genome-based collapse for Nanopore reads that has poly-A tail detection on the 3' end and, 
+of course, accounts for the higher / biased error.
+\item Read and or transcript polishing (self but very importantly also Illumina based) that are explicitly aware of the dynamic range of transcriptomics data
+\end{itemize}
+\item There's some interest in 5' and 3' selection in the Iso-Seq pipeline,
+which already even has some early software available for the collapse step 
+\href{https://github.com/GenomeRIK/tama}{TAMA}
+\item Finally, in good time, both Iso-Seq and direct RNAseq will be incorporated into 
+full gene annotation pipelines.
+\end{itemize}