From 41cb48b58c5701fe55dcb819bee75c6933cc9cd0 Mon Sep 17 00:00:00 2001 From: Dominik Brilhaus <brilhaus@nfdi4plants.org> Date: Fri, 28 Mar 2025 10:11:16 +0100 Subject: [PATCH] update sub --- .cwl/README.md | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++ .cwl/cwl-aux | 2 +- 2 files changed, 93 insertions(+), 1 deletion(-) create mode 100644 .cwl/README.md diff --git a/.cwl/README.md b/.cwl/README.md new file mode 100644 index 0000000..6b3f620 --- /dev/null +++ b/.cwl/README.md @@ -0,0 +1,92 @@ +# CWL + +<!-- +add this repo as submodule to ARCs + +```bash +git submodule add https://git.nfdi4plants.org/brilator/cwl-aux.git .cwl/cwl-aux +git submodule update --init --recursive +git submodule update --recursive --remote +`` + + --> + +**Data analysis** in this ARC is packaged and made reusable via [Common Workflow Language (CWL)](https://www.commonwl.org). +For details, visit the [DataPLANT knowledgebase](https://nfdi4plants.github.io/nfdi4plants.knowledgebase/cwl/). + +Briefly, every data analysis step (`runs`) is described with a `run.cwl` document. The `run.cwl` points (i.e. executes) one or multiple `workflows` (stored as `workflow.cwl`). The input parameters required for the `workflow.cwl` are documented in the accompanying `run.yml`. A `workflow.cwl` can be a single command line tool or a more complex workflow pipeline that references and combines other `*.cwl` documents. + +```bash +... +├── runs +│ ├── fastqc +│ │ ├── run.cwl +│ │ └── run.yml +│ ... +├── studies +│ ├── ... +│ +└── workflows + ├── fastqc + │ ├── collectFilesInDir.cwl + │ ├── fastqc.cwl + │ └── workflow.cwl + ... +``` + +```mermaid +flowchart TD +workflowcwl --- runcwl +subgraph r["runs/fastqc/"] + runcwl(("run.cwl")) + runyml(("run.yml")) +end +i[input: DB_097_CAMMD_CAGATC_L001_R1_001.fastq.gz] --- runyml +r ---> o[output: DB_097_CAMMD_CAGATC_L001_R1_001_fastqc.html] +subgraph "workflows/fastqc" + workflowcwl(("workflow.cwl")) +end +``` + + +## Setup and dependencies + +Again, for details check the docs linked above. +Executing cwl documents requires a cwl runner, e.g. [cwltool](https://github.com/common-workflow-language/cwltool). +Software and package dependencies are ideally covered by Docker or Conda and described in the hints / requirements sections of cwl documents (e.g. `DockerRequirement` and / or `SoftwareRequirement`). + +Additional dependencies may exist for one or the other workflow (e.g. a local installation of R or F# or packages therein), if the workflow is not yet packaged perfectly reusable. + +## Default cwltool commands + +Here's a list of frequently used `cwltool` commands to validate or execute runs and workflows. + +### Validate document + +```bash +cwltool --validate run.cwl +``` + +### Execute workflow in `./runs/*` + +```bash +cwltool run.cwl run.yml +``` + +### capture log and run in bg + +```bash +cwltool run.cwl run.yml > $(date +"%Y-%m-%d_%H-%M")-run.log 2>&1 & +``` + +### capture log, run in parallel and in bg + +```bash +cwltool --parallel --timestamp run.cwl run.yml > $(date +"%Y-%m-%d_%H-%M")-run.log 2>&1 & +``` + +### Print workflow to file + +```bash +cwltool --print-dot ../arc.cwl | dot -Tsvg > arc-cwl.svg +``` diff --git a/.cwl/cwl-aux b/.cwl/cwl-aux index 11931ea..a2d591e 160000 --- a/.cwl/cwl-aux +++ b/.cwl/cwl-aux @@ -1 +1 @@ -Subproject commit 11931eaf85d05574e9a8c0aff32de00a6f3a46d7 +Subproject commit a2d591e879b619161d6482d5552a9d287691b99a -- GitLab