# CWL **Data analysis** in this ARC is packaged and made reusable via [Common Workflow Language (CWL)](https://www.commonwl.org). For details, visit the [DataPLANT knowledgebase](https://nfdi4plants.github.io/nfdi4plants.knowledgebase/cwl/). Briefly, every data analysis step (`runs`) is described with a `run.cwl` document. The `run.cwl` points (i.e. executes) one or multiple `workflows` (stored as `workflow.cwl`). The input parameters required for the `workflow.cwl` are documented in the accompanying `run.yml`. A `workflow.cwl` can be a single command line tool or a more complex workflow pipeline that references and combines other `*.cwl` documents. ```bash ... ├── runs │ ├── fastqc │ │ ├── run.cwl │ │ └── run.yml │ ... ├── studies │ ├── ... │ └── workflows ├── fastqc │ ├── collectFilesInDir.cwl │ ├── fastqc.cwl │ └── workflow.cwl ... ``` ```mermaid flowchart TD workflowcwl --- runcwl subgraph r["runs/fastqc/"] runcwl(("run.cwl")) runyml(("run.yml")) end i[input: DB_097_CAMMD_CAGATC_L001_R1_001.fastq.gz] --- runyml r ---> o[output: DB_097_CAMMD_CAGATC_L001_R1_001_fastqc.html] subgraph "workflows/fastqc" workflowcwl(("workflow.cwl")) end ``` ## Setup and dependencies Again, for details check the docs linked above. Executing cwl documents requires a cwl runner, e.g. [cwltool](https://github.com/common-workflow-language/cwltool). Software and package dependencies are ideally covered by Docker or Conda and described in the hints / requirements sections of cwl documents (e.g. `DockerRequirement` and / or `SoftwareRequirement`). Additional dependencies may exist for one or the other workflow (e.g. a local installation of R or F# or packages therein), if the workflow is not yet packaged perfectly reusable. ## Default cwltool commands Here's a list of frequently used `cwltool` commands to validate or execute runs and workflows. ### Validate document ```bash cwltool --validate run.cwl ``` ### Execute workflow in `./runs/*` ```bash cwltool run.cwl run.yml ``` ### capture log and run in bg ```bash cwltool run.cwl run.yml > $(date +"%Y-%m-%d_%H-%M")-run.log 2>&1 & ``` ### capture log, run in parallel and in bg ```bash cwltool --parallel run.cwl run.yml > $(date +"%Y-%m-%d_%H-%M")-run.log 2>&1 & ``` ### Print workflow to file ```bash cwltool --print-dot run.cwl | dot -Tsvg > run.svg ```