# CWL

**Data analysis** in this ARC is packaged and made reusable via [Common Workflow Language (CWL)](https://www.commonwl.org).
For details, visit the [DataPLANT knowledgebase](https://nfdi4plants.github.io/nfdi4plants.knowledgebase/cwl/).

Briefly, every data analysis step (`runs`) is described with a `run.cwl` document. The `run.cwl` points (i.e. executes) one or multiple `workflows` (stored as `workflow.cwl`). The input parameters required for the `workflow.cwl` are documented in the accompanying `run.yml`. A `workflow.cwl` can be a single command line tool or a more complex workflow pipeline that references and combines other `*.cwl` documents.

```bash
...
├── runs
│   ├── fastqc
│   │   ├── run.cwl
│   │   └── run.yml
│   ...
├── studies
│   ├── ...
│
└── workflows
    ├── fastqc
    │   ├── collectFilesInDir.cwl
    │   ├── fastqc.cwl
    │   └── workflow.cwl
    ...
```

```mermaid
flowchart TD
workflowcwl --- runcwl
subgraph r["runs/fastqc/"]
    runcwl(("run.cwl"))
    runyml(("run.yml"))
end
i[input: DB_097_CAMMD_CAGATC_L001_R1_001.fastq.gz] --- runyml
r ---> o[output: DB_097_CAMMD_CAGATC_L001_R1_001_fastqc.html]
subgraph "workflows/fastqc"
    workflowcwl(("workflow.cwl"))
end
```


## Setup and dependencies

Again, for details check the docs linked above.
Executing cwl documents requires a cwl runner, e.g. [cwltool](https://github.com/common-workflow-language/cwltool).
Software and package dependencies are ideally covered by Docker or Conda and described in the hints / requirements sections of cwl documents (e.g. `DockerRequirement` and / or `SoftwareRequirement`).

Additional dependencies may exist for one or the other workflow (e.g. a local installation of R or F# or packages therein), if the workflow is not yet packaged perfectly reusable.

## Default cwltool commands

Here's a list of frequently used `cwltool` commands to validate or execute runs and workflows.

### Validate document

```bash
cwltool --validate run.cwl
```

### Execute workflow in `./runs/*`

```bash
cwltool run.cwl run.yml
```

### capture log and run in bg

```bash
cwltool run.cwl run.yml > $(date +"%Y-%m-%d_%H-%M")-run.log 2>&1 &
```

### capture log, run in parallel and in bg

```bash
cwltool --parallel run.cwl run.yml > $(date +"%Y-%m-%d_%H-%M")-run.log 2>&1 &
```

### Print workflow to file

```bash
cwltool --print-dot run.cwl | dot -Tsvg > run.svg
```