Skip to content
Snippets Groups Projects

CWL

Data analysis in this ARC is packaged and made reusable via Common Workflow Language (CWL). For details, visit the DataPLANT knowledgebase.

Briefly, every data analysis step (runs) is described with a run.cwl document. The run.cwl points (i.e. executes) one or multiple workflows (stored as workflow.cwl). The input parameters required for the workflow.cwl are documented in the accompanying run.yml. A workflow.cwl can be a single command line tool or a more complex workflow pipeline that references and combines other *.cwl documents.

...
├── runs
│   ├── fastqc
│   │   ├── run.cwl
│   │   └── run.yml
│   ...
├── studies
│   ├── ...

└── workflows
    ├── fastqc
    │   ├── collectFilesInDir.cwl
    │   ├── fastqc.cwl
    │   └── workflow.cwl
    ...
flowchart TD
workflowcwl --- runcwl
subgraph r["runs/fastqc/"]
    runcwl(("run.cwl"))
    runyml(("run.yml"))
end
i[input: DB_097_CAMMD_CAGATC_L001_R1_001.fastq.gz] --- runyml
r ---> o[output: DB_097_CAMMD_CAGATC_L001_R1_001_fastqc.html]
subgraph "workflows/fastqc"
    workflowcwl(("workflow.cwl"))
end

Setup and dependencies

Again, for details check the docs linked above. Executing cwl documents requires a cwl runner, e.g. cwltool. Software and package dependencies are ideally covered by Docker or Conda and described in the hints / requirements sections of cwl documents (e.g. DockerRequirement and / or SoftwareRequirement).

Additional dependencies may exist for one or the other workflow (e.g. a local installation of R or F# or packages therein), if the workflow is not yet packaged perfectly reusable.

Default cwltool commands

Here's a list of frequently used cwltool commands to validate or execute runs and workflows.

Validate document

cwltool --validate run.cwl

Execute workflow in ./runs/*

cwltool run.cwl run.yml

capture log and run in bg

cwltool run.cwl run.yml > $(date +"%Y-%m-%d_%H-%M")-run.log 2>&1 &

capture log, run in parallel and in bg

cwltool --parallel run.cwl run.yml > $(date +"%Y-%m-%d_%H-%M")-run.log 2>&1 &

Print workflow to file

cwltool --print-dot run.cwl | dot -Tsvg > run.svg