-
Dominik Brilhaus authoredDominik Brilhaus authored
CWL
Data analysis in this ARC is packaged and made reusable via Common Workflow Language (CWL). For details, visit the DataPLANT knowledgebase.
Briefly, every data analysis step (runs
) is described with a run.cwl
document. The run.cwl
points (i.e. executes) one or multiple workflows
(stored as workflow.cwl
). The input parameters required for the workflow.cwl
are documented in the accompanying run.yml
. A workflow.cwl
can be a single command line tool or a more complex workflow pipeline that references and combines other *.cwl
documents.
...
├── runs
│ ├── fastqc
│ │ ├── run.cwl
│ │ └── run.yml
│ ...
├── studies
│ ├── ...
│
└── workflows
├── fastqc
│ ├── collectFilesInDir.cwl
│ ├── fastqc.cwl
│ └── workflow.cwl
...
flowchart TD
workflowcwl --- runcwl
subgraph r["runs/fastqc/"]
runcwl(("run.cwl"))
runyml(("run.yml"))
end
i[input: DB_097_CAMMD_CAGATC_L001_R1_001.fastq.gz] --- runyml
r ---> o[output: DB_097_CAMMD_CAGATC_L001_R1_001_fastqc.html]
subgraph "workflows/fastqc"
workflowcwl(("workflow.cwl"))
end
Setup and dependencies
Again, for details check the docs linked above.
Executing cwl documents requires a cwl runner, e.g. cwltool.
Software and package dependencies are ideally covered by Docker or Conda and described in the hints / requirements sections of cwl documents (e.g. DockerRequirement
and / or SoftwareRequirement
).
Additional dependencies may exist for one or the other workflow (e.g. a local installation of R or F# or packages therein), if the workflow is not yet packaged perfectly reusable.
Default cwltool commands
Here's a list of frequently used cwltool
commands to validate or execute runs and workflows.
Validate document
cwltool --validate run.cwl
./runs/*
Execute workflow in cwltool run.cwl run.yml
capture log and run in bg
cwltool run.cwl run.yml > $(date +"%Y-%m-%d_%H-%M")-run.log 2>&1 &
capture log, run in parallel and in bg
cwltool --parallel run.cwl run.yml > $(date +"%Y-%m-%d_%H-%M")-run.log 2>&1 &
Print workflow to file
cwltool --print-dot run.cwl | dot -Tsvg > run.svg