CWL
Data analysis in this ARC is packaged and made reusable via Common Workflow Language (CWL). For details, visit the DataPLANT knowledgebase.
Briefly, every data analysis step (runs
) is described with a run.cwl
document. The run.cwl
points (i.e. executes) one or multiple workflows
(stored as workflow.cwl
). The input parameters required for the workflow.cwl
are documented in the accompanying run.yml
. A workflow.cwl
can be a single command line tool or a more complex workflow pipeline that references and combines other *.cwl
documents.
...
├── runs
│ ├── fastqc
│ │ ├── run.cwl
│ │ └── run.yml
│ ...
├── studies
│ ├── ...
│
└── workflows
├── fastqc
│ ├── collectFilesInDir.cwl
│ ├── fastqc.cwl
│ └── workflow.cwl
...
flowchart TD
workflowcwl --- runcwl
subgraph r["runs/fastqc/"]
runcwl(("run.cwl"))
runyml(("run.yml"))
end
i[input: DB_097_CAMMD_CAGATC_L001_R1_001.fastq.gz] --- runyml
r ---> o[output: DB_097_CAMMD_CAGATC_L001_R1_001_fastqc.html]
subgraph "workflows/fastqc"
workflowcwl(("workflow.cwl"))
end
Setup and dependencies
Again, for details check the docs linked above.
Executing cwl documents requires a cwl runner, e.g. cwltool.
Software and package dependencies are ideally covered by Docker or Conda and described in the hints / requirements sections of cwl documents (e.g. DockerRequirement
and / or SoftwareRequirement
).
Additional dependencies may exist for one or the other workflow (e.g. a local installation of R or F# or packages therein), if the workflow is not yet packaged perfectly reusable.
Default cwltool commands
Here's a list of frequently used cwltool
commands to validate or execute runs and workflows.
Validate document
cwltool --validate run.cwl
./runs/*
Execute workflow in cwltool run.cwl run.yml
capture log and run in bg
cwltool run.cwl run.yml > $(date +"%Y-%m-%d_%H-%M")-run.log 2>&1 &
capture log, run in parallel and in bg
cwltool --parallel run.cwl run.yml > $(date +"%Y-%m-%d_%H-%M")-run.log 2>&1 &
Print workflow to file
cwltool --print-dot run.cwl | dot -Tsvg > run.svg