diff --git a/assays/Metrics/README.md b/assays/Metrics/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/assays/Metrics/dataset/.gitkeep b/assays/Metrics/dataset/.gitkeep
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/assays/Metrics/isa.assay.xlsx b/assays/Metrics/isa.assay.xlsx
new file mode 100644
index 0000000000000000000000000000000000000000..ca5a7eeab188f221e28c5dbbc81b96c23d671d69
Binary files /dev/null and b/assays/Metrics/isa.assay.xlsx differ
diff --git a/assays/Metrics/protocols/.gitkeep b/assays/Metrics/protocols/.gitkeep
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/assays/Metrics/protocols/MetricsProtocol.md b/assays/Metrics/protocols/MetricsProtocol.md
new file mode 100644
index 0000000000000000000000000000000000000000..c09714a2f19afcff40d0c8fb83023a4d04c2e47a
--- /dev/null
+++ b/assays/Metrics/protocols/MetricsProtocol.md
@@ -0,0 +1,24 @@
+## Metrics
+
+Five metrics were used to evaluate model performance, the Poisson loss, the Pearson correlation coefficient (Pearsonâ€™s r), precision, recall, and F1.
+
+The most prominent peak caller for ChIP-seq data, MACS (Zhang et al. 2008), which was also frequently used for ATAC-seq data (Hiranuma et al. 2017, Thibodeau et al. 2018, Hentges et al. 2022), assumes that the ChIP-seq coverage data is Poisson distributed. Therefore, PyTorchâ€™s Poisson negative log likelihood loss function (Poisson loss) was used as the loss function for all models (Equation 1).
+ 
+(1) $$loss=\frac{1}{n} \sum_{i=1}^n e^{x_i} - y_i \ast x_i$$
+
+The individual samples of the predictions $(â xâ )$ and the targets $(yâ â )$ are indexed with $(i)$â . The sample size is denoted with $(n)$
+ (https://pytorch.org/docs/stable/generated/torch.nn.PoissonNLLLoss.html). This version of the Poisson loss caused the network to output logarithmic predictions. The desired, actual predictions were thus the exponential of the networkâ€™s output. The exponential distribution only consists of positive real numbers like the ATAC- and ChIP-seq read coverage.
+
+To measure the â€œaccuracyâ€ of the modelâ€™s predictions, i.e. translating the Poisson loss into a human-readable number, the Pearsonâ€™s r was chosen (Equation 2), measuring the linear correlation between two variables.
+ 
+(2) $$r=\frac{\sum_{i=1}^n (x_i - \overline{x}) (y_i - \overline{y})}{\sqrt{\sum_{i=1}^n (x_i - \overline{x})^2 \sum_{i=1}^n (y_i - \overline{y})^2} + \epsilon}$$
+
+The sample size is denoted with $n$â , the individual samples of the predictions $(xâ )$ and the targets $(â yâ )$ are indexed with $i$â . The additional epsilon $(\epsilon)$ equals 1e-8 and is used to avoid a division by zero. A value of 1 represents a perfect positive linear relationship, so Predmoterâ€™s predictions and the experimental NGS coverage data would be identical. A value of 0 means no linear relationship between the predictions and targets. Finally, a value of âˆ’1 represents a perfect negative linear relationship.
+
+Precision, recall, and F1 were used to compare predicted peaks to experimental peaks for both test species (Equations 3â€“5). A F1 score of 1 indicates that the predicted peaks are at the same position as the experimental peaks. The lowest score possible is 0. Precision, recall, and F1 were calculated base-wise. Called peaks were denoted with 1, all other base pairs with 0. A confusion matrix containing the sum of True Positives (TP), False Positives (FP), and False Negatives (FN) for the two classes, peak and no peak, was computed for the average predicted coverage of both strands. Precision and recall were also utilized to plot precision-recall curves (PRC). The area under the precision-recall curve (AUPRC) was calculated using scikit-learn (Pedregosa et al. 2011). Flagged sequences were excluded from the calculations (see Section 2.1.2). The baseline AUPRC is equal to the fraction of positives, i.e. the percentage of peaks in the training set (Saito and Rehmsmeier 2015). The peak percentages were calculated using the Predmoterâ€™s compute_peak_f1.py script in â€œside_scripts.â€ The percentages are listed in Supplementary Table S8.
+
+(3) $$precision = \frac{TP}{TP+FP}$$
+
+(4) $$recall = \frac{TP}{TP+FN}$$
+
+(5) $$F_1 = 2 \ast \frac{precision \ast recall}{precision+recall}$$
\ No newline at end of file