From b74acf24fc9026ae31158a274b9674fde7b290d7 Mon Sep 17 00:00:00 2001
From: Viktoria Petrova <vipet103@hhu.de>
Date: Mon, 20 Jan 2025 14:45:56 +0100
Subject: [PATCH] add metrics assay and protocol

---
 assays/Metrics/README.md                    |   0
 assays/Metrics/dataset/.gitkeep             |   0
 assays/Metrics/isa.assay.xlsx               | Bin 0 -> 6891 bytes
 assays/Metrics/protocols/.gitkeep           |   0
 assays/Metrics/protocols/MetricsProtocol.md |  24 ++++++++++++++++++++
 5 files changed, 24 insertions(+)
 create mode 100644 assays/Metrics/README.md
 create mode 100644 assays/Metrics/dataset/.gitkeep
 create mode 100644 assays/Metrics/isa.assay.xlsx
 create mode 100644 assays/Metrics/protocols/.gitkeep
 create mode 100644 assays/Metrics/protocols/MetricsProtocol.md

diff --git a/assays/Metrics/README.md b/assays/Metrics/README.md
new file mode 100644
index 0000000..e69de29
diff --git a/assays/Metrics/dataset/.gitkeep b/assays/Metrics/dataset/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/assays/Metrics/isa.assay.xlsx b/assays/Metrics/isa.assay.xlsx
new file mode 100644
index 0000000000000000000000000000000000000000..ca5a7eeab188f221e28c5dbbc81b96c23d671d69
GIT binary patch
literal 6891
zcmai3by(Ex(xzFuVd<3aR#Grv>1OF#y1P-NYbjAWL=XfFx?AZk0VSo8URu9Z4(hA#
z`OfplUVH7%J@Z`m%+Ad3UR4ET6ifsJ1OP&gt)O1QObZ4W5dmQr1p$Ew{;9sKlcPJt
z(cM_n#~I>g$nEW5Uz)7s*v5+=dMt5wnG1w%sHt_gEw7@NwCE6F)LVv=YG(gw^NbfE
zYcV@ru?Z1Js>qICRtLFCRjm93P);m8T1}W3V|s+mfvol<qqzXekPFa$=tAoDMwT#X
z(NTQPCqE>e>vLY*u#~(!Bo;??MO?2;lUoq~RG2eA+!X^W`M7=I5oczswK;`8o?1+y
zZfj4+GNc!Jd>e^^u=~Jh>a8o{CrU$^<vIiXqwqu;0J)bZMR8>zB36jFyfw!vVtA8R
zmf{d*5_!l88qUk*iLO|zUY^yI50E4xHbbp-!p9ssIMo)*$iaCOe8S9|6D5ca=$9AD
zif{GriQu{(?sOBJ1}uH`4G0-1K$ml|ig%QN4OBo4WTlS?kZEJm(NDGm7Lp8&<v)M5
zd4Qua;E?>hd|rHy0O#bJ;n$4TDzPK`5c-YYXA*=T&NkkdiE)H(py+!_l`K-K{wEU%
zKQaL)|K~x2zcY4)*t_vu+gxR9KU;o!9sZpH5&{C&PaE#<mZgcU%I&<^(mVclm(SB2
z<p8<Q>*?}8G7Cz!Fga3I#y;zK^kpY9UK%^r20hMYYt^cEvtK-RBUT+oh0Xbrf*7@T
zAbg;u_p6Ny9y=CVI3qg0b3S(ZishvuBdDmm5eg=e>q7HQILYy-l2Cfe5l7sk(rY8b
zHaNB(;n@;*SX{Q_e%Ae9C8VfoKzh3F#*0h75g~g=eW&$=ZPPqDE4_;zRtrVy$cH{-
zI~id*E4l2dL#u|`ll2!8g;{b#xj9uC)&v?w%8R{v%^p-Uh8U}ZF$yx(-)gDSLW3E-
zz!>Hi{39$PL;eWYjQq_SGMo`_`=6BkZGdrQ@Ndk9Ful?K&TK85S&HAyyqsL^%$%I;
zzW)-giMDO$#lN$2iTUPO&r=V?mf*(r&_MqfIi#XW(O%wK#NqN-i&BmXkc*z(Q8JbK
zUT(AhSTsUKaSZ)c4m${cP3d*x4za)B%()eQ$F#l#3KWx>X===CQ-n%zmD)9cDM3k`
z&AD&yE{GrZzS7-@wF4!}kry|6$GTti8+oznKu(!PXOTCsRNj<hjtpqirLiprHOS7u
z(2O75j9EJcg)n7>_3=<--m*ZwZ)iaqKe?o%@Xqa1r`uV!IPOb*Y`kdgd3Txf0XC`K
zhRD+s(!QlcUSs<0Zu))lWx=z%{qKv&efE|%X#YFWxK{*yr^d}10&)MdOsk{R>O$Z|
zw!sMp{$=|E!uOre$0=$~>Ad(4=8iey=O{%*nCOv;J=H4Z(*+$eVOv0=5~O=oNmb`Z
z^`Z*PWjW(`38IvexaUn72kEP2vA8mnMQ6b+ob(B6wZ#_A99@Iq=PyYjV;}*YdgIZ-
z4FFlk9=y|}5oiHH+daUV(rpU6a1ScVPZ4pcsT6mh0+l1w<-iR?do(WH&omp;Pno+J
zzC7LPj(k0_WBQ^l#lW6;&MU8)C4!Wz{ArWTvdKV2Tr>J4-(?MLS<MF89eW7JcJ!mf
z7fYLi^+q;T4twR<rUNIb+rc6y%C4m`#oO2}K|}%P{ossFIran4A{mvA#7G`@7gxy+
zYkW=L$NppPZl_ztclu$amt|R!7`1ksU(`ShmDQzNon$WfT*@R&J`ymsO2<A(U73+^
zo3p{I;QaN^B$SrmRJ$YMg&yQD`z>+j06U-$cj_>4(#m^3r#&<(ejsfR6BG|Z<90#6
z(e{j9d}Gff!;8AD9_H<yA%eHiFY)O}lm}<wIgOL#fPTm>DGd+xDwnLc0<;>qS}Nns
z0j;K5h0s91LZA3FjEAnSe7UH#zc7qPA@N((S-n#$`&6n)npbTPKOeur)jDsDzKda3
zC^)h%oGzh71cMnw8*G4N%>494AYcT;KifnG0l<&x|J<Yv5s--K|I$S0VP9~3lQ^#_
z9?skFGuTj(C4mLr(RS{)X(GQy2NSWq$LpC}N@G#e51kngi=cIOFVVb0wJNkGioZ|^
zO;v?T-g|MoG+6?DFY;V=>Jp!gi|@AA8EaQg#kiuLKk$7JM~1b?qYg{ia-S;Q5$sy;
zvA2m(MIaUNdnp!!H+s|PM+(|iE}0)fpTL6p=$fdSd!(+b{HnoDtg9aC!N(yxd>j&A
zRj!-0sVl@n!`;=!(aP;d)h4N^gz(~b?gT~^c&*o=n`6JGM<XNxW0asnNi3f*Up~q-
zOL21g0DLABK_zzJx1HW+S0lzDFWHjEdn-?*HH>hm(k4Xm?&0Tn!#TC!xXmzyXY_f<
zbRRo)=<^tjiuv+N-9Ac~>sIH8q7@)-i`_RT@%n15f-Q#C&`R3E)?Jm}&!u5t4YD1z
zBOBgMNi>Q+rlD!Va*1HTHO$b|+Na<r3%BR=Cm}eMcAcNnSIwcmiA{Y0Wk7!&#pHn_
zimL=`^hK_}r=R+1Lb^vVU57fyu`*#rkaOw`xzt67!_c%iel~V96<vV-;Z|+h$%;88
z7<V9wB2}8k0MrUz^l_c65mwKKzzkMw!2QiyC-bMK*)=Movq{Y_6sNwK)LM5vBUN^e
zj5gU2U?q#pJ4!W%brdFj+hS2Nzr6P(bxBRGHPiT<L5BjPSHv(vNIq?2#jNAP3Mz$o
zos0k4x!tWH4v?P>{wigO141k@@PXMuceTO5C%~22kJlf4W*e;HG$Ty>jXn5FnT7s-
zc8;TIuji<+P4K(zAx6egQge`?9DAf1F>rd68yU^Log3M6#uTV+f~vjs7LO>Wb0Q!@
zLjEh`v}Ssh&9JEpLVAAS&ffcD8d~Fymy;^8wp8ZafM`9-(<lcD(ftqcca~CooQ8OW
z6lLXzaR?v;j;R8ieVdbId0gc4_>^?ikz8Z^$ZUoBDL4Fq_ZBhJ7XewZK)i$yZd!fW
zIAv)&xzZ?y$rAj8E<pz3;E~4Rh?6NG6MO&jeVWi$a#0p7g%>R<R(%TjQw#YS7ZItu
ztRis)4yn9@@tN4WMS?g{AtC5oqsmeR9^1v0#CdO}g;^$5q}*z2Tb$M9KcAd9v#p7i
zY1r``>F{5s*BCYGY>ua#sUz$M=-Pmvl^gA4+!RLdNu&7q(YIPpd-vw$IFA2}(pO@C
zk5Dh+-OHxxx2CwsLLYG&W^6p`iUYkuUpdTgde(1_k@#a`vPLpfTF;bfHFnxQi6?kA
zqvt*9ady;0)g%M1UF%t91QXlQ-!f~!doWfjm)I3w&PGE|>r9ik98Y8(9)k>MV2gVC
zz_i}%c$?jyB}jQ1BoS4`+aqr!+cX6PodDZbE!g!}xQ1+3=y;49MEx!<w)S?cRI)N8
zzIsR=7t9Jrz4NJ^lnOW)#51M{`UYFtES%feV}J%7_3FI(dUBw&@myxJ-s)ABl;7#)
zWYFb>CbE;8`_phdxrk5~^lYFP7=SgV_yF&;ULG6i4lxEXf;#e6w>09XRQlp}kJq+s
z=T!WmxZaGC>NPYd;wg8K8>mAtouBYcn;O(-%yb)^e(o67mAVN%sHq1~Z?3>5sAi)J
z@%Qf(j$pl(amx;_9tPgmrRWwAZq;RkJ;3iQtM6D<e4QXjga7PN!~_$vE9;TW3=>Lo
zpF2wHc$9<Z4kh(HkB#~h-Mej7(wc>Y8l5pBs+?*sATcs&_rK7QugTH$)w}{4$n72i
zH7pOqbOD)tG}&zKPO@Rc^#M-?WJtPiq^`w_MZ6Jz+skb9psgrHATn4SR`66hg26mn
zo(rSkOME_FptM?U{SFc5oP++{BkX*<_+|M%4}By752=AQfNAEwFy7;>>6JT_)S2ET
zAu!x`#F;tt@7^`0-x5#fmsr<Fw0P`LQr^#%|IT&<`uYJsTale$+zlCw56I${sQ#W-
zc9B+LdH8z_{a(zrU5E3@qXi=p8!E|~An&&}M@+Cg5d+UXq@+Q*Hv)2@ljP7(9?*w<
z!+u@!UMUYd0v%_t@jK5Rkkhg9y04SoF>922o2mXrmqSEdWV;7vLa&i1ajmp-Q18Q1
zVA|W&XC&fe#v8=XlaG)J0mrnQc6F?AFun|?CtCdwt%INv+bGyRgv$-wYcRo|U6KNR
z%%lC4OQrY*r~hXGhGd#9@=h@ZE`OA*OH*33QDycD|8_x~oSWIot+6*jGiX-(SPYLI
zAt;%qx!!8m?bCvW@<EV5Eaj?RdcK_t+?1BjKJPC+X9m#KQiPA#@!^K1_1&{O8lS0u
zepVqu>VxE1e0ag+npDJ8OhI_GaVnR*9c?x|_S!mmuA{BT2NjnE`G!-wA&WW60vf%9
z;ze>dPlMk=7B_!E-By&PkJ4(JcxilBXgY|2EDVu~X0x4aYtvF=XCo@*UgI90zk|aH
z8by)B5(>;<992j%t*VdUVicv{8T0H;Vi7*|eXh|bZnDiPmZ`4v^>QfWoB_$DQcQ$y
zj}r0CBzDj!N^2f-c$^>Hn5>YJ>IH8qzk5aigS2)PV-51(m@cFRDBt2l8$oZ~?GVl9
zYmRPFb!^N4I?II^=()*xbJyr-8z|@;l)f;iDyd@48%5|>N%rR-oVW3lXQ_@$UOy}5
z%*4QZe;F<6F~?nzB8Y_>?`ukW?j7zQr=>O6Grm%1*&cYnwzq&T6Q?JuQd(T7?e1w{
zayve~OxvUylq({&E2s<A<IvG5XpGf&@bpGWm<zQPDp0rJd9R#%0V6$505i+oC;RYO
zlY)<h>mYZ#iu%=}?~VEe7)W{ijRH=!Ay=hgwceLnNc~KaIMTB&I?nnJvjHbo*#j$2
zVCyIEIs<B0D_@;(R_N-X>lZoux)_%c8_!ara2~wYUKe@UuUtA-Qgwc#aK<J%&8Spk
zW1_*gdhTfUUir5!!8ncAqh-FzrIW1VJy|-gww^N0!<^%ppy%6TGqG8tfr8aSCH{N$
zyiPrv)6Lg2?rM0Uiajv$gm22o@BxPNpMAsK#~$)y(RiM02w#th!*-52IOjBwaX2F%
z*xE#C7NxTbC3$W>kS{qyCLc{uI^PjPBP8^gE`asCe+A7xI1{K{&zI)u6Et{9jIJYj
zPm!c<+4jletMu7)^D*6G5PO9go8)V>ISbmvcQhz|(*l+R_?k(Jjg?1<@dZ-?^64Cg
z0;r6Sg%kDG(x`20+pRy;$g9T>9h)<->3Gv|LiVKa<@??A{X$cOD*by=tHZNG6F5_j
z#h&AQ#B`oU70}D!j!^P08B%x^C1(WuxYlEYn7mfJi0ZfPaho)RlnCRs6C*0YUc2c6
zSltFfskx>No(jkj3-7gh-u!YF=cY?nPofN#Lwu-uUpI<7d#z{=8M`{wtz3~dwA-#z
zQAQVhfZI~{ht1o#Nc`<;h`LpCohasA>6Oc5!~tL=O2<^So9;tECtRVYYR*DvirL4|
zt<2nTx|jjxL!7aXwi<0(8L%Z*NTkW*P>ir0SXpHqv*VE7)EXz-&5t&BLuF~KVh~NR
zm^+f{U(FR`dFs%Q_ZUo{^LIM$ecXW311Z`O1>C8=WeC5Xi@j^;(JhsGE<<*h6g4Yf
za1!lDX(p57&>Q-B1260$*fVJ|aIEgVD;O1c<E1sn){5ZMMYgYfpie!!$wytr71ZSg
zC#{*9`1KSe<#@5%`ePJXi`iZ;q0vW%G`?;@LdTYqocpB1w%d2hXU2Uqc`k;UFFjzk
zb9<KW!_nd)jC8#*(bWA?N_tU)w;o*verti%9^VC?XXmvX@7qc3TJmN39h}_S-)99(
z?%8{geI8CdUW;uGQ3*ETr{&^$qUhYoz3nuhCCaq+6+*D=O=xk}R6ciIp#Q#vSvZ-i
zx;i=k`NY@#xg8P$-<z4?&*1p*I{s;9>g@b|XU<H#?KH!S9lCQ)680rF!;v*iTAMGQ
z(pUElg8a-<OSe0xW(rDzP~zoTH}2xCP33xtM7+Ulw&EGt)hHUG5Az&v43hJton#y?
zn+T!xPCD6lZ(}#1k%?<BMld-Zp;yY&o0&yxE~BVCwoY9Ff%v&RrK81RJ|iJxh^zeD
z5Nis%UGAroxDvgaZyOT4O%8=7bX+ir3p=z1zftO;3+xV$egV2=X&allPav#jkKs+^
zOXn^uZnoxVxJqi?w2od80!wbFcl#7op;48&y>@INU_YHuJqEN72%8Yuj2{%=Z>J}*
z2wE*Qj9w(e#}$3NMi1WQRIbwCvn7j63dCIx)w{!dqi}_F^;2zC_Mxkwg^Nuxk=(?S
z2(ez=_50-yJHF8w4Mg5Y<1?>7k$MP-+rb~CJ?99dt^-K!s48zDif_5>DsR*u@f-rZ
z7I1-s3QhLK4p<nQ7u=(aiZ~wGHc6Bho1_;so8_yQCZdsrqaQXu_vc;crYzma@B=j&
z)_g_2PV80ZImO<3X9UmK61@8m{GKs$Cs)Xil&SZ^1tt831CoTD$RtfV;^##uFId>=
z)le2Kw;b%!rcxSj^v^AcrwQk=IEsmtF9nDiQ@!LrLEOyOc1yg&hav64l|MbuJ>yIM
zDXE2CBlxjiYlh%_S6)qbZFf!8u305wG@-$KY;tc+Y7%u`H>-y~I*D6?4y%e|MJdqN
zx_XSa$x1)RhH7pYGae%>rYbC#oVweD?Zny!w`vO;rBGD{6_Z*iFC?oHQ01zFb!$Zi
zEN_o(@HneWEx_HCc*28vyj*$XTxIS3#%U#`P;#*$3A*}nu+=P*tV6U&rB0pp&Bm@u
zaS&(j+KB8Lqu0q9T3moHiX`cL&11C7OkdZ;-PI{s4LgnuE0!0H#IjSiGlyg3KC1lT
z96?`VQTB#1{4Nzfw>zGbOQQs`fSX&14kZ2j^?2kH-mN<TSxhNjC4p0z!Bjz25-J;m
zVdJ}9&D1Jr?fywY;0vUGCc~0THqZc`40m`k@UF(d@48cLqLK<++Y8&dh+;i1<$8oT
zJ}nQ3_q<K@fMZ|XD>{Xzb6xkFgya1Ld@^pAtq!QM7`RkrB7*VW$S1A0Pgm7*G@6DT
zb-Jr33qO<+C$|WhG_-?Ya;}EPC6WQ+_Ki}20WZz&08nlpzq((Av>WZvp8V#s&<@90
z2r+|<tY3$Pl4t5_Z(p$Iz9Q>V5U|OivJL&!t!VB{dIB6KZ52Oi>pbugZAPf~TAQK|
zCGgGoy~4l{kf~Pfhnuvfb~T$CE)oeo=jaZVQCa@!D=}xJr-gG2gHemKu8p>pt;Yvr
zs3E)9o-fSud8OCi3}d8&T$Prj_H^hx7n(-HGVAX)#e**o2zk^5r(~=~I9{f$T@>`b
zY+%iwPsCVK7`v}35gtCPMjfEfSC(!mO5~`Sk0JR0aJEOp6h|dwLxX|ECNvk5zG902
zA(DTnR${@+xL(l3<MfU)Y2XY?RRIx+7~#JQ;_z(zcK$^k|K;HCj`>~3y*l5UViY{X
z*V^MhgZ`=OUZW5Y%6?*gh>!nYu<!Ec)%hN*1zt)&#L(+Vf8XxA9C~%W0~W3T|9-(g
z<<V;({BHz!Sa_T(*zZOAKkxfpMZ7xS!?VE^7r38)Xo}ZYzv@_js*2ZW1O%<Wu)j3M
zUyNOO`BUNf<pmG@#|{3{dwzNNUlgVTet5W6DgJr=KXr;<f!ltE{ZcJ{#r^kP@t1=j
zc+Y@y=k{}R{004ASEeKUZCL+;T}AwJs{e|3|99AbX8fP%KPSvD^eOzGrk@+a^}P8L
j{%2481rH<nSt{4cys84~kKY<TeEWC@uSh(y@8<spkz!Kr

literal 0
HcmV?d00001

diff --git a/assays/Metrics/protocols/.gitkeep b/assays/Metrics/protocols/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/assays/Metrics/protocols/MetricsProtocol.md b/assays/Metrics/protocols/MetricsProtocol.md
new file mode 100644
index 0000000..c09714a
--- /dev/null
+++ b/assays/Metrics/protocols/MetricsProtocol.md
@@ -0,0 +1,24 @@
+## Metrics
+
+Five metrics were used to evaluate model performance, the Poisson loss, the Pearson correlation coefficient (Pearson’s r), precision, recall, and F1.
+
+The most prominent peak caller for ChIP-seq data, MACS (Zhang et al. 2008), which was also frequently used for ATAC-seq data (Hiranuma et al. 2017, Thibodeau et al. 2018, Hentges et al. 2022), assumes that the ChIP-seq coverage data is Poisson distributed. Therefore, PyTorch’s Poisson negative log likelihood loss function (Poisson loss) was used as the loss function for all models (Equation 1).
+ 
+(1) $$loss=\frac{1}{n} \sum_{i=1}^n e^{x_i} - y_i \ast x_i$$
+
+The individual samples of the predictions $(⁠x⁠)$ and the targets $(y⁠⁠)$ are indexed with $(i)$⁠. The sample size is denoted with $(n)$
+ (https://pytorch.org/docs/stable/generated/torch.nn.PoissonNLLLoss.html). This version of the Poisson loss caused the network to output logarithmic predictions. The desired, actual predictions were thus the exponential of the network’s output. The exponential distribution only consists of positive real numbers like the ATAC- and ChIP-seq read coverage.
+
+To measure the “accuracy” of the model’s predictions, i.e. translating the Poisson loss into a human-readable number, the Pearson’s r was chosen (Equation 2), measuring the linear correlation between two variables.
+ 
+(2) $$r=\frac{\sum_{i=1}^n (x_i - \overline{x}) (y_i - \overline{y})}{\sqrt{\sum_{i=1}^n (x_i - \overline{x})^2 \sum_{i=1}^n (y_i - \overline{y})^2} + \epsilon}$$
+
+The sample size is denoted with $n$⁠, the individual samples of the predictions $(x⁠)$ and the targets $(⁠y⁠)$ are indexed with $i$⁠. The additional epsilon $(\epsilon)$ equals 1e-8 and is used to avoid a division by zero. A value of 1 represents a perfect positive linear relationship, so Predmoter’s predictions and the experimental NGS coverage data would be identical. A value of 0 means no linear relationship between the predictions and targets. Finally, a value of −1 represents a perfect negative linear relationship.
+
+Precision, recall, and F1 were used to compare predicted peaks to experimental peaks for both test species (Equations 3–5). A F1 score of 1 indicates that the predicted peaks are at the same position as the experimental peaks. The lowest score possible is 0. Precision, recall, and F1 were calculated base-wise. Called peaks were denoted with 1, all other base pairs with 0. A confusion matrix containing the sum of True Positives (TP), False Positives (FP), and False Negatives (FN) for the two classes, peak and no peak, was computed for the average predicted coverage of both strands. Precision and recall were also utilized to plot precision-recall curves (PRC). The area under the precision-recall curve (AUPRC) was calculated using scikit-learn (Pedregosa et al. 2011). Flagged sequences were excluded from the calculations (see Section 2.1.2). The baseline AUPRC is equal to the fraction of positives, i.e. the percentage of peaks in the training set (Saito and Rehmsmeier 2015). The peak percentages were calculated using the Predmoter’s compute_peak_f1.py script in “side_scripts.” The percentages are listed in Supplementary Table S8.
+
+(3) $$precision = \frac{TP}{TP+FP}$$
+
+(4) $$recall = \frac{TP}{TP+FN}$$
+
+(5) $$F_1 = 2 \ast \frac{precision \ast recall}{precision+recall}$$
\ No newline at end of file
-- 
GitLab