From b74acf24fc9026ae31158a274b9674fde7b290d7 Mon Sep 17 00:00:00 2001 From: Viktoria Petrova <vipet103@hhu.de> Date: Mon, 20 Jan 2025 14:45:56 +0100 Subject: [PATCH] add metrics assay and protocol --- assays/Metrics/README.md | 0 assays/Metrics/dataset/.gitkeep | 0 assays/Metrics/isa.assay.xlsx | Bin 0 -> 6891 bytes assays/Metrics/protocols/.gitkeep | 0 assays/Metrics/protocols/MetricsProtocol.md | 24 ++++++++++++++++++++ 5 files changed, 24 insertions(+) create mode 100644 assays/Metrics/README.md create mode 100644 assays/Metrics/dataset/.gitkeep create mode 100644 assays/Metrics/isa.assay.xlsx create mode 100644 assays/Metrics/protocols/.gitkeep create mode 100644 assays/Metrics/protocols/MetricsProtocol.md diff --git a/assays/Metrics/README.md b/assays/Metrics/README.md new file mode 100644 index 0000000..e69de29 diff --git a/assays/Metrics/dataset/.gitkeep b/assays/Metrics/dataset/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/assays/Metrics/isa.assay.xlsx b/assays/Metrics/isa.assay.xlsx new file mode 100644 index 0000000000000000000000000000000000000000..ca5a7eeab188f221e28c5dbbc81b96c23d671d69 GIT binary patch literal 6891 zcmai3by(Ex(xzFuVd<3aR#Grv>1OF#y1P-NYbjAWL=XfFx?AZk0VSo8URu9Z4(hA# z`OfplUVH7%J@Z`m%+Ad3UR4ET6ifsJ1OP>)O1QObZ4W5dmQr1p$Ew{;9sKlcPJt z(cM_n#~I>g$nEW5Uz)7s*v5+=dMt5wnG1w%sHt_gEw7@NwCE6F)LVv=YG(gw^NbfE zYcV@ru?Z1Js>qICRtLFCRjm93P);m8T1}W3V|s+mfvol<qqzXekPFa$=tAoDMwT#X z(NTQPCqE>e>vLY*u#~(!Bo;??MO?2;lUoq~RG2eA+!X^W`M7=I5oczswK;`8o?1+y zZfj4+GNc!Jd>e^^u=~Jh>a8o{CrU$^<vIiXqwqu;0J)bZMR8>zB36jFyfw!vVtA8R zmf{d*5_!l88qUk*iLO|zUY^yI50E4xHbbp-!p9ssIMo)*$iaCOe8S9|6D5ca=$9AD zif{GriQu{(?sOBJ1}uH`4G0-1K$ml|ig%QN4OBo4WTlS?kZEJm(NDGm7Lp8&<v)M5 zd4Qua;E?>hd|rHy0O#bJ;n$4TDzPK`5c-YYXA*=T&NkkdiE)H(py+!_l`K-K{wEU% zKQaL)|K~x2zcY4)*t_vu+gxR9KU;o!9sZpH5&{C&PaE#<mZgcU%I&<^(mVclm(SB2 z<p8<Q>*?}8G7Cz!Fga3I#y;zK^kpY9UK%^r20hMYYt^cEvtK-RBUT+oh0Xbrf*7@T zAbg;u_p6Ny9y=CVI3qg0b3S(ZishvuBdDmm5eg=e>q7HQILYy-l2Cfe5l7sk(rY8b zHaNB(;n@;*SX{Q_e%Ae9C8VfoKzh3F#*0h75g~g=eW&$=ZPPqDE4_;zRtrVy$cH{- zI~id*E4l2dL#u|`ll2!8g;{b#xj9uC)&v?w%8R{v%^p-Uh8U}ZF$yx(-)gDSLW3E- zz!>Hi{39$PL;eWYjQq_SGMo`_`=6BkZGdrQ@Ndk9Ful?K&TK85S&HAyyqsL^%$%I; zzW)-giMDO$#lN$2iTUPO&r=V?mf*(r&_MqfIi#XW(O%wK#NqN-i&BmXkc*z(Q8JbK zUT(AhSTsUKaSZ)c4m${cP3d*x4za)B%()eQ$F#l#3KWx>X===CQ-n%zmD)9cDM3k` z&AD&yE{GrZzS7-@wF4!}kry|6$GTti8+oznKu(!PXOTCsRNj<hjtpqirLiprHOS7u z(2O75j9EJcg)n7>_3=<--m*ZwZ)iaqKe?o%@Xqa1r`uV!IPOb*Y`kdgd3Txf0XC`K zhRD+s(!QlcUSs<0Zu))lWx=z%{qKv&efE|%X#YFWxK{*yr^d}10&)MdOsk{R>O$Z| zw!sMp{$=|E!uOre$0=$~>Ad(4=8iey=O{%*nCOv;J=H4Z(*+$eVOv0=5~O=oNmb`Z z^`Z*PWjW(`38IvexaUn72kEP2vA8mnMQ6b+ob(B6wZ#_A99@Iq=PyYjV;}*YdgIZ- z4FFlk9=y|}5oiHH+daUV(rpU6a1ScVPZ4pcsT6mh0+l1w<-iR?do(WH&omp;Pno+J zzC7LPj(k0_WBQ^l#lW6;&MU8)C4!Wz{ArWTvdKV2Tr>J4-(?MLS<MF89eW7JcJ!mf z7fYLi^+q;T4twR<rUNIb+rc6y%C4m`#oO2}K|}%P{ossFIran4A{mvA#7G`@7gxy+ zYkW=L$NppPZl_ztclu$amt|R!7`1ksU(`ShmDQzNon$WfT*@R&J`ymsO2<A(U73+^ zo3p{I;QaN^B$SrmRJ$YMg&yQD`z>+j06U-$cj_>4(#m^3r#&<(ejsfR6BG|Z<90#6 z(e{j9d}Gff!;8AD9_H<yA%eHiFY)O}lm}<wIgOL#fPTm>DGd+xDwnLc0<;>qS}Nns z0j;K5h0s91LZA3FjEAnSe7UH#zc7qPA@N((S-n#$`&6n)npbTPKOeur)jDsDzKda3 zC^)h%oGzh71cMnw8*G4N%>494AYcT;KifnG0l<&x|J<Yv5s--K|I$S0VP9~3lQ^#_ z9?skFGuTj(C4mLr(RS{)X(GQy2NSWq$LpC}N@G#e51kngi=cIOFVVb0wJNkGioZ|^ zO;v?T-g|MoG+6?DFY;V=>Jp!gi|@AA8EaQg#kiuLKk$7JM~1b?qYg{ia-S;Q5$sy; zvA2m(MIaUNdnp!!H+s|PM+(|iE}0)fpTL6p=$fdSd!(+b{HnoDtg9aC!N(yxd>j&A zRj!-0sVl@n!`;=!(aP;d)h4N^gz(~b?gT~^c&*o=n`6JGM<XNxW0asnNi3f*Up~q- zOL21g0DLABK_zzJx1HW+S0lzDFWHjEdn-?*HH>hm(k4Xm?&0Tn!#TC!xXmzyXY_f< zbRRo)=<^tjiuv+N-9Ac~>sIH8q7@)-i`_RT@%n15f-Q#C&`R3E)?Jm}&!u5t4YD1z zBOBgMNi>Q+rlD!Va*1HTHO$b|+Na<r3%BR=Cm}eMcAcNnSIwcmiA{Y0Wk7!&#pHn_ zimL=`^hK_}r=R+1Lb^vVU57fyu`*#rkaOw`xzt67!_c%iel~V96<vV-;Z|+h$%;88 z7<V9wB2}8k0MrUz^l_c65mwKKzzkMw!2QiyC-bMK*)=Movq{Y_6sNwK)LM5vBUN^e zj5gU2U?q#pJ4!W%brdFj+hS2Nzr6P(bxBRGHPiT<L5BjPSHv(vNIq?2#jNAP3Mz$o zos0k4x!tWH4v?P>{wigO141k@@PXMuceTO5C%~22kJlf4W*e;HG$Ty>jXn5FnT7s- zc8;TIuji<+P4K(zAx6egQge`?9DAf1F>rd68yU^Log3M6#uTV+f~vjs7LO>Wb0Q!@ zLjEh`v}Ssh&9JEpLVAAS&ffcD8d~Fymy;^8wp8ZafM`9-(<lcD(ftqcca~CooQ8OW z6lLXzaR?v;j;R8ieVdbId0gc4_>^?ikz8Z^$ZUoBDL4Fq_ZBhJ7XewZK)i$yZd!fW zIAv)&xzZ?y$rAj8E<pz3;E~4Rh?6NG6MO&jeVWi$a#0p7g%>R<R(%TjQw#YS7ZItu ztRis)4yn9@@tN4WMS?g{AtC5oqsmeR9^1v0#CdO}g;^$5q}*z2Tb$M9KcAd9v#p7i zY1r``>F{5s*BCYGY>ua#sUz$M=-Pmvl^gA4+!RLdNu&7q(YIPpd-vw$IFA2}(pO@C zk5Dh+-OHxxx2CwsLLYG&W^6p`iUYkuUpdTgde(1_k@#a`vPLpfTF;bfHFnxQi6?kA zqvt*9ady;0)g%M1UF%t91QXlQ-!f~!doWfjm)I3w&PGE|>r9ik98Y8(9)k>MV2gVC zz_i}%c$?jyB}jQ1BoS4`+aqr!+cX6PodDZbE!g!}xQ1+3=y;49MEx!<w)S?cRI)N8 zzIsR=7t9Jrz4NJ^lnOW)#51M{`UYFtES%feV}J%7_3FI(dUBw&@myxJ-s)ABl;7#) zWYFb>CbE;8`_phdxrk5~^lYFP7=SgV_yF&;ULG6i4lxEXf;#e6w>09XRQlp}kJq+s z=T!WmxZaGC>NPYd;wg8K8>mAtouBYcn;O(-%yb)^e(o67mAVN%sHq1~Z?3>5sAi)J z@%Qf(j$pl(amx;_9tPgmrRWwAZq;RkJ;3iQtM6D<e4QXjga7PN!~_$vE9;TW3=>Lo zpF2wHc$9<Z4kh(HkB#~h-Mej7(wc>Y8l5pBs+?*sATcs&_rK7QugTH$)w}{4$n72i zH7pOqbOD)tG}&zKPO@Rc^#M-?WJtPiq^`w_MZ6Jz+skb9psgrHATn4SR`66hg26mn zo(rSkOME_FptM?U{SFc5oP++{BkX*<_+|M%4}By752=AQfNAEwFy7;>>6JT_)S2ET zAu!x`#F;tt@7^`0-x5#fmsr<Fw0P`LQr^#%|IT&<`uYJsTale$+zlCw56I${sQ#W- zc9B+LdH8z_{a(zrU5E3@qXi=p8!E|~An&&}M@+Cg5d+UXq@+Q*Hv)2@ljP7(9?*w< z!+u@!UMUYd0v%_t@jK5Rkkhg9y04SoF>922o2mXrmqSEdWV;7vLa&i1ajmp-Q18Q1 zVA|W&XC&fe#v8=XlaG)J0mrnQc6F?AFun|?CtCdwt%INv+bGyRgv$-wYcRo|U6KNR z%%lC4OQrY*r~hXGhGd#9@=h@ZE`OA*OH*33QDycD|8_x~oSWIot+6*jGiX-(SPYLI zAt;%qx!!8m?bCvW@<EV5Eaj?RdcK_t+?1BjKJPC+X9m#KQiPA#@!^K1_1&{O8lS0u zepVqu>VxE1e0ag+npDJ8OhI_GaVnR*9c?x|_S!mmuA{BT2NjnE`G!-wA&WW60vf%9 z;ze>dPlMk=7B_!E-By&PkJ4(JcxilBXgY|2EDVu~X0x4aYtvF=XCo@*UgI90zk|aH z8by)B5(>;<992j%t*VdUVicv{8T0H;Vi7*|eXh|bZnDiPmZ`4v^>QfWoB_$DQcQ$y zj}r0CBzDj!N^2f-c$^>Hn5>YJ>IH8qzk5aigS2)PV-51(m@cFRDBt2l8$oZ~?GVl9 zYmRPFb!^N4I?II^=()*xbJyr-8z|@;l)f;iDyd@48%5|>N%rR-oVW3lXQ_@$UOy}5 z%*4QZe;F<6F~?nzB8Y_>?`ukW?j7zQr=>O6Grm%1*&cYnwzq&T6Q?JuQd(T7?e1w{ zayve~OxvUylq({&E2s<A<IvG5XpGf&@bpGWm<zQPDp0rJd9R#%0V6$505i+oC;RYO zlY)<h>mYZ#iu%=}?~VEe7)W{ijRH=!Ay=hgwceLnNc~KaIMTB&I?nnJvjHbo*#j$2 zVCyIEIs<B0D_@;(R_N-X>lZoux)_%c8_!ara2~wYUKe@UuUtA-Qgwc#aK<J%&8Spk zW1_*gdhTfUUir5!!8ncAqh-FzrIW1VJy|-gww^N0!<^%ppy%6TGqG8tfr8aSCH{N$ zyiPrv)6Lg2?rM0Uiajv$gm22o@BxPNpMAsK#~$)y(RiM02w#th!*-52IOjBwaX2F% z*xE#C7NxTbC3$W>kS{qyCLc{uI^PjPBP8^gE`asCe+A7xI1{K{&zI)u6Et{9jIJYj zPm!c<+4jletMu7)^D*6G5PO9go8)V>ISbmvcQhz|(*l+R_?k(Jjg?1<@dZ-?^64Cg z0;r6Sg%kDG(x`20+pRy;$g9T>9h)<->3Gv|LiVKa<@??A{X$cOD*by=tHZNG6F5_j z#h&AQ#B`oU70}D!j!^P08B%x^C1(WuxYlEYn7mfJi0ZfPaho)RlnCRs6C*0YUc2c6 zSltFfskx>No(jkj3-7gh-u!YF=cY?nPofN#Lwu-uUpI<7d#z{=8M`{wtz3~dwA-#z zQAQVhfZI~{ht1o#Nc`<;h`LpCohasA>6Oc5!~tL=O2<^So9;tECtRVYYR*DvirL4| zt<2nTx|jjxL!7aXwi<0(8L%Z*NTkW*P>ir0SXpHqv*VE7)EXz-&5t&BLuF~KVh~NR zm^+f{U(FR`dFs%Q_ZUo{^LIM$ecXW311Z`O1>C8=WeC5Xi@j^;(JhsGE<<*h6g4Yf za1!lDX(p57&>Q-B1260$*fVJ|aIEgVD;O1c<E1sn){5ZMMYgYfpie!!$wytr71ZSg zC#{*9`1KSe<#@5%`ePJXi`iZ;q0vW%G`?;@LdTYqocpB1w%d2hXU2Uqc`k;UFFjzk zb9<KW!_nd)jC8#*(bWA?N_tU)w;o*verti%9^VC?XXmvX@7qc3TJmN39h}_S-)99( z?%8{geI8CdUW;uGQ3*ETr{&^$qUhYoz3nuhCCaq+6+*D=O=xk}R6ciIp#Q#vSvZ-i zx;i=k`NY@#xg8P$-<z4?&*1p*I{s;9>g@b|XU<H#?KH!S9lCQ)680rF!;v*iTAMGQ z(pUElg8a-<OSe0xW(rDzP~zoTH}2xCP33xtM7+Ulw&EGt)hHUG5Az&v43hJton#y? zn+T!xPCD6lZ(}#1k%?<BMld-Zp;yY&o0&yxE~BVCwoY9Ff%v&RrK81RJ|iJxh^zeD z5Nis%UGAroxDvgaZyOT4O%8=7bX+ir3p=z1zftO;3+xV$egV2=X&allPav#jkKs+^ zOXn^uZnoxVxJqi?w2od80!wbFcl#7op;48&y>@INU_YHuJqEN72%8Yuj2{%=Z>J}* z2wE*Qj9w(e#}$3NMi1WQRIbwCvn7j63dCIx)w{!dqi}_F^;2zC_Mxkwg^Nuxk=(?S z2(ez=_50-yJHF8w4Mg5Y<1?>7k$MP-+rb~CJ?99dt^-K!s48zDif_5>DsR*u@f-rZ z7I1-s3QhLK4p<nQ7u=(aiZ~wGHc6Bho1_;so8_yQCZdsrqaQXu_vc;crYzma@B=j& z)_g_2PV80ZImO<3X9UmK61@8m{GKs$Cs)Xil&SZ^1tt831CoTD$RtfV;^##uFId>= z)le2Kw;b%!rcxSj^v^AcrwQk=IEsmtF9nDiQ@!LrLEOyOc1yg&hav64l|MbuJ>yIM zDXE2CBlxjiYlh%_S6)qbZFf!8u305wG@-$KY;tc+Y7%u`H>-y~I*D6?4y%e|MJdqN zx_XSa$x1)RhH7pYGae%>rYbC#oVweD?Zny!w`vO;rBGD{6_Z*iFC?oHQ01zFb!$Zi zEN_o(@HneWEx_HCc*28vyj*$XTxIS3#%U#`P;#*$3A*}nu+=P*tV6U&rB0pp&Bm@u zaS&(j+KB8Lqu0q9T3moHiX`cL&11C7OkdZ;-PI{s4LgnuE0!0H#IjSiGlyg3KC1lT z96?`VQTB#1{4Nzfw>zGbOQQs`fSX&14kZ2j^?2kH-mN<TSxhNjC4p0z!Bjz25-J;m zVdJ}9&D1Jr?fywY;0vUGCc~0THqZc`40m`k@UF(d@48cLqLK<++Y8&dh+;i1<$8oT zJ}nQ3_q<K@fMZ|XD>{Xzb6xkFgya1Ld@^pAtq!QM7`RkrB7*VW$S1A0Pgm7*G@6DT zb-Jr33qO<+C$|WhG_-?Ya;}EPC6WQ+_Ki}20WZz&08nlpzq((Av>WZvp8V#s&<@90 z2r+|<tY3$Pl4t5_Z(p$Iz9Q>V5U|OivJL&!t!VB{dIB6KZ52Oi>pbugZAPf~TAQK| zCGgGoy~4l{kf~Pfhnuvfb~T$CE)oeo=jaZVQCa@!D=}xJr-gG2gHemKu8p>pt;Yvr zs3E)9o-fSud8OCi3}d8&T$Prj_H^hx7n(-HGVAX)#e**o2zk^5r(~=~I9{f$T@>`b zY+%iwPsCVK7`v}35gtCPMjfEfSC(!mO5~`Sk0JR0aJEOp6h|dwLxX|ECNvk5zG902 zA(DTnR${@+xL(l3<MfU)Y2XY?RRIx+7~#JQ;_z(zcK$^k|K;HCj`>~3y*l5UViY{X z*V^MhgZ`=OUZW5Y%6?*gh>!nYu<!Ec)%hN*1zt)&#L(+Vf8XxA9C~%W0~W3T|9-(g z<<V;({BHz!Sa_T(*zZOAKkxfpMZ7xS!?VE^7r38)Xo}ZYzv@_js*2ZW1O%<Wu)j3M zUyNOO`BUNf<pmG@#|{3{dwzNNUlgVTet5W6DgJr=KXr;<f!ltE{ZcJ{#r^kP@t1=j zc+Y@y=k{}R{004ASEeKUZCL+;T}AwJs{e|3|99AbX8fP%KPSvD^eOzGrk@+a^}P8L j{%2481rH<nSt{4cys84~kKY<TeEWC@uSh(y@8<spkz!Kr literal 0 HcmV?d00001 diff --git a/assays/Metrics/protocols/.gitkeep b/assays/Metrics/protocols/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/assays/Metrics/protocols/MetricsProtocol.md b/assays/Metrics/protocols/MetricsProtocol.md new file mode 100644 index 0000000..c09714a --- /dev/null +++ b/assays/Metrics/protocols/MetricsProtocol.md @@ -0,0 +1,24 @@ +## Metrics + +Five metrics were used to evaluate model performance, the Poisson loss, the Pearson correlation coefficient (Pearson’s r), precision, recall, and F1. + +The most prominent peak caller for ChIP-seq data, MACS (Zhang et al. 2008), which was also frequently used for ATAC-seq data (Hiranuma et al. 2017, Thibodeau et al. 2018, Hentges et al. 2022), assumes that the ChIP-seq coverage data is Poisson distributed. Therefore, PyTorch’s Poisson negative log likelihood loss function (Poisson loss) was used as the loss function for all models (Equation 1). + +(1) $$loss=\frac{1}{n} \sum_{i=1}^n e^{x_i} - y_i \ast x_i$$ + +The individual samples of the predictions $(â xâ )$ and the targets $(yâ â )$ are indexed with $(i)$â . The sample size is denoted with $(n)$ + (https://pytorch.org/docs/stable/generated/torch.nn.PoissonNLLLoss.html). This version of the Poisson loss caused the network to output logarithmic predictions. The desired, actual predictions were thus the exponential of the network’s output. The exponential distribution only consists of positive real numbers like the ATAC- and ChIP-seq read coverage. + +To measure the “accuracy†of the model’s predictions, i.e. translating the Poisson loss into a human-readable number, the Pearson’s r was chosen (Equation 2), measuring the linear correlation between two variables. + +(2) $$r=\frac{\sum_{i=1}^n (x_i - \overline{x}) (y_i - \overline{y})}{\sqrt{\sum_{i=1}^n (x_i - \overline{x})^2 \sum_{i=1}^n (y_i - \overline{y})^2} + \epsilon}$$ + +The sample size is denoted with $n$â , the individual samples of the predictions $(xâ )$ and the targets $(â yâ )$ are indexed with $i$â . The additional epsilon $(\epsilon)$ equals 1e-8 and is used to avoid a division by zero. A value of 1 represents a perfect positive linear relationship, so Predmoter’s predictions and the experimental NGS coverage data would be identical. A value of 0 means no linear relationship between the predictions and targets. Finally, a value of −1 represents a perfect negative linear relationship. + +Precision, recall, and F1 were used to compare predicted peaks to experimental peaks for both test species (Equations 3–5). A F1 score of 1 indicates that the predicted peaks are at the same position as the experimental peaks. The lowest score possible is 0. Precision, recall, and F1 were calculated base-wise. Called peaks were denoted with 1, all other base pairs with 0. A confusion matrix containing the sum of True Positives (TP), False Positives (FP), and False Negatives (FN) for the two classes, peak and no peak, was computed for the average predicted coverage of both strands. Precision and recall were also utilized to plot precision-recall curves (PRC). The area under the precision-recall curve (AUPRC) was calculated using scikit-learn (Pedregosa et al. 2011). Flagged sequences were excluded from the calculations (see Section 2.1.2). The baseline AUPRC is equal to the fraction of positives, i.e. the percentage of peaks in the training set (Saito and Rehmsmeier 2015). The peak percentages were calculated using the Predmoter’s compute_peak_f1.py script in “side_scripts.†The percentages are listed in Supplementary Table S8. + +(3) $$precision = \frac{TP}{TP+FP}$$ + +(4) $$recall = \frac{TP}{TP+FN}$$ + +(5) $$F_1 = 2 \ast \frac{precision \ast recall}{precision+recall}$$ \ No newline at end of file -- GitLab