From ed09846983acc36387ed23b708faafe82934ce4b Mon Sep 17 00:00:00 2001 From: Viktoria Petrova <vipet103@hhu.de> Date: Mon, 20 Jan 2025 12:07:44 +0100 Subject: [PATCH] add architecture and models study and protocol --- studies/ArchitectureAndProposedModels/README.md | 0 .../ArchitectureAndProposedModels/isa.study.xlsx | Bin 0 -> 7510 bytes .../protocols/.gitkeep | 0 .../ArchitectureAndProposedModelsProtocol.md | 11 +++++++++++ .../resources/.gitkeep | 0 5 files changed, 11 insertions(+) create mode 100644 studies/ArchitectureAndProposedModels/README.md create mode 100644 studies/ArchitectureAndProposedModels/isa.study.xlsx create mode 100644 studies/ArchitectureAndProposedModels/protocols/.gitkeep create mode 100644 studies/ArchitectureAndProposedModels/protocols/ArchitectureAndProposedModelsProtocol.md create mode 100644 studies/ArchitectureAndProposedModels/resources/.gitkeep diff --git a/studies/ArchitectureAndProposedModels/README.md b/studies/ArchitectureAndProposedModels/README.md new file mode 100644 index 0000000..e69de29 diff --git a/studies/ArchitectureAndProposedModels/isa.study.xlsx b/studies/ArchitectureAndProposedModels/isa.study.xlsx new file mode 100644 index 0000000000000000000000000000000000000000..252260d95ac6aa250af6a437a7cd09d076d1dcad GIT binary patch literal 7510 zcmai3Wmpv45(eo`>F!vhLtzC8>F)0C4k_u*6_joy1f)Y!K)Op70g*;r8YJ&3SN$&E zckewv_Ten^o;lCVoSE5olw{%HQDI<UkYQT2cytoM9VptcFfe=YFfe$~Z*|2S9o)<v z+zi#coy=YJA9&f@y#dKPba7yZo&qIS*<{f5)zl@r@@fYON{(PAy+i?I;DeX#U{748 zQWmOGBRtku{E%l^-NcIZvC=a{QiAEx%6#}J^ApVWgiU7|?M3l4xyTyNoe5muis2@& zI0!9y7lfp<t>wjy3rkCz$6_e03+Ys-J>bFq&Ic$6cR@h|oxcAZ49JYNG9}T)QjRIs z>Ky1^H6M&WeR7WkxBt*!uG$55kyKxFwOLR1I6R30nb^~Vq_nOW7A-_b+KP1pHoVO< zOKuD`nK<MO5u<E%rY{z4kbNWNqj@qOv%dNk?h95;j0Uq+xS%`|PCka6nQ~Zrq^nB> zxoX{~{Fp9Bkba!+eqXMA{6a>GkffX~6C8jiBen5*Vj{;laFj9WNN2l#pOf_srPr>l z12I%b>_LS!%R>7&7-u*7*BPDlg2#3t)Z2skK-`b#+iy(-Swpwsbv=a3S4frq?FpEl zo`4$v_d<mJW$0pV=gR&|=GI#W+0ye{(ErKagMmT&E%V@q<eQ{Uh4&ojA`oAR)r&L- zDdgP37OH|z3_OoJ=p0DvV)MI$zd<4sM9^cck>Z@cZdeZP3=73>$Euu=p##cD@DT<_ z!bds=udSW2SkMr|X_2^`3eeNnEw1EfWlH*6<F)Tg^&$Esp5?gL1La?`#^Dbr4qA&c zk4|kxcyz=al~zF9&iez`LrUsLMCP06Ute)f@Y*@(I&LNI8s|}2>Rb*mnaPnyKKGu2 zWQ1w1=dvh`ZRl&vwp;>>v!upybLum!a8yeaRt9z2-O0fEC>x_OvZ4()O=M}ILA0LQ zD5jTO6O8<0zA(Qm@^`DjK`r8C_uFKD7eKic_$SMT(7o08Vc90AWl8=j=IQ8SYvSl= z`{O12GPKQm4s3DA73$kl9S<ED=0sQK=X$!Qa3Qtza(2>I{PtI;>ZDR+$hk<_-Q{zy zK1l5hp9(}M%1t3v=CH_MZ^~D-LhyZgz!#R--SfIY_;^$Xy16OO9ey&N4RRMhx<q*) zW~ZTj2^lU-NqLEg%|m(8iPsMYr}|$H8+bBlnt!L8T!7m~Q+!*4Ix(U_mBzdh@J<YT zf@m20FlO_+ObA_8*bqBO<|8u%NqsZQgxN2evc0a0J+9~3LYQT`=vdJj%Wk3<Bh12k z?;^jS5e$7v;xMG%?WaB<UgbHL82(T~?7jbGoASRK8uQjbKTP9lWp3_vH%z~1E4cMR zy%9nH14HyT*`E;3ANG7PHy)QRfRDI*@fcynAJNw?SsGMeBUq|qty$WvzgrjnvYW-o zEJMyR+n?Nm@zIvkR93XqTo`5W6~f;6ZhXbC0$<F?X}aD*inqnZJVdX3X*D3=!pk+) z(v;_Fq_(Xg{jxvqJ9sz!I3H{2LvS6ZEbLWdKNcruSsu|&@a~Y;4idOHr8j(}TMhlf zge-`+Wh<wGEeBAR$6Ya+ji>S1lgoFqaendQeD34p$Kmkcw3p4o6bl?}1@6a?Zg&P> zVNCuyNErV9vJPQ((U{QA_Vk7yB-q<(()<Q{V2b^t>SuM|jqPR0rEW0(X@I1m;bFBF z0kd_>Vv`%fMoSg~*zUm7<hYIH(4-~H<66M|K(^1Lp7R5r|8abFIhfLfgf`Rinn-sR zEc~^<zQj5?b=IBIS8!S$sK1-_ZC9)KqOQNLEn{U-yhb1C=4GKE`veo#x=@@6y+s_W z<C9{A*$uy?LUNVqmjlK2O%D{n?#ji6QO&Ec1Hp!CumgCh?@D%o4#D4Dr4^Rw2a*rH zqmql2veHH;GlXIfj(iPz`pc9Ic(WUpW2Wz|vH6p*E_m<<R0t|8L6jC<9y|8;>#le| zmD5wMrUMD6U?&RcX{AyXy@SzMz0YAXgGNC6oE*hu1ucc{IWvmMH#7o<=e#H;=V&R6 z&!0ph;4~RYnD7S$(rUTD7ttaM28Bu*0Y$EMYmQ)M;6f;jwqVPVkbxZ&aGLQ{qF7aT zztzIR_6aORs-mDef|137A~#`EC^~!s$KU|Kz+)-Vc(biUt$4a4sT7LHW2tTT=t4;o zVA2?^ccqgi#3_WLe}eWch4f}30EoMj_&>MGArw6m`BfAWo5}zn>dx=2UuD%1Q@JFL z4?cu+bDK6|EMLn9e<mLp7*!}~42=5cEebHLP|_5bHkPmL!!whAXIp~hpF$CPY$Y1M z&PJp6h6dY*M>f9KhbLFE=E$mC67N5@#WY810O0)@)vNMikuuldk6>Z>xfcH+9kAS! zt%}f%7ai7u=f?9gV%N*2N1tT3jD|<TT$zVI_FkN+FlD*kRPPIzJ#L)pw7zmn#(Las zq$h_|d>-IuV+BF-^tWM~uGZ<`TFbJFsK~M=QY{2l<`fLt>RqniIJbDcoFOP)F^4@| z2wtfUe0PQQD^uJSseSSU?Eq*RrGr)|{M*;>YGv$VZl>zyV(nn*`s3wirYJk6b6|g1 zI%Nhd5;i|pn*<e@6ue4E3JjFn&bNKgne_IRebFj-6krWPv=hx1-GdB$TJqdTe@w{| zb{(Xqw)g7P{$QDUSsWh(v0UUc^uPK>UC7DKfk$Sshyywmy5Ulnp*0<t9%8Ml2smq_ zp;yzA7spsA{~A(W#TT%)s96WE_ZnA3b11;q%j<wut|>gisxix3F`mLk;A&FD%bPQs z<x%#wcZwrnrK;g*vH!)&r3!x!js${}yeIzZt`C*8G%xliDP*riDP=T-9v%D+wHu3v z)9B}^1UV4}b$qzTaN36&%F{9t1X{Wt(rNrpQDmOOYAnN~f?(&DOKZx9xDeI^^AAh9 zl2e2fFy+V}Wm~O@N2dbL>#PI1cw`O)hRYKtfs)_f*2^AZ=8&0%-V<Fe=q@T*Y-Tj& z92BQT$vHCr2yv>uYKrQjieHKCnNEJ!2rM}9#z2T7c)v*#7nYkdUYKAd@q!|ZJkl-J zoMfjrDO%qlrjc)lIyUu*)o2Jk!@Sg3Y6>OHvr;`+)4Z*h7@;_3+Iu#UG)kzH<Ma4A z%$WmjSmEtgUx{Y*1qJdNtB0S1^f=xMV?Oc;jytXK3?6r2K;Nl9ZgH{=7M%4V*>EPR zv<rO@N+8Z5`M?0X#fmaPd3u_c40p?f%PUc@&)&vX7Sauw%mj+iYmvODHHIYaL|w<w zG^~5>;6|15O(M*CdV5POIk)j272n(WWcnH_bf1Brd^)A)DgEL1`+Tk-+I76Tf&&UW z`QgkDB}H?5s~Mey+<u&li691p2E|4}UA;;q3F_Eh2aAUM7!yfgMxz>s6XYnH$i1Gt zrS0%&_@&?t(yxI1uPuU`mASq7?@r=2u9HT18DpTGSvS>fs|;;5ZpD6n{w!#_+KP@~ zKKvV&pl=msx(C@g4#tBXlYG`ey}M(yw37s;GCWc&k;?c)^OFzY5Z&HCfb#$w6KNPB zXnd{4!prHI@rwXTU(?R3rPo`J8#}|K7x+W=Kb%rf8g`e>DvH^VnLa>{*0J~=Wlth- z@G(LBONzJS7(1_=m=rz+jyaCQD{jEh&TK^<8}TwWDHVAn+Y}cZbFpp;y)V(z71Z<< z<g8dCti+H9l)7SZ3L>^rZ=&pH%dr#tcxdp0CR)cM&gO{dScVG^C_*cxqRcvqFFO=1 zhhz)pJ{M$MM!ec%;*Z0zf5kDHkcqxm!h;bN5`x4wsUTeBzFS&{pI0ry$2hAf?Ap}S z;iMwHc6R2(yeUwjYRi7C$#s?9Xwa&;Go5m-0(0P}Wv!iGW3Zp`kPm4fjpWlOp9UR` zy@yxR7`|ZnYkXh#P*1+StG0$}V@we5CyaMsYmeqqf6vfL`{f;vmYu2lzNn~7kqo3( z;5X{6JvONcIQd{5uSxgw;{mcZQSGM9fmK><d|T>AChxEUr<$aa`VwlGDX1x(DDqYl z@Jz#F;E>-jN4*R*ZZSFCW$|SUP?(niM%8l+NLz}v%@N6*5p`{tvFNU|joGYIu^YY< zcy@XDbsu7>n3Vy%c7J?Yw7?hD>)kXf>~}bdWk?cmbMj@Ucxii|CO+VJP_y#->`;EY zP;{rovNB8f+4rm2fU8S2I7e5vm*H4a5uwgV*+ib&$Y@h?fmq*Lq|xt*<D=lisK9;g z7lB=TMP2&dy~?KRf{ZH^(~I`8N+SilP>ML*JMs`zr&P{)W4)FPaKGO7weE2(;fL{u zjV;LJJL@MiWDC*7*azap6KGYUuGv8i<3y5LB>micom#Xff!IA2E!`V(Rf#+l*!jT` zGjwn+Ovj>NI{5Y>H~7x!D0>eGDfv_P?UpkwiLQDPwPIY=o)~^5fbwhe7|}GzZ&bva zQWQgtl|*_{dq+g77Dr)P$eBYF+01T^VqxPgeyJm(_xtHzZ6*jtycMb*WH1QqDoNpv z3=%pidMOe?W120^hEnt`p#aNYL^-zwf(KZ#*If!mFThGzl^$}}y~ph?JhF*woO!^9 z_2TRNx;QC$rdN5$31%;TW)5|4Z)^G^p>!_bmM*N>3;Xh#VYY%^n~C_UKrZGITb?+2 zQIwBxLKX<V9+tL|mSK6=`!vIz3^sj7%b>}k3E;LONKMA8+WMI8L_A`o&|O$WMvLAr zH-45le$hSt`Ox^YzGcsp=iUAeV07%B^FU%MCJwhP0&$a8sp?FXw_2?HD*U?x7&AJp zcuAXYdPa3Ve(_JM-pIc%L}<8;UkEzBSB!j039xNuiaX)Vpi5OBHdj9kD7T3^IWT8) z)gIKF;mR&g(SE_Mam}V!N)PZ|<E8;o^bz+6(y;l$f4wrMM4VJ$sr7x&gOT$vTcI=d zp$r(&@&Jt{I2cCWIL+nJd#xe$_)t#sdoqkQN}f7CJ)a+tTI8=CtQ0aJQ#Fx<PuOx| zhNcZYwLP8&w-lb&@)LO9b0|H!q;p9wp(`c9J>LE<1=@`^8K0`M0xflS4R|A9GQ!b2 zzSn0oh5w954V1eC^|QZw_BrbTS5foVC<|}-jV_@#34NjIGBku?uw)cF?+L%|Sg1m_ zqf(x>?sNLu+pi;%lmNfLpXf~^@IFqfAHunugdcW7Jr_?Z!6uhvn@n{TYge>*<wD&e z1y2m{1HBQZ!*vZ#!Zray5J}!NzF>8~IHaGg6&4=U-cjhy$2l?Y>?=hZ<)WW2rbJeF z1VEfX>fGxVDBx_5?oe`Q%^-4;3enSXmGXMfsHveRt9kh5wO)OBJyYH!%&;QJmuqy{ z`k6FiLmX)9ycCd$g7x7lTEKnjL2U{T8fJozF~NmbxNn@g`sl#)db7oQ|3l{e&q$(i zI%0}%N{cnzJoJp7B&1hp7&XY`@(b_rXvOQWYHAj>#_HO8c)=$wh1&2IshF{UP{_SJ zAvjIcW{{F3{J5q@!pX>Xn7dm~Ub*5!-}3oHMq&D`EJlMqTb+J`&bKD>7I2Bsz5G5Z zK+DGkzcb71k@eJ*t+U=9zec9I$}>Q%mJX6`iIb1BVFkY70x3M;uu5Z#zie3H%~W~) z1${Bt8kA=6Ms<7UolnEk@xs%Zn?9a6)vCz~AB8uwOw$8dnl3gTqV3~==}eiz-6?Qv z)}%jA18=$SehY`=z|MU8uNrq-ybuHf4LqPt8632LVf?!(bMv+{|JjWdg7l%?7=9S! zloha~3WouR479P1QY%Sk;Z63~36w5Bha;X$l)r!oBI4q@&ljBxe5j1iK0N1c+A0uX zAL7v~!$;D5{8a9K^QukiN@e;&y6KcwsSHc43G?GB#3eJzq+Sa6XY<?^IM`~*E3I|M zNeM-B+|uc+`rHV#FZhylHq*$hZQfgbY?M|>7&|qkVb=7b0+{a$V@nUa>OKoi;jQx> zL}&=l3QYv0oC+3Vd_r}aN8r}Uc@QD*RX!$L86{;v^l5Xz02Z`ax`Oa**Zm1W2mu~S zl_M<z&i;Gj&)O>1=A>$y8rpMyIfCJXmWAzQ3m6Zb`v#Ik(d-jKRfhWE-B_9gatK*e z$R6eLcOgD;nTs;G<V1cVEIDr4#dhD<wh^yiDc6x?*@ar6LR1J@`yQ#oE9Hl7V?>Ua zyf2lVcoF5YPvgI4=7v+nj4&KwOntIZ?NZM;`NCu#Y4jo#B@A*>QP<4iFs3uN31EKs z$yy>*jKVSo)(DNEJGrHDsT9palX`kUZ~lU-$7%o5c04r^$$MCCH?o@yzN)2I34Ql| z;oJ*R!lUG<1?i%*=x3xRqB-`1p=;Y%Vb8Tal4t#=nm@Q`BM{M-S+Ray=Xtrpd_5$y zIIssg?lY_<uPHj~%v8f}At|rHirv+nBFS3G_AHB!KGvu3aSh--wU`AQ5RBXGO4NX- zeKOfE$J(#lPi&U<Ek1-JCYaMw4aP*14-3odM2$WQzVg56IB7bSAiBuT>o`5I72dPp z%zSou_UPb%33+zk&Yf^={MG4ZY<q}ekP#Op8(XTJQ_q84#}Rb_y3K2IoK-Jev-7r^ zrC$O1&rO(_qp6aMqtjh#VdZMYFcivHGC=7CY-k++7BhBo`uW2m>4_tl13eUSaX;)^ zY(@oBn1}{v0jZDHTNr7uY3G0&KrID5ku&M)ydQJrk-cIoFbQiko4GXI?@Ki%?Z;&_ z&g?ZBF-KABt9op#^d&-J@u%o*@PtAHD}i(lNBnhBxjlUZ%(5t1sLI?8Xw3anb1=gb zbH&p4P2e&H8eim=mUn`UJTQDxQoa4zoJTih0qpafN@RG1<N>_3=tUd(ABJ<~y>+ha zbS*KQ&ZYev<<aq3M*Md%9?=(zy$G7I@)$pOvLT}&TgSKT^eV_5Ax^2kA*@+!)3g2_ zsdQN!Avp;qC+!8h{Vp^miOhY$&kMHdtPz-x>&elVB|BD)s8plnVum(W504e<S|<)W z6S|v}<EW&wX&nH91?|(QGy$SQ*|=EfCS6u2S&^P=pW#-vOd%_&xz?S<<O9l|g2UNP z7L~8wUcaE2mv2PqolF`{x6v2t=4u^0LpBxrL_fdjb5o7j-gB_}U0_pjW#8+>#~(P@ zzVY1&r8q0ek+9+h?w4n8Q%}Rom~I28%f3L<56)j*X6oo-{?lhFgHY-c``zLFL`Y<^ zh6HwD3H)nD7HVbq6^pO-wrO)It@Oi7Uxd>5@)#Wi1#7<e3Fwjy(wxEW6ll06iF2Ze zII|VZkMx6mh!+z(s8xgR>vU#JE%(v1_cuy6)$f_q!A9fiEysdnn_eZ8=SefW`y$c1 zCTcP%I@Gce`&czhakN?L<|GTW71C4Tp%dGqi*XAiEvoEwPLSD+lOQYP2IAsBN)tx` z+Tz%4RS?t;g;s;KNE=yziNWWat5oHsETu(nrcSfR;-?R71#OtrYgjPpM&T8k5W?oy z9c?Pz3b}j|ZM{HRB8x>AW0K$Yjxoquj$nKN9nP#6=YsC5OrJ;JhBr@W2#m%lxiW`U zP1k>W&3%w#gsFMGK#H=G`}p)cb>RGy2X`RyrNV?mFyp%4`B|+&k}s~Jg>gnmvR5XZ zwUDgkX}0`m{&|h@N8tm<56Mr2mo^R8@6-WQAsz{Rs59K4&cM2@1Ak>WW0T|^yExFp zpwCD7+ZX|n-F>!+TKN?&XQLJcYG-TlHc)}c&DDZOCXe%H+z$$uXLZTP>!tS}24hX6 z1zAjja5)xGt>X}`7TOiFm{1alJ3+ikV+8j;fv|R>kBfur7=yku;Mc~DuSw&2VWfqY zxEt*iR2$SpR0uP4dK=M9!6=uRr4$W{b6anU^j!hbb=8yJm#xJj;vq#K_0W`J)wvlL zSo>osa0l2dgy`;KJ15BW)`zR)dA9*cc0N8ao}K;bO!wsk@eTbc-yDkq&N1sj(Gozu z+Rmy;PgA9HmczX(<HD%9ERRVU(9KUx8xdq8Z_d8QzD#;LAr(4uD3m!_aQQ4UlkpzQ z%<D!;bz1Tzw<f8Y7+e5xC*++F_o7MT;O=y&q3pRkYcscrj|5j01e<jHB%8`)6%gE= zn3JHlBHw;L8WR*lH-axopSdMn2FQGKPMSY*0R7bodk-Jxzt3Kv?)c04he^zz1OA>f ze~b)n*N?Oq1$FT+n)^SM-W?(QLczdP{Kov`x&N24A58o0`k|}?8cRQU_g|X+e!Cyc z`|bJx_<Rfa=L_Ct;(r056Dz1%sGeKcU!(T_-uDM#eY<|BXNJzYpy&Jtef{g|w<+r` zf&B{&1Ec;o>>oP&kHv0Jxl4roIYkxu=MDZwi~Tv_|CtaTO0NATApXE^72jnl{wQYs zE9?(%arc0`-Qu4EHlZ0Ds@L^*@AwD$|E?SfB?Et#>p!sDE8nf_e>7J87548ce;0kX w#QZ^<LPrq4mHsL?cj0%F;ve`P;qOrSH3U+UMfmyFu%Z1&FEk>3h<}Lx3#tc~8UO$Q literal 0 HcmV?d00001 diff --git a/studies/ArchitectureAndProposedModels/protocols/.gitkeep b/studies/ArchitectureAndProposedModels/protocols/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/studies/ArchitectureAndProposedModels/protocols/ArchitectureAndProposedModelsProtocol.md b/studies/ArchitectureAndProposedModels/protocols/ArchitectureAndProposedModelsProtocol.md new file mode 100644 index 0000000..f390838 --- /dev/null +++ b/studies/ArchitectureAndProposedModels/protocols/ArchitectureAndProposedModelsProtocol.md @@ -0,0 +1,11 @@ +## Architecture and proposed models + +The model architectures were implemented using Pytorch Lightning (Falcon 2019) on top of PyTorch (Paszke et al. 2019). The model used supervised learning, a method that connects an input to an output based on example input–output pairs (Russell and Norvig 2016). + +The input for the model was a genomic DNA sequence. The nucleotides were encoded into four-dimensional vectors (see Supplementary Table S1). The DNA sequence of a given plant species was cut into subsequences of 21 384 bp. This number was large enough to contain typical gene lengths of plants while being divisible by ten of the numbers from one to twenty. An easily divisible subsequence length is a requirement for Predmoter (see Supplementary Section S1.2). As few chromosomes, scaffolds or contigs were divisible by 21 384 bp, sequence ends as well as short sequences were padded with the vector [0., 0., 0., 0.]. Padded base pairs were masked during training. If a subsequence only contained N bases, here referred to as “gap subsequence,†it was filtered out. Both strands, plus and minus, were used. Since the ATAC- and ChIP-seq data was PCR amplified and as such it was not possible to determine from which strand a read originated, the coverage information was always added to both strands. The model’s predictions for either ATAC-seq, ChIP-seq or both were compared to the experimental read coverage. The target data were represented per sample of experimental data. These were averaged beforehand, resulting in one coverage track per NGS dataset and plant species. + +Three main model architectures were examined on their performance. The first architecture consisted of convolutional layers followed by transposed convolutional layers for deconvolution (LeCun et al. 1989, LeCun and Bengio 1995). The deconvolution was added to output base-wise predictions. We refer here to this architecture as U-Net. To ensure that the new sequence lengths resulting from a convolution or deconvolution was correct, custom padding formulas were used (Supplementary Section S1.2). Our second approach was a hybrid network. A block of long short-term memory layers (LSTM) (Hochreiter and Schmidhuber 1997) was placed in between a convolutional layer block and a transposed convolutional layer block. The final approach was called bi-hybrid. Its architecture matched the hybrid architecture, except that the LSTM layers were replaced with bidirectional LSTM layers (BiLSTM) (Hochreiter and Schmidhuber 1997, Schuster and Paliwal 1997). Each convolutional and transposed convolutional layer was followed in all three approaches by the ReLU activation function (Glorot et al. 2011). Additional augmentations to the bi-hybrid network included adding batch normalization after each convolutional and transposed convolutional layer and adding a dropout layer after each BiLSTM layer except the last (Fig. 2). The Adam algorithm was used as an optimization method (Kingma and Ba 2014). The network’s base-wise predictions can be smoothed via a postprocessing step utilizing a rolling mean of a given window size. + +We examined 10 different model setups (Table 2). The best model of each architecture and dataset combination was used to develop the next combination test. The model reaching the highest Pearson’s correlation for the validation set was deemed the best model. Pre-tests showed that including gap subsequences, subsequences of 21 384 bp only containing Ns, led to a considerably lower Pearson’s correlation. The proportion of gap subsequences in the total data was 0.6%. Normalizing the NGS coverage data through a general approach of subtracting the average coverage from the dataset and using a ReLU transformation (Glorot et al. 2011) showed notably worse results during previous attempts. The approach of normalizing via an input sample was not feasible due to the considerable lack of available ATAC-seq input samples accompanying the experiments. Therefore, the target data was not adjusted towards its sequencing depth. For more information about the training process see Supplementary Section S1.3. + +All models excluded gap subsequences, subsequences of 21 384 bp only containing Ns. For more details on species selection and exact model parameters see Supplementary Table S4. Models excluding subsequences of unplaced scaffolds and non-nuclear sequences during training and testing are denoted with *. \ No newline at end of file diff --git a/studies/ArchitectureAndProposedModels/resources/.gitkeep b/studies/ArchitectureAndProposedModels/resources/.gitkeep new file mode 100644 index 0000000..e69de29 -- GitLab