Erin provided me with the three missing fastq.gz files. These are unprocessed paired end files merged in a single file. I couldn't find a tool to de-interlace these files into individual fastq files. Galaxy's fastq de-interlacer only works for old versions of fastq (/1 rather than the new 1:). I wrote a simple script for the de-interlacing. It is absolutely not efficient, but works.
As soon as the files are available at SRA, we should replace the files with the SRA-versions!
Simple script for
#r"nuget: FSharpAux.IO, 1.1.0"openFSharpAux.IOletpath=@"\\syno.bio.uni-kl.de\CSB\RuZhang\Transcripts\sratoolkit.3.0.0-win64\bin\35C_C_14.fastq"//@"\\syno.bio.uni-kl.de\CSB\RuZhang\Transcripts\sratoolkit.3.0.0-win64\bin\40C_A_3.fastq"letread=FSharpAux.IO.FileIO.readFilepathletmutablecollR1:stringlist=[]letmutablecollR2:stringlist=[]letmutablecounterR1=0letmutablecounterR2=0letmutableindex=1lete=read.GetEnumerator()letnewR1=@"\\syno.bio.uni-kl.de\CSB\RuZhang\Transcripts\sratoolkit.3.0.0-win64\bin\35C_C_14_R1.fastq"//@"\\syno.bio.uni-kl.de\CSB\RuZhang\Transcripts\sratoolkit.3.0.0-win64\bin\40C_A_3_R1.fastq"letnewR2=@"\\syno.bio.uni-kl.de\CSB\RuZhang\Transcripts\sratoolkit.3.0.0-win64\bin\35C_C_14_R2.fastq"//@"\\syno.bio.uni-kl.de\CSB\RuZhang\Transcripts\sratoolkit.3.0.0-win64\bin\40C_A_3_R2.fastq"whilee.MoveNext()dolettmp:string=e.Currentiftmp.StartsWith"@"thenifcounterR2%1_000_000=0thenifcounterR1%1_000_000=0thenFileIO.writeToFiletruenewR1(collR1|>List.rev)collR1<-[]FileIO.writeToFiletruenewR2(collR2|>List.rev)collR2<-[]printfn"R1 %i\nR2 %i"counterR1counterR2iftmp.Contains" 1:"thenindex<-1counterR1<-counterR1+1collR1<-tmp::collR1eliftmp.Contains" 2:"thenindex<-2counterR2<-counterR2+1collR2<-tmp::collR2elsefailwithf"no :1/:2 in %s"tmpelseifindex=1thencollR1<-tmp::collR1elsecollR2<-tmp::collR2//write the remaining chunk to the fileFileIO.writeToFiletruenewR1(collR1|>List.rev)FileIO.writeToFiletruenewR2(collR2|>List.rev)printfn"index should be 2 and it is: %i"index//should be 2printfn"Counter should be equal\nR1Counter: %i\nR2Counter: %i"counterR1counterR2
The missing samples are now available at SRA, but the format differs from the remaining samples. The files are not deinterlaced using fasterq-dump, but remain as single fastq file. Since I already got the data and deinterlaced them manually, I close this issue.