You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Alex,
Does sample work with zipped files? Large files are usually compressed.
The command is producing a binary file for me and if I try to <(zcat file.fq.zg) it gives me an error.
What command should I use in this case.
Thank you in advance for the reply
The text was updated successfully, but these errors were encountered:
Because sample requires two passes through the input file being sampled from, it is not possible to use a file stream (such as what comes out of <(zcat foo.fq.gz)). A file stream can only be read once.
Native gzip itself has no ability to seek into the archive to get positions of newline characters. But adding support for file offsets into a block-gzipped file (like a bgzip-compressed file that gets used with tabix and other tools in the htslib family) might be a possibility. I'll take a look into that format some time and see if there is potential there to integrate support for bgzip-compressed files.
In the meantime, if you're running into memory errors using GNU shuf and you have sufficient disk space, you could do something like this:
There is a time and disk space cost in extracting your gzipped-archive to a temporary file. So if you don't think you will run into memory issues with GNU shuf, you could use that tool instead by linearizing the FASTQ record, sampling some number of records, and then "un"-linearizing the sample, i.e. :
Hi Alex,
Does sample work with zipped files? Large files are usually compressed.
The command is producing a binary file for me and if I try to
<(zcat file.fq.zg)
it gives me an error.What command should I use in this case.
Thank you in advance for the reply
The text was updated successfully, but these errors were encountered: