-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request: Get demultiplexing script to calculate coverage for each sample #68
Comments
The calculation should be 2x[read length]x[number of reads]/[genome size] and only be performed on R1. [read length] can be found in the upper parts of the SampleSheet.csv, or in the multiqc_general_stats.txt |
/data/rawdata/220603_M06578_0105_000000000-KB7MY/SampleSheet.csv is reads "read length" or is it "reads"? In either case, why is it written twice? if that is "reads" where is read length, please? or do i have to process R1? if i process R1, can I calculate the genome size? that will help not change SampleSheet.csv much. also 2x[read length]x[number of reads]/[genome size] : that means that reads are always even, but there might be a small discrepancy due to integer division in computers:
i'm doing the multiplication first and then dividing by genomeSize, which in turn is rounded down instead of up, and then multiplied by 2. The result will always be even, but it might differ depending on the order of calculation and whether we round up or down in the division. |
In the Samplesheet.csv read length is for some reason called reads. It is written twice since it points to both the forward and reverse read. In the multiqc_general_stats.txt however [read length] is used. You cannot calculate genome size from anything in the Samplesheet or demuxed stats files. We could provide the constant list of genome sizes for each genus separately if that helps (related to the VIGAS-PID, but then VIGAS-PID needs to be included in the SampleSheet). I would optimally want the resulting number with one decimal. |
I have made modifications to the sample sheet template. Changes only applied to the template for the new workflow. Have not done anything with the "old-workflow-template". This is an example of the updated data section: Note that I have also removed "i7_index_id" and "i5_index_id" and replaced it with "index_id". |
And here is the commit for the template changes: |
awesome! can you please generate a fake SampleSheet.csv for me? edit: @magnulei send a made-up new SampleSheet , attaching here . |
this is a bit unexpected, but it looks I may have to create a SampleSheet class. Not a lot of work, but basically instead of just searching for 1-2 strings inside the file, looks like i have to properly parse it and then fill in appropriate fields. https://support-docs.illumina.com/SHARE/SampleSheetv2/Content/SHARE/FrontPages/SampleSheetv2.htm and |
@magnulei can you please give me the dimension of a Plate Well? A to I ? and 00 to 10 ? am i correct? |
@magnulei same thing as above but for Sample_Well i want to put in a check that the well is never out of bounds. |
@magnulei also can you please explain what is index an dhow does that differ from index2 ? |
Index_Plate_Well and Sample_Well is in the same format. A01 in the upper left corner. Like this: Btw: |
This should answer your question: If needed we can have a quick call later this week and go trough it. |
Hm. Is it ok if I add a secondary check, just in case? Just to satisfy my paranoia. |
are there different sizes of plates or is this industry standard? |
Thats of course perfectly fine by me. And good to catch any well-id-typos in the index names. |
When you say size, I assume you mean format. The 96 well format (8x12) is industry standard. But the specific dimensions could be different depending on the application. You have low-profile, high-profile and deep-well. You also have 24-well format plates (8x3). But these are not used in clarity. In clarity we are using three different formats (see configuration -> consumables -> containers in the clarity gui):
|
i got knowledge today. thanks! |
I have now created the script for extracting sample metadata at the sequencing step: At the moment, the metadata is saved into an excel file. With time it might be pushed to an SQL database instead. So this is just to let you know that we would like the list of coverage numbers to be sorted in the same order as in the sample sheet (but guess that was the plan anyway?). Feel free to pop by my office if you want to discuss the above script and/or the temporary metadata "excel-logging" solution. |
Other things that need to change:
The text was updated successfully, but these errors were encountered: