feature request: Get demultiplexing script to calculate coverage for each sample #68

karinlag · 2023-05-30T11:06:06Z

Other things that need to change:

@magnulei needs to add printing of genome size to the sample sheet.
@georgemarselis-nvi to add calculation and output of it to demux

CathrineAB · 2023-05-30T11:59:47Z

The calculation should be 2x[read length]x[number of reads]/[genome size] and only be performed on R1.

[read length] can be found in the upper parts of the SampleSheet.csv, or in the multiqc_general_stats.txt
[number of reads] from the multiqc_general_stats.txt
[genome size] from the SampleSheet.csv

georgemarselis-nvi · 2023-05-30T18:10:13Z

/data/rawdata/220603_M06578_0105_000000000-KB7MY/SampleSheet.csv

is reads "read length" or is it "reads"? In either case, why is it written twice?

if that is "reads" where is read length, please?

or do i have to process R1?

if i process R1, can I calculate the genome size? that will help not change SampleSheet.csv much.

also 2x[read length]x[number of reads]/[genome size] : that means that reads are always even, but there might be a small discrepancy due to integer division in computers:

 result = 2 * ( ( demux.readLength * numberOfReads ) // genomeSize ) # // is integer division, essentially divide and take math.floor( ) of result

i'm doing the multiplication first and then dividing by genomeSize, which in turn is rounded down instead of up, and then multiplied by 2.

The result will always be even, but it might differ depending on the order of calculation and whether we round up or down in the division.

CathrineAB · 2023-05-31T06:34:26Z

In the Samplesheet.csv read length is for some reason called reads. It is written twice since it points to both the forward and reverse read. In the multiqc_general_stats.txt however [read length] is used.

You cannot calculate genome size from anything in the Samplesheet or demuxed stats files. We could provide the constant list of genome sizes for each genus separately if that helps (related to the VIGAS-PID, but then VIGAS-PID needs to be included in the SampleSheet).

I would optimally want the resulting number with one decimal.

magnulei · 2023-05-31T09:17:21Z

I have made modifications to the sample sheet template. Changes only applied to the template for the new workflow. Have not done anything with the "old-workflow-template".

This is an example of the updated data section:

Note that I have also removed "i7_index_id" and "i5_index_id" and replaced it with "index_id".
I have also removed the "Description" column (which will resolve https://github.com/NorwegianVeterinaryInstitute/nvi_lims_epps/issues/28)

magnulei · 2023-05-31T09:20:56Z

And here is the commit for the template changes:
https://github.com/NorwegianVeterinaryInstitute/nvi_lims_epps/commit/cc818c053dd6f9bdc09fac755cfdf076ac917b6c

georgemarselis-nvi · 2023-05-31T09:24:32Z

awesome! can you please generate a fake SampleSheet.csv for me?

edit: @magnulei send a made-up new SampleSheet , attaching here .

MiSeq_MS-TEST3.csv

georgemarselis-nvi · 2023-05-31T14:09:59Z

this is a bit unexpected, but it looks I may have to create a SampleSheet class.

Not a lot of work, but basically instead of just searching for 1-2 strings inside the file, looks like i have to properly parse it and then fill in appropriate fields.

https://support-docs.illumina.com/SHARE/SampleSheetv2/Content/SHARE/FrontPages/SampleSheetv2.htm

and

https://support.illumina.com/content/dam/illumina-support/documents/documentation/system_documentation/miseq/miseq-sample-sheet-quick-ref-guide-15028392-j.pdf

georgemarselis-nvi · 2023-06-19T13:12:10Z

@magnulei can you please give me the dimension of a Plate Well?

A to I ?

and 00 to 10 ?

am i correct?

georgemarselis-nvi · 2023-06-19T13:13:21Z

@magnulei same thing as above but for Sample_Well

i want to put in a check that the well is never out of bounds.

georgemarselis-nvi · 2023-06-19T13:16:28Z

@magnulei also can you please explain what is index an dhow does that differ from index2 ?

magnulei · 2023-06-19T13:27:45Z

@georgemarselis-nvi

Index_Plate_Well and Sample_Well is in the same format.

A01 in the upper left corner.
H12 in the bottom right corner

Like this:

Btw:
Sample_Well will definitely never be out of bounds. Its defined in clarity. Index_Plate_Well could in theory be out of bounds since it is getting the position from the index name. A typo here could cause it to be out of bounds in the sample sheet.

magnulei · 2023-06-19T13:31:19Z

@magnulei also can you please explain what is index an dhow does that differ from index2 ?

This should answer your question:
https://support.illumina.com/content/dam/illumina-support/documents/documentation/system_documentation/miseq/indexed-sequencing-overview-guide-15057455-08.pdf

If needed we can have a quick call later this week and go trough it.

georgemarselis-nvi · 2023-06-20T12:11:34Z

Btw:
Sample_Well will definitely never be out of bounds. Its defined in clarity. Index_Plate_Well could in theory be out of bounds since it is getting the position from the index name. A typo here could cause it to be out of bounds in the sample sheet.

Hm. Is it ok if I add a secondary check, just in case? Just to satisfy my paranoia.

georgemarselis-nvi · 2023-06-20T12:13:19Z

Like this:

are there different sizes of plates or is this industry standard?

magnulei · 2023-06-20T12:57:12Z

Btw:
Sample_Well will definitely never be out of bounds. Its defined in clarity. Index_Plate_Well could in theory be out of bounds since it is getting the position from the index name. A typo here could cause it to be out of bounds in the sample sheet.

Hm. Is it ok if I add a secondary check, just in case? Just to satisfy my paranoia.

Thats of course perfectly fine by me. And good to catch any well-id-typos in the index names.

magnulei · 2023-06-20T13:10:48Z

Like this:

are there different sizes of plates or is this industry standard?

When you say size, I assume you mean format. The 96 well format (8x12) is industry standard. But the specific dimensions could be different depending on the application. You have low-profile, high-profile and deep-well.

You also have 24-well format plates (8x3). But these are not used in clarity.
And you have 8-well strips (8x1). Strickly not a plate though. Not used in clarity

In clarity we are using three different formats (see configuration -> consumables -> containers in the clarity gui):

96 well plate (8x12)
Tube (clarity reads this position as 1:1, which is translated to index well 101 by the sample sheet function at the moment: see https://github.com/NorwegianVeterinaryInstitute/nvi_lims_epps/issues/62)
9x9 storage box for tubes (called "DNA for NGS" in clarity). Columns: 1->9, Rows: A -> I (81 places in a box)

georgemarselis-nvi · 2023-06-20T13:12:20Z

i got knowledge today. thanks!

magnulei · 2023-08-09T12:17:59Z

Hi @georgemarselis-nvi.

I have now created the script for extracting sample metadata at the sequencing step:

https://github.com/NorwegianVeterinaryInstitute/nvi_lims_epps/blob/main/excel-logging/run_log_samples.py

At the moment, the metadata is saved into an excel file. With time it might be pushed to an SQL database instead.
For writing the data, I have written a sorting function to sort the samples in the same order as in the sample sheet. And I have an empty column in the dataframe/excel-file where @CathrineAB can paste the coverage numbers calculated by the demux script.

So this is just to let you know that we would like the list of coverage numbers to be sorted in the same order as in the sample sheet (but guess that was the plan anyway?).

Feel free to pop by my office if you want to discuss the above script and/or the temporary metadata "excel-logging" solution.

karinlag assigned georgemarselis-nvi and magnulei May 30, 2023

georgemarselis-nvi added the enhancement New feature or request label May 31, 2023

georgemarselis-nvi changed the title ~~Get demultiplexing script to calculate coverage for each sample~~ feature request: Get demultiplexing script to calculate coverage for each sample May 31, 2023

georgemarselis-nvi added this to the M11 Deliver of data from sequencer to VIGAS happens automatically milestone Aug 21, 2023

georgemarselis-nvi mentioned this issue Sep 22, 2023

bug: .md5 files and .sha512 files are not written correctly #71

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request: Get demultiplexing script to calculate coverage for each sample #68

feature request: Get demultiplexing script to calculate coverage for each sample #68

karinlag commented May 30, 2023 •

edited by magnulei

Loading

CathrineAB commented May 30, 2023

georgemarselis-nvi commented May 30, 2023 •

edited

Loading

CathrineAB commented May 31, 2023

magnulei commented May 31, 2023

magnulei commented May 31, 2023

georgemarselis-nvi commented May 31, 2023 •

edited

Loading

georgemarselis-nvi commented May 31, 2023

georgemarselis-nvi commented Jun 19, 2023

georgemarselis-nvi commented Jun 19, 2023

georgemarselis-nvi commented Jun 19, 2023

magnulei commented Jun 19, 2023

magnulei commented Jun 19, 2023

georgemarselis-nvi commented Jun 20, 2023

georgemarselis-nvi commented Jun 20, 2023

magnulei commented Jun 20, 2023

magnulei commented Jun 20, 2023

georgemarselis-nvi commented Jun 20, 2023

magnulei commented Aug 9, 2023

feature request: Get demultiplexing script to calculate coverage for each sample #68

feature request: Get demultiplexing script to calculate coverage for each sample #68

Comments

karinlag commented May 30, 2023 • edited by magnulei Loading

CathrineAB commented May 30, 2023

georgemarselis-nvi commented May 30, 2023 • edited Loading

CathrineAB commented May 31, 2023

magnulei commented May 31, 2023

magnulei commented May 31, 2023

georgemarselis-nvi commented May 31, 2023 • edited Loading

georgemarselis-nvi commented May 31, 2023

georgemarselis-nvi commented Jun 19, 2023

georgemarselis-nvi commented Jun 19, 2023

georgemarselis-nvi commented Jun 19, 2023

magnulei commented Jun 19, 2023

magnulei commented Jun 19, 2023

georgemarselis-nvi commented Jun 20, 2023

georgemarselis-nvi commented Jun 20, 2023

magnulei commented Jun 20, 2023

magnulei commented Jun 20, 2023

georgemarselis-nvi commented Jun 20, 2023

magnulei commented Aug 9, 2023

karinlag commented May 30, 2023 •

edited by magnulei

Loading

georgemarselis-nvi commented May 30, 2023 •

edited

Loading

georgemarselis-nvi commented May 31, 2023 •

edited

Loading