You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As @alimanfoo mentioned in #702, creating the tests "could be tricky because the simulated data is not guaranteed to generate data at the tag SNP positions."
Looking quickly at the code simulating the SNP data, it looks like for each chromosome, a "size" is chosen randomly (between 50 000 and 100 000 for Ag3 and between 80 000 and 120 000 for Af1) and the positions are then assigned (starting at 1) meaning that the positions are all < 120 000, i.e., none of the tags is ever going to have simulated data.
I see a few possible solutions:
Generate enough simulated SNPs to cover all the regions containing targets (i.e., use a minimum size > the highest value in the tags). That would blow up the size of the simulated data which sounds sub-optimal.
Generate extra data for exactly the tags. This would cause the generation of more simulated data but not at the same scale.
Use a similar method to the one used for the AIMs, i.e., generate the targets on the fly instead of using the ones from the file. This would require a bit of recoding of karyotype.py as the path to the targets is hard-coded to a path in the package (i.e., it cannot be simulated) instead of a path in the data storage (i.e., it can be simulated).
I think 3) would make the most sense but differing opinions are welcome.
Part of #689 .
For the record:
The text was updated successfully, but these errors were encountered: