Add tests on simulated data for `karyotype` #700

jonbrenas · 2024-12-11T13:15:02Z

Part of #689 .

For the record:

It [the `karyotype function] has tests on simulated data to be compliant with the rest of the package.

jonbrenas · 2025-01-03T13:59:57Z

As @alimanfoo mentioned in #702, creating the tests "could be tricky because the simulated data is not guaranteed to generate data at the tag SNP positions."

Looking quickly at the code simulating the SNP data, it looks like for each chromosome, a "size" is chosen randomly (between 50 000 and 100 000 for Ag3 and between 80 000 and 120 000 for Af1) and the positions are then assigned (starting at 1) meaning that the positions are all < 120 000, i.e., none of the tags is ever going to have simulated data.

I see a few possible solutions:

Generate enough simulated SNPs to cover all the regions containing targets (i.e., use a minimum size > the highest value in the tags). That would blow up the size of the simulated data which sounds sub-optimal.
Generate extra data for exactly the tags. This would cause the generation of more simulated data but not at the same scale.
Use a similar method to the one used for the AIMs, i.e., generate the targets on the fly instead of using the ones from the file. This would require a bit of recoding of karyotype.py as the path to the targets is hard-coded to a path in the package (i.e., it cannot be simulated) instead of a path in the data storage (i.e., it can be simulated).

I think 3) would make the most sense but differing opinions are welcome.

Anything that I missed? Any better idea?

jonbrenas added the low priority label Dec 11, 2024

jonbrenas mentioned this issue Dec 11, 2024

Refactoring karyotype #689

Open

jonbrenas mentioned this issue Jan 2, 2025

Moving karyotype to anoph #702

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests on simulated data for `karyotype` #700

Add tests on simulated data for `karyotype` #700

jonbrenas commented Dec 11, 2024

jonbrenas commented Jan 3, 2025

Add tests on simulated data for karyotype #700

Add tests on simulated data for karyotype #700

Comments

jonbrenas commented Dec 11, 2024

jonbrenas commented Jan 3, 2025

Add tests on simulated data for `karyotype` #700

Add tests on simulated data for `karyotype` #700