Karyotype frequencies #652

alimanfoo · 2024-11-13T12:37:48Z

For Ag3 we now have a karyotype() function which infers a karyotype for each sample at each of several possible inversions on Chromosome 2. Suggest to also add a karyotype_frequencies() and karyotype_frequencies_advanced() function which compute frequencies within cohorts, analogous to existing functions for SNPs and CNVs.

The text was updated successfully, but these errors were encountered:

sanjaynagi · 2024-11-28T16:53:00Z

Havent got time to implement but heres some basic code i had.

def calculate_karyo_frequencies(inversion, sample_sets, sample_query, cohort_column='cohort_admin2_year'):

    df_karyo = ag3.karyotype(inversion, sample_sets=sample_sets, sample_query=sample_query)
    df_samples = ag3.sample_metadata(sample_sets=sample_sets, sample_query=sample_query)
    df_karyo = df_karyo.merge(df_samples, how='left', on='sample_id')

    afs = {coh:calc_allele_freq(df_karyo.query(f"{cohort_column} == @coh")[f'karyotype_{inversion}']) for coh in df_karyo[cohort_column].unique()}
    counts = {coh:df_karyo.query(f"{cohort_column} == @coh").shape[0] for coh in df_karyo[cohort_column].unique()}
    af_df = pd.DataFrame(afs).T.reset_index().rename(columns={'index':cohort_column, 0:inversion, 1:'alt_alleles', 2:'total_alleles'})
    af_df[f'{inversion}_standard'] = 1 - af_df[inversion]
    df_af = af_df.merge(df_samples[['latitude', 'longitude', 'year' , 'taxon', cohort_column]].drop_duplicates(cohort_column), how='left', on=cohort_column)

    return df_af, counts

jonbrenas · 2024-12-04T13:52:22Z

I think, similarly to what we have with other frequency functions, we will have 2 functions karyotype_frequencies (that takes a cohort parameter) and karyotype_frequencies_advanced (that takes a area_by parameter and a period_by parameter). In both cases, there would be 3 rows/variants (how. ref., het., hom. alt.) and the columns would be the frequencies (+ max_af, probably), the contig and name of the inversion. Any other important data that needs to be added?

I think it would make sense to have multiple inversions in the same heatmap in the frequency functions (the question then is do we modify karyotype to take a list of inversions or do we loop the function in the `karyotype_frequencies[_advanced]" functions)

The frequency functions is going to go in the same folder, if not file, as the karyotype function, in my opinion.

alimanfoo · 2024-12-10T15:02:37Z

Just capturing some discussion here, maybe better to call this "inversion_frequencies()" or "inversion_allele_frequencies()" to be clear it is about computing the frequency of the inverted allele within a cohort.

leehart added the enhancement New feature or request label Nov 18, 2024

alimanfoo mentioned this issue Dec 3, 2024

Karyotype function demonstration malariagen/vobs-updates#52

Open

leehart added the high priority label Dec 3, 2024

leehart assigned jonbrenas Dec 3, 2024

jonbrenas mentioned this issue Dec 6, 2024

Refactoring karyotype #689

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Karyotype frequencies #652

Karyotype frequencies #652

alimanfoo commented Nov 13, 2024

sanjaynagi commented Nov 28, 2024

jonbrenas commented Dec 4, 2024 •

edited

Loading

alimanfoo commented Dec 10, 2024

Karyotype frequencies #652

Karyotype frequencies #652

Comments

alimanfoo commented Nov 13, 2024

sanjaynagi commented Nov 28, 2024

jonbrenas commented Dec 4, 2024 • edited Loading

alimanfoo commented Dec 10, 2024

jonbrenas commented Dec 4, 2024 •

edited

Loading