Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karyotype frequencies #652

Open
alimanfoo opened this issue Nov 13, 2024 · 3 comments
Open

Karyotype frequencies #652

alimanfoo opened this issue Nov 13, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request high priority

Comments

@alimanfoo
Copy link
Member

For Ag3 we now have a karyotype() function which infers a karyotype for each sample at each of several possible inversions on Chromosome 2. Suggest to also add a karyotype_frequencies() and karyotype_frequencies_advanced() function which compute frequencies within cohorts, analogous to existing functions for SNPs and CNVs.

@leehart leehart added the enhancement New feature or request label Nov 18, 2024
@sanjaynagi
Copy link
Collaborator

Havent got time to implement but heres some basic code i had.

def calculate_karyo_frequencies(inversion, sample_sets, sample_query, cohort_column='cohort_admin2_year'):

    df_karyo = ag3.karyotype(inversion, sample_sets=sample_sets, sample_query=sample_query)
    df_samples = ag3.sample_metadata(sample_sets=sample_sets, sample_query=sample_query)
    df_karyo = df_karyo.merge(df_samples, how='left', on='sample_id')

    afs = {coh:calc_allele_freq(df_karyo.query(f"{cohort_column} == @coh")[f'karyotype_{inversion}']) for coh in df_karyo[cohort_column].unique()}
    counts = {coh:df_karyo.query(f"{cohort_column} == @coh").shape[0] for coh in df_karyo[cohort_column].unique()}
    af_df = pd.DataFrame(afs).T.reset_index().rename(columns={'index':cohort_column, 0:inversion, 1:'alt_alleles', 2:'total_alleles'})
    af_df[f'{inversion}_standard'] = 1 - af_df[inversion]
    df_af = af_df.merge(df_samples[['latitude', 'longitude', 'year' , 'taxon', cohort_column]].drop_duplicates(cohort_column), how='left', on=cohort_column)

    return df_af, counts

@jonbrenas
Copy link
Collaborator

jonbrenas commented Dec 4, 2024

I think, similarly to what we have with other frequency functions, we will have 2 functions karyotype_frequencies (that takes a cohort parameter) and karyotype_frequencies_advanced (that takes a area_by parameter and a period_by parameter). In both cases, there would be 3 rows/variants (how. ref., het., hom. alt.) and the columns would be the frequencies (+ max_af, probably), the contig and name of the inversion. Any other important data that needs to be added?

I think it would make sense to have multiple inversions in the same heatmap in the frequency functions (the question then is do we modify karyotype to take a list of inversions or do we loop the function in the `karyotype_frequencies[_advanced]" functions)

The frequency functions is going to go in the same folder, if not file, as the karyotype function, in my opinion.

@alimanfoo
Copy link
Member Author

Just capturing some discussion here, maybe better to call this "inversion_frequencies()" or "inversion_allele_frequencies()" to be clear it is about computing the frequency of the inverted allele within a cohort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request high priority
Projects
None yet
Development

No branches or pull requests

4 participants