Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarifying the link between zdim and n_clusters #134

Open
DmitryKishkinev opened this issue Feb 12, 2024 · 1 comment
Open

Clarifying the link between zdim and n_clusters #134

DmitryKishkinev opened this issue Feb 12, 2024 · 1 comment

Comments

@DmitryKishkinev
Copy link

Not an issue really but a question for my lab understanding. I am trying to understand what the link between zdim (latent dimension spaces) and the number of motifs (n_clusters in config.yaml) is: could these number be completely disassociated or there is a rule here (for ex., the zdim is always the same or higher/lower than the number of clusters/motifs or there is no rule here whatsoever.

My understanding is that we should optimize the zdim so that we do not use too many zdim (looking at the batch-normalized error curve and stop at a number of zdim when the error curves stops going down) but the number of motifs could be any - a lot if we are looking for a very granular picture of behaviour - or a small number if we want to have a coarse structure of behaviour. So the question of n_cluster is more on a researcher but zdim is a computational optimization.

Any clarifications would be appreciated here.

Dmitry

@DmitryKishkinev
Copy link
Author

Following up to my previous post - perhaps zdim/latent dimensions is something that we need to find out and it is fixed for a given data set but n_clusters / number of motifs depends on a research question meaning that the researcher could be looking more or less granular into one's data depending on whether we want more/less information and details about behaviours of given data set. Practically, zdim is found by looking into the bending point / plateauing of Batch Normalised Mean Sq Errors but n_clusters/motifs could be pretty much any number. But please correct me if I am missing something here. Much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant