Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong estimates with FEMap with single experimental value #123

Open
ijpulidos opened this issue Jul 10, 2024 · 2 comments
Open

wrong estimates with FEMap with single experimental value #123

ijpulidos opened this issue Jul 10, 2024 · 2 comments

Comments

@ijpulidos
Copy link
Collaborator

I'm experiencing some issues when trying to generate an FEMap with some computed DDGs and an absolute experimental DG for the reference compound. An example notebook that shows this is in https://gist.github.com/ijpulidos/72aff8d9440800fc9230126c9168ce50

One can see that in the dataframe for the absolute measurements/estimates you get a duplicated lig_a. I was expecting only one entry for this ligand, which is the reference ligand. Also the values after the MLE don't seem to make much sense, which I think it's just related to the same issue.

@ijpulidos ijpulidos changed the title Duplicated entry in FEMap with single experimental value wrong estimates with FEMap with single experimental value Jul 10, 2024
@ijpulidos
Copy link
Collaborator Author

Now that I think about it, maybe the duplicated entry in the table is fine, but the real issue is that the values don't make sense. I would expect the values to be around the absolute experimental measurement plus or minus the computed relative energy. I hope that makes sense.

@ianmkenney
Copy link
Member

ianmkenney commented Jul 11, 2024

Summary after a call with @ijpulidos:
The generate_absolute_values method call iterates over all edges in the underlying networkx graph and reports the dGs when the first node is a ReferenceState (whose label becomes the source shown in the resulting table) and the second node is not a reference state. This is what that graph looks like.

image

Notice that the Zero reference state, which is what @ijpulidos created as an experimental value, only connects to a single node, representing lig_a. Given that, it makes sense we see one entry for lig_a where the source is empty and it's the value you provided.

There is another reference state "MLE" that is created after maximizing the log likelihood function. The edges between this ReferenceState and the ligand nodes is generated iteratively with the results from the cinnabar.stats.mle function. This is slightly problematic since those outputs are state free energies, not free energy differences. These free energies are arbitrary up to a shared constant and don't mean anything physically, when taken alone. You can see that the differences within the MLE source group in the absolute dataframe produce the correct DDGs from the get_relative_dataframe. In short, the numbers you see with the source as MLE are arbitrary and doesn't give you anything useful, but do make sense. I suspect this is the point of #111.

EDIT: there might be some room to rework this underlying representation or change the paradigm for how MLE is applied to input data, possibly a more functional approach where state isn't so important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants