Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The inconsistancy of the batch files #9

Open
wangwpi opened this issue Jun 26, 2024 · 1 comment
Open

The inconsistancy of the batch files #9

wangwpi opened this issue Jun 26, 2024 · 1 comment

Comments

@wangwpi
Copy link
Contributor

wangwpi commented Jun 26, 2024

A inconsistant positive binding percentage was found in our previous numpy array file, the large discrepancy might lead to bad model performance. Umair has helped reshuffule the original dataset, and now I'm spliting the shuffuled data into a new train file and validation file. Then I will regenerate the batch files containing the morgan fingerprint.
TODO: after the new batch files are generated, we need to check the files and make sure the inconsistancy is not accured. (A little bit of variation is fine and probably good for the model, but large variation is not prefered.) And see a more balanced training batch files could improve the model perforamnce or not.

@wangwpi
Copy link
Contributor Author

wangwpi commented Jun 26, 2024

  • currently generating the new batch files using the shuffled data

  • wait to double check the new batch files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant