Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDkit generation of new non-SMILES representation #3

Open
kaichop opened this issue Jun 11, 2024 · 1 comment
Open

RDkit generation of new non-SMILES representation #3

kaichop opened this issue Jun 11, 2024 · 1 comment
Assignees

Comments

@kaichop
Copy link
Contributor

kaichop commented Jun 11, 2024

Assess different features that can be generated from RDkit.

For example, convert the SMILES to morgan fingerprint as features, and then use a simple neural network to perform prediction. Assess the performance using testing data. Compare the performance with what is reported in kaggle currently so we know how much to improve.

Paste the code here.

@wangwpi wangwpi self-assigned this Jun 14, 2024
@wangwpi
Copy link
Contributor

wangwpi commented Jun 14, 2024

I have generated morgan fingerprint, protein name (one hot encoding) and binds (labels) for all train and validation data as numpy array format, into trunks. Each trunk has 500,000 rows, the data are located in "/mnt/isilon/wang_lab/shared/Belka/analysis/morgan" and "/mnt/isilon/wang_lab/shared/Belka/analysis/morgan_validation"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants