-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the results reproduction #5
Comments
|
Hi, thanks for reaching out. I suggest using or referencing our evaluation pipeline found in the shepherd-score repo. For the task of x4 conditional evaluation, we used To address your question, I see a few inconsistencies with our pipeline and your implementation which would lead to lower scores:
Let me know if this helps, |
Thank you for your reply. I reproduced the box plots result on the upper right of Figure 3 with your guidance~ I have two more questions.
Are the partial charges obtained in this way consistent with those used by shepherd? In addition, shepherd seems to use the complete pharmacophore information of the template molecule in the P(x1 | x4) scenario? Can we only use part of the pharmacophore information of the molecule to generate analogs? If so, can we calculate the matches of part of the pharmacophore when calculating the pharmacophore 3D similarity score? Looking forward to your reply. |
ShEPhERD was trained on xTB-optimized conformers and xTB-computed charges, so we recommend using xTB to generate partial charges. Given a conformer, this can be done with
Yes, but expect the subselection of the pharmacophores to be a manual task. Assuming that you already have a set of pharmacophore features (say n of them) and want to generate molecules that contain those pharmacophores as a subset, you could set n4 ≥ n and only inpaint the n pharmacophores that you want. This would require you to adjust the mask of the inpainting such that only the specified n pharmacophores are regenerated while the remaining n4-n pharmacophores are free to diffuse. Please let me know if you need some clarity on this strategy.
You can use a Tversky similarity instead of Tanimoto similarity for this case. This is implemented in pharmacophore scoring and alignment functions. For example, to compute the asymmetric score: -- |
Hello, I am currently doing a baseline comparison of pharmacophore-based molecular generation models. I'm interested in generating molecules based on partial pharmacophore information (only several pharmacophore positions and types are provided). I tried the method you mentioned, but I am not sure if my settings are correct. I am also worried that my incorrect use will affect the performance of Shepherd and cause an unfair comparison. Could you please provide an example or script? If possible, please send it to [email protected]. I would be deeply grateful. My usage is similar to the following, assuming that the 0th, 3rd, 5th, and 9th pharmacophores identified from mol are what I need
|
Hi, I've just pushed changes to the Here are some corrections or important notes in the code that you sent: pharm_types, pharm_pos, pharm_direction = pharm_types[0,3,5,9], pharm_pos[0,3,5,9], pharm_direction[0,3,5,9] generated_samples_batch = inference_sample(
...
N_x4 = desired_num_pharms, # This has been updated so that it can handle n4 >= len(pharm_types)
...
# these are the inpainting targets
center_of_mass = np.zeros(3), # Assuming you have already centered your target molecule's COM
...
pharm_types = pharm_types,
pharm_pos = pharm_pos,
pharm_direction = pharm_direction,
) In particular, you should scan through values for |
1.I understand that the script ‘paper_experiments/run_inference_gdb_conditional_x4.py’ directly corresponds to the conditional generation (P(x1|x4)), which corresponds to the two box plots on the upper right of Figure 3. Is this correct?
By running this script, I will generate 20 molecular analogs for 100 template molecules, retain the valid molecules after filtering, and then calculate the 3D pharmacophore similarity score according to the demonstration calculation method in shepherd-score.
Is the process I mentioned correct? If so, are the parameters in run_inference_gdb_conditional_x4.py consistent with those in your test? Do I need to change any parameters? The overall distribution and median of the boxplot I got are worse than those in your paper, and the median is about 0.1 lower. I wonder if I have set it incorrectly.
Code Supplement:
`for i in trange(100):
with open(f'samples/GDB_conditional/x4/samples_{i}.pickle', 'rb') as f:
molblocks_and_charges = pickle.load(f)
`with open(f'conformers/gdb/molblock_charges_9_test100.pkl', 'rb') as f:
molblocks_and_charges = pickle.load(f)
record = {f'{i}':[] for i in name_list}
for idx in trange(70,100):
ref_mol_rdkit = rdkit.Chem.MolFromMolBlock(molblocks_and_charges[idx][0], removeHs = False)
ref_mol, _, ref_charges = optimize_conformer_with_xtb(ref_mol_rdkit)
fit_mol_rdkits = Chem.SDMolSupplier(f"/data/lbh/Code/shepherd/samples/GDB_conditional/x4/sdfs/sample{idx}.sdf", removeHs=False)
for i,fit_mol_rdkit in enumerate(fit_mol_rdkits):
# Local relaxation with xTB
# ref_mol, _, ref_charges = optimize_conformer_with_xtb(ref_mol_rdkit)
try:
fit_mol, _, fit_charges = optimize_conformer_with_xtb(fit_mol_rdkit)
The text was updated successfully, but these errors were encountered: