Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error filtering by functional group #18

Closed
felipeZ opened this issue Nov 26, 2020 · 9 comments
Closed

Error filtering by functional group #18

felipeZ opened this issue Nov 26, 2020 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@felipeZ
Copy link
Member

felipeZ commented Nov 26, 2020

Given the following candidates:

,smiles
0,CN1C=NC2=C1C(=O)N(C(=O)N2C)C
1,OC(=O)C1CNC2C3C4CC2C1N34
2,C1=CC=CC=C1
3,OC(=O)C1CNC2COC1(C2)C#C
4,CCO
5,CCCCCCCCC=CCCCCCCCC(=O)O
6,CC(=O)O
7,O=C(O)Cc1ccccc1
8,CC(C(=O)O)O
9,CC12C3C(=O)OCC14CC=C(C4)C32

When applying the filter:

filters:
  include_functional_groups:
    - "C(=O)O[H]"

It returns the carboxylic acids and the esthers:

'O=C(O)C1CNC2C3CC4C2N4C13'
'C#CC12CC(CO1)NCC2C(=O)O'
 'CCCCCCCCC=CCCCCCCCC(=O)O' 
'CC(=O)O' 'O=C(O)Cc1ccccc1'
 'CC(O)C(=O)O'
 'CC12C3C(=O)OCC14CC=C(C4)C32'
@felipeZ felipeZ added the bug Something isn't working label Nov 26, 2020
@felipeZ felipeZ self-assigned this Nov 26, 2020
@BvB93
Copy link
Member

BvB93 commented Nov 26, 2020

The issue here is that rdkit Mol.MolFromSmiles method will automatically remove the explicit hydrogen with its default sanitization setting.
This is how I manager to fix the issue in CAT: https://github.com/nlesc-nano/CAT/blob/e14032d5e4e1e1d680e155c25baa7392eebdfb5e/CAT/attachment/ligand_anchoring.py#L153-L162

@BvB93
Copy link
Member

BvB93 commented Nov 26, 2020

Shall I make a PR?

@felipeZ
Copy link
Member Author

felipeZ commented Nov 26, 2020

@BvB93 I am going to change to smarts. I think that smiles are not good enough, but I am still checking how it works

@BvB93
Copy link
Member

BvB93 commented Nov 26, 2020

I think Mol.FromSmarts does parse explicit hydrogens by default.
Just be aware that there are some subtle differences between SMILES and SMARTS, e.g. with aromatic systems.

@felipeZ
Copy link
Member Author

felipeZ commented Nov 26, 2020

I think Mol.FromSmarts does parse explicit hydrogens by default.
Just be aware that there are some subtle differences between SMILES and SMARTS, e.g. with aromatic systems.

Yes, but smiles may not be enough for filter. I am still thinking about what system to choose

@felipeZ
Copy link
Member Author

felipeZ commented Nov 26, 2020

@BvB93 I think that I am going to use your method but I still need to check for other functional groups :)

@BvB93
Copy link
Member

BvB93 commented Nov 26, 2020

For reference the various rdkit sanitization flags:
https://www.rdkit.org/docs/source/rdkit.Chem.rdmolops.html?rdkit.Chem.rdmolops.SanitizeFlags#rdkit.Chem.rdmolops.SanitizeFlags

felipeZ added a commit that referenced this issue Nov 26, 2020
@BvB93
Copy link
Member

BvB93 commented Dec 9, 2020

I suspect this can be closed now with the merging of #19.

@felipeZ
Copy link
Member Author

felipeZ commented Dec 9, 2020

Thanks @BvB93 for the feedback while solving this issue!

@felipeZ felipeZ closed this as completed Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants