kgcnn.molecule package

Submodules

kgcnn.molecule.base module

class kgcnn.molecule.base.MolGraphInterface(mol=None, make_directed: bool = False)[source]

Bases: object

The MolGraphInterface defines the base class interface to extract a molecular graph.

The method implementation to generate a molecule-instance from smiles etc. can be obtained from different backends like RDkit . The mol-instance of a chemical informatics package like RDkit is treated via composition. The interface is designed to extract a graph from a mol instance, not to make a mol object from a graph.

__init__(mol=None, make_directed: bool = False)[source]

Set the mol attribute for composition. This mol instances will be the backend molecule class.

Parameters
  • mol – Instance of a molecule from chemical informatics package.

  • make_directed (bool) – Whether the edges are directed. Default is False.

static _check_encoder(encoder: dict, possible_keys: list, raise_error: bool = False)[source]

Verify and check if encoder dictionary inputs is within possible properties. If a key has to be removed, a warning is issued.

Parameters
  • encoder (dict) – Dictionary of callable encoder function or class. Key matches properties.

  • possible_keys (list) – List of allowed keys for encoder.

  • raise_error (bool) – Whether to raise an error on wrong identifier.

Returns

Cleaned encoder dictionary.

Return type

dict

static _check_properties_list(properties: list, possible_properties: list, attribute_name: str, raise_error: bool = False)[source]

Verify and check if list of string identifier match expected properties. If an identifier has to be removed, a warning is issued. Non-string properties i.e. class or functions to extract properties are ignored.

Parameters
  • properties (list) – List of requested string identifier. Key matches properties.

  • possible_properties (list) – List of allowed string identifier for properties.

  • attribute_name (str) – A name for the properties. E.g. bond, node or graph.

  • raise_error (bool) – Whether to raise an error on wrong identifier.

Returns

Cleaned encoder dictionary.

Return type

dict

add_hs(**kwargs)[source]

Add hydrogen to molecule instance.

clean(**kwargs)[source]
compute_partial_charges(method='gasteiger', **kwargs)[source]
edge_attributes(properties: list, encoder: dict)[source]

Make edge attributes.

Parameters
  • properties (list) – List of string identifier for a molecular property. Must match backend features.

  • encoder (dict) – A dictionary of callable encoder function or class for each string identifier.

Returns

List of attributes after processed by the encoder.

Return type

list

property edge_indices

Return a list of edge indices of the molecule.

property edge_number

Return a list of edge number that represents the bond order.

from_mol_block(mol_block: str, keep_hs: bool = True, **kwargs)[source]

Set mol-instance from a more extensive string representation containing coordinates and bond information.

Parameters
  • mol_block (str) – Mol-block representation of a molecule.

  • keep_hs (str) – Whether to keep hydrogen in mol-block. Default is True.

Returns

self

from_smiles(smile: str, **kwargs)[source]

Main method to generate a molecule from smiles string representation.

Parameters

smile (str) – Smile string representation of a molecule.

Returns

self

graph_attributes(properties: list, encoder: dict)[source]

Make graph attributes.

Parameters
  • properties (list) – List of string identifier for a molecular property. Must match backend features.

  • encoder (dict) – A dictionary of callable encoder function or class for each string identifier.

Returns

List of attributes after processed by the encoder.

Return type

list

make_conformer(**kwargs)[source]

Generate a conformer guess for molecule instance.

node_attributes(properties: list, encoder: dict)[source]

Make node attributes.

Parameters
  • properties (list) – List of string identifier for a molecular property. Must match backend features.

  • encoder (dict) – A dictionary of callable encoder function or class for each string identifier.

Returns

List of attributes after processed by the encoder.

Return type

list

property node_coordinates

Return a list of atomic coordinates of the molecule.

property node_number

Return list of node numbers which is the atomic number of atoms in the molecule

property node_symbol

Return a list of atomic symbols of the molecule.

optimize_conformer(**kwargs)[source]

Optimize conformer of molecule instance.

remove_hs(**kwargs)[source]

Remove hydrogen from molecule instance.

to_mol_block(**kwargs)[source]

Make a more extensive string representation containing coordinates and bond information from self.

Returns

Mol-block representation of a molecule.

Return type

mol_block (str)

to_smiles(**kwargs)[source]

Return a smile string representation of the mol instance.

Returns

Smile string.

Return type

smile (str)

kgcnn.molecule.convert module

class kgcnn.molecule.convert.MolConverter(base_path: Optional[str] = None)[source]

Bases: object

__init__(base_path: Optional[str] = None)[source]

Initialize a converter to transform smile or coordinates into mol block information.

Parameters

base_path (str) – Base path for temporary files.

smile_to_mol(smiles_path: str, sdf_path: str, external_program: Optional[dict] = None, num_workers: Optional[int] = None, sanitize: bool = True, add_hydrogen: bool = True, make_conformers: bool = True, optimize_conformer: bool = True, logger=None, batch_size: int = 5000)[source]

Convert a smiles file to SDF structure file.

Parameters
  • smiles_path

  • sdf_path

  • external_program

  • num_workers

  • sanitize

  • add_hydrogen

  • make_conformers

  • optimize_conformer

  • logger

  • batch_size

Returns

List of mol-strings.

Return type

list

xyz_to_mol(xyz_path: str, sdf_path: str, charge: Optional[Union[list, int]] = None)[source]

Convert xyz info to structure file.

Parameters
  • xyz_path

  • sdf_path

  • charge

Returns

List of mol blocks as string.

Return type

list

kgcnn.molecule.convert.openbabel_smile_to_mol(smile: str, sanitize: bool = True, add_hydrogen: bool = True, make_conformers: bool = True, optimize_conformer: bool = True, random_seed: int = 42, stop_logging: bool = False)[source]
kgcnn.molecule.convert.openbabel_xyz_to_mol(xyz_string: str, charge: int = 0, stop_logging: bool = False)[source]

Convert xyz-string to mol-string.

The order of atoms in the list should be the same as output. Uses openbabel for conversion.

Parameters
  • xyz_string (str) – Convert the xyz string to mol-string

  • stop_logging (bool) – Whether to stop logging. Default is False.

Returns

Mol-string. Generates bond information in addition to coordinates from xyz-string.

Return type

str

kgcnn.molecule.convert.rdkit_smile_to_mol(smile: str, sanitize: bool = True, add_hydrogen: bool = True, make_conformers: bool = True, optimize_conformer: bool = True, random_seed: int = 42, stop_logging: bool = False)[source]
kgcnn.molecule.convert.rdkit_xyz_to_mol(xyz_string: str, charge: Optional[Union[list, int]] = None)[source]

Convert xyz-string to mol-string.

The order of atoms in the list should be the same as output.

Parameters
  • xyz_string (str) – Convert the xyz string to mol-string

  • charge (int, list) – Possible charges of the molecule.

Returns

Mol-string. Generates bond information in addition to coordinates from xyz-string.

Return type

str

kgcnn.molecule.encoder module

class kgcnn.molecule.encoder.OneHotEncoder(categories: list, add_unknown: bool = True, dtype: str = 'int')[source]

Bases: object

Simple One-Hot-Encoding for python lists.

Uses a list of possible values for a one-hot encoding of a single value. The translated values must support __eq__ operator. The list of possible values must be set beforehand. Is used as a basic encoder example for MolecularGraphRDKit. There can not be different dtypes in categories.

__call__(value)[source]

Encode a single feature or value, mapping it to a one-hot python list. E.g. [0, 0, 1, 0]

Parameters

value – Any object that can be compared to items in self.one_hot_values.

Returns

Python List with 1 at value match. E.g. [0, 0, 1, 0]

Return type

list

__init__(categories: list, add_unknown: bool = True, dtype: str = 'int')[source]

Initialize the encoder beforehand with a set of all possible values to encounter.

Parameters
  • categories (list) – List of possible values, matching the one-hot encoding.

  • add_unknown (bool) – Whether to add a unknown bit. Default is True.

  • dtype (str) – Data type to cast value into before comparing to category entries. Default is “int”.

classmethod from_config(config)[source]
get_config()[source]
report(name='')[source]

kgcnn.molecule.graph_babel module

class kgcnn.molecule.graph_babel.MolecularGraphOpenBabel(mol=None, make_directed: bool = False)[source]

Bases: kgcnn.molecule.base.MolGraphInterface

A graph object representing a strict molecular graph, e.g. only chemical bonds. This class is an interface to OBMol class to retrieve graph properties.

import numpy as np
from kgcnn.mol.graph_babel import MolecularGraphOpenBabel
mg = MolecularGraphOpenBabel()
mg.from_smiles("CC(C)C(C(=O)O)N")
mg.add_hs()
mg.make_conformer()
mg.optimize_conformer()
mg.compute_partial_charges()
print(MolecularGraphOpenBabel.atom_fun_dict.keys(), MolecularGraphOpenBabel.bond_fun_dict.keys())
print(mg.node_coordinates)
print(mg.edge_indices)
print(mg.node_attributes(properties=["NumBonds", "GasteigerCharge"], encoder={}))
__init__(mol=None, make_directed: bool = False)[source]

Set the mol attribute for composition. This mol instances will be the backends molecule class.

Parameters
  • mol (openbabel.OBMol) – OpenBabel molecule.

  • make_directed (bool) – Whether the edges are directed. Default is False.

add_hs(**kwargs)[source]

Add Hydrogen.

atom_fun_dict = {'AtomicMass': <function MolecularGraphOpenBabel.<lambda>>, 'AtomicNum': <function MolecularGraphOpenBabel.<lambda>>, 'Coordinate': <function MolecularGraphOpenBabel.<lambda>>, 'CoordinateIdx': <function MolecularGraphOpenBabel.<lambda>>, 'Data': <function MolecularGraphOpenBabel.<lambda>>, 'ExactMass': <function MolecularGraphOpenBabel.<lambda>>, 'ExplicitDegree': <function MolecularGraphOpenBabel.<lambda>>, 'ExplicitValence': <function MolecularGraphOpenBabel.<lambda>>, 'FormalCharge': <function MolecularGraphOpenBabel.<lambda>>, 'HasAlphaBetaUnsat': <function MolecularGraphOpenBabel.<lambda>>, 'HasAromaticBond': <function MolecularGraphOpenBabel.<lambda>>, 'HasBondOfOrder1': <function MolecularGraphOpenBabel.<lambda>>, 'HasBondOfOrder2': <function MolecularGraphOpenBabel.<lambda>>, 'HasBondOfOrder3': <function MolecularGraphOpenBabel.<lambda>>, 'HasDoubleBond': <function MolecularGraphOpenBabel.<lambda>>, 'HasNonSingleBond': <function MolecularGraphOpenBabel.<lambda>>, 'HasResidue': <function MolecularGraphOpenBabel.<lambda>>, 'HasSingleBond': <function MolecularGraphOpenBabel.<lambda>>, 'HeteroDegree': <function MolecularGraphOpenBabel.<lambda>>, 'HvyDegree': <function MolecularGraphOpenBabel.<lambda>>, 'Hyb': <function MolecularGraphOpenBabel.<lambda>>, 'ImplicitHCount': <function MolecularGraphOpenBabel.<lambda>>, 'Index': <function MolecularGraphOpenBabel.<lambda>>, 'IsAmideNitrogen': <function MolecularGraphOpenBabel.<lambda>>, 'IsAromatic': <function MolecularGraphOpenBabel.<lambda>>, 'IsAromaticNOxide': <function MolecularGraphOpenBabel.<lambda>>, 'IsAxial': <function MolecularGraphOpenBabel.<lambda>>, 'IsCarboxylOxygen': <function MolecularGraphOpenBabel.<lambda>>, 'IsChiral': <function MolecularGraphOpenBabel.<lambda>>, 'IsHbondAcceptor': <function MolecularGraphOpenBabel.<lambda>>, 'IsHbondAcceptorSimple': <function MolecularGraphOpenBabel.<lambda>>, 'IsHbondDonor': <function MolecularGraphOpenBabel.<lambda>>, 'IsHbondDonorH': <function MolecularGraphOpenBabel.<lambda>>, 'IsHetAtom': <function MolecularGraphOpenBabel.<lambda>>, 'IsHeteroatom': <function MolecularGraphOpenBabel.<lambda>>, 'IsInRing': <function MolecularGraphOpenBabel.<lambda>>, 'IsInRingSize5': <function MolecularGraphOpenBabel.<lambda>>, 'IsInRingSize6': <function MolecularGraphOpenBabel.<lambda>>, 'IsMetal': <function MolecularGraphOpenBabel.<lambda>>, 'IsNitroOxygen': <function MolecularGraphOpenBabel.<lambda>>, 'IsNonPolarHydrogen': <function MolecularGraphOpenBabel.<lambda>>, 'IsPeriodic': <function MolecularGraphOpenBabel.<lambda>>, 'IsPhosphateOxygen': <function MolecularGraphOpenBabel.<lambda>>, 'IsPolarHydrogen': <function MolecularGraphOpenBabel.<lambda>>, 'IsSulfateOxygen': <function MolecularGraphOpenBabel.<lambda>>, 'Isotope': <function MolecularGraphOpenBabel.<lambda>>, 'PartialCharge': <function MolecularGraphOpenBabel.<lambda>>, 'Residue': <function MolecularGraphOpenBabel.<lambda>>, 'SpinMultiplicity': <function MolecularGraphOpenBabel.<lambda>>, 'Title': <function MolecularGraphOpenBabel.<lambda>>, 'TotalDegree': <function MolecularGraphOpenBabel.<lambda>>, 'TotalValence': <function MolecularGraphOpenBabel.<lambda>>, 'Type': <function MolecularGraphOpenBabel.<lambda>>, 'Vector': <function MolecularGraphOpenBabel.<lambda>>, 'Visit': <function MolecularGraphOpenBabel.<lambda>>, 'X': <function MolecularGraphOpenBabel.<lambda>>, 'Y': <function MolecularGraphOpenBabel.<lambda>>, 'Z': <function MolecularGraphOpenBabel.<lambda>>}
bond_fun_dict = {'Aromatic': <function MolecularGraphOpenBabel.<lambda>>, 'BeginAtom': <function MolecularGraphOpenBabel.<lambda>>, 'BeginAtomIdx': <function MolecularGraphOpenBabel.<lambda>>, 'BondOrder': <function MolecularGraphOpenBabel.<lambda>>, 'CisOrTrans': <function MolecularGraphOpenBabel.<lambda>>, 'EndAtom': <function MolecularGraphOpenBabel.<lambda>>, 'EndAtomIdx': <function MolecularGraphOpenBabel.<lambda>>, 'EquibLength': <function MolecularGraphOpenBabel.<lambda>>, 'Flags': <function MolecularGraphOpenBabel.<lambda>>, 'Id': <function MolecularGraphOpenBabel.<lambda>>, 'Idx': <function MolecularGraphOpenBabel.<lambda>>, 'IsAmide': <function MolecularGraphOpenBabel.<lambda>>, 'IsAromatic': <function MolecularGraphOpenBabel.<lambda>>, 'IsCarbonyl': <function MolecularGraphOpenBabel.<lambda>>, 'IsCisOrTrans': <function MolecularGraphOpenBabel.<lambda>>, 'IsClosure': <function MolecularGraphOpenBabel.<lambda>>, 'IsDoubleBondGeometry': <function MolecularGraphOpenBabel.<lambda>>, 'IsEster': <function MolecularGraphOpenBabel.<lambda>>, 'IsHash': <function MolecularGraphOpenBabel.<lambda>>, 'IsInRing': <function MolecularGraphOpenBabel.<lambda>>, 'IsPeriodic': <function MolecularGraphOpenBabel.<lambda>>, 'IsPrimaryAmide': <function MolecularGraphOpenBabel.<lambda>>, 'IsTertiaryAmide': <function MolecularGraphOpenBabel.<lambda>>, 'IsWedge': <function MolecularGraphOpenBabel.<lambda>>, 'IsWedgeOrHash': <function MolecularGraphOpenBabel.<lambda>>, 'Length': <function MolecularGraphOpenBabel.<lambda>>, 'Parent': <function MolecularGraphOpenBabel.<lambda>>, 'Visit': <function MolecularGraphOpenBabel.<lambda>>}
compute_partial_charges(method='gasteiger', **kwargs)[source]

Compute partial charges.

Parameters
  • method (str) – Name of charge model.

  • kwargs – Not used.

Returns

Compute charges return value.

Return type

bool

edge_attributes(properties: list, encoder: dict)[source]

Make edge attributes.

Parameters
  • properties (list) – List of string identifier for a molecular property. Must match backend features.

  • encoder (dict) – A dictionary of callable encoder function or class for each string identifier.

Returns

List of attributes after processed by the encoder.

Return type

list

property edge_indices

Return a list of edge indices of the molecule.

property edge_number

Return a list of edge number that represents the bond order.

from_mol_block(mol_block: str, keep_hs: bool = True, sanitize: bool = True)[source]

Set mol-instance from a string representation containing coordinates and bond information that is MDL mol format equivalent.

Parameters
  • mol_block (str) – Mol-block representation of a molecule.

  • sanitize (bool) – Whether to sanitize the mol-object.

  • keep_hs (bool) – Whether to keep hydrogen.

Returns

self

from_smiles(smile: str, sanitize: bool = True)[source]

Make molecule from smile.

Parameters
  • smile (str) – Smile string for the molecule.

  • sanitize (bool) – Whether to sanitize molecule.

from_xyz(xyz_string)[source]

Setting mol-instance from an external xyz-string. Does not add hydrogen or makes conformers.

Parameters

xyz_string – String of xyz block.

Returns

self

graph_attributes(properties: list, encoder: dict)[source]

Make graph attributes.

Parameters
  • properties (list) – List of string identifier for a molecular property. Must match backend features.

  • encoder (dict) – A dictionary of callable encoder function or class for each string identifier.

Returns

List of attributes after processed by the encoder.

Return type

list

make_conformer(**kwargs)[source]

Make conformer for mol-object.

Parameters

kwargs – Not used.

Returns

Whether conformer generation was successful

Return type

bool

mol_fun_dict = {'ExactMass': <function MolecularGraphOpenBabel.<lambda>>, 'NumAtoms': <function MolecularGraphOpenBabel.<lambda>>, 'NumBonds': <function MolecularGraphOpenBabel.<lambda>>, 'TotalCharge': <function MolecularGraphOpenBabel.<lambda>>}
node_attributes(properties: list, encoder: dict)[source]

Make node attributes.

Parameters
  • properties (list) – List of string identifier for a molecular property. Must match backend features.

  • encoder (dict) – A dictionary of callable encoder function or class for each string identifier.

Returns

List of attributes after processed by the encoder.

Return type

list

property node_coordinates

Return a list of atomic coordinates of the molecule.

property node_number

Return list of node numbers which is the atomic number of atoms in the molecule

property node_symbol

Return a list of atomic symbols of the molecule.

optimize_conformer(force_field='mmff94', steps=100, **kwargs)[source]

Optimize conformer. Requires an initial conformer. See make_conformer.

Parameters
  • force_field (str) – Force field type.

  • steps (int) – Number of iteration steps.

  • kwargs – Kwargs for SteepestDescent.

Returns

Whether conformer optimization was successful.

Return type

bool

remove_hs(**kwargs)[source]

Remove Hydrogen.

to_mol_block()[source]

Make a more extensive string representation containing coordinates and bond information from self.

Returns

Mol-block representation of a molecule.

Return type

mol_block (str)

to_smiles()[source]

Return a smile string representation of the mol instance.

Returns

Smile string.

Return type

smile (str)

kgcnn.molecule.graph_rdkit module

class kgcnn.molecule.graph_rdkit.MolecularGraphRDKit(mol=None, make_directed: bool = False)[source]

Bases: kgcnn.molecule.base.MolGraphInterface

A graph object representing a strict molecular graph, e.g. only chemical bonds using a mol-object from RDkit chemical informatics package.

Generate attributes for nodes, edges, and graph which are in a molecular graph atoms, bonds and the molecule itself. The class is used to get a graph from a RDkit molecule object but also offers some functionality defined in MolGraphInterface .

import numpy as np
from kgcnn.mol.graph_rdkit import MolecularGraphRDKit
mg = MolecularGraphRDKit()
mg.from_smiles("CC(C)C(C(=O)O)N")
mg.add_hs()
mg.make_conformer()
mg.optimize_conformer()
mg.compute_partial_charges()
print(MolecularGraphRDKit.atom_fun_dict.keys(), MolecularGraphRDKit.bond_fun_dict.keys())
print(mg.node_coordinates)
print(mg.edge_indices)
print(mg.node_attributes(properties=["NumBonds", "GasteigerCharge"], encoder={}))
__init__(mol=None, make_directed: bool = False)[source]

Initialize MolecularGraphRDKit with mol object.

Parameters
  • mol (rdkit.Chem.rdchem.Mol) – Mol object from rdkit. Default is None.

  • make_directed (bool) – Whether the edges are directed. Default is False.

add_hs(**kwargs)[source]

Add hydrogen atoms.

Parameters

kwargs – Kwargs for rdkit method, e.g. can specify explicit or implicit.

Returns

self.

atom_fun_dict = {'AtomFeatures': <function MolecularGraphRDKit.<lambda>>, 'AtomMapNum': <function MolecularGraphRDKit.<lambda>>, 'AtomicNum': <function MolecularGraphRDKit.<lambda>>, 'CIPCode': <function MolecularGraphRDKit.<lambda>>, 'CIPRank': <function MolecularGraphRDKit.<lambda>>, 'ChiralTag': <function MolecularGraphRDKit.<lambda>>, 'ChiralityPossible': <function MolecularGraphRDKit.<lambda>>, 'Degree': <function MolecularGraphRDKit.<lambda>>, 'DescribeQuery': <function MolecularGraphRDKit.<lambda>>, 'ExplicitValence': <function MolecularGraphRDKit.<lambda>>, 'FormalCharge': <function MolecularGraphRDKit.<lambda>>, 'GasteigerCharge': <function MolecularGraphRDKit.<lambda>>, 'GasteigerHCharge': <function MolecularGraphRDKit.<lambda>>, 'HasOwningMol': <function MolecularGraphRDKit.<lambda>>, 'Hybridization': <function MolecularGraphRDKit.<lambda>>, 'Idx': <function MolecularGraphRDKit.<lambda>>, 'ImplicitValence': <function MolecularGraphRDKit.<lambda>>, 'IsAromatic': <function MolecularGraphRDKit.<lambda>>, 'IsInRing': <function MolecularGraphRDKit.<lambda>>, 'Isotope': <function MolecularGraphRDKit.<lambda>>, 'Mass': <function MolecularGraphRDKit.<lambda>>, 'MassScaled': <function MolecularGraphRDKit.<lambda>>, 'MolFileRLabel': <function MolecularGraphRDKit.<lambda>>, 'MonomerInfo': <function MolecularGraphRDKit.<lambda>>, 'NoImplicit': <function MolecularGraphRDKit.<lambda>>, 'NumBonds': <function MolecularGraphRDKit.<lambda>>, 'NumExplicitHs': <function MolecularGraphRDKit.<lambda>>, 'NumImplicitHs': <function MolecularGraphRDKit.<lambda>>, 'NumRadicalElectrons': <function MolecularGraphRDKit.<lambda>>, 'PDBResidueInfo': <function MolecularGraphRDKit.<lambda>>, 'Rcovalent': <function MolecularGraphRDKit.<lambda>>, 'RcovalentScaled': <function MolecularGraphRDKit.<lambda>>, 'Rvdw': <function MolecularGraphRDKit.<lambda>>, 'RvdwScaled': <function MolecularGraphRDKit.<lambda>>, 'Smarts': <function MolecularGraphRDKit.<lambda>>, 'Symbol': <function MolecularGraphRDKit.<lambda>>, 'TotalDegree': <function MolecularGraphRDKit.<lambda>>, 'TotalNumHs': <function MolecularGraphRDKit.<lambda>>, 'TotalValence': <function MolecularGraphRDKit.<lambda>>}
bond_fun_dict = {'BeginAtom': <function MolecularGraphRDKit.<lambda>>, 'BeginAtomIdx': <function MolecularGraphRDKit.<lambda>>, 'BondDir': <function MolecularGraphRDKit.<lambda>>, 'BondType': <function MolecularGraphRDKit.<lambda>>, 'BondTypeAsDouble': <function MolecularGraphRDKit.<lambda>>, 'DescribeQuery': <function MolecularGraphRDKit.<lambda>>, 'EndAtom': <function MolecularGraphRDKit.<lambda>>, 'EndAtomIdx': <function MolecularGraphRDKit.<lambda>>, 'Idx': <function MolecularGraphRDKit.<lambda>>, 'IsAromatic': <function MolecularGraphRDKit.<lambda>>, 'IsConjugated': <function MolecularGraphRDKit.<lambda>>, 'IsInRing': <function MolecularGraphRDKit.<lambda>>, 'Smarts': <function MolecularGraphRDKit.<lambda>>, 'Stereo': <function MolecularGraphRDKit.<lambda>>}
clean(**kwargs)[source]

Clean or sanitize mol.

compute_partial_charges(method='gasteiger', **kwargs)[source]

Compute partial charges.

Parameters
  • method (str) – Method to compute partial charges. Defaults to ‘gasteiger’.

  • **kwargs

Returns

self

edge_attributes(properties: list, encoder: dict)[source]

Return edge or bond attributes together with bond indices of the molecule. If flag _make_directed is set to true, then only the bonds as defined by RDkit are returned, otherwise a table of sorted undirected bond indices is returned.

Parameters
  • properties (list) – List of identifiers for properties to retrieve from bonds, or a callable object that receives RDkit bond class and returns list or value.

  • encoder (dict) – A dictionary of optional encoders for each string identifier.

Returns

Indices, Attributes.

Return type

tuple

property edge_indices

Return edge or bond indices of the molecule. If flag _make_directed is set to true, then only the bonds as defined by RDkit are returned, otherwise a table of sorted undirected bond indices is returned.

Returns

Array of bond indices.

Return type

np.ndarray

property edge_number

Make list of the bond order or type of each bond in the molecule.

from_list(atoms: Union[list, numpy.ndarray], bond_idx: Union[list, numpy.ndarray], bond_order: Union[list, numpy.ndarray], conformer: Optional[Union[list, numpy.ndarray]] = None)[source]
Parameters
  • atoms

  • bond_idx

  • bond_order

  • conformer

Returns

self.

from_mol_block(mol_block, sanitize: bool = True, keep_hs: bool = True, strictParsing: bool = True)[source]

Set mol-instance from a mol-block string.

Parameters
  • mol_block (str) – Mol-block representation of a molecule.

  • sanitize (bool) – Whether to sanitize the mol-object.

  • keep_hs (bool) – Whether to keep hydrogen.

  • strictParsing (bool) – If this is false, the parser is more lax about. correctness of the content. Defaults to true.

Returns

self

from_smiles(smile, sanitize: bool = True, **kwargs)[source]

Make molecule from smile.

Parameters
  • smile (str) – Smile string for the molecule.

  • sanitize (bool) – Whether to sanitize molecule.

  • kwargs – Kwargs for MolFromSmiles .

Returns

self

from_xyz(xyz_string: str, charge: Optional[Union[list, int]] = None)[source]

Setting mol-instance from an external xyz-string. Does not add hydrogen or makes conformers.

Parameters
  • xyz_string (str) – String of xyz block.

  • charge (int, list) – Charge or possible charges of the molecule. Default is [0, 1, -1, 2, -2].

Returns

self.

graph_attributes(properties: list, encoder: dict)[source]

Return graph or molecular attributes.

Parameters
  • properties (list) – List of identifiers for properties to retrieve from the molecule, or a callable object that receives RDkit molecule class and returns list or value.

  • encoder (dict) – A dictionary of optional encoders for each string identifier.

Returns

List of molecular graph-level properties.

Return type

list

make_conformer(**kwargs)[source]

Make conformer for mol-object.

Parameters

kwargs – Kwargs for rdkit EmbedMolecule .

Returns

Whether conformer generation was successful

Return type

bool

mol_fun_dict = {'AtomsIsAromatic': <function MolecularGraphRDKit.<lambda>>, 'AtomsIsInRing': <function MolecularGraphRDKit.<lambda>>, 'BondsIsAromatic': <function MolecularGraphRDKit.<lambda>>, 'BondsIsConjugated': <function MolecularGraphRDKit.<lambda>>, 'C': <function MolecularGraphRDKit.<lambda>>, 'Cl': <function MolecularGraphRDKit.<lambda>>, 'ExactMolWt': <function <lambda>>, 'F': <function MolecularGraphRDKit.<lambda>>, 'FpDensityMorgan3': <function MolecularGraphRDKit.<lambda>>, 'FractionCSP3': <function MolecularGraphRDKit.<lambda>>, 'H': <function MolecularGraphRDKit.<lambda>>, 'MolLogP': <function MolecularGraphRDKit.<lambda>>, 'MolMR': <function MolecularGraphRDKit.<lambda>>, 'N': <function MolecularGraphRDKit.<lambda>>, 'NumAtoms': <function MolecularGraphRDKit.<lambda>>, 'NumBonds': <function MolecularGraphRDKit.<lambda>>, 'NumRotatableBonds': <function MolecularGraphRDKit.<lambda>>, 'O': <function MolecularGraphRDKit.<lambda>>, 'S': <function MolecularGraphRDKit.<lambda>>, 'fr_Al_COO': <function MolecularGraphRDKit.<lambda>>, 'fr_Al_OH': <function MolecularGraphRDKit.<lambda>>, 'fr_Ar_COO': <function MolecularGraphRDKit.<lambda>>, 'fr_Ar_OH': <function MolecularGraphRDKit.<lambda>>, 'fr_C_O_noCOO': <function MolecularGraphRDKit.<lambda>>, 'fr_NH2': <function MolecularGraphRDKit.<lambda>>, 'fr_SH': <function MolecularGraphRDKit.<lambda>>, 'fr_alkyl_halide': <function MolecularGraphRDKit.<lambda>>, 'fr_sulfide': <function MolecularGraphRDKit.<lambda>>}
node_attributes(properties: list, encoder: dict)[source]

Return node or atom attributes.

Parameters
  • properties (list) – List of string identifiers for properties to retrieve from atoms, or a callable object that receives RDkit atom class and returns list or value.

  • encoder (dict) – A dictionary of optional encoders for each string identifier.

Returns

List of atomic properties.

Return type

list

property node_coordinates

Return a list or array of atomic coordinates of the molecule.

property node_number

Return list of node number which is the atomic number of each atom in the molecule

property node_symbol

Return a list of atomic symbols of the molecule.

optimize_conformer(force_field='mmff94', **kwargs)[source]

Optimize conformer. Requires an initial conformer. See make_conformer.

Parameters
  • force_field (str) – Force field type.

  • kwargs – Kwargs for rdkit optimization function. Includes iterations and force field sub type.

Returns

Whether conformer optimization was successful.

Return type

bool

remove_hs(**kwargs)[source]

Remove hydrogen atoms.

Parameters

kwargs – Kwargs for rdkit method, e.g. can specify explicit or implicit.

Returns

self.

to_mol_block(**kwargs)[source]

Make mol-block from mol-object.

Returns

Mol-block representation of a molecule.

Return type

mol_block (str)

to_smiles(**kwargs)[source]

Return a smile string representation of the mol instance.

Returns

Smile string.

Return type

smile (str)

kgcnn.molecule.io module

kgcnn.molecule.io.parse_list_to_xyz_str(mol: list, comment: str = '', number_coordinates: Optional[int] = None)[source]

Convert list of atom and coordinates list into xyz-string.

Parameters
  • mol (list) – Tuple or list of [[‘C’, ‘H’, …], [[0.0, 0.0, 0.0], [1.0, 1.0, 1.0], … ]].

  • comment (str) – Comment for comment line in xyz string. Default is “”.

  • number_coordinates (int) – Number of allowed coordinates.

Returns

Information in xyz-string format.

Return type

str

kgcnn.molecule.io.parse_mol_str(mol_str: str)[source]

Parse MDL mol table string into nested list. Only supports V2000 format and CTab. Better rely on OpenBabel to do this. This function was a temporary solution.

Parameters

mol_str (str) – String of mol block.

Returns

[title, program, comment, counts, atoms, bonds, properties]

Return type

list

kgcnn.molecule.io.read_mol_list_from_sdf_file(filepath, line_by_line=False)[source]

Simple loader to load an SDF file by only splitting.

Parameters
  • filepath (str) – File path for SDF file.

  • line_by_line (bool) – Whether to read SDF file line by line.

Returns

List of mol blocks as string.

Return type

list

kgcnn.molecule.io.read_smiles_file(file_path)[source]

Simply python function to read smiles from file.

Parameters

file_path (str) – File path for smiles file.

Returns

List of smiles.

Return type

list

kgcnn.molecule.io.read_xyz_file(file_path, delimiter: Optional[str] = None, line_by_line=False)[source]

Simple python script to read xyz-file and parse into a nested python list. Always returns a list with the geometries in xyz file.

Parameters
  • file_path (str) – Full path to xyz-file.

  • delimiter (str) – Delimiter for xyz separation. Default is ‘ ‘.

  • line_by_line (bool) – Whether to read XYZ file line by line.

Returns

Nested coordinates from xyz-file.

Return type

list

kgcnn.molecule.io.write_list_to_xyz_file(filepath: str, mol_list: list)[source]

Write a list of nested list of atom and coordinates into xyz-string. Uses parse_list_to_xyz_str.

Parameters
  • filepath (str) – Full path to file including name.

  • mol_list (list) – List of molecules, which is a list of pairs of atoms and coordinates of [[[‘C’, ‘H’, … ], [[0.0, 0.0, 0.0], [1.0, 1.0, 1.0], … ]], … ].

kgcnn.molecule.io.write_mol_block_list_to_sdf(mol_block_list, filepath)[source]

Write a list of mol blocks as string into a SDF file.

Parameters
  • mol_block_list (list) – List of mol blocks as string.

  • filepath (str) – File path for SDF file.

Returns

None.

kgcnn.molecule.io.write_smiles_file(file_path, smile_list)[source]

Simply python function to write smiles to file.

Parameters
  • file_path (str) – File path for smiles file.

  • smile_list (list) – List of smiles to write to file.

Returns

None

kgcnn.molecule.methods module

kgcnn.molecule.methods.get_connectivity_from_inverse_distance_matrix(inv_dist_mat, protons, radii_dict=None, k1=16.0, k2=1.3333333333333333, cutoff=0.85, force_bonds=True)[source]

Get connectivity table from inverse distance matrix defined at last dimensions (…, N, N) and corresponding bond-radii. Keeps shape with (…, N, N). Covalent radii, from Pyykko and Atsumi, Chem. Eur. J. 15, 2009, 188-197. Values for metals decreased by 10% according to Robert Paton’s Sterimol implementation. Partially based on code from Robert Paton’s Sterimol script, which based this part on Grimme’s D3 code. Vectorized version of the original code for numpy arrays that take atomic numbers as input.

Parameters
  • inv_dist_mat (np.ndarray) – Inverse distance matrix defined at last dimensions (…, N, N) distances must be in Angstrom not in Bohr.

  • protons (np.ndarray) – An array of atomic numbers matching the inv_dist_mat (…, N), for which the radii are to be computed.

  • radii_dict (np.ndarray) – Covalent radii for each element. If None, stored values are used. Otherwise, expected numpy array with covalent bonding radii. Example: np.array([0, 0.34, 0.46, 1.2, ...]) for atomic number np.array([0, 1, 2, ...]) that would match [None, 'H', 'He', 'Li', ...].

  • k1 (float) – K1-value. Defaults to 16

  • k2 (float) – K2-value. Defaults to 4.0/3.0

  • cutoff (float) – Cutoff value to set values to Zero (no bond). Defaults to 0.85.

  • force_bonds (bool) – Whether to force at least one bond in the bond table per atom. Default is True.

Returns

Connectivity table with 1 for chemical bond and zero otherwise of shape (…, N, N).

Return type

np.ndarray

kgcnn.molecule.preprocessor module

class kgcnn.molecule.preprocessor.SetMolAttributes(*, nodes: Optional[list] = None, edges: Optional[list] = None, graph: Optional[list] = None, encoder_nodes: Optional[dict] = None, encoder_edges: Optional[dict] = None, encoder_graph: Optional[dict] = None, node_coordinates: str = 'node_coordinates', node_symbol: str = 'node_symbol', node_number: str = 'node_number', edge_indices: str = 'edge_indices', edge_number: str = 'edge_number', node_attributes: str = 'node_attributes', edge_attributes: str = 'edge_attributes', graph_attributes: str = 'graph_attributes', name='set_mol_attributes', **kwargs)[source]

Bases: kgcnn.graph.base.GraphPreProcessorBase

Preprocessor to compute molecular attributes from graph arrays that make a valid molecule via a MolGraphInterface . See MoleculeNetDataset which uses a callbacks but has identical nomenclature.

from kgcnn.data.datasets.QM7Dataset import QM7Dataset
from kgcnn.molecule.preprocessor import SetMolAttributes
ds = QM7Dataset()
pp = SetMolAttributes()
print(pp(ds[0]))
Parameters
  • nodes (list) – List of atomic properties for attributes.

  • edges (list) – List of bond properties for attributes.

  • graph (list) – List of molecular properties for attributes.

  • encoder_nodes (dict) – Dictionary of node attribute encoders.

  • encoder_edges (dict) – Dictionary of edge attribute encoders.

  • encoder_graph (dict) – Dictionary of graph attribute encoders.

  • node_coordinates (str) – Name of numpy array storing atomic coordinates.

  • node_symbol (str) – Name of numpy array storing atomic symbol.

  • node_number (str) – Name of numpy array storing atomic number.

  • edge_indices (str) – Name of numpy array storing atomic bond indices.

  • edge_number (str) – Name of numpy array storing atomic bond order.

  • node_attributes (str) – Name to assign node attributes to.

  • edge_attributes (str) – Name to assign edge attributes to.

  • graph_attributes (str) – Name to assign graph attributes to.

  • name (str) – Name of the preprocessor.

call(nodes: list, edges: list, graph: list, encoder_nodes: dict, encoder_edges: dict, encoder_graph: dict, node_coordinates: numpy.ndarray, node_symbol: numpy.ndarray, node_number: numpy.ndarray, edge_indices: numpy.ndarray, edge_number: numpy.ndarray)[source]
class kgcnn.molecule.preprocessor.SetMolBondIndices(*, node_coordinates: str = 'node_coordinates', node_symbol: str = 'node_symbol', node_number: str = 'node_number', edge_indices: str = 'edge_indices', edge_number: str = 'edge_number', name='set_mol_bond_indices', **kwargs)[source]

Bases: kgcnn.graph.base.GraphPreProcessorBase

Preprocessor to compute chemical bonds from coordinates via a MolGraphInterface .

Parameters
  • node_coordinates (str) – Name of atomic coordinates array of shape (N, 3) .

  • node_symbol (str) – Name of atomic symbol as numpy array of shape (N, ) .

  • node_number (str) – Name of atomic numbers array of shape (N, ) .

  • edge_indices (str) – Name to assign edge indices to.

  • edge_number (str) – Name to assign the edge number/order to.

  • name (str) – Name of this preprocessor.

call(node_coordinates: numpy.ndarray, node_symbol: numpy.ndarray, node_number: numpy.ndarray)[source]

kgcnn.molecule.serial module

kgcnn.molecule.serial.deserialize_encoder(encoder_identifier)[source]

Deserialization of encoder class.

Parameters

encoder_identifier – Identifier, class or function of an encoder.

Returns

Deserialized encoder.

Return type

obj

Module contents