kgcnn.data.transform.scaler package¶

Submodules¶

kgcnn.data.transform.scaler.force module¶

class kgcnn.data.transform.scaler.force.EnergyForceExtensiveLabelScaler(standardize_coordinates: bool = False, energy: str = 'energy', force: str = 'force', atomic_number: str = 'atomic_number', sample_weight: Optional[str] = None, **kwargs)[source]¶

Bases: kgcnn.data.transform.scaler.molecule._ExtensiveMolecularScalerBase

Extensive scaler for scaling jointly energy, forces.

Inherits from kgcnn.scaler.mol._ExtensiveMolecularScalerBase but makes use of X , y , as atomic_number and (energy , force ). In contrast to kgcnn.scaler.mol.ExtensiveMolecularLabelScaler which uses only y as energy .

Interface is designed after scikit-learn scaler and has additional functions to apply on datasets with fit_dataset() and transform_dataset()

Note

Units for energy and forces must match.

Code example for scaler:

import numpy as np
from kgcnn.data.transform.scaler.force import EnergyForceExtensiveLabelScaler
energy = np.random.rand(5).reshape((5,1))
mol_num = [np.array([6, 1, 1, 1, 1]), np.array([7, 1, 1, 1]),
    np.array([6, 6, 1, 1, 1, 1]), np.array([6, 6, 1, 1]), np.array([6, 6, 1, 1, 1, 1, 1, 1])
]
force = [np.random.rand(len(m)*3).reshape((len(m),3)) for m in mol_num]
scaler = EnergyForceExtensiveLabelScaler()
scaler.fit(y=[energy, force], X=mol_num)
print(scaler.get_weights())
print(scaler.get_config())
scaler._plot_predict(energy, mol_num)  # For debugging.
y, f = scaler.transform(y=[energy, force], X=mol_num)
print(energy, y)
print(scaler.inverse_transform(y=[y, f], X=mol_num)[1][1][0], f[0])
scaler.save("example.json")
new_scaler = EnergyForceExtensiveLabelScaler()
new_scaler.load("example.json")
print(scaler.inverse_transform(y=[y, f], X=mol_num)[1][1][0], f[0])

__init__(standardize_coordinates: bool = False, energy: str = 'energy', force: str = 'force', atomic_number: str = 'atomic_number', sample_weight: Optional[str] = None, **kwargs)[source]¶

Initialize layer with arguments for kgcnn.scaler.mol._ExtensiveMolecularScalerBase .

Parameters

standardize_coordinates (bool) – Whether to standardize coordinates. Must always be False.
kwargs – Kwargs for kgcnn.scaler.mol._ExtensiveMolecularScalerBase parent class. See docs for this class.

fit(y: Optional[Tuple[List[numpy.ndarray], List[numpy.ndarray]]] = None, *, X: Optional[List[numpy.ndarray]] = None, sample_weight: Union[None, numpy.ndarray] = None, force: Union[None, List[numpy.ndarray]] = None, atomic_number: Union[None, List[numpy.ndarray]] = None) → Tuple[List[numpy.ndarray], List[numpy.ndarray]][source]¶

Fit Scaler to data.

Parameters

y (tuple) – Tuple of (energy, forces) . Energies must be a single array or list of energies of shape (n_samples, n_states) . For one energy this must still be (n_samples, 1) . List of forces as with each force stored in a numpy array. Note that you can also pass the forces separately to function argument force , in which case y should be only energies (not a tuple).
X (list) – Atomic number atomic_number are a list of arrays of atomic numbers. Example: [np.array([7,1,1,1]), …] . They must match in length. Note that you can also pass the atomic numbers separately to function argument atomic_number , in which case X is ignored.
sample_weight (list, np.ndarray) – Weights for each sample.
force (list) – List of forces as numpy arrays. Deprecated, since they can be contained in y .
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Deprecated, since they can be contained in X .

Returns

self.

fit_dataset(dataset: List[Dict[str, numpy.ndarray]], **fit_params)[source]¶

Fit to dataset with relevant X , y information.

Parameters

dataset (list) – Dataset of type List[Dict] containing energies and forces and atomic numbers.
fit_params – Fit parameters handed to fit()

Returns

self.

fit_transform(y: Optional[Tuple[List[numpy.ndarray], List[numpy.ndarray]]] = None, *, X: Optional[List[numpy.ndarray]] = None, sample_weight: Union[None, numpy.ndarray] = None, force: Union[None, List[numpy.ndarray]] = None, atomic_number: Union[None, List[numpy.ndarray]] = None, copy: bool = True) → Tuple[List[numpy.ndarray], List[numpy.ndarray]][source]¶

Fit Scaler to data and subsequently transform data.

Parameters

y (tuple) – Tuple of (energy, forces) . Energies must be a single array or list of energies of shape (n_samples, n_states) . For one energy this must still be (n_samples, 1) . List of forces as with each force stored in a numpy array. Note that you can also pass the forces separately to function argument force , in which case y should be only energies (not a tuple).
X (list) – Atomic number atomic_number are a list of arrays of atomic numbers. Example: [np.array([7,1,1,1]), …] . They must match in length. Note that you can also pass the atomic numbers separately to function argument atomic_number , in which case X is ignored.
sample_weight (list, np.ndarray) – Weights for each sample.
force (list) – List of forces as numpy arrays. Deprecated, since they can be contained in y .
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Deprecated, since they can be contained in X .
copy (bool) – Not yet implemented.

Returns

Tuple of transformed (energy, forces) .

Return type

tuple

fit_transform_dataset(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False, **fit_params) → List[Dict[str, numpy.ndarray]][source]¶

Fit and transform to dataset with relevant X , y information.

Parameters

dataset (list) – Dataset of type List[Dict] containing energies and forces and atomic numbers.
copy (bool) – Whether to copy dataset. Default is True.
copy_dataset (bool) – Whether to copy full dataset. Default is False.
fit_params – Fit parameters handed to fit()

Returns

Transformed dataset.

Return type

dataset

get_config() → dict [source]¶: Get configuration for scaler.

inverse_transform(y: Optional[Tuple[List[numpy.ndarray], List[numpy.ndarray]]] = None, *, X: Optional[List[numpy.ndarray]] = None, force: Union[None, List[numpy.ndarray]] = None, atomic_number: Union[None, List[numpy.ndarray]] = None, copy: bool = True) → Tuple[List[numpy.ndarray], List[numpy.ndarray]][source]¶

Scale back data for atoms.

Parameters

y (tuple) – Tuple of (energy, forces) . Energies must be a single array or list of energies of shape (n_samples, n_states) . For one energy this must still be (n_samples, 1) . List of forces as with each force stored in a numpy array. Note that you can also pass the forces separately to function argument force , in which case y should be only energies (not a tuple).
X (list) – Atomic number atomic_number are a list of arrays of atomic numbers. Example: [np.array([7,1,1,1]), …] . They must match in length. Note that you can also pass the atomic numbers separately to function argument atomic_number , in which case X is ignored.
force (list) – List of forces as numpy arrays. Deprecated, since they can be contained in y .
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Deprecated, since they can be contained in X .
copy (bool) – Not yet implemented.

Returns

Tuple of reverse-transformed (energy, forces) .

Return type

tuple

inverse_transform_dataset(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶

Inverse transform dataset with relevant X , y information.

Parameters

dataset (list) – Dataset of type List[Dict] containing energies and forces and atomic numbers.
copy (bool) – Whether to copy dataset. Default is True.
copy_dataset (bool) – Whether to copy full dataset. Default is False.

Returns

Inverse-transformed dataset.

Return type

dataset

set_config(config: dict)[source]¶

Set configuration for scaler.

Parameters: config (dict) – Config dictionary.

transform(y: Optional[Tuple[List[numpy.ndarray], List[numpy.ndarray]]] = None, *, X: Optional[List[numpy.ndarray]] = None, force: Union[None, List[numpy.ndarray]] = None, atomic_number: Union[None, List[numpy.ndarray]] = None, copy: bool = True) → Tuple[List[numpy.ndarray], List[numpy.ndarray]][source]¶

Perform scaling of atomic energies and forces.

Parameters

y (tuple) – Tuple of (energy, forces) . Energies must be a single array or list of energies of shape (n_samples, n_states) . For one energy this must still be (n_samples, 1) . List of forces as with each force stored in a numpy array. Note that you can also pass the forces separately to function argument force , in which case y should be only energies (not a tuple).
X (list) – Atomic number atomic_number are a list of arrays of atomic numbers. Example: [np.array([7,1,1,1]), …] . They must match in length. Note that you can also pass the atomic numbers separately to function argument atomic_number , in which case X is ignored.
force (list) – List of forces as numpy arrays. Deprecated, since they can be contained in y .
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Deprecated, since they can be contained in X .
copy (bool) – Not yet implemented.

Returns

Tuple of transformed (energy, forces) .

Return type

tuple

transform_dataset(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶

Transform dataset with relevant X , y information.

Parameters

dataset (list) – Dataset of type List[Dict] containing energies and forces and atomic numbers.
copy (bool) – Whether to copy data for transformation. Default is True.
copy_dataset (bool) – Whether to copy full dataset. Default is False.

Returns

Transformed dataset.

Return type

dataset

kgcnn.data.transform.scaler.molecule module¶

class kgcnn.data.transform.scaler.molecule.ExtensiveMolecularLabelScaler(y: str = 'graph_labels', atomic_number: str = 'atomic_number', sample_weight: Optional[str] = None, **kwargs)[source]¶

Bases: kgcnn.data.transform.scaler.molecule._ExtensiveMolecularScalerBase

Equivalent of ExtensiveMolecularScaler for labels, which uses the y argument for labels. For X the atomic numbers can be passed.

import numpy as np
from kgcnn.scaler.mol import ExtensiveMolecularLabelScaler
data = np.random.rand(5).reshape((5,1))
mol_num = [np.array([6, 1, 1, 1, 1]), np.array([7, 1, 1, 1]),
    np.array([6, 6, 1, 1, 1, 1]), np.array([6, 6, 1, 1]), np.array([6, 6, 1, 1, 1, 1, 1, 1])
]
scaler = ExtensiveMolecularLabelScaler()
scaler.fit(X=mol_num, y=data)
print(scaler.get_weights())
print(scaler.get_config())
scaler._plot_predict(data, mol_num)  # For debugging.
print(scaler.inverse_transform(X=mol_num, y=scaler.transform(X=mol_num, y=data)))
print(data)
scaler.save("example.json")
new_scaler = ExtensiveMolecularLabelScaler()
new_scaler.load("example.json")
print(scaler.inverse_transform(X=mol_num, y=scaler.transform(X=mol_num, y=data)))

fit(y: Union[None, list, numpy.ndarray] = None, *, X=None, sample_weight=None, atomic_number=None)[source]¶

Fit labels with atomic number information.

Parameters

y – Array of atomic labels of shape (n_samples, n_labels).
X – List of array of atomic numbers. Example [np.array([7,1,1,1]), …].
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Optional, since they should be contained in X . Note that if assigning atomic_numbers then X is ignored.
sample_weight – Sample weights (n_samples,) directly passed to Ridge(). Default is None.

Returns

Transformed y.

Return type

np.ndarray

fit_transform(y=None, *, X=None, copy=True, atomic_number=None, sample_weight=None)[source]¶

Fit and transform.

Parameters

y – Array of atomic labels of shape (n_samples, n_labels).
X – List of array of atomic numbers. Example [np.array([7,1,1,1]), …].
copy (bool) – Whether to copy or change in place.
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Optional, since they should be contained in X . Note that if assigning atomic_numbers then X is ignored.
sample_weight – Sample weights (n_samples,) directly passed to Ridge(). Default is None.

Returns:

get_config()[source]¶: Get configuration for scaler.

inverse_transform(y=None, *, X=None, copy=True, atomic_number=None)[source]¶

Reverse the transform method to original labels without offset removed and scaled to original units.

Parameters

y – Array of atomic labels of shape (n_samples, n_labels).
X – List of array of atomic numbers. Example [np.array([7,1,1,1]), …].
copy (bool) – Whether to copy or change in place.
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Optional, since they should be contained in X . Note that if assigning atomic_numbers then X is ignored.

Returns

Transformed y.

Return type

np.ndarray

set_config(config)[source]¶

Set configuration for scaler.

Parameters: config (dict) – Config dictionary.

transform(y=None, *, X=None, copy=True, atomic_number=None)[source]¶

Transform any atomic number list with matching labels based on previous fit with sequential std-scaling.

Parameters

y – Array of atomic labels of shape (n_samples, n_labels).
X – List of array of atomic numbers. Example [np.array([7,1,1,1]), …].
copy (bool) – Whether to copy or change in place.
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Optional, since they should be contained in X . Note that if assigning atomic_numbers then X is ignored.

Returns

Transformed y.

Return type

np.ndarray

class kgcnn.data.transform.scaler.molecule.ExtensiveMolecularScaler(X: str = 'graph_attributes', atomic_number: str = 'atomic_number', sample_weight: Optional[str] = None, **kwargs)[source]¶

Bases: kgcnn.data.transform.scaler.molecule._ExtensiveMolecularScalerBase

Scaler for extensive properties like energy to remove a simple linear behaviour with additive atom contributions. Interface is designed after scikit-learn scaler. Internally Ridge regression ist used. Only the atomic number is used as extensive scaler. This could be further improved by also taking bonds and interactions into account, e.g. as energy contribution.

import numpy as np
from kgcnn.scaler.mol import ExtensiveMolecularScaler
data = np.random.rand(5).reshape((5,1))
mol_num = [np.array([6, 1, 1, 1, 1]), np.array([7, 1, 1, 1]),
    np.array([6, 6, 1, 1, 1, 1]), np.array([6, 6, 1, 1]), np.array([6, 6, 1, 1, 1, 1, 1, 1])
]
scaler = ExtensiveMolecularScaler()
scaler.fit(X=data, atomic_number=mol_num)
print(scaler.get_weights())
print(scaler.get_config())
scaler._plot_predict(data, mol_num)  # For debugging.
print(scaler.inverse_transform(scaler.transform(X=data, atomic_number=mol_num), atomic_number=mol_num))
print(data)
scaler.save("example.json")
new_scaler = ExtensiveMolecularScaler()
new_scaler.load("example.json")
print(scaler.inverse_transform(scaler.transform(X=data, atomic_number=mol_num), atomic_number=mol_num))

fit(X, *, y: Union[None, numpy.ndarray] = None, sample_weight=None, atomic_number=None)[source]¶

Fit atomic number to the molecular properties.

Parameters

X (np.ndarray) – Array of atomic properties of shape (n_samples, n_properties).
y (np.ndarray) – Ignored.
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …].
sample_weight – Sample weights (n_samples,) directly passed to Ridge(). Default is None.

Returns

self.

fit_transform(X, *, y=None, copy=True, atomic_number=None, sample_weight=None)[source]¶

Fit and transform.

Parameters

X (np.ndarray) – Array of atomic properties of shape (n_samples, n_properties).
y (np.ndarray) – Ignored.
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …].
copy (bool) – Whether to copy or change in place.
sample_weight – Sample weights (n_samples,) directly passed to Ridge(). Default is None.

Returns

Transformed properties.

Return type

np.ndarray

get_config()[source]¶: Get configuration for scaler.

inverse_transform(X, *, y=None, copy=True, atomic_number=None)[source]¶

Reverse the transform method to original properties without offset removed and scaled to original units.

Parameters

X (np.ndarray) – Array of atomic properties of shape (n_samples, n_properties).
y (np.ndarray) – Ignored.
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …].
copy (bool) – Whether to copy or change in place.

Returns

Original atomic properties. Shape is (n_samples, n_properties).

Return type

np.ndarray

set_config(config)[source]¶

Set configuration for scaler.

Parameters: config (dict) – Config dictionary.

transform(X, *, y=None, copy=True, atomic_number=None)[source]¶

Transform any atomic number list with matching properties based on previous fit with sequential std-scaling.

Parameters

X (np.ndarray) – Array of atomic properties of shape (n_samples, n_properties).
y (np.ndarray) – Ignored.
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …].
copy (bool) – Whether to copy or change in place.

Returns

Transformed atomic properties fitted. Shape is (n_samples, n_properties).

Return type

np.ndarray

class kgcnn.data.transform.scaler.molecule.QMGraphLabelScaler(scaler: list, y: str = 'graph_labels', X: Optional[str] = None, atomic_number: str = 'atomic_number', sample_weight: Optional[str] = None)[source]¶

Bases: object

A scaler that scales QM targets differently. For now, the main difference is that intensive and extensive properties are scaled differently. In principle, also dipole, polarizability or rotational constants could to be standardized differently. Interface is designed after scikit-learn scaler.

The class is simply a list of separate scaler and scales each target of shape [N_samples, target] with a scaler from its list. QMGraphLabelScaler is intended as a scaler list class.

Note

The scaler uses y argument

Each label is passed to the corresponding scaler in list simply as first argument without keyword X or y.

import numpy as np
from kgcnn.scaler.mol import QMGraphLabelScaler, ExtensiveMolecularScaler
from kgcnn.scaler.scaler import StandardScaler
data = np.random.rand(10).reshape((5,2))
mol_num = [np.array([6, 1, 1, 1, 1]), np.array([7, 1, 1, 1]),
    np.array([6, 6, 1, 1, 1, 1]), np.array([6, 6, 1, 1]), np.array([6, 6, 1, 1, 1, 1, 1, 1])
]
scaler = QMGraphLabelScaler([ExtensiveMolecularScaler(), StandardScaler()])
scaler.fit(y=data, atomic_number=mol_num)
print(scaler.get_weights())
print(scaler.get_config())
print(scaler.inverse_transform(scaler.transform(y=data, atomic_number=mol_num), atomic_number=mol_num))
print(data)
scaler.save("example.json")
new_scaler = QMGraphLabelScaler([ExtensiveMolecularScaler(), StandardScaler()])
new_scaler.load("example.json")
print(new_scaler.inverse_transform(scaler.transform(y=data, atomic_number=mol_num), atomic_number=mol_num))

fit(y: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, X: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, atomic_number: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, sample_weight=None)[source]¶

Fit scaling of QM graph labels or targets.

Parameters

y (np.ndarray) – Array of atomic labels of shape (n_samples, n_labels).
X (np.ndarray) – Ignored.
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …].
sample_weight – Sample weights (n_samples,) directly passed to Ridge(). Default is None.

Returns

self

fit_dataset(dataset: List[Dict[str, numpy.ndarray]])[source]¶

Fit to dataset with relevant X , y information.

Parameters: dataset (list) – Dataset of type List[Dict] with dictionary of numpy arrays.
Returns: self.

fit_transform(y: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, X: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, atomic_number: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, copy: bool = True, sample_weight=None)[source]¶

Fit and transform all target labels for QM.

Parameters

y (np.ndarray) – Array of atomic labels of shape (n_samples, n_labels).
X (np.ndarray) – Not used.
copy (bool) – Whether to copy or change in place.
atomic_number (list) – List of array of atomic numbers. Example [np.array([7,1,1,1]), …].
sample_weight – Sample weights (n_samples,) directly passed to Ridge(). Default is None.

Returns

Transformed labels of shape (n_samples, n_labels).

Return type

np.ndarray

fit_transform_dataset(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶

Fit and transform to dataset with relevant X , y information.

Parameters

dataset (list) – Dataset of type List[Dict] with dictionary of numpy arrays.
copy (bool) – Whether to copy data for transformation. Default is True.
copy_dataset (bool) – Whether to copy full dataset. Default is False.

Returns

Transformed dataset.

Return type

dataset

get_config()[source]¶: Get configuration for scaler.

get_scaling()[source]¶: Get scale of shape (1, n_properties).

get_weights()[source]¶: Get weights for this scaler after fit.

inverse_transform(y: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, X: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, atomic_number: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, copy: bool = True)[source]¶

Back-transform all target labels for QM.

Parameters

y (np.ndarray) – Array of atomic labels of shape (n_samples, n_labels).
X (np.ndarray) – Not used.
copy (bool) – Whether to copy or change in place.
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …].

Returns

Back-transformed labels of shape (n_samples, n_labels).

Return type

np.ndarray

inverse_transform_dataset(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶

Inverse transform dataset with relevant X , y information.

Parameters

dataset (list) – Dataset of type List[Dict] with dictionary of numpy arrays.
copy (bool) – Whether to copy data for transformation. Default is True.
copy_dataset (bool) – Whether to copy full dataset. Default is False.

Returns

Inverse-transformed dataset.

Return type

dataset

load(file_path: str)[source]¶

Load scaler serialization from file.

Parameters: file_path – Filepath to load scaler serialization.

save(file_path: str)[source]¶

Save scaler serialization to file.

Parameters: file_path – Filepath to save scaler serialization.

save_weights(file_path: str)[source]¶

Save weights as numpy to file.

Parameters: file_path – Filepath to save weights.

property scale_¶: Composite scale of all scaler in list.

set_config(config)[source]¶

Set configuration for scaler.

Parameters: config (dict) – Config dictionary.

set_weights(weights: dict)[source]¶

Set weights for this scaler.

Parameters: weights (dict) – Weight dictionary.

transform(y: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, X: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, atomic_number: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, copy=True)[source]¶

Transform all target labels for QM. Requires fit() called previously.

Parameters

y (np.ndarray) – Array of QM unscaled labels of shape (n_samples, n_labels).
X (np.ndarray) – Not used.
copy (bool) – Whether to copy or change in place.
atomic_number (list) – List of atomic numbers for each molecule. E.g. [np.array([6,1,1,1]), …].

Returns

Transformed labels of shape (n_samples, n_labels).

Return type

np.ndarray

transform_dataset(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶

Transform dataset with relevant X information.

Parameters

dataset (list) – Dataset of type List[Dict] with dictionary of numpy arrays.
copy (bool) – Whether to copy data for transformation. Default is True.
copy_dataset (bool) – Whether to copy full dataset. Default is False.

Returns

Transformed dataset.

Return type

dataset

class kgcnn.data.transform.scaler.molecule._ExtensiveMolecularScalerBase(alpha: float = 1e-09, fit_intercept: bool = False, standardize_scale: bool = True, **kwargs)[source]¶

Bases: object

Scaler base class for extensive properties like energy to remove a simple linear behaviour with additive atom contributions.

__init__(alpha: float = 1e-09, fit_intercept: bool = False, standardize_scale: bool = True, **kwargs)[source]¶

Initialize scaler with parameters directly passed to scikit-learns Ridge().

Parameters

alpha (float) – Regularization parameter for regression.
fit_intercept (bool) – Whether to allow a constant offset per target.
standardize_scale (bool) – Whether to standardize output after offset removal.
kwargs – Additional arguments passed to Ridge().

_fit(molecular_property, atomic_number, sample_weight=None)[source]¶

Fit atomic number to the molecular properties.

Parameters

molecular_property (np.ndarray) – Molecular properties of shape (n_samples, n_properties) .
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …].
sample_weight – Sample weights (n_samples,) directly passed to Ridge(). Default is None.

Returns

self

_fit_transform(molecular_property, atomic_number, copy=True, sample_weight=None)[source]¶

Combine fit and transform methods in one call.

Parameters

molecular_property (np.ndarray) – Molecular properties of shape (n_samples, n_properties) .
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …].
copy (bool) – Whether to copy or change in place.
sample_weight – Sample weights (n_samples,) directly passed to Ridge(). Default is None.

Returns

Transformed atomic properties fitted. Shape is (n_samples, n_properties).

Return type

np.ndarray

_inverse_transform(molecular_property: numpy.ndarray, atomic_number: List[numpy.ndarray], copy: bool = True) → numpy.ndarray [source]¶

Reverse the transform method to original properties without offset removed and scaled to original units.

Parameters

molecular_property (np.ndarray) – Molecular properties of shape (n_samples, n_properties) .
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …].
copy (bool) – Whether to copy or change in place.

Returns

Original atomic properties. Shape is (n_samples, n_properties).

Return type

np.ndarray

_plot_predict(molecular_property: numpy.ndarray, atomic_number: List[numpy.ndarray])[source]¶: Debug function to check prediction.

_predict(atomic_number)[source]¶

Predict the offset form atomic numbers. Requires fit() called previously.

Parameters: atomic_number (list) – List of array of atomic numbers. Example [np.array([7,1,1,1]), …].
Returns: Offset of atomic properties fitted previously. Shape is (n_samples, n_properties).
Return type: np.ndarray

_transform(molecular_property, atomic_number, copy=True)[source]¶

Transform any atomic number list with matching properties based on previous fit with sequential std-scaling.

Parameters

molecular_property (np.ndarray) – Molecular properties of shape (n_samples, n_properties) .
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …].
copy (bool) – Whether to copy or change in place.

Returns

Transformed atomic properties fitted. Shape is (n_samples, n_properties).

Return type

np.ndarray

fit_dataset(dataset: List[Dict[str, numpy.ndarray]])[source]¶

Fit to dataset with relevant X , y information.

Parameters: dataset (list) – Dataset of type List[Dict] with dictionary of numpy arrays.
Returns: self.

fit_transform_dataset(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶

Fit and transform to dataset with relevant X , y information.

Parameters

dataset (list) – Dataset of type List[Dict] with dictionary of numpy arrays.
copy (bool) – Whether to copy data for transformation. Default is True.
copy_dataset (bool) – Whether to copy full dataset. Default is False.

Returns

Transformed dataset.

Return type

dataset

get_config() → dict [source]¶: Get configuration for scaler.

get_scaling()[source]¶: Get scale of shape (1, n_properties).

get_weights() → dict [source]¶: Get weights for this scaler after fit.

inverse_transform_dataset(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶

Inverse transform dataset with relevant X , y information.

Parameters

dataset (list) – Dataset of type List[Dict] with dictionary of numpy arrays.
copy (bool) – Whether to copy data for transformation. Default is True.
copy_dataset (bool) – Whether to copy full dataset. Default is False.

Returns

Inverse-transformed dataset.

Return type

dataset

load(file_path: str)[source]¶

Load scaler serialization from file.

Parameters: file_path – Filepath to load scaler serialization.

max_atomic_number = 95¶

save(file_path: str)[source]¶

Save scaler serialization to file.

Parameters: file_path – Filepath to save scaler serialization.

save_weights(file_path: str)[source]¶

Save weights as numpy to file.

Parameters: file_path – Filepath to save weights.

set_config(config)[source]¶

Set configuration for scaler.

Parameters: config (dict) – Config dictionary.

set_weights(weights: dict)[source]¶

Set weights for this scaler.

Parameters: weights (dict) – Weight dictionary.

transform_dataset(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶

Transform dataset with relevant X information.

Parameters

dataset (list) – Dataset of type List[Dict] with dictionary of numpy arrays.
copy (bool) – Whether to copy data for transformation. Default is True.
copy_dataset (bool) – Whether to copy full dataset. Default is False.

Returns

Transformed dataset.

Return type

dataset

kgcnn.data.transform.scaler.serial module¶

kgcnn.data.transform.scaler.serial.deserialize(name: Union[str, dict], **kwargs)[source]¶

Deserialize a scaler class.

Parameters

name (str, dict) – Serialization dictionary of class. This can also be a name of a scaler.
kwargs – Kwargs for processor initialization, if name is string.

Returns

Instance of graph preprocessor.

Return type

GraphPreProcessorBase

kgcnn.data.transform.scaler.standard module¶

class kgcnn.data.transform.scaler.standard.StandardLabelScaler(*, y: str = 'graph_labels', sample_weight: Optional[str] = None, copy=True, with_mean=True, with_std=True)[source]¶

Bases: kgcnn.data.transform.scaler.standard._StandardScalerSklearnMixin

Standard scaler for labels that has a member of sklearn.preprocessing.StandardScaler . Included unused kwarg ‘atomic_number’ to be compatible with some material oriented scaler. Uses y argument for scaling labels and X is ignored.

import numpy as np
from kgcnn.data.transform.scaler.standard import StandardLabelScaler
data = np.random.rand(5).reshape((5,1))
scaler = StandardLabelScaler()
scaler.fit(y=data)
print(scaler.fit_transform(y=data))
print(scaler.get_weights())
print(scaler.get_config())
print(scaler.inverse_transform(y=scaler.transform(y=data)))
print(data)
scaler.save("example.json")
new_scaler = StandardLabelScaler()
new_scaler.load("example.json")
print(new_scaler.inverse_transform(y=scaler.transform(y=data)))

fit(y: numpy.ndarray, *, X=None, sample_weight=None, atomic_number=None)[source]¶

Compute the mean and std to be used for later scaling.

Parameters

y (np.ndarray) – Array of shape (n_samples, n_labels) The data used to compute the mean and standard deviation used for later scaling along the feature’s axis.
X (None) – Ignored.
sample_weight (np.ndarray) – Individual weights for each sample.
atomic_number (list) – Ignored.

Returns

Fitted scaler.

Return type

self

fit_transform(y: numpy.ndarray, *, X=None, atomic_number=None, copy=None, **fit_params)[source]¶

Perform fit and standardization by centering and scaling.

Parameters

y (np.ndarray) – Array of shape (n_samples, n_labels) The data used to compute the mean and standard deviation used for later scaling along the feature’s axis.
X (None) – Ignored.
atomic_number (list) – Ignored.
copy (bool) – Copy the input y or not.
fit_params (Any) – Kwargs for fit.

Returns

Transformed array of shape (n_samples, n_labels).

Return type

y_tr (np.ndarray)

get_config() → dict [source]¶: Get configuration for scaler.

inverse_transform(y: Optional[numpy.ndarray] = None, *, X=None, copy: Optional[bool] = None, atomic_number=None)[source]¶

Scale back the data to the original representation.

Parameters

y (None) – Array of shape (n_samples, n_labels) The data used to scale along the feature’s axis.
X (np.ndarray, None) – Ignored. Default is None.
atomic_number (list) – Ignored.
copy (bool) – Copy the input y or not.

Returns

Transformed array of shape (n_samples, n_labels).

Return type

y_tr (np.ndarray)

partial_fit(y: numpy.ndarray, X=None, sample_weight=None, atomic_number=None)[source]¶

Online computation of mean and std on y for later scaling. All of y is processed as a single batch. This is intended for cases when fit() is not feasible due to very large number of n_samples or because y is read from a continuous stream. The algorithm for incremental mean and std is given in Equation 1.5a,b in Chan, et al. (1982) .

Parameters

y (np.ndarray) – Array of shape (n_samples, n_labels) The data used to compute the mean and standard deviation used for later scaling along the feature’s axis.
X (None) – Ignored.
sample_weight (np.ndarray) – Individual weights for each sample.
atomic_number (list) – Ignored.

Returns

Fitted scaler.

Return type

self

transform(y: numpy.ndarray, *, X=None, copy=None, atomic_number=None)[source]¶

Perform standardization by centering and scaling.

Parameters

y (None) – Array of shape (n_samples, n_labels) The data used to scale along the feature’s axis.
X (None) – Ignored.
atomic_number (list) – Ignored.
copy (bool) – Copy the input y or not.

Returns

Transformed array of shape (n_samples, n_labels).

Return type

y_tr (np.ndarray)

class kgcnn.data.transform.scaler.standard.StandardScaler(*, X: str = 'graph_attributes', sample_weight: Optional[str] = None, copy=True, with_mean=True, with_std=True)[source]¶

Bases: kgcnn.data.transform.scaler.standard._StandardScalerSklearnMixin

Standard scaler that uses obj:sklearn.preprocessing.StandardScaler . Included unused kwarg ‘atomic_number’ to be compatible with some material oriented scaler.

import numpy as np
from kgcnn.data.transform.scaler.standard import StandardScaler
data = np.random.rand(5).reshape((5,1))
scaler = StandardScaler()
scaler.fit(X=data)
print(scaler.get_weights())
print(scaler.get_config())
print(scaler.inverse_transform(scaler.transform(X=data)))
print(data)
scaler.save("example.json")
new_scaler = StandardScaler()
new_scaler.load("example.json")
print(new_scaler.inverse_transform(scaler.transform(X=data)))

fit(X, *, y=None, sample_weight=None, atomic_number=None)[source]¶

Compute the mean and std to be used for later scaling.

Parameters

X (np.ndarray) – Array of shape (n_samples, n_features) The data used to compute the mean and standard deviation used for later scaling along the feature’s axis.
y (None) – Ignored.
sample_weight (np.ndarray) – Individual weights for each sample.
atomic_number (list, None) – Ignored.

Returns

Fitted scaler.

Return type

self

fit_transform(X, *, y=None, atomic_number=None, **fit_params)[source]¶

Perform fit and standardization by centering and scaling.

Parameters

X (np.ndarray) – Array of shape (n_samples, n_features). The data used to scale along the feature’s axis.
y (np.ndarray, None) – Ignored.
atomic_number (list) – Not used.
fit_params – Additional fit kwargs.

Returns

Transformed array of shape (n_samples, n_features).

Return type

X_tr (np.ndarray)

get_config() → dict [source]¶: Get configuration for scaler.

inverse_transform(X, *, copy: Optional[bool] = None, atomic_number=None)[source]¶

Scale back the data to the original representation.

Parameters

X (np.ndarray) – Array of shape (n_samples, n_features). The data used to scale along the feature’s axis.
copy (bool) – Copy the input X or not.
atomic_number (list) – Not used.

Returns

Transformed array of shape (n_samples, n_features).

Return type

X_tr (np.ndarray)

partial_fit(X, y=None, sample_weight=None, atomic_number=None)[source]¶

Online computation of mean and std on X for later scaling. All of X is processed as a single batch. This is intended for cases when fit() is not feasible due to very large number of n_samples or because X is read from a continuous stream. The algorithm for incremental mean and std is given in Equation 1.5a,b in Chan, et al. (1982) .

Parameters

X (np.ndarray) – Array of shape (n_samples, n_features) The data used to compute the mean and standard deviation used for later scaling along the feature’s axis.
y (np.ndarray, None) – Ignored.
sample_weight (np.ndarray) – Array-like of shape (n_samples,), default=None Individual weights for each sample.
atomic_number (list) – Not used.

Returns

Fitted scaler.

Return type

self

transform(X, *, copy=None, atomic_number=None)[source]¶

Perform standardization by centering and scaling.

Parameters

X (np.ndarray) – Array of shape (n_samples, n_features). The data used to scale along the feature’s axis.
copy (bool) – Copy the input X or not.
atomic_number (list) – Not used.

Returns

Transformed array of shape (n_samples, n_features).

Return type

X_tr (np.ndarray)

class kgcnn.data.transform.scaler.standard._StandardScalerSklearnMixin[source]¶

Bases: object

Mixin class for scaler of sklearn with added functionality to save and load weights of a scaler similar to keras layers and objects.

Note

This class is only meant to add functionality. Scaler is accessed via _scaler_reference property.

fit_dataset(dataset: List[Dict[str, numpy.ndarray]])[source]¶

Fit to dataset with relevant X , y information.

Parameters: dataset (list) – Dataset of type List[Dict] with dictionary of numpy arrays.
Returns: self.

fit_transform_dataset(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶

Fit and transform to dataset with relevant X , y information.

Parameters

dataset (list) – Dataset of type List[Dict] with dictionary of numpy arrays.
copy (bool) – Whether to copy data for transformation. Default is True.
copy_dataset (bool) – Whether to copy full dataset. Default is False.

Returns

Transformed dataset.

Return type

dataset

get_config() → dict [source]¶: Get configuration for scaler.

get_mean_shift()[source]¶: Get scale of shape (1, n_properties).

get_scaling()[source]¶: Get scale of shape (1, n_properties).

get_weights() → dict [source]¶: Get weights for this scaler after fit.

inverse_transform_dataset(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶

Inverse transform dataset with relevant X , y information.

Parameters

dataset (list) – Dataset of type List[Dict] with dictionary of numpy arrays.
copy (bool) – Whether to copy data for transformation. Default is True.
copy_dataset (bool) – Whether to copy full dataset. Default is False.

Returns

Inverse-transformed dataset.

Return type

dataset

load(file_path: str)[source]¶

Load scaler serialization from file.

Parameters: file_path – Filepath to load scaler serialization.

save(file_path: str)[source]¶

Save scaler serialization to file.

Parameters: file_path – Filepath to save scaler serialization.

save_weights(file_path: str)[source]¶

Save weights as numpy to file.

Parameters: file_path – Filepath to save weights.

property scale_¶

set_config(config: dict)[source]¶

Set configuration for scaler.

Parameters: config (dict) – Config dictionary.

set_weights(weights: dict)[source]¶

Set weights for this scaler.

Parameters: weights (dict) – Weight dictionary.

transform_dataset(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶

Transform dataset with relevant X information.

Parameters

dataset (list) – Dataset of type List[Dict] with dictionary of numpy arrays.
copy (bool) – Whether to copy data for transformation. Default is True.
copy_dataset (bool) – Whether to copy full dataset. Default is False.

Returns

Transformed dataset.

Return type

dataset

kgcnn.data.transform.scaler package¶

Submodules¶

kgcnn.data.transform.scaler.force module¶

kgcnn.data.transform.scaler.molecule module¶

kgcnn.data.transform.scaler.serial module¶

kgcnn.data.transform.scaler.standard module¶

Module contents¶