kgcnn.data.transform.scaler package¶
Submodules¶
kgcnn.data.transform.scaler.force module¶
-
class
kgcnn.data.transform.scaler.force.
EnergyForceExtensiveLabelScaler
(standardize_coordinates: bool = False, energy: str = 'energy', force: str = 'force', atomic_number: str = 'atomic_number', sample_weight: Optional[str] = None, **kwargs)[source]¶ Bases:
kgcnn.data.transform.scaler.molecule._ExtensiveMolecularScalerBase
Extensive scaler for scaling jointly energy, forces.
Inherits from
kgcnn.scaler.mol._ExtensiveMolecularScalerBase
but makes use of X , y , as atomic_number and (energy , force ). In contrast tokgcnn.scaler.mol.ExtensiveMolecularLabelScaler
which uses only y as energy .Interface is designed after scikit-learn scaler and has additional functions to apply on datasets with
fit_dataset()
andtransform_dataset()
Note
Units for energy and forces must match.
Code example for scaler:
import numpy as np from kgcnn.data.transform.scaler.force import EnergyForceExtensiveLabelScaler energy = np.random.rand(5).reshape((5,1)) mol_num = [np.array([6, 1, 1, 1, 1]), np.array([7, 1, 1, 1]), np.array([6, 6, 1, 1, 1, 1]), np.array([6, 6, 1, 1]), np.array([6, 6, 1, 1, 1, 1, 1, 1]) ] force = [np.random.rand(len(m)*3).reshape((len(m),3)) for m in mol_num] scaler = EnergyForceExtensiveLabelScaler() scaler.fit(y=[energy, force], X=mol_num) print(scaler.get_weights()) print(scaler.get_config()) scaler._plot_predict(energy, mol_num) # For debugging. y, f = scaler.transform(y=[energy, force], X=mol_num) print(energy, y) print(scaler.inverse_transform(y=[y, f], X=mol_num)[1][1][0], f[0]) scaler.save("example.json") new_scaler = EnergyForceExtensiveLabelScaler() new_scaler.load("example.json") print(scaler.inverse_transform(y=[y, f], X=mol_num)[1][1][0], f[0])
-
__init__
(standardize_coordinates: bool = False, energy: str = 'energy', force: str = 'force', atomic_number: str = 'atomic_number', sample_weight: Optional[str] = None, **kwargs)[source]¶ Initialize layer with arguments for
kgcnn.scaler.mol._ExtensiveMolecularScalerBase
.- Parameters
standardize_coordinates (bool) – Whether to standardize coordinates. Must always be False.
kwargs – Kwargs for
kgcnn.scaler.mol._ExtensiveMolecularScalerBase
parent class. See docs for this class.
-
fit
(y: Optional[Tuple[List[numpy.ndarray], List[numpy.ndarray]]] = None, *, X: Optional[List[numpy.ndarray]] = None, sample_weight: Union[None, numpy.ndarray] = None, force: Union[None, List[numpy.ndarray]] = None, atomic_number: Union[None, List[numpy.ndarray]] = None) → Tuple[List[numpy.ndarray], List[numpy.ndarray]][source]¶ Fit Scaler to data.
- Parameters
y (tuple) – Tuple of (energy, forces) . Energies must be a single array or list of energies of shape (n_samples, n_states) . For one energy this must still be (n_samples, 1) . List of forces as with each force stored in a numpy array. Note that you can also pass the forces separately to function argument force , in which case y should be only energies (not a tuple).
X (list) – Atomic number atomic_number are a list of arrays of atomic numbers. Example: [np.array([7,1,1,1]), …] . They must match in length. Note that you can also pass the atomic numbers separately to function argument atomic_number , in which case X is ignored.
sample_weight (list, np.ndarray) – Weights for each sample.
force (list) – List of forces as numpy arrays. Deprecated, since they can be contained in y .
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Deprecated, since they can be contained in X .
- Returns
self.
-
fit_dataset
(dataset: List[Dict[str, numpy.ndarray]], **fit_params)[source]¶ Fit to dataset with relevant X , y information.
-
fit_transform
(y: Optional[Tuple[List[numpy.ndarray], List[numpy.ndarray]]] = None, *, X: Optional[List[numpy.ndarray]] = None, sample_weight: Union[None, numpy.ndarray] = None, force: Union[None, List[numpy.ndarray]] = None, atomic_number: Union[None, List[numpy.ndarray]] = None, copy: bool = True) → Tuple[List[numpy.ndarray], List[numpy.ndarray]][source]¶ Fit Scaler to data and subsequently transform data.
- Parameters
y (tuple) – Tuple of (energy, forces) . Energies must be a single array or list of energies of shape (n_samples, n_states) . For one energy this must still be (n_samples, 1) . List of forces as with each force stored in a numpy array. Note that you can also pass the forces separately to function argument force , in which case y should be only energies (not a tuple).
X (list) – Atomic number atomic_number are a list of arrays of atomic numbers. Example: [np.array([7,1,1,1]), …] . They must match in length. Note that you can also pass the atomic numbers separately to function argument atomic_number , in which case X is ignored.
sample_weight (list, np.ndarray) – Weights for each sample.
force (list) – List of forces as numpy arrays. Deprecated, since they can be contained in y .
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Deprecated, since they can be contained in X .
copy (bool) – Not yet implemented.
- Returns
Tuple of transformed (energy, forces) .
- Return type
-
fit_transform_dataset
(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False, **fit_params) → List[Dict[str, numpy.ndarray]][source]¶ Fit and transform to dataset with relevant X , y information.
- Parameters
- Returns
Transformed dataset.
- Return type
dataset
-
inverse_transform
(y: Optional[Tuple[List[numpy.ndarray], List[numpy.ndarray]]] = None, *, X: Optional[List[numpy.ndarray]] = None, force: Union[None, List[numpy.ndarray]] = None, atomic_number: Union[None, List[numpy.ndarray]] = None, copy: bool = True) → Tuple[List[numpy.ndarray], List[numpy.ndarray]][source]¶ Scale back data for atoms.
- Parameters
y (tuple) – Tuple of (energy, forces) . Energies must be a single array or list of energies of shape (n_samples, n_states) . For one energy this must still be (n_samples, 1) . List of forces as with each force stored in a numpy array. Note that you can also pass the forces separately to function argument force , in which case y should be only energies (not a tuple).
X (list) – Atomic number atomic_number are a list of arrays of atomic numbers. Example: [np.array([7,1,1,1]), …] . They must match in length. Note that you can also pass the atomic numbers separately to function argument atomic_number , in which case X is ignored.
force (list) – List of forces as numpy arrays. Deprecated, since they can be contained in y .
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Deprecated, since they can be contained in X .
copy (bool) – Not yet implemented.
- Returns
Tuple of reverse-transformed (energy, forces) .
- Return type
-
inverse_transform_dataset
(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶ Inverse transform dataset with relevant X , y information.
-
set_config
(config: dict)[source]¶ Set configuration for scaler.
- Parameters
config (dict) – Config dictionary.
-
transform
(y: Optional[Tuple[List[numpy.ndarray], List[numpy.ndarray]]] = None, *, X: Optional[List[numpy.ndarray]] = None, force: Union[None, List[numpy.ndarray]] = None, atomic_number: Union[None, List[numpy.ndarray]] = None, copy: bool = True) → Tuple[List[numpy.ndarray], List[numpy.ndarray]][source]¶ Perform scaling of atomic energies and forces.
- Parameters
y (tuple) – Tuple of (energy, forces) . Energies must be a single array or list of energies of shape (n_samples, n_states) . For one energy this must still be (n_samples, 1) . List of forces as with each force stored in a numpy array. Note that you can also pass the forces separately to function argument force , in which case y should be only energies (not a tuple).
X (list) – Atomic number atomic_number are a list of arrays of atomic numbers. Example: [np.array([7,1,1,1]), …] . They must match in length. Note that you can also pass the atomic numbers separately to function argument atomic_number , in which case X is ignored.
force (list) – List of forces as numpy arrays. Deprecated, since they can be contained in y .
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Deprecated, since they can be contained in X .
copy (bool) – Not yet implemented.
- Returns
Tuple of transformed (energy, forces) .
- Return type
-
transform_dataset
(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶ Transform dataset with relevant X , y information.
-
kgcnn.data.transform.scaler.molecule module¶
-
class
kgcnn.data.transform.scaler.molecule.
ExtensiveMolecularLabelScaler
(y: str = 'graph_labels', atomic_number: str = 'atomic_number', sample_weight: Optional[str] = None, **kwargs)[source]¶ Bases:
kgcnn.data.transform.scaler.molecule._ExtensiveMolecularScalerBase
Equivalent of
ExtensiveMolecularScaler
for labels, which uses the y argument for labels. For X the atomic numbers can be passed.import numpy as np from kgcnn.scaler.mol import ExtensiveMolecularLabelScaler data = np.random.rand(5).reshape((5,1)) mol_num = [np.array([6, 1, 1, 1, 1]), np.array([7, 1, 1, 1]), np.array([6, 6, 1, 1, 1, 1]), np.array([6, 6, 1, 1]), np.array([6, 6, 1, 1, 1, 1, 1, 1]) ] scaler = ExtensiveMolecularLabelScaler() scaler.fit(X=mol_num, y=data) print(scaler.get_weights()) print(scaler.get_config()) scaler._plot_predict(data, mol_num) # For debugging. print(scaler.inverse_transform(X=mol_num, y=scaler.transform(X=mol_num, y=data))) print(data) scaler.save("example.json") new_scaler = ExtensiveMolecularLabelScaler() new_scaler.load("example.json") print(scaler.inverse_transform(X=mol_num, y=scaler.transform(X=mol_num, y=data)))
-
fit
(y: Union[None, list, numpy.ndarray] = None, *, X=None, sample_weight=None, atomic_number=None)[source]¶ Fit labels with atomic number information.
- Parameters
y – Array of atomic labels of shape (n_samples, n_labels).
X – List of array of atomic numbers. Example [np.array([7,1,1,1]), …].
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Optional, since they should be contained in X . Note that if assigning atomic_numbers then X is ignored.
sample_weight – Sample weights (n_samples,) directly passed to
Ridge()
. Default is None.
- Returns
Transformed y.
- Return type
np.ndarray
-
fit_transform
(y=None, *, X=None, copy=True, atomic_number=None, sample_weight=None)[source]¶ Fit and transform.
- Parameters
y – Array of atomic labels of shape (n_samples, n_labels).
X – List of array of atomic numbers. Example [np.array([7,1,1,1]), …].
copy (bool) – Whether to copy or change in place.
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Optional, since they should be contained in X . Note that if assigning atomic_numbers then X is ignored.
sample_weight – Sample weights (n_samples,) directly passed to
Ridge()
. Default is None.
Returns:
-
inverse_transform
(y=None, *, X=None, copy=True, atomic_number=None)[source]¶ Reverse the transform method to original labels without offset removed and scaled to original units.
- Parameters
y – Array of atomic labels of shape (n_samples, n_labels).
X – List of array of atomic numbers. Example [np.array([7,1,1,1]), …].
copy (bool) – Whether to copy or change in place.
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Optional, since they should be contained in X . Note that if assigning atomic_numbers then X is ignored.
- Returns
Transformed y.
- Return type
np.ndarray
-
set_config
(config)[source]¶ Set configuration for scaler.
- Parameters
config (dict) – Config dictionary.
-
transform
(y=None, *, X=None, copy=True, atomic_number=None)[source]¶ Transform any atomic number list with matching labels based on previous fit with sequential std-scaling.
- Parameters
y – Array of atomic labels of shape (n_samples, n_labels).
X – List of array of atomic numbers. Example [np.array([7,1,1,1]), …].
copy (bool) – Whether to copy or change in place.
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …]. Optional, since they should be contained in X . Note that if assigning atomic_numbers then X is ignored.
- Returns
Transformed y.
- Return type
np.ndarray
-
-
class
kgcnn.data.transform.scaler.molecule.
ExtensiveMolecularScaler
(X: str = 'graph_attributes', atomic_number: str = 'atomic_number', sample_weight: Optional[str] = None, **kwargs)[source]¶ Bases:
kgcnn.data.transform.scaler.molecule._ExtensiveMolecularScalerBase
Scaler for extensive properties like energy to remove a simple linear behaviour with additive atom contributions. Interface is designed after scikit-learn scaler. Internally Ridge regression ist used. Only the atomic number is used as extensive scaler. This could be further improved by also taking bonds and interactions into account, e.g. as energy contribution.
import numpy as np from kgcnn.scaler.mol import ExtensiveMolecularScaler data = np.random.rand(5).reshape((5,1)) mol_num = [np.array([6, 1, 1, 1, 1]), np.array([7, 1, 1, 1]), np.array([6, 6, 1, 1, 1, 1]), np.array([6, 6, 1, 1]), np.array([6, 6, 1, 1, 1, 1, 1, 1]) ] scaler = ExtensiveMolecularScaler() scaler.fit(X=data, atomic_number=mol_num) print(scaler.get_weights()) print(scaler.get_config()) scaler._plot_predict(data, mol_num) # For debugging. print(scaler.inverse_transform(scaler.transform(X=data, atomic_number=mol_num), atomic_number=mol_num)) print(data) scaler.save("example.json") new_scaler = ExtensiveMolecularScaler() new_scaler.load("example.json") print(scaler.inverse_transform(scaler.transform(X=data, atomic_number=mol_num), atomic_number=mol_num))
-
fit
(X, *, y: Union[None, numpy.ndarray] = None, sample_weight=None, atomic_number=None)[source]¶ Fit atomic number to the molecular properties.
- Parameters
X (np.ndarray) – Array of atomic properties of shape (n_samples, n_properties).
y (np.ndarray) – Ignored.
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …].
sample_weight – Sample weights (n_samples,) directly passed to
Ridge()
. Default is None.
- Returns
self.
-
fit_transform
(X, *, y=None, copy=True, atomic_number=None, sample_weight=None)[source]¶ Fit and transform.
- Parameters
X (np.ndarray) – Array of atomic properties of shape (n_samples, n_properties).
y (np.ndarray) – Ignored.
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …].
copy (bool) – Whether to copy or change in place.
sample_weight – Sample weights (n_samples,) directly passed to
Ridge()
. Default is None.
- Returns
Transformed properties.
- Return type
np.ndarray
-
inverse_transform
(X, *, y=None, copy=True, atomic_number=None)[source]¶ Reverse the transform method to original properties without offset removed and scaled to original units.
- Parameters
- Returns
Original atomic properties. Shape is (n_samples, n_properties).
- Return type
np.ndarray
-
set_config
(config)[source]¶ Set configuration for scaler.
- Parameters
config (dict) – Config dictionary.
-
-
class
kgcnn.data.transform.scaler.molecule.
QMGraphLabelScaler
(scaler: list, y: str = 'graph_labels', X: Optional[str] = None, atomic_number: str = 'atomic_number', sample_weight: Optional[str] = None)[source]¶ Bases:
object
A scaler that scales QM targets differently. For now, the main difference is that intensive and extensive properties are scaled differently. In principle, also dipole, polarizability or rotational constants could to be standardized differently. Interface is designed after scikit-learn scaler.
The class is simply a list of separate scaler and scales each target of shape [N_samples, target] with a scaler from its list.
QMGraphLabelScaler
is intended as a scaler list class.Note
The scaler uses y argument
Each label is passed to the corresponding scaler in list simply as first argument without keyword X or y.
import numpy as np from kgcnn.scaler.mol import QMGraphLabelScaler, ExtensiveMolecularScaler from kgcnn.scaler.scaler import StandardScaler data = np.random.rand(10).reshape((5,2)) mol_num = [np.array([6, 1, 1, 1, 1]), np.array([7, 1, 1, 1]), np.array([6, 6, 1, 1, 1, 1]), np.array([6, 6, 1, 1]), np.array([6, 6, 1, 1, 1, 1, 1, 1]) ] scaler = QMGraphLabelScaler([ExtensiveMolecularScaler(), StandardScaler()]) scaler.fit(y=data, atomic_number=mol_num) print(scaler.get_weights()) print(scaler.get_config()) print(scaler.inverse_transform(scaler.transform(y=data, atomic_number=mol_num), atomic_number=mol_num)) print(data) scaler.save("example.json") new_scaler = QMGraphLabelScaler([ExtensiveMolecularScaler(), StandardScaler()]) new_scaler.load("example.json") print(new_scaler.inverse_transform(scaler.transform(y=data, atomic_number=mol_num), atomic_number=mol_num))
-
fit
(y: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, X: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, atomic_number: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, sample_weight=None)[source]¶ Fit scaling of QM graph labels or targets.
- Parameters
y (np.ndarray) – Array of atomic labels of shape (n_samples, n_labels).
X (np.ndarray) – Ignored.
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …].
sample_weight – Sample weights (n_samples,) directly passed to
Ridge()
. Default is None.
- Returns
self
-
fit_dataset
(dataset: List[Dict[str, numpy.ndarray]])[source]¶ Fit to dataset with relevant X , y information.
- Parameters
dataset (list) – Dataset of type List[Dict] with dictionary of numpy arrays.
- Returns
self.
-
fit_transform
(y: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, X: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, atomic_number: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, copy: bool = True, sample_weight=None)[source]¶ Fit and transform all target labels for QM.
- Parameters
y (np.ndarray) – Array of atomic labels of shape (n_samples, n_labels).
X (np.ndarray) – Not used.
copy (bool) – Whether to copy or change in place.
atomic_number (list) – List of array of atomic numbers. Example [np.array([7,1,1,1]), …].
sample_weight – Sample weights (n_samples,) directly passed to
Ridge()
. Default is None.
- Returns
Transformed labels of shape (n_samples, n_labels).
- Return type
np.ndarray
-
fit_transform_dataset
(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶ Fit and transform to dataset with relevant X , y information.
-
inverse_transform
(y: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, X: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, atomic_number: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, copy: bool = True)[source]¶ Back-transform all target labels for QM.
- Parameters
- Returns
Back-transformed labels of shape (n_samples, n_labels).
- Return type
np.ndarray
-
inverse_transform_dataset
(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶ Inverse transform dataset with relevant X , y information.
-
load
(file_path: str)[source]¶ Load scaler serialization from file.
- Parameters
file_path – Filepath to load scaler serialization.
-
save
(file_path: str)[source]¶ Save scaler serialization to file.
- Parameters
file_path – Filepath to save scaler serialization.
-
save_weights
(file_path: str)[source]¶ Save weights as numpy to file.
- Parameters
file_path – Filepath to save weights.
-
property
scale_
¶ Composite scale of all scaler in list.
-
set_config
(config)[source]¶ Set configuration for scaler.
- Parameters
config (dict) – Config dictionary.
-
set_weights
(weights: dict)[source]¶ Set weights for this scaler.
- Parameters
weights (dict) – Weight dictionary.
-
transform
(y: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, X: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, atomic_number: Optional[Union[numpy.ndarray, List[numpy.ndarray]]] = None, copy=True)[source]¶ Transform all target labels for QM. Requires
fit()
called previously.- Parameters
- Returns
Transformed labels of shape (n_samples, n_labels).
- Return type
np.ndarray
-
transform_dataset
(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶ Transform dataset with relevant X information.
-
-
class
kgcnn.data.transform.scaler.molecule.
_ExtensiveMolecularScalerBase
(alpha: float = 1e-09, fit_intercept: bool = False, standardize_scale: bool = True, **kwargs)[source]¶ Bases:
object
Scaler base class for extensive properties like energy to remove a simple linear behaviour with additive atom contributions.
-
__init__
(alpha: float = 1e-09, fit_intercept: bool = False, standardize_scale: bool = True, **kwargs)[source]¶ Initialize scaler with parameters directly passed to scikit-learns
Ridge()
.
-
_fit
(molecular_property, atomic_number, sample_weight=None)[source]¶ Fit atomic number to the molecular properties.
- Parameters
molecular_property (np.ndarray) – Molecular properties of shape (n_samples, n_properties) .
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …].
sample_weight – Sample weights (n_samples,) directly passed to
Ridge()
. Default is None.
- Returns
self
-
_fit_transform
(molecular_property, atomic_number, copy=True, sample_weight=None)[source]¶ Combine fit and transform methods in one call.
- Parameters
molecular_property (np.ndarray) – Molecular properties of shape (n_samples, n_properties) .
atomic_number (list) – List of arrays of atomic numbers. Example [np.array([7,1,1,1]), …].
copy (bool) – Whether to copy or change in place.
sample_weight – Sample weights (n_samples,) directly passed to
Ridge()
. Default is None.
- Returns
Transformed atomic properties fitted. Shape is (n_samples, n_properties).
- Return type
np.ndarray
-
_inverse_transform
(molecular_property: numpy.ndarray, atomic_number: List[numpy.ndarray], copy: bool = True) → numpy.ndarray[source]¶ Reverse the transform method to original properties without offset removed and scaled to original units.
- Parameters
- Returns
Original atomic properties. Shape is (n_samples, n_properties).
- Return type
np.ndarray
-
_plot_predict
(molecular_property: numpy.ndarray, atomic_number: List[numpy.ndarray])[source]¶ Debug function to check prediction.
-
_predict
(atomic_number)[source]¶ Predict the offset form atomic numbers. Requires
fit()
called previously.- Parameters
atomic_number (list) – List of array of atomic numbers. Example [np.array([7,1,1,1]), …].
- Returns
Offset of atomic properties fitted previously. Shape is (n_samples, n_properties).
- Return type
np.ndarray
-
_transform
(molecular_property, atomic_number, copy=True)[source]¶ Transform any atomic number list with matching properties based on previous fit with sequential std-scaling.
- Parameters
- Returns
Transformed atomic properties fitted. Shape is (n_samples, n_properties).
- Return type
np.ndarray
-
fit_dataset
(dataset: List[Dict[str, numpy.ndarray]])[source]¶ Fit to dataset with relevant X , y information.
- Parameters
dataset (list) – Dataset of type List[Dict] with dictionary of numpy arrays.
- Returns
self.
-
fit_transform_dataset
(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶ Fit and transform to dataset with relevant X , y information.
-
inverse_transform_dataset
(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶ Inverse transform dataset with relevant X , y information.
-
load
(file_path: str)[source]¶ Load scaler serialization from file.
- Parameters
file_path – Filepath to load scaler serialization.
-
max_atomic_number
= 95¶
-
save
(file_path: str)[source]¶ Save scaler serialization to file.
- Parameters
file_path – Filepath to save scaler serialization.
-
save_weights
(file_path: str)[source]¶ Save weights as numpy to file.
- Parameters
file_path – Filepath to save weights.
-
set_config
(config)[source]¶ Set configuration for scaler.
- Parameters
config (dict) – Config dictionary.
-
set_weights
(weights: dict)[source]¶ Set weights for this scaler.
- Parameters
weights (dict) – Weight dictionary.
-
transform_dataset
(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶ Transform dataset with relevant X information.
-
kgcnn.data.transform.scaler.serial module¶
kgcnn.data.transform.scaler.standard module¶
-
class
kgcnn.data.transform.scaler.standard.
StandardLabelScaler
(*, y: str = 'graph_labels', sample_weight: Optional[str] = None, copy=True, with_mean=True, with_std=True)[source]¶ Bases:
kgcnn.data.transform.scaler.standard._StandardScalerSklearnMixin
Standard scaler for labels that has a member of
sklearn.preprocessing.StandardScaler
. Included unused kwarg ‘atomic_number’ to be compatible with some material oriented scaler. Uses y argument for scaling labels and X is ignored.import numpy as np from kgcnn.data.transform.scaler.standard import StandardLabelScaler data = np.random.rand(5).reshape((5,1)) scaler = StandardLabelScaler() scaler.fit(y=data) print(scaler.fit_transform(y=data)) print(scaler.get_weights()) print(scaler.get_config()) print(scaler.inverse_transform(y=scaler.transform(y=data))) print(data) scaler.save("example.json") new_scaler = StandardLabelScaler() new_scaler.load("example.json") print(new_scaler.inverse_transform(y=scaler.transform(y=data)))
-
fit
(y: numpy.ndarray, *, X=None, sample_weight=None, atomic_number=None)[source]¶ Compute the mean and std to be used for later scaling.
- Parameters
- Returns
Fitted scaler.
- Return type
self
-
fit_transform
(y: numpy.ndarray, *, X=None, atomic_number=None, copy=None, **fit_params)[source]¶ Perform fit and standardization by centering and scaling.
- Parameters
- Returns
Transformed array of shape (n_samples, n_labels).
- Return type
y_tr (np.ndarray)
-
inverse_transform
(y: Optional[numpy.ndarray] = None, *, X=None, copy: Optional[bool] = None, atomic_number=None)[source]¶ Scale back the data to the original representation.
- Parameters
- Returns
Transformed array of shape (n_samples, n_labels).
- Return type
y_tr (np.ndarray)
-
partial_fit
(y: numpy.ndarray, X=None, sample_weight=None, atomic_number=None)[source]¶ Online computation of mean and std on y for later scaling. All of y is processed as a single batch. This is intended for cases when
fit()
is not feasible due to very large number of n_samples or because y is read from a continuous stream. The algorithm for incremental mean and std is given in Equation 1.5a,b in Chan, et al. (1982) .- Parameters
- Returns
Fitted scaler.
- Return type
self
-
transform
(y: numpy.ndarray, *, X=None, copy=None, atomic_number=None)[source]¶ Perform standardization by centering and scaling.
-
-
class
kgcnn.data.transform.scaler.standard.
StandardScaler
(*, X: str = 'graph_attributes', sample_weight: Optional[str] = None, copy=True, with_mean=True, with_std=True)[source]¶ Bases:
kgcnn.data.transform.scaler.standard._StandardScalerSklearnMixin
Standard scaler that uses obj:sklearn.preprocessing.StandardScaler . Included unused kwarg ‘atomic_number’ to be compatible with some material oriented scaler.
import numpy as np from kgcnn.data.transform.scaler.standard import StandardScaler data = np.random.rand(5).reshape((5,1)) scaler = StandardScaler() scaler.fit(X=data) print(scaler.get_weights()) print(scaler.get_config()) print(scaler.inverse_transform(scaler.transform(X=data))) print(data) scaler.save("example.json") new_scaler = StandardScaler() new_scaler.load("example.json") print(new_scaler.inverse_transform(scaler.transform(X=data)))
-
fit
(X, *, y=None, sample_weight=None, atomic_number=None)[source]¶ Compute the mean and std to be used for later scaling.
- Parameters
- Returns
Fitted scaler.
- Return type
self
-
fit_transform
(X, *, y=None, atomic_number=None, **fit_params)[source]¶ Perform fit and standardization by centering and scaling.
- Parameters
- Returns
Transformed array of shape (n_samples, n_features).
- Return type
X_tr (np.ndarray)
-
inverse_transform
(X, *, copy: Optional[bool] = None, atomic_number=None)[source]¶ Scale back the data to the original representation.
-
partial_fit
(X, y=None, sample_weight=None, atomic_number=None)[source]¶ Online computation of mean and std on X for later scaling. All of X is processed as a single batch. This is intended for cases when
fit()
is not feasible due to very large number of n_samples or because X is read from a continuous stream. The algorithm for incremental mean and std is given in Equation 1.5a,b in Chan, et al. (1982) .- Parameters
X (np.ndarray) – Array of shape (n_samples, n_features) The data used to compute the mean and standard deviation used for later scaling along the feature’s axis.
y (np.ndarray, None) – Ignored.
sample_weight (np.ndarray) – Array-like of shape (n_samples,), default=None Individual weights for each sample.
atomic_number (list) – Not used.
- Returns
Fitted scaler.
- Return type
self
-
-
class
kgcnn.data.transform.scaler.standard.
_StandardScalerSklearnMixin
[source]¶ Bases:
object
Mixin class for scaler of
sklearn
with added functionality to save and load weights of a scaler similar to keras layers and objects.Note
This class is only meant to add functionality. Scaler is accessed via
_scaler_reference
property.-
fit_dataset
(dataset: List[Dict[str, numpy.ndarray]])[source]¶ Fit to dataset with relevant X , y information.
- Parameters
dataset (list) – Dataset of type List[Dict] with dictionary of numpy arrays.
- Returns
self.
-
fit_transform_dataset
(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶ Fit and transform to dataset with relevant X , y information.
-
inverse_transform_dataset
(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶ Inverse transform dataset with relevant X , y information.
-
load
(file_path: str)[source]¶ Load scaler serialization from file.
- Parameters
file_path – Filepath to load scaler serialization.
-
save
(file_path: str)[source]¶ Save scaler serialization to file.
- Parameters
file_path – Filepath to save scaler serialization.
-
save_weights
(file_path: str)[source]¶ Save weights as numpy to file.
- Parameters
file_path – Filepath to save weights.
-
property
scale_
¶
-
set_config
(config: dict)[source]¶ Set configuration for scaler.
- Parameters
config (dict) – Config dictionary.
-
set_weights
(weights: dict)[source]¶ Set weights for this scaler.
- Parameters
weights (dict) – Weight dictionary.
-
transform_dataset
(dataset: List[Dict[str, numpy.ndarray]], copy: bool = True, copy_dataset: bool = False) → List[Dict[str, numpy.ndarray]][source]¶ Transform dataset with relevant X information.
-