kgcnn.literature.MEGAN package¶

Module contents¶

class kgcnn.literature.MEGAN.ExplanationSparsityRegularization(*args, **kwargs)[source]¶

Bases: keras.src.layers.layer.Layer

build(input_shape)[source]¶

call(inputs, **kwargs)[source]¶

Computes a loss from importance scores.

Parameters: inputs – Importance tensor of shape ([batch], [N], K) .
Returns: None.

class kgcnn.literature.MEGAN.MEGAN(*args, **kwargs)[source]¶

Bases: keras.src.models.model.Model

MEGAN: Multi Explanation Graph Attention Network

This model currently supports graph regression and graph classification problems. It was mainly designed with a focus on explainable AI (XAI). Along the main prediction, this model is able to output multiple attention-based explanations for that prediction. More specifically, the model outputs node and edge attributional explanations (assigning [0, 1] values to ever node / edge of the input graph) in K separate explanation channels, where K can be chosen as an independent model parameter.

__init__(units: List[int], activation: str = 'kgcnn>leaky_relu', use_bias: bool = True, dropout_rate: float = 0.0, use_edge_features: bool = True, input_node_embedding: Optional[dict] = None, importance_units: List[int] = [], importance_channels: int = 2, importance_activation: str = 'sigmoid', importance_dropout_rate: float = 0.0, importance_factor: float = 0.0, importance_multiplier: float = 10.0, sparsity_factor: float = 0.0, concat_heads: bool = True, final_units: List[int] = [1], final_dropout_rate: float = 0.0, final_activation: str = 'linear', final_pooling: str = 'sum', regression_limits: Optional[Tuple[float, float]] = None, regression_reference: Optional[float] = None, return_importances: bool = True, **kwargs)[source]¶

Parameters

units – A list of ints where each element configures an additional attention layer. The numeric value determines the number of hidden units to be used in the attention heads of that layer
activation – The activation function to be used within the attention layers of the network
use_bias – Whether the layers of the network should use bias weights at all
dropout_rate – The dropout rate to be applied after each of the attention layers of the network.
input_node_embedding – Dictionary of embedding kwargs for input embedding layer.
use_edge_features – Whether edge features should be used. Generally the network supports the usage of edge features, but if the input data does not contain edge features, this should be set to False.
importance_units – A list of ints where each element configures another dense layer in the subnetwork that produces the node importance tensor from the main node embeddings. The numeric value determines the number of hidden units in that layer.
importance_channels – The int number of explanation channels to be produced by the network. This is the value referred to as “K”. Note that this will also determine the number of attention heads used within the attention subnetwork.
importance_factor – The weight of the explanation-only train step. If this is set to exactly zero then the explanation train step will not be executed at all (less computationally expensive)
importance_multiplier – An additional hyperparameter of the explanation-only train step. This is essentially the scaling factor that is applied to the values of the dataset such that the target values can reasonably be approximated by a sum of [0, 1] importance values.
sparsity_factor – The coefficient for the sparsity regularization of the node importance tensor.
concat_heads – Whether to concat the heads of the attention subnetwork. The default is True. In that case the output of each individual attention head is concatenated and the concatenated vector is then used as the input of the next attention layer’s heads. If this is False, the vectors are average pooled instead.
final_units – A list of ints where each element configures another dense layer in the MLP at the tail end of the network. The numeric value determines the number of the hidden units in that layer. Note that the final element in this list has to be the same as the dimension to be expected for the samples of the training dataset!
final_dropout_rate – The dropout rate to be applied after every layer of the final MLP.
final_activation – The activation to be applied at the very last layer of the MLP to produce the actual output of the network.
final_pooling – The pooling method to be used during the global pooling phase in the network.
regression_limits – A tuple where the first value is the lower limit for the expected value range of the regression task and teh second value the upper limit.
regression_reference – A reference value which is inside the range of expected values (best if it was in the middle, but does not have to). Choosing different references will result in different explanations.
return_importances – Whether the importance / explanation tensors should be returned as an output of the model. If this is True, the output of the model will be a 3-tuple: (output, node importances, edge importances), otherwise it is just the output itself

build(input_shape)[source]¶

call(inputs, training: bool = False, return_importances: bool = False)[source]¶

property doing_regression¶

explain_importances(x: Sequence[keras.src.backend.common.keras_tensor.KerasTensor], **kwargs) → Tuple[keras.src.backend.common.keras_tensor.KerasTensor, keras.src.backend.common.keras_tensor.KerasTensor][source]¶

get_config()[source]¶

Returns the config of the object.

An object config is a Python dictionary (serializable) containing the information needed to re-instantiate it.

regression_augmentation(out_true: keras.src.backend.common.keras_tensor.KerasTensor)[source]¶

Given the tensor ([B], 1) of true regression target values, this method will return two derived tensors: The first one is a ([B], 2) tensor of normalized distances of the corresponding true values to self.regression_reference and the second is a ([B], 2) boolean mask tensor. :param out_true: A tensor of shape ([B], 1) of the true target values of the current batch.

Returns: A tuple of two tensors each with the shape ([B], 2)

kgcnn.literature.MEGAN.make_model(inputs: list = None, name: str = None, input_tensor_type: str = None, cast_disjoint_kwargs: dict = None, units: list = None, activation: str = None, use_bias: bool = None, dropout_rate: float = None, use_edge_features: bool = None, input_embedding: dict = None, input_node_embedding: dict = None, importance_units: list = None, importance_channels: int = None, importance_activation: str = None, importance_dropout_rate: float = None, importance_factor: float = None, importance_multiplier: float = None, sparsity_factor: float = None, concat_heads: bool = None, final_units: list = None, final_dropout_rate: float = None, final_activation: str = None, final_pooling: str = None, regression_limits: tuple = None, regression_reference: float = None, return_importances: bool = True, output_embedding: dict = None, output_tensor_type: str = None)[source]¶

Functional model definition of MEGAN. Please check documentation of kgcnn.literature.MEGAN .

Model inputs: Model uses the list template of inputs and standard output template. The supported inputs are [nodes, edges, edge_indices, graph_labels...] with ‘…’ indicating mask or ID tensors following the template below. Graph labels are used to generate explanations but not to influence model output.

Template of listed graph input tensors, which should be compatible to previous kgcnn versions and defines the order as follows: [nodes, edges, angles, edge_indices, angle_indices, graph_state, ...] . Where ‘…’ denotes further mask or ID tensors, which is required for certain input types (see below). Depending on the model, some inputs may not be used (see model description for information on supported inputs). For example if the model does not support angles and no graph attribute input, the input becomes: [nodes, edges, edge_indices, ...] . In case of crystal graphs lattice and translation information has to be added. This will give a possible input of [nodes, edges, angles, edge_indices, angle_indices, graph_state, image_translation, lattice,...] . Note that in place of nodes or edges also more than one tensor can be provided, depending on the model, for example [nodes_1, nodes_2, edges_1, edges_2, edge_indices, ...] .

However, for future models we intend to used named inputs rather than a list that is sensible to ordering. Whether to use mask or length tensor for padded as well as further parameter of casting has to be set with (dict) cast_disjoint_kwargs .

Padded or Masked Inputs:

list: [nodes, edges, angles, edge_indices, angle_indices, graph_state, image_translation, lattice, node_mask/node_count, edge_mask/edge_count, angle_mask/angle_count]

nodes (Tensor): Node attributes of shape (batch, N, F) or (batch, N) using an embedding layer.

edges (Tensor): Edge attributes of shape (batch, M, F) or (batch, M) using an embedding layer.

angles (Tensor): Angle attributes of shape (batch, M, F) or (batch, K) using an embedding layer.

edge_indices (Tensor): Index list for edges of shape (batch, M, 2) referring to nodes.

angle_indices (Tensor): Index list for angles of shape (batch, K, 2) referring to edges.

graph_state (Tensor): Graph attributes of shape (batch, F) .

image_translation (Tensor): Indices of the periodic image the sending node is located in. Shape is (batch, M, 3) .

lattice (Tensor): Lattice matrix of the periodic structure of shape (batch, 3, 3) .

node_mask (Tensor): Mask for padded nodes of shape (batch, N) .

edge_mask (Tensor): Mask for padded edges of shape (batch, M) .

angle_mask (Tensor): Mask for padded angles of shape (batch, K) .

node_count (Tensor): Total number of nodes if padding is used of shape (batch, ) .

edge_count (Tensor): Total number of edges if padding is used of shape (batch, ) .

angle_count (Tensor): Total number of angle if padding is used of shape (batch, ) .

Ragged or Jagged Inputs:

list: [nodes, edges, angles, edge_indices, angle_indices, graph_state, image_translation, lattice]

nodes (RaggedTensor): Node attributes of shape (batch, None, F) or (batch, None) using an embedding layer.

edges (RaggedTensor): Edge attributes of shape (batch, None, F) or (batch, None) using an embedding layer.

angles (RaggedTensor): Angle attributes of shape (batch, None, F) or (batch, None) using an embedding layer.

edge_indices (RaggedTensor): Index list for edges of shape (batch, None, 2) referring to nodes.

angle_indices (RaggedTensor): Index list for angles of shape (batch, None, 2) referring to edges.

graph_state (Tensor): Graph attributes of shape (batch, F) .

image_translation (RaggedTensor): Indices of the periodic image the sending node is located in. Shape is (batch, None, 3) .

lattice (Tensor): Lattice matrix of the periodic structure of shape (batch, 3, 3) .

Disjoint Input:

list: [nodes, edges, angles, edge_indices, angle_indices, graph_state, image_translation, lattice, graph_id_node, graph_id_edge, graph_id_angle, nodes_id, edges_id, angle_id, nodes_count, edges_count, angles_count]

nodes (Tensor): Node attributes of shape ([N], F) or ([N], ) using an embedding layer.

edges (Tensor): Edge attributes of shape ([M], F) or ([M], ) using an embedding layer.

angles (Tensor): Angle attributes of shape ([K], F) or ([K], ) using an embedding layer.

edge_indices (Tensor): Index list for edges of shape (2, [M]) referring to nodes.

angle_indices (Tensor): Index list for angles of shape (2, [K]) referring to edges.

graph_state (Tensor): Graph attributes of shape (batch, F) .

image_translation (Tensor): Indices of the periodic image the sending node is located in. Shape is ([M], 3) .

lattice (Tensor): Lattice matrix of the periodic structure of shape (batch, 3, 3) .

graph_id_node (Tensor): ID tensor of batch assignment in disjoint graph of shape ([N], ) .

graph_id_edge (Tensor): ID tensor of batch assignment in disjoint graph of shape ([M], ) .

graph_id_angle (Tensor): ID tensor of batch assignment in disjoint graph of shape ([K], ) .

nodes_id (Tensor): The ID-tensor to assign each node to its respective graph of shape ([N], ) .

edges_id (Tensor): The ID-tensor to assign each edge to its respective graph of shape ([M], ) .

angle_id (Tensor): The ID-tensor to assign each edge to its respective graph of shape ([K], ) .

nodes_count (Tensor): Tensor of number of nodes for each graph of shape (batch, ) .

edges_count (Tensor): Tensor of number of edges for each graph of shape (batch, ) .

angles_count (Tensor): Tensor of number of angles for each graph of shape (batch, ) .

Model outputs: The standard output template:

The standard model output template returns a single tensor of either “graph”, “node”, or “edge” embeddings specified by output_embedding within the model. The return tensor type is determined by output_tensor_type . Options are:

graph:

Tensor: Graph labels of shape (batch, F) .

nodes:

Tensor: Node labels for the graph of either type:

ragged (RaggedTensor): Single tensor of shape (batch, None, F) .

padded (Tensor): Padded tensor of shape (batch, N, F) .

disjoint (Tensor): Disjoint representation of shape ([N], F) .

edges:

Tensor: Edge labels for the graph of either type:

ragged (RaggedTensor): Single tensor of shape (batch, None, F) .

padded (Tensor): Padded tensor of shape (batch, M, F)

disjoint (Tensor): Disjoint representation of shape ([M], F) .

Parameters

name – Name of the model.
inputs (list) – List of dictionaries unpacked in keras.layers.Input. Order must match model definition.
input_tensor_type (str) – Input type of graph tensor. Default is “padded”.
cast_disjoint_kwargs (dict) – Dictionary of arguments for casting layer.
units – A list of ints where each element configures an additional attention layer. The numeric value determines the number of hidden units to be used in the attention heads of that layer
activation – The activation function to be used within the attention layers of the network
use_bias – Whether the layers of the network should use bias weights at all
dropout_rate – The dropout rate to be applied after each of the attention layers of the network.
input_node_embedding – Dictionary of embedding kwargs for input embedding layer.
use_edge_features – Whether edge features should be used. Generally the network supports the usage of edge features, but if the input data does not contain edge features, this should be set to False.
importance_units – A list of ints where each element configures another dense layer in the subnetwork that produces the node importance tensor from the main node embeddings. The numeric value determines the number of hidden units in that layer.
importance_channels – The int number of explanation channels to be produced by the network. This is the value referred to as “K”. Note that this will also determine the number of attention heads used within the attention subnetwork.
importance_factor – The weight of the explanation-only train step. If this is set to exactly zero then the explanation train step will not be executed at all (less computationally expensive)
importance_multiplier – An additional hyperparameter of the explanation-only train step. This is essentially the scaling factor that is applied to the values of the dataset such that the target values can reasonably be approximated by a sum of [0, 1] importance values.
sparsity_factor – The coefficient for the sparsity regularization of the node importance tensor.
concat_heads – Whether to concat the heads of the attention subnetwork. The default is True. In that case the output of each individual attention head is concatenated and the concatenated vector is then used as the input of the next attention layer’s heads. If this is False, the vectors are average pooled instead.
final_units – A list of ints where each element configures another dense layer in the MLP at the tail end of the network. The numeric value determines the number of the hidden units in that layer. Note that the final element in this list has to be the same as the dimension to be expected for the samples of the training dataset!
final_dropout_rate – The dropout rate to be applied after every layer of the final MLP.
final_activation – The activation to be applied at the very last layer of the MLP to produce the actual output of the network.
final_pooling – The pooling method to be used during the global pooling phase in the network.
regression_limits – A tuple where the first value is the lower limit for the expected value range of the regression task and teh second value the upper limit.
regression_reference – A reference value which is inside the range of expected values (best if it was in the middle, but does not have to). Choosing different references will result in different explanations.
return_importances – Whether the importance / explanation tensors should be returned as an output of the model. If this is True, the output of the model will be a 3-tuple: (output, node importances, edge importances), otherwise it is just the output itself
output_tensor_type (str) – Output type of graph tensors such as nodes or edges. Default is “padded”.
output_embedding (str) – Main embedding task for graph network. Either “node”, “edge” or “graph”.

Returns

keras.models.Model

kgcnn.literature.MEGAN.shifted_sigmoid(x: keras.src.backend.common.keras_tensor.KerasTensor, multiplier: float = 1.0, shift: float = 10) → float [source]¶