kgcnn.optimizers package

Submodules

kgcnn.optimizers.optimizers module

class kgcnn.optimizers.optimizers.Adan(learning_rate: float = 0.001, name: str = 'Adan', beta_1: float = 0.98, beta_2: float = 0.92, beta_3: float = 0.99, eps: float = 1e-08, weight_decay: float = 0.0, amsgrad: bool = False, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, **kwargs)[source]

Bases: keras.src.optimizers.optimizer.Optimizer

Optimizer Adan : Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models.

‘Adan develops a Nesterov momentum estimation method to estimate stable and accurate first and second momentums of gradient in adaptive gradient algorithms for acceleration’.

Algorithm of Adan:

Input: Initialization \(θ_0\), step size \(\eta\), average parameter \((β_1, β_2, β_3) \in [0, 1]^3\), stable parameter \(\epsilon > 0\), weight decays \(\lambda_k > 0\), restart condition.

Output: some average of \(\{\theta_k\}^K_{k=1}\).

(set \(m_0 = g_0\) and \(v_1 = g_1 - g_0\))

while \(k < K\) do:

\[\begin{split}m_k &= (1 − \beta_1)m_{k−1} + \beta_1 g_k \\\\ v_k &= (1 − \beta_2)v_{k−1} + \beta_2(g_k − g_{k−1}) \\\\ n_k = (1 − \beta_3)n_{k−1} + \beta_3[g_k + (1 − \beta_2)(g_k − g_{k−1})]^2 \\\\ \eta_k = \eta / \sqrt{n_k + \epsilon} \\\\ θ_{k+1} = (1 + \lambda_k \eta)^{-1} [\theta_k − \eta_k \dot (m_k + (1 − \beta_2) v_k)] \\\\ \text{if restart condition holds:} \\\\ \text{ get stochastic gradient estimator } g_0 \text{at } \theta_{k+1} \\\\ \text{ set } m_0 = g_0, \; v_0 = 0, \; n_0 = g_0^2, \; k = 1 \\\\ \text{ update } \theta_k\end{split}\]
__init__(learning_rate: float = 0.001, name: str = 'Adan', beta_1: float = 0.98, beta_2: float = 0.92, beta_3: float = 0.99, eps: float = 1e-08, weight_decay: float = 0.0, amsgrad: bool = False, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, **kwargs)[source]

Initialize optimizer.

Parameters
  • learning_rate (float) – Learning rate. Default is 1e-3.

  • name (str) – Name of the optimizer. Defaults to ‘Adan’.

  • beta_1 (float) – Beta 1 parameter. Default is 0.98.

  • beta_2 (float) – Beta 2 parameter. Default is 0.92.

  • beta_3 (float) – Beta 3 parameter. Default is 0.99.

  • eps (float) – Numerical epsilon for denominators. Default is 1e-8.

  • weight_decay (float) – Decoupled weight decay. Default is 0.0.

  • amsgrad (bool) – Use the maximum of all 2nd moment running averages. Default is False.

build(var_list)[source]

Initialize optimizer variables.

Parameters

var_list – list of model variables to build Adam variables on.

get_config()[source]

Get config dictionary.

update_step(grad, var, learning_rate)[source]

Update step given gradient and the associated model variable.

Module contents