jittor.optim

这里是Jittor的优化器模块的API文档,您可以通过from jittor import optim来获取该模块。

class jittor.optim.Adam(params, lr, eps=1e-08, betas=(0.9, 0.999), weight_decay=0)[源代码]

Adam Optimizer.

Example:

optimizer = nn.Adam(model.parameters(), lr, eps=1e-8, betas=(0.9, 0.999))
optimizer.step(loss)
add_param_group(group)[源代码]
step(loss=None, retain_graph=False)[源代码]
class jittor.optim.AdamW(params, lr, eps=1e-08, betas=(0.9, 0.999), weight_decay=0)[源代码]

AdamW Optimizer.

Example:

optimizer = nn.AdamW(model.parameters(), lr, eps=1e-8, betas=(0.9, 0.999))
optimizer.step(loss)
add_param_group(group)[源代码]
step(loss=None, retain_graph=False)[源代码]
class jittor.optim.LRScheduler(optimizer, last_epoch=- 1)[源代码]
get_last_lr()[源代码]
get_lr()[源代码]
step(epoch=None)[源代码]
class jittor.optim.LambdaLR(optimizer, lr_lambda, last_epoch=- 1)[源代码]
get_lr()[源代码]
class jittor.optim.Optimizer(params, lr, param_sync_iter=10000)[源代码]

Basic class of Optimizer.

Example:

optimizer = nn.SGD(model.parameters(), lr)
optimizer.step(loss)
add_param_group(group)[源代码]
backward(loss, retain_graph=False)[源代码]

optimize.backward(loss) is used for accumulate multiple step, it can be used as following:

Origin source code

n_iter = 10000 batch_size = 100 … for i in range(n_iter):

… loss = calc_loss() optimizer.step(loss)

Accumulation version

n_iter = 10000 batch_size = 100 accumulation_steps = 10 n_iter *= accumulation_steps batch_size //= accumulation_steps … for i in range(n_iter):

… loss = calc_loss() # if loss is a mean across batch, we need to divide accumulation_steps optimizer.backward(loss / accumulation_steps) if (i+1) % accumulation_steps == 0:

optimizer.step()

clip_grad_norm(max_norm: float, norm_type: int = 2)[源代码]

Clips gradient norm of this optimizer. The norm is computed over all gradients together.

Args:

max_norm (float or int): max norm of the gradients norm_type (int): 1-norm or 2-norm

Example:

a = jt.ones(2)
opt = jt.optim.SGD([a], 0.1)

loss = a*a
opt.zero_grad()
opt.backward(loss)

print(opt.param_groups[0]['grads'][0].norm()) # output: 2.83
opt.clip_grad_norm(0.01, 2)
print(opt.param_groups[0]['grads'][0].norm()) # output: 0.01

opt.step()
property defaults
find_grad(v: jittor_core.jittor_core.Var) jittor_core.jittor_core.Var[源代码]
load_state_dict(state)[源代码]
post_step()[源代码]

something should be done before step, such as zero grad, and so on.

Example:

class MyOptimizer(Optimizer):
    def step(self, loss):
        self.pre_step(loss)
        ...
        self.post_step()
pre_step(loss, retain_graph=False)[源代码]

something should be done before step, such as calc gradients, mpi sync, and so on.

Example:

class MyOptimizer(Optimizer):
    def step(self, loss):
        self.pre_step(loss)
        ...
        self.post_step()
state_dict()[源代码]
step(loss=None, retain_graph=False)[源代码]
zero_grad()[源代码]
class jittor.optim.RMSprop(params, lr=0.01, eps=1e-08, alpha=0.99)[源代码]

RMSprop Optimizer. Args:

params(list): parameters of model. lr(float): learning rate. eps(float): term added to the denominator to avoid division by zero, default 1e-8. alpha(float): smoothing constant, default 0.99.

Example:

optimizer = nn.RMSprop(model.parameters(), lr) optimizer.step(loss)

add_param_group(group)[源代码]
step(loss=None, retain_graph=False)[源代码]
class jittor.optim.SGD(params, lr, momentum=0, weight_decay=0, dampening=0, nesterov=False)[源代码]

SGD Optimizer.

Example:

optimizer = nn.SGD(model.parameters(), lr, momentum=0.9)
optimizer.step(loss)
add_param_group(group)[源代码]
step(loss=None, retain_graph=False)[源代码]
jittor.optim.opt_grad(v: jittor_core.jittor_core.Var, opt: jittor.optim.Optimizer)[源代码]

Get grad of certain variable in optimizer, Example:

model = Model() optimizer = SGD(model.parameters()) … optimizer.backward(loss)

for p in model.parameters():

grad = p.opt_grad(optimizer)

以下是Jittor的学习率调度模块的API文档,学习率调度模块需要配合优化器使用,您可以通过from jittor import lr_scheduler来获取该模块。

class jittor.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=- 1)[源代码]
get_lr(base_lr, now_lr)[源代码]
step()[源代码]
update_lr()[源代码]
class jittor.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=- 1)[源代码]

learning rate is multiplied by gamma in each step.

get_lr(base_lr, now_lr)[源代码]
step()[源代码]
update_lr()[源代码]
class jittor.lr_scheduler.MultiStepLR(optimizer, milestones=[], gamma=0.1, last_epoch=- 1)[源代码]
get_gamma()[源代码]
get_lr()[源代码]
step()[源代码]
update_lr()[源代码]
class jittor.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)[源代码]
better(a, b)[源代码]
step(loss, epoch=None)[源代码]
update_lr(epoch)[源代码]
class jittor.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=- 1)[源代码]
get_gamma()[源代码]
get_lr()[源代码]
step()[源代码]
update_lr()[源代码]