Issue
I have a neural network which essentially has 2 parts A and B, I have defined a custom loss function say l = l1(A) + l2(A, B)
. Only the layers of part A are responsible for loss l1
, so during backpropagation I want to apply the penalty of l1
to only those layers and l2
is from the whole network so I want to apply the penalty of l2
to all the layers. How can this be achieved in Keras?
I have been using a weighted sum of the losses up until now, but it is not what I have in mind, here is some code from my training loop:
net_loss = lambda_1 * l1 + lambda_2 * l2
gradients = tape.gradient(net_loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
I have not found a way to do partial gradient calculation and application, please help me with this.
Solution
If you're trying to update weights of different layers with respect to different loss functions, you can separate model.trainable_variables
into different groups when calculating gradients and doing the weight updates.
Like this (minimal example):
import tensorflow as tf
import tensorflow_datasets as tfds
preprocess = lambda x, y: (x, tf.one_hot(y, 3))
ds = tfds.load('iris', split='train', as_supervised=True).map(preprocess)
train = ds.batch(4)
model = tf.keras.Sequential([
tf.keras.layers.Dense(16, activation='relu', name='pen1'),
tf.keras.layers.Dense(32, activation='relu', name='pen2'),
tf.keras.layers.Dense(3, activation='softmax', name='pen1b')
])
model.build(input_shape=(None, 4))
loss_fn1 = tf.losses.BinaryCrossentropy(from_logits=False)
loss_fn2 = tf.losses.BinaryFocalCrossentropy(from_logits=False)
all_variables = model.trainable_variables
variables1 = list(filter(lambda x: x.name.startswith('pen1'), all_variables))
variables2 = list(filter(lambda x: x.name.startswith('pen2'), all_variables))
optimizer = tf.optimizers.Adam()
optimizer.build(variables1 + variables2)
verbose = "Epoch {:2d} Loss1: {:.3f} Loss2: {:.3f}"
for epoch in range(1, 5 + 1):
train_loss1 = tf.metrics.Mean()
train_loss2 = tf.metrics.Mean()
for x, y in train:
with tf.GradientTape(persistent=True) as tape:
out = model(x, training=True)
loss1 = loss_fn1(out, y)
loss2 = loss_fn2(out, y)
grads1 = tape.gradient(loss1, variables1)
grads2 = tape.gradient(loss2, variables2)
optimizer.apply_gradients(zip(grads1, variables1))
optimizer.apply_gradients(zip(grads2, variables2))
train_loss1.update_state(loss1)
train_loss2.update_state(loss2)
print(verbose.format(epoch, train_loss1.result(), train_loss2.result()))
Answered By - Nicolas Gervais
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.