Issue
I have a Tensorflow 2.x model with the purpose of dynamically choosing a computational path. Here's a schematic drawing of this model:
The only trainable block is the Decision Module (DM), which is essentially a fully connected layer with a single binary output (0 or 1; It's differentiable using a technique called Improved Semantic Hashing). Nets A & B have the same network architecture.
In the training progress, I feed forward a batch of images until the output of the DM, and then process the decision image-by-image, directing each image to the decided net (A or B). The predictions are concatenated into a single tensor, who's used to evaluate the performance. Here's the training code (sigma
is the output of the DM; model
includes the feature extractor and the DM):
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')
@tf.function
def train_step(images, labels):
with tf.GradientTape() as tape:
# training=True is only needed if there are custom_layers with different
# behavior during training versus inference (e.g. Dropout).
_, sigma = model(images, training=True)
out = []
for img, s in zip(images, sigma):
if s == 0:
o = binary_classifier_model_a(tf.expand_dims(img, axis=0), training=False)
else:
o = binary_classifier_model_b(tf.expand_dims(img, axis=0), training=False)
out.append(o)
predictions = tf.concat(out, axis=0)
loss = loss_object(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
train_accuracy(labels, predictions)
The problem - when running this code, gradients
returns [None, None]
.
What I know for now is:
- The first part of the model (until the DM's output) is differentiable; I tested it by running only this section and applying a loss function (MSE) and then applying
tape.gradients
- I got actual gradients. - I tried choosing a single (constant) net - e.g, net A - and simply multiplying it's output by s (which is either 0 or 1); This is performed instead of the
if-else
block in the code. In this case I also got gradients.
My concern is that such thing might not be possible - quoting from the official docs:
x = tf.constant(1.0) v0 = tf.Variable(2.0) v1 = tf.Variable(2.0) with tf.GradientTape(persistent=True) as tape: tape.watch(x) if x > 0.0: result = v0 else: result = v1**2
Depending on the value of x in the above example, the tape either records result = v0 or result = v1**2. The gradient with respect to x is always None.
dx = tape.gradient(result, x) print(dx) >> None
I'm not 100% sure that this is my case, but I wanted to ask here for the experts' opinion. Is what I'm trying to do possible? And if yes - what should I change in order for this to work? Thanks
Solution
You correctly identified the issue. The control statement of the conditional is not differentiable, so you lose your link to the model variables that produced sigma
.
In your case, because you state that sigma is either 1 or 0, you can use the value of sigma as a mask, and skip the conditional statement (and even the loop).
with tf.GradientTape() as tape:
_, sigma = model(images, training=True)
predictions = (1.0 - sigma) * binary_classifier_model_a(images, training=False)\
+ sigma * binary_classifier_model_b(images, training=False)
loss = loss_object(labels, predictions)
Answered By - Lescurel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.