Tuesday, June 28, 2022

[FIXED] Control flow in Tensorflow 2 - gradients are None

June 28, 2022 control-flow, keras, python, tensorflow, tensorflow2.0 No comments

Issue

I have a Tensorflow 2.x model with the purpose of dynamically choosing a computational path. Here's a schematic drawing of this model:

The only trainable block is the Decision Module (DM), which is essentially a fully connected layer with a single binary output (0 or 1; It's differentiable using a technique called Improved Semantic Hashing). Nets A & B have the same network architecture. In the training progress, I feed forward a batch of images until the output of the DM, and then process the decision image-by-image, directing each image to the decided net (A or B). The predictions are concatenated into a single tensor, who's used to evaluate the performance. Here's the training code (sigma is the output of the DM; model includes the feature extractor and the DM):

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')


@tf.function
def train_step(images, labels):
    with tf.GradientTape() as tape:
        # training=True is only needed if there are custom_layers with different
        # behavior during training versus inference (e.g. Dropout).
        _, sigma = model(images, training=True)
        out = []
        for img, s in zip(images, sigma):
            if s == 0:
                o = binary_classifier_model_a(tf.expand_dims(img, axis=0), training=False)
            else:
                o = binary_classifier_model_b(tf.expand_dims(img, axis=0), training=False)
            out.append(o)

        predictions = tf.concat(out, axis=0)
        loss = loss_object(labels, predictions)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    train_loss(loss)
    train_accuracy(labels, predictions)

The problem - when running this code, gradients returns [None, None]. What I know for now is:

The first part of the model (until the DM's output) is differentiable; I tested it by running only this section and applying a loss function (MSE) and then applying tape.gradients - I got actual gradients.
I tried choosing a single (constant) net - e.g, net A - and simply multiplying it's output by s (which is either 0 or 1); This is performed instead of the if-else block in the code. In this case I also got gradients.

My concern is that such thing might not be possible - quoting from the official docs:

x = tf.constant(1.0)

v0 = tf.Variable(2.0)
v1 = tf.Variable(2.0)

with tf.GradientTape(persistent=True) as tape:
  tape.watch(x)
  if x > 0.0:
    result = v0
  else:
    result = v1**2 
Depending on the value of x in the above example, the tape either records result = v0 or result = v1**2. The gradient with respect to x is always None.
dx = tape.gradient(result, x)
print(dx)
>> None

I'm not 100% sure that this is my case, but I wanted to ask here for the experts' opinion. Is what I'm trying to do possible? And if yes - what should I change in order for this to work? Thanks

Solution

You correctly identified the issue. The control statement of the conditional is not differentiable, so you lose your link to the model variables that produced sigma.

In your case, because you state that sigma is either 1 or 0, you can use the value of sigma as a mask, and skip the conditional statement (and even the loop).

with tf.GradientTape() as tape:
    _, sigma = model(images, training=True)
    predictions = (1.0 - sigma) * binary_classifier_model_a(images, training=False)\
                   + sigma * binary_classifier_model_b(images, training=False)
    loss = loss_object(labels, predictions)

Answered By - Lescurel

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, June 28, 2022

[FIXED] Control flow in Tensorflow 2 - gradients are None

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels