Issue
I'm trying to train a model using LSTM layers. I'm using a GPU and all needed libraries are loaded.
When I'm building the model this way:
model = keras.Sequential()
model.add(layers.LSTM(256, activation="relu", return_sequences=False)) # note the activation function
model.add(layers.Dropout(0.2))
model.add(layers.Dense(256, activation="relu"))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(1))
model.add(layers.Activation(activation="sigmoid"))
model.compile(
loss=keras.losses.BinaryCrossentropy(),
optimizer="adam",
metrics=["accuracy"]
)
It works. But it's using activation="relu"
on the LSTM layer, so it's not CuDNNLSTM - that's automatically chosen when the activation function is tanh (default) - if I'm not wrong.
So, it's painfully slow and I would like to run the faster CuDNNLSTM. My code for that:
model = keras.Sequential()
model.add(layers.LSTM(256, return_sequences=False))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(256, activation="relu"))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(1))
model.add(layers.Activation(activation="sigmoid"))
model.compile(
loss=keras.losses.BinaryCrossentropy(),
optimizer="adam",
metrics=["accuracy"]
)
It's basically the same, only without the activation function provided, so tanh will be used. But now it's not training, and the end of output looks like this:
2021-04-19 22:41:46.046218: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-19 22:41:46.046426: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-19 22:41:46.046642: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-19 22:41:46.046942: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-04-19 22:41:46.047124: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-04-19 22:41:46.047312: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-04-19 22:41:46.047489: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-04-19 22:41:46.047663: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-04-19 22:41:46.047936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-04-19 22:41:46.665456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-19 22:41:46.665712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-04-19 22:41:46.665876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-04-19 22:41:46.666186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2982 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-04-19 22:41:46.667505: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-19 22:42:07.374456: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/50
2021-04-19 22:42:08.922891: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-19 22:42:09.272264: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-19 22:42:09.302667: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
Process finished with exit code -1073740791 (0xC0000409)
It just starts the first epoch, then freezes for a minute and exits with this weird exit code.
- Shape of the input data:
tf.Tensor([50985 29 7], shape=(3,), dtype=int32)
- My GPU:
Nvidia GTX 1050 Ti
- CUDA:
v11.3
- OS:
Windows 10
- IDE:
PyCharm
Finding solutions for this problem is a bit challenging as I don't have any error outputed. Am I doing something wrong? Has anyone encountered a similar issue? What should help?
// Edit; I tried:
- running this model with much fewer units (2 instead of 256) and lower batch_size
- downgrading tensorflow to
2.4.0
, CUDA to11.0
and cudnn to8.0.1
with python3.7.1
(this should be a right combination according to this list from TensorFlow website) - restarting my PC :)
Solution
I found the solution... kinda.
So it works as it should when I downgraded tensorflow to 2.1.0
, CUDA to 10.1
and cudnn to 7.6.5
(at the time 4th combination from this list on TensorFlow website)
I don't know why it didn't work at the newest version, or at the valid combination for tensorflow 2.4.0
.
It's working well so my issue is solved. Nonetheless it would be nice to know why using LSTM with cudnn on higher versions didn't work for me, as I haven't found this issue anywhere.
Answered By - Brunon Blok
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.