Issue
Problem: when I run the following command
python -c "import tensorflow as tf; tf.test.is_gpu_available(); print('version :' + tf.__version__)"
Error:
RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: all CUDA-capable devices are busy or unavailable
Details:
WARNING:tensorflow:From :1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.config.list_physical_devices('GPU')
instead.
2021-04-18 21:02:51.839069: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2021-04-18 21:02:51.846775: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2500000000 Hz
2021-04-18 21:02:51.847076: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fc3bc000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-04-18 21:02:51.847104: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-04-18 21:02:51.849876: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-04-18 21:02:51.911161: W tensorflow/compiler/xla/service/platform_util.cc:210] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_UNKNOWN: unknown error
2021-04-18 21:02:51.911285: I tensorflow/compiler/jit/xla_gpu_device.cc:161] Ignoring visible XLA_GPU_JIT device. Device number is 0, reason: Internal: no supported devices found for platform CUDA
2021-04-18 21:02:51.911546: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 21:02:51.912210: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:07.0 name: GRID T4-4Q computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 3.97GiB deviceMemoryBandwidth: 298.08GiB/s
2021-04-18 21:02:51.912446: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-04-18 21:02:51.914362: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-04-18 21:02:51.916358: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-04-18 21:02:51.916679: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-04-18 21:02:51.918787: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-04-18 21:02:51.919993: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-04-18 21:02:51.924652: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-04-18 21:02:51.924792: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 21:02:51.925488: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-18 21:02:51.926100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2021-04-18 21:02:51.926146: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
File "", line 1, in
File "/home/miniconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/home/miniconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/test_util.py", line 1496, in is_gpu_available
for local_device in device_lib.list_local_devices():
File "/home/miniconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/client/device_lib.py", line 43, in list_local_devices
_convert(s) for s in _pywrap_device_lib.list_devices(serialized_config)
RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: all CUDA-capable devices are busy or unavailable
System information:
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ubuntu 18.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: cloud server
TensorFlow installed from (source or binary): source
TensorFlow version: 2.2.0.
Python version: 3.7.7
Installed using virtualenv? pip? conda?: pip & conda.
Bazel version (if compiling from source): 2..0.0
GCC/Compiler version (if compiling from source): 7.5
CUDA/cuDNN version: CUDA 10.1 & cuDNN 7.6.5
GPU model and memory:
00:07.0 VGA compatible controller:
NVIDIA Corporation Device 1eb8 (rev a1) (prog-if 00 [VGA controller]).
Subsystem: NVIDIA Corporation Device 130e.
Physical Slot: 7
Flags: bus master, fast devsel
, latency 0, IRQ 37
Memory at fc000000 (32-bit, non-prefetchable
) [size=16M]
Memory at e0000000 (64-bit, prefetchable
) [size=256M]
Memory at fa000000 (64-bit, non-prefetchable
) [size=32M]
I/O ports at c500 [size=128]
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Kernel driver in use: nvidia
Kernel modules: nvidiafb
, nouveau, nvidia_drm
, nvidia
I tried looking for solutions to this problem but none of them solved it:
https://github.com/tensorflow/tensorflow/issues/41990
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#recommended-post
https://github.com/tensorflow/tensorflow/issues/48558
https://programmersought.com/article/94034772029/
Solution
I can confirm the case mentioned in a comment.
I had the problem while working with an Ubuntu VM, executed on VMware ESXi host, and using a vGPU partition for a v100 Nvidia GPU.
I got the same error, and I have already tried changing cuda versions and downloading (pip) softwares compiled for that specific CUDA versions, this has NOT solved the issue, the error:
tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: all CUDA-capable devices are busy or unavailable
In my case I forgot to set the license server in /etc/nvidia/grid.conf
, and I got exactly the same error, so in my case it was a GRID license issue ... fixing the grid config file and rebooting solved the issue.
Answered By - Fabiano Tarlao
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.