Issue
I am developing image classification models in the Jupyter notebook environment. After getting my model to work with the CPU, I am trying to use the latest TensorFlow Docker image supported for Jupyter & GPU (tensorflow/tensorflow:latest-gpu-py3-jupyter) so I can take advantage of my GPU for training. The GPU configuration is not the problem (nvidia-smi
command shows the GPU is available), but I'm now stuck on what I should do with my image data pipeline setup.
I have folders containing images with the following structure:
my_folder
│
└───Training
│ │
│ └───Class_A
│ │ 01234.jpg
│ │ 56789.jpg
│ │ ...
│ │
│ └───Class_B
│ │ 01234.jpg
│ │ 56789.jpg
│ │ ...
│
└───Validation
│ │
│ └───Class_A
│ │ 01234.jpg
│ │ 56789.jpg
│ │ ...
│ │
│ └───Class_B
│ │ 01234.jpg
│ │ 56789.jpg
│ │ ...
path_training = 'my_folder/Training/'
path_validation = 'my_folder/Validation/'
image_size = (90, 90)
With tensorflow == 2.6.2, I can easily load in my training/validation image datasets with the following code:
train_ds = tf.keras.preprocessing.image_dataset_from_directory(path_training,
seed=1993,
image_size = image_size)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(path_validation,
seed=1993,
image_size = image_size)
However, it became apparent that this command does not work when using the Docker image:
----> 3 train_ds = tf.keras.preprocessing.image_dataset_from_directory(path_training, 4 seed=1993, 5 image_size = image_size)
AttributeError: module 'tensorflow_core.keras.preprocessing' has no attribute 'image_dataset_from_directory'
So I discovered the tensorflow version of the Docker image is 2.1.0, and that attribute is not listed in the API, which leaves me this option:
# Read in all image files and split into training/validation sets (tensorflow-gpu 2.1.0)
train_ds = tf.keras.preprocessing.image.load_img(path_training, target_size = image_size)
val_ds = tf.keras.preprocessing.image.load_img(path_validation, target_size = image_size)
As might be expected, the load_img() command from TensorFlow 2.1.0 does not read in directories, like image_dataset_from_directory() does.
IsADirectoryError: [Errno 21] Is a directory: 'my_folder/Training/'
I'm not sure what the best or easiest path forward would be here, as I'm not very familiar with building Docker images. Would it be better to build a Dockerfile based on TensorFlow's latest official tensorflow-GPU & Jupyter Docker image so I can utilize tf.keras.preprocessing.image_dataset_from_directory()
or should I just make do with this pre-built Docker image and load my image data with tf.keras.preprocessing.image.load_img()
by looping through files in the directory path and creating training/validation image datasets this way? For the latter approach, I searched and found some similar examples, notably this example code:
def get_data(dir):
X_train, Y_train = [], []
X_test, Y_test = [], []
subfolders = sorted([file.path for file in os.scandir(dir) if file.is_dir()])
for idx, folder in enumerate(subfolders):
for file in sorted(os.listdir(folder)):
img = load_img(folder+"/"+file, color_mode='grayscale')
img = img_to_array(img).astype('float32')/255
img = img.reshape(img.shape[0], img.shape[1],1)
if idx < 35:
X_train.append(img)
Y_train.append(idx)
else:
X_test.append(img)
Y_test.append(idx-35)
X_train = np.array(X_train)
X_test = np.array(X_test)
Y_train = np.array(Y_train)
Y_test = np.array(Y_test)
return (X_train, Y_train), (X_test, Y_test)
Solution
OK, so I gave up on the refactoring approach, but I learned how to build a Docker image! It was surprisingly easier than I anticipated. Now I am able to use tf.keras.preprocessing.image_dataset_from_directory()
as well as my GPU for deep learning.
This is basically the Dockerfile I built:
FROM tensorflow/tensorflow:latest-gpu-jupyter
ENV python_version 3.8
# Install desired Python version (the current TF image is based on Ubuntu at the moment)
RUN apt install -y python${python_version}
# Set default version for root user
RUN update-alternatives --install /usr/local/bin/python python /usr/bin/python${python_version} 1
# Update pip: https://packaging.python.org/tutorials/installing-packages/#ensure-pip-setuptools-and-wheel-are-up-to-date
RUN python -m pip install --upgrade pip setuptools wheel
COPY requirements.txt requirements.txt
RUN python -m pip install -r requirements.txt
EXPOSE 8888
Here are contents of the basic requirements.txt
file I used:
numpy
matplotlib
pandas
scikit-learn
tensorflow-gpu==2.10.0
After importing tensorflow
, here is the print out of the TensorFlow version as well as what devices are available:
print(tf. __version__)
print(tf.config.get_visible_devices())
2.10.0 [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
I can then load image datasets easily!
Here are some useful links:
Setting up TensorFlow with GPU acceleration the quick way
Installing TensorFlow and Jupyter, with GPU Support
For anyone who comes across this post, I hope you find my thought process useful. Cheers!
Answered By - 1337nerd
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.