Sunday, November 6, 2022

[FIXED] Is it better to create a Dockerfile based on tensorflow:latest-gpu-py3-jupyter or refactor for load_image() attribute?

November 06, 2022 dockerfile, jupyter, keras, python, tensorflow No comments

Issue

I am developing image classification models in the Jupyter notebook environment. After getting my model to work with the CPU, I am trying to use the latest TensorFlow Docker image supported for Jupyter & GPU (tensorflow/tensorflow:latest-gpu-py3-jupyter) so I can take advantage of my GPU for training. The GPU configuration is not the problem (nvidia-smi command shows the GPU is available), but I'm now stuck on what I should do with my image data pipeline setup.

I have folders containing images with the following structure:

my_folder
│
└───Training
│   │
│   └───Class_A
│   │       01234.jpg
│   │       56789.jpg
│   │       ...
│   │        
│   └───Class_B
│   │       01234.jpg
│   │       56789.jpg
│   │       ...
│   
└───Validation
│   │
│   └───Class_A
│   │       01234.jpg
│   │       56789.jpg
│   │       ...
│   │        
│   └───Class_B
│   │       01234.jpg
│   │       56789.jpg
│   │       ...

path_training = 'my_folder/Training/'
path_validation = 'my_folder/Validation/'
image_size = (90, 90)

With tensorflow == 2.6.2, I can easily load in my training/validation image datasets with the following code:

train_ds = tf.keras.preprocessing.image_dataset_from_directory(path_training,
                                                               seed=1993, 
                                                               image_size = image_size)

val_ds = tf.keras.preprocessing.image_dataset_from_directory(path_validation,
                                                             seed=1993,
                                                             image_size = image_size)

However, it became apparent that this command does not work when using the Docker image:

----> 3 train_ds = tf.keras.preprocessing.image_dataset_from_directory(path_training, 4 seed=1993, 5 image_size = image_size)

AttributeError: module 'tensorflow_core.keras.preprocessing' has no attribute 'image_dataset_from_directory'

So I discovered the tensorflow version of the Docker image is 2.1.0, and that attribute is not listed in the API, which leaves me this option:

# Read in all image files and split into training/validation sets (tensorflow-gpu 2.1.0)
train_ds = tf.keras.preprocessing.image.load_img(path_training, target_size = image_size)
val_ds = tf.keras.preprocessing.image.load_img(path_validation, target_size = image_size)

As might be expected, the load_img() command from TensorFlow 2.1.0 does not read in directories, like image_dataset_from_directory() does.

IsADirectoryError: [Errno 21] Is a directory: 'my_folder/Training/'

I'm not sure what the best or easiest path forward would be here, as I'm not very familiar with building Docker images. Would it be better to build a Dockerfile based on TensorFlow's latest official tensorflow-GPU & Jupyter Docker image so I can utilize tf.keras.preprocessing.image_dataset_from_directory() or should I just make do with this pre-built Docker image and load my image data with tf.keras.preprocessing.image.load_img() by looping through files in the directory path and creating training/validation image datasets this way? For the latter approach, I searched and found some similar examples, notably this example code:

def get_data(dir):
    X_train, Y_train = [], []
    X_test, Y_test = [], []
    subfolders = sorted([file.path for file in os.scandir(dir) if file.is_dir()])
    for idx, folder in enumerate(subfolders):
        for file in sorted(os.listdir(folder)):
            img = load_img(folder+"/"+file, color_mode='grayscale')
            img = img_to_array(img).astype('float32')/255
            img = img.reshape(img.shape[0], img.shape[1],1)
            if idx < 35:
                X_train.append(img)
                Y_train.append(idx)
            else:
                X_test.append(img)
                Y_test.append(idx-35)

    X_train = np.array(X_train)
    X_test = np.array(X_test)
    Y_train = np.array(Y_train)
    Y_test = np.array(Y_test)
    return (X_train, Y_train), (X_test, Y_test)

Solution

OK, so I gave up on the refactoring approach, but I learned how to build a Docker image! It was surprisingly easier than I anticipated. Now I am able to use tf.keras.preprocessing.image_dataset_from_directory() as well as my GPU for deep learning.

This is basically the Dockerfile I built:

FROM tensorflow/tensorflow:latest-gpu-jupyter

ENV python_version 3.8

# Install desired Python version (the current TF image is based on Ubuntu at the moment)
RUN apt install -y python${python_version}

# Set default version for root user 
RUN update-alternatives --install /usr/local/bin/python python /usr/bin/python${python_version} 1

# Update pip: https://packaging.python.org/tutorials/installing-packages/#ensure-pip-setuptools-and-wheel-are-up-to-date
RUN python -m pip install --upgrade pip setuptools wheel

COPY requirements.txt requirements.txt

RUN python -m pip install -r requirements.txt

EXPOSE 8888

Here are contents of the basic requirements.txt file I used:

numpy
matplotlib
pandas
scikit-learn
tensorflow-gpu==2.10.0

After importing tensorflow, here is the print out of the TensorFlow version as well as what devices are available:

print(tf. __version__)
print(tf.config.get_visible_devices())

2.10.0 [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

I can then load image datasets easily!

Here are some useful links:

Setting up TensorFlow with GPU acceleration the quick way

Installing TensorFlow and Jupyter, with GPU Support

For anyone who comes across this post, I hope you find my thought process useful. Cheers!

Answered By - 1337nerd

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, November 6, 2022

[FIXED] Is it better to create a Dockerfile based on tensorflow:latest-gpu-py3-jupyter or refactor for load_image() attribute?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels