Saturday, October 30, 2021

[FIXED] Cupy works well with TITAN V, but not with TITAN RTX

October 30, 2021 cupy, nvidia, python, pytorch No comments

Issue

I am using cupy to run a cuda code with pytorch.

My env is ubuntu 20, anaconda-python 3.7.6, nvidia-driver 440, cuda 10.2, cupy-cuda102, torch 1.4.0

First, I wrote a simple main code

import data_load_test
from tqdm import tqdm
import torch
from torch.utils.data import DataLoader

def main():

    dataset = data_load_test.DataLoadTest()
    training_loader = DataLoader(dataset, batch_size=1)
    with torch.cuda.device(0):
        pbar = tqdm(training_loader)
        for epoch in range(3):
            for i, img in enumerate(pbar):
                print("see the message")

if __name__ == "__main__":
    main()

and data loader like this.

from torch.utils.data import Dataset
import cv2
import cupy as cp

def read_cuda_file(cuda_path):
    f = open(cuda_path, 'r')
    source_line = ""
    while True:
        line = f.readline()
        if not line: break
        source_line = source_line + line
    f.close()
    return source_line

class DataLoadTest(Dataset):
    def __init__(self):
        source = read_cuda_file("cuda/cuda_code.cu")
        cuda_source = '''{}'''.format(source)
        module = cp.RawModule(code=cuda_source)
        self.myfunc = module.get_function('myfunc')

        self.input = cp.asarray(cv2.imread("hi.png",-1), cp.uint8)
        h, w, c = self.input.shape
        self.h = h
        self.w = w
        self.output = cp.zeros((w, h, 3), dtype=cp.uint8)

        self.block_size = (32, 32)
        self.grid_size = (h // self.block_size[1], w // self.block_size[0])

    def __len__(self):
        return 1

    def __getitem__(self, idx):
        self.myfunc(self.grid_size, self.block_size, (self.input, self.output, self.h, self.w))
        return cp.asnumpy(self.output)

And my cuda code is,

#define PI 3.14159265358979323846f
extern "C"{
__global__ void myfunc(const unsigned char* refImg, unsigned char* warpImg, const long long cols, const long long rows)
{

    long long x = blockDim.x * blockIdx.x + threadIdx.x;
    long long y = blockDim.y * blockIdx.y + threadIdx.y;

    long long indexImg = y * cols + x;

    warpImg[indexImg * 3] = 0;
    warpImg[indexImg * 3 + 1] = 1;
    warpImg[indexImg * 3 + 2] = 2;
}
}

I have two GPUs TITAN V (device 0) and TITAN RTX (device 1)

When I run this code with TITAN V,(main function 3rd line)

with torch.cuda.device(0):

it works fine, but

with TITAN RTX,

with torch.cuda.device(1):

It gives an error message like this.

  File "cupy/core/raw.pyx", line 66, in cupy.core.raw.RawKernel.__call__
  File "cupy/cuda/function.pyx", line 162, in cupy.cuda.function.Function.__call__
  File "cupy/cuda/function.pyx", line 144, in cupy.cuda.function._launch
  File "cupy/cuda/driver.pyx", line 293, in cupy.cuda.driver.launchKernel
  File "cupy/cuda/driver.pyx", line 118, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_CONTEXT_IS_DESTROYED: context is destroyed

Please help.

Solution

In main() when dataLoadTest() class is instantiated, it is happening on the default device 0, so cuPy is compiling myFunc() there.

The next line “with torch.cuda.device(0):“ is where you switch to device 1 in the version that fails?

What happens if you call

cuPy.cuda.Device(1).use()

as the first line in main(), to make sure myFunc() gets instantiated on device 1?

Answered By - Stripedbass

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, October 30, 2021

[FIXED] Cupy works well with TITAN V, but not with TITAN RTX

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels