Issue
I need to be able to preserve the order in which the data is fed to the model when training in multiple GPUS. According to https://github.com/Lightning-AI/lightning/discussions/13342 each GPU gets a consecutive fraction of the dataset, so if I have 2GPUs, the first one will get the first half of the dataset and the other one will get the second half of the dataset. I need to preserve the order and don't know how to overwrite the dataset-splitting logic. Any advice?
Solution
I got an answer here https://github.com/Lightning-AI/lightning/discussions/15164 which is to basically write a custom DistributedSampler and pass it to the dataloader and set Trainer(replace_sampler_ddp=False)
My code is something like this
def train_dataloader(self):
"""returns a dataloader for training according to hparams
Returns:
DataLoader: DataLoader ready to deliver samples for training
"""
# define a distributed sampler in case we are using multiple GPUs
if self.hparams.num_gpus>1:
sampler = torch.utils.data.distributed.DistributedSampler(
self.train_dataset, shuffle=False)
# only use the sampler if using multiple GPUs
return DataLoader(
self.train_dataset,
shuffle=False,
num_workers=self.hparams.num_workers,
batch_size=self.hparams.batch_size,
pin_memory=False,
sampler=sampler if self.hparams.num_gpus > 1 else None)
Answered By - malfonsoarquimea
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.