Issue
I have some libtorch code that is doing inference on the cpu using a model trained in pytorch that is then exported to torchscript. The code below is a simplified version of a method that is being repeatedly called.
void Backend::perform(std::vector<float *> in_buffer,
std::vector<float *> out_buffer) {
c10::InferenceMode guard;
at::Tensor tensor_out;
at::Tensor tensor_in = torch::zeros({ 1, 16, 2 });
std::vector<torch::jit::IValue> inputs = { tensor_in };
// calling forward on the model "decode," this is where
// the memory leak happens
tensor_out = m_model.get_method("decode")(inputs).toTensor();
auto out_ptr = tensor_out.contiguous().data_ptr<float>();
for (int i(0); i < out_buffer.size(); i++) {
memcpy(out_buffer[i], out_ptr + i * n_vec, n_vec * sizeof(float));
}
}
m_model is the .ts file loaded via:
m_model = torch::jit::load(path);
m_model.eval();
Every call it seems that more of the torch graph is being allocated, and it isn’t being freed causing the program to eventually OOM and crash. Commenting out the forward call causes the the memory usage to stabilize.
My understanding is that InferenceMode guard should turn off autograd memory buildup which seems to be the normal cause of these issues.
I tried mimicking this in pytorch (by repeatedly calling forward from a loop), and there’s no memory issues which seems to point to this being a libtorch issue rather than an issue with the model itself.
My system:
OS: Windows 10/11
pytorch version: 1.11.0
libtorch version: 1.11.0
Solution
This ended up being a bug in the windows implementation of libtorch. Memory leaks can happen when calling forward on a separate thread from the main thread (https://github.com/pytorch/pytorch/issues/24237), and moving the forward call to the main thread fixed the issue.
Even though the issue is marked closed the bug is still present.
Answered By - Nicholas Shaheed
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.