Issue
I have 10,00,000 agents, each associated with (x,y)
coordinates. I am trying to find agents close to each other (radius=1.5
). I tried to implement this using PyTorch:
X = torch.DoubleTensor(1000000,2).uniform_(0,10000)
torch.cdist(X,X,p=2)
However, with this the session crashes. I am running this on google colab. The same happened when I tried constructing the graph using, radius_neighbors_graph
of scikit-learn package. It would be of great help if someone suggested a memory efficient way to implement the same.
Solution
I found three solutions, Solution 1
import torch
x = torch.randn(3000000, 2).cuda()
y = x
# Turn our Tensors into KeOps symbolic variables:
from pykeops.torch import LazyTensor
x_i = LazyTensor( x[:,None,:] )
y_j = LazyTensor( y[None,:,:] )
# We can now perform large-scale computations, without memory overflows:
D_ij = ((x_i - y_j)**2).sum(dim=2)
D_ij.argKmin(20,dim=1)
Solution 2
M = 3000000
import numpy as np
from pykeops.numpy import LazyTensor as LazyTensor_np
x = np.random.rand(M, 2)
y = x
x_i = LazyTensor_np(
x[:, None, :]
) # (M, 1, 2) KeOps LazyTensor, wrapped around the numpy array x
y_j = LazyTensor_np(
y[None, :, :]
) # (1, N, 2) KeOps LazyTensor, wrapped around the numpy array y
D_ij = ((x_i - y_j) ** 2).sum(-1) # **Symbolic** (M, N) matrix of squared distances
s_i = D_ij.argKmin(20,dim=1).ravel() # genuine (M,) array of integer indices
Solution 3
from sklearn.neighbors import NearestNeighbors
import numpy as np
M = 3000000
x = np.random.rand(M, 2)
nbrs = NearestNeighbors(n_neighbors=20, algorithm='ball_tree').fit(x)
distances, indices = nbrs.kneighbors(x)
Although the execution time of all the three solutions is the same, a minute, the memory requirements are approximately 2GB, 1GB and 1.3GB, respectively. It would be great to hear ideas to lower the execution time.
Answered By - user3856486
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.