Issue
I would like to know the fastest way to extract the indices of the first n
non zero values per column in a 2D array.
For example, with the following array:
arr = [
[4, 0, 0, 0],
[0, 0, 0, 0],
[0, 4, 0, 0],
[2, 0, 9, 0],
[6, 0, 0, 0],
[0, 7, 0, 0],
[3, 0, 0, 0],
[1, 2, 0, 0],
With n=2
I would have [0, 0, 1, 1, 2]
as xs and [0, 3, 2, 5, 3]
as ys. 2 values in the first and second columns and 1 in the third.
Here is how it is currently done:
x = []
y = []
n = 3
for i, c in enumerate(arr.T):
a = c.nonzero()[0][:n]
if len(a):
x.extend([i]*len(a))
y.extend(a)
In practice I have arrays of size (405, 256)
.
Is there a way to make it faster?
Solution
Here is a method, although quite confusing as it uses a lot of functions, that does not require sorting the array (only a linear scan is necessary to get non null values):
n = 2
# Get indices with non null values, columns indices first
nnull = np.stack(np.where(arr.T != 0))
# split indices by unique value of column
cols_ids= np.array_split(range(len(nnull[0])), np.where(np.diff(nnull[0]) > 0)[0] +1 )
# Take n in each (max) and concatenate the whole
np.concatenate([nnull[:, u[:n]] for u in cols_ids], axis = 1)
outputs:
array([[0, 0, 1, 1, 2],
[0, 3, 2, 5, 3]], dtype=int64)
Answered By - PlainRavioli
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.