Issue
I want to add a new column with custom buckets (see example below)based on the price values in the price column.
< 400 = low
>=401 and <=1000 = medium
>1000 = expensive
Table
product_id price
2 1203
4 500
5 490
6 200
3 429
5 321
Output table
product_id price price_category
2 1001 high
4 500 medium
5 490 medium
6 200 low
3 429 medium
5 321 low
This what I have tried so far:
from numba import njit
def cut(arr):
bins = np.empty(arr.shape[0])
for idx, x in enumerate(arr):
if (x >= 0) & (x <= 50):
bins[idx] = 1
elif (x >= 51) & (x <= 100):
bins[idx] = 2
elif (x >= 101) & (x <= 250):
bins[idx] = 3
elif (x >= 251) & (x <= 1000):
bins[idx] = 4
else:
bins[idx] = 5
return bins
a = cut(df2['average_listings'].to_numpy())
conversion_dict = {1: 'S',
2: 'M',
3: 'L',
4: 'XL',
5: 'XXL'}
bins = list(map(conversion_dict.get, a))
--> But I am struggling to add this to the main df
Solution
You can use, np.select
:
conditions = [
df['price'].lt(400),
df['price'].ge(401) & df['price'].le(1000),
df['price'].gt(1000)]
choices = ['low', 'medium', 'high']
df['price_category'] = np.select(conditions, choices)
# print(df)
product_id price price_category
0 2 1203 high
1 4 500 medium
2 5 490 medium
3 6 200 low
4 3 429 medium
5 5 321 low
Answered By - Shubham Sharma
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.