Issue
I am currently converting code from pandas to polars as I really like the api. This question is a more generally question to a previous question of mine (see here)
I have the following dataframe
# Dummy data
df = pl.DataFrame({
"Buy_Signal": [1, 0, 1, 0, 1, 0, 0],
"Returns": [0.01, 0.02, 0.03, 0.02, 0.01, 0.00, -0.01],
})
I want to ultimately do aggregations on column Returns
conditional on different intervals - which are given by column Buy_Signal
. In the above case the length is from each 1 to the end of the dataframe. The resulting dataframe should therefore look like this
shape: (15, 2)
┌───────┬─────────┐
│ group ┆ Returns │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═══════╪═════════╡
│ 1 ┆ 0.01 │
│ 1 ┆ 0.02 │
│ 1 ┆ 0.03 │
│ 1 ┆ 0.02 │
│ 1 ┆ 0.01 │
│ 1 ┆ 0.0 │
│ 1 ┆ -0.01 │
│ 2 ┆ 0.03 │
│ 2 ┆ 0.02 │
│ 2 ┆ 0.01 │
│ 2 ┆ 0.0 │
│ 2 ┆ -0.01 │
│ 3 ┆ 0.01 │
│ 3 ┆ 0.0 │
│ 3 ┆ -0.01 │
└───────┴─────────┘
One approach posted as an answer to my previous question is the following:
# Build overlapping group index
idx = df.select(index=
pl.when(pl.col("Buy_Signal") == 1)
.then(pl.int_ranges(pl.int_range(pl.len()), pl.len() ))
).explode(pl.col("index")).drop_nulls().cast(pl.UInt32)
# Join index with original data
df = (df.with_row_index()
.join(idx, on="index")
.with_columns(group = (pl.col("index") == pl.col("index").max())
.shift().cum_sum().backward_fill() + 1)
.select(["group", "Returns"])
)
Question: Are there other solutions to this problem that are both readable and fast?
My actual problem contains much larger datasets. Thanks
Solution
For completeness, here is an alternative solution that doesnt rely on experimental functionality.
(
df
.with_columns(
pl.col("Buy_Signal").cum_sum().alias("group")
)
.with_columns(
pl.int_ranges(pl.col("group").min(), pl.col("group")+1)
)
.explode("group")
.sort("group")
)
Output.
shape: (15, 3)
┌────────────┬─────────┬───────┐
│ Buy_Signal ┆ Returns ┆ group │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ i64 │
╞════════════╪═════════╪═══════╡
│ 1 ┆ 0.01 ┆ 1 │
│ 0 ┆ 0.02 ┆ 1 │
│ 1 ┆ 0.03 ┆ 1 │
│ 0 ┆ 0.02 ┆ 1 │
│ 1 ┆ 0.01 ┆ 1 │
│ … ┆ … ┆ … │
│ 0 ┆ 0.0 ┆ 2 │
│ 0 ┆ -0.01 ┆ 2 │
│ 1 ┆ 0.01 ┆ 3 │
│ 0 ┆ 0.0 ┆ 3 │
│ 0 ┆ -0.01 ┆ 3 │
└────────────┴─────────┴───────┘
Answered By - Hericks
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.