Issue
I have a dataset with 3 columns Point_A, Point_B and Range like below dataset,
Point_A Point_B Range
0 1001400 1001402 9.7
1 1001402 1001404 20.2
2 1001404 1001406 16.0
3 1001406 1001408 21.7
4 1001408 1001410 11.1
5 1001410 1001412 15.6
Now I want to create a dataframe which contains all the possible combinations and its cumulative values like below dataset, like if i want to calculate value for 1001400 to 1001404 then its value should be 9.7 + 20.2 = 29.9 and so on. Output dataframe:
Point_A Point_B Cumulative_Range
1001400 1001402 9.7
1001400 1001404 29.9
1001400 1001406 45.9
1001400 1001408 67.6
1001400 1001410 78.7
1001400 1001412 94.3
1001402 1001404 20.2
1001402 1001406 36.2
1001402 1001408 57.9
1001402 1001410 69
1001402 1001412 84.6
I have tried below code but its returning original dataframe only, import pandas as pd
df = df.sort_values(['Point_A', 'Point_B'])
cumulative_df = pd.DataFrame(columns=['Point_A', 'Point_B', 'cumulative_range'])
for Point_A in df['Point_A'].unique():
subset_df = df[df['Point_A'] == Point_A].reset_index(drop=True)
cumulative_range = 0
for index, row in subset_df.iterrows():
Point_B = row['Point_B']
c_range = row['range']
cumulative_range += c_range
cumulative_df = cumulative_df.append({'Point_A': Point_A, 'Point_B': Point_B, 'cumulative_range': cumulative_range}, ignore_index=True)
print(cumulative_df)
Can anyone have logic or solution how to tackle this problem?
Solution
I would use networkx
. A simple (but not optimized) way is to sum the weights of all possible paths between any two nodes, if any such paths exist.
It is quite likely there are optimized ways to do that.
import networkx as nx
G = nx.DiGraph()
for _, r in df.iterrows():
G.add_edge(r['Point_A'], r['Point_B'], weight=r['Range'])
# then
result = [
(a, b, sum([nx.path_weight(G, path, 'weight') for path in paths]))
for a in G.nodes() for b in G.nodes()
if (paths := list(nx.all_simple_paths(G, a, b)))
]
>>> result
[(1001400.0, 1001402.0, 9.7),
(1001400.0, 1001404.0, 29.9),
(1001400.0, 1001406.0, 45.9),
(1001400.0, 1001408.0, 67.6),
...
(1001406.0, 1001412.0, 48.4),
(1001408.0, 1001410.0, 11.1),
(1001408.0, 1001412.0, 26.7),
(1001410.0, 1001412.0, 15.6)]
You can of course put that in a new df
is you like:
out = pd.DataFrame(result, columns='Point_A Point_B Cumulative_Range'.split())
Addendum
You haven't specified what should happen if there are multiple paths from to given nodes. You may be interested to find the shortest (weighted) path using the Floyd-Warshall algorithm:
result = [
(a, b, w)
for a, d in nx.floyd_warshall(G, weight='weight').items()
for b, w in d.items() if 0 < w < np.inf
]
Answered By - Pierre D
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.