Issue
I have website visitor data that resembles the example below:
id | pages |
---|---|
001 | /ice-cream, /bagels, /bagels/flavors |
002 | /pizza, /pizza/flavors, /pizza/recipe |
I would like to transform to below, where I can count the amount of times they have visited a part of my site that deals with specific content. A general count of all pageviews, delimited by comma, would be great as well.
id | bagel_count |
---|---|
001 | 2 |
002 | 0 |
id | pizza_count |
---|---|
001 | 0 |
002 | 3 |
id | total_pages_count |
---|---|
001 | 3 |
002 | 3 |
I have the option to perform in SQL or Python but I am not sure what is easier, hence why I am asking the question.
I have referenced following questions but they are not serving my purpose:
Count the number of occurrences of a character in a string (this was close but I am not sure how to apply to a dataframe)
Solution
We can do split
then explode
and get your result with crosstab
df['pages'] = df.pages.str.split(r'[/, ]')
s = df.explode('pages')
out = pd.crosstab(s['id'], s['pages']).drop('', axis=1)
out
Out[427]:
pages bagels flavors ice-cream pizza recipe
id
1 2 1 1 0 0
2 0 1 0 3 1
Answered By - BENY
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.