Issue
I have a csv file that is "name, place, thing". the thing column often has "word\nanotherword\nanotherword\n" I'm trying to figure out how to parse this out into individual lines instead of multiline entries in a single column. i.e.
name, place, word
name, place, anotherword
name, place , anotherword
I'm certain this is simple, but im having a hard time grasping what i need to do.
Solution
Without going into the code, essentially what you want to do is check to see if there are any newline characters in your 'thing'. If there are, you need to split them on the newline characters. This will give you a list of tokens (the lines in the 'thing') and since this is essentially an inner loop, you can use the original name
and place
along with your new thing_token
. A generator function lends itself well to this.
This is brings me to kroolik's answer. However, there's a slight error in kroolik's answer:
If you want to go with the column_wrapper
generator, you will need to account for the fact that the csv reader escapes backslash in the newlines, so they look like \\n
instead of \n
. Also, you need to check for blank 'things'.
def column_wrapper(reader):
for name, place, thing in reader:
for split_thing in thing.strip().split('\\n'):
if split_thing:
yield name, place, split_thing
Then you can obtain the data like this:
with open('filewithdata.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
data = [[data, name, thing] for data, name, thing in column_wrapper(reader)]
OR (without column_wrapper
):
data = []
with open('filewithdata.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
name, place, thing = tuple(row)
if '\\n' in thing:
for item in thing.split('\\n'):
if item != '\n':
data.append([name, place, item)]
I recommend using column_wrapper
as generators are more generic and pythonic.
Be sure to add import csv
to the top of your file (although I'm sure you knew that already). Hope that helps!
Answered By - jcomo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.