Thursday, March 17, 2022

[FIXED] How to parse a csv with python, when one column has multiple lines

March 17, 2022 csv, python No comments

Issue

I have a csv file that is "name, place, thing". the thing column often has "word\nanotherword\nanotherword\n" I'm trying to figure out how to parse this out into individual lines instead of multiline entries in a single column. i.e.

name, place, word

name, place, anotherword

name, place , anotherword

I'm certain this is simple, but im having a hard time grasping what i need to do.

Solution

Without going into the code, essentially what you want to do is check to see if there are any newline characters in your 'thing'. If there are, you need to split them on the newline characters. This will give you a list of tokens (the lines in the 'thing') and since this is essentially an inner loop, you can use the original name and place along with your new thing_token. A generator function lends itself well to this.

This is brings me to kroolik's answer. However, there's a slight error in kroolik's answer:

If you want to go with the column_wrapper generator, you will need to account for the fact that the csv reader escapes backslash in the newlines, so they look like \\n instead of \n. Also, you need to check for blank 'things'.

def column_wrapper(reader):
    for name, place, thing in reader:
        for split_thing in thing.strip().split('\\n'):
            if split_thing:
                yield name, place, split_thing

Then you can obtain the data like this:

with open('filewithdata.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    data = [[data, name, thing] for data, name, thing in column_wrapper(reader)]

OR (without column_wrapper):

data = []
with open('filewithdata.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        name, place, thing = tuple(row)
        if '\\n' in thing:
            for item in thing.split('\\n'):
                if item != '\n':
                    data.append([name, place, item)]

I recommend using column_wrapper as generators are more generic and pythonic.

Be sure to add import csv to the top of your file (although I'm sure you knew that already). Hope that helps!

Answered By - jcomo

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, March 17, 2022

[FIXED] How to parse a csv with python, when one column has multiple lines

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels