Issue
I'm looking at creating a dictionary in python where the key is the html tag name and the value is the number of times the tag appeared. Is there a way to do this with beautiful soup or something else?
Solution
With BeautifulSoup you can search for all tags by omitting the search criteria:
# print all tags
for tag in soup.findAll():
print tag.name # TODO: add/update dict
If you're only interested in the number of occurrences, BeautifulSoup may be a bit overkill in which case you could use the HTMLParser
instead:
from HTMLParser import HTMLParser
class print_tags(HTMLParser):
def handle_starttag(self, tag, attrs):
print tag # TODO: add/update dict
parser = print_tags()
parser.feed(html)
This will produce the same output.
To create the dictionary of { 'tag' : count }
you could use collections.defaultdict
:
from collections import defaultdict
occurrences = defaultdict(int)
# ...
occurrences[tag_name] += 1
Answered By - Anonymous Coward
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.