Issue
I have a metadata file that looks like this:
<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="uuid_id" version="2.0">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:title>Princeton Review Digital SAT Premium Prep, 2024: 4 Practice Tests + Online Flashcards + Review & Tools</dc:title>
<dc:creator opf:file-as="Princeton Review, The" opf:role="aut">The Princeton Review</dc:creator>
<dc:identifier opf:scheme="ISBN">9780593516874</dc:identifier>
<dc:identifier opf:scheme="AMAZON">0593516877</dc:identifier>
<dc:identifier opf:scheme="GOODREADS">63139948</dc:identifier>
<dc:identifier opf:scheme="GOOGLE">o6i4EAAAQBAJ</dc:identifier>
</metadata>
</package>
I know how to use BeautifulSoup to extract fields like <dc.title>
. I'm struggling how to extract only the ISBN field (<dc:identifier opf:scheme="ISBN">
).
from bs4 import BeautifulSoup
with open ('metadata.opf', 'r') as f:
file = f.read()
metadata = BeautifulSoup(file, 'xml')
title = metadata.find('dc:title')
print(title.text)
author = metadata.find('dc:creator')
print(author.text)
# isbn = metadata.find_all('dc:identifier'). # This finds 4 fields, as expected.
How do I limit it? I can't depend on the order of the fields, and the ISBN length can vary.
Solution
According to the documentation, the find method has an argument attribute using it you should be able to select ISBN
isbn = metadata.find('dc:identifier', attrs={"opf:scheme": "ISBN"})
So the code could be written like
from bs4 import BeautifulSoup
with open ('metadata.opf', 'r') as f:
file = f.read()
metadata = BeautifulSoup(file, 'xml')
title = metadata.find('dc:title')
print(title.text)
author = metadata.find('dc:creator')
print(author.text)
isbn = metadata.find('dc:identifier', attrs={"opf:scheme": "ISBN"}) # This finds 4 fields, as expected.
print(isbn.text)
and should result in
Princeton Review Digital SAT Premium Prep, 2024: 4 Practice Tests + Online Flashcards + Review & Tools
The Princeton Review
9780593516874
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find
Answered By - RQussous
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.