Issue
I'd like to test that some (de)serialization code is 100% symmetrical but I've found that, when serializing my data as a nested Tag, the inner Tags don't have newline characters between them like the original Tag did.
Is there some way to force newline characters between nested Tags without having to prettify -> re-parse?
Example:
import bs4
from bs4 import BeautifulSoup
tag_string = "<OUTER>\n<INNER/>\n</OUTER>"
tag = BeautifulSoup(tag_string, "xml").find("OUTER")
print(tag)
... <OUTER>
... <INNER/>
... </OUTER>
new_tag = bs4.Tag(name="OUTER")
new_tag.append(bs4.Tag(name="INNER", can_be_empty_element=True))
tag == new_tag
... False
print(new_tag)
... <OUTER><INNER/></OUTER>
In order to get a 100% symmetrical serialization, I have to:
- prettify the Tag into a string
- replace "\n " with "\n"
- parse this string as a new XML Tag
- find the
OUTER
Tag
new_tag = BeautifulSoup(
new_tag.prettify().replace("\n ", "\n"), "xml"
).find("OUTER")
tag == new_tag
... True
print(new_tag)
... <OUTER>
... <INNER/>
... </OUTER>
Justification for this need:
- I'm deserializing thousands of tags into objects using some test data specific to me
- as I developed this feature, I found that there were many mismatches between input / output for several of the attributes
- mismatches are not acceptable because it causes the closed-source binary that reads this data to perform operations that should be avoided unless there's a genuine difference between that program's database and the entries in this XML
- I was able to handle these mismatches for my test data, but I am not able to guarantee (de)serialization symmetry for all users of this code
- as a result, I think it's best if the test suit enumerates tags for the users' data asserting that input == output and, to facilitate this, I've had to do this kinda annoying hackaround (prettify -> re-parse) which I'd like to avoid if possible
Solution
BeautifulSoup has a NavigableString element which can be used to represent an XML structure with newline characters in it without having to convert to a string and parse back into a bs4.element.Tag
.
Example:
import bs4
from bs4 import BeautifulSoup
tag_string = "<OUTER>\n<INNER/>\n</OUTER>"
tag = BeautifulSoup(tag_string, "xml").find("OUTER")
new_tag = bs4.Tag(name="OUTER")
new_tag.append(bs4.NavigableString("\n"))
new_tag.append(bs4.Tag(name="INNER", can_be_empty_element=True))
new_tag.append(bs4.NavigableString("\n"))
tag == new_tag
... True
Answered By - aweeeezy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.