Issue
I'm trying to extract all direct text inside a td tag but I'm only able to get the first part of text by using this code.
td_tag = driver.find_element(By.ID, "td_id")
driver.execute_script('return arguments[0].firstChild;', td_tag)['textContent']
Here is my DOM.
<td id="td_id">
<p>Name</p>
<div>
<span>agdsf</span>
</div>
John Smith
<span>dfsdf</span>
Address:
<br>
NewYork
</td>
What I expect here is Name John Smith Address: NewYork
Solution
To extract all direct text inside the td
tag, including text that is not enclosed in child tags, use JavaScript execution through Selenium to retrieve all child nodes of the td
element and then concatenate their text content if they are text nodes. Following is the updated code:
td_tag = driver.find_element(By.ID, "td_id")
all_text = driver.execute_script("""
var node = arguments[0];
var text = '';
for (var child = node.firstChild; child; child = child.nextSibling) {
if (child.nodeType === Node.TEXT_NODE) {
text += child.textContent.trim() + ' ';
}
}
return text.trim();
""", td_tag)
print(all_text) # This should print: "John Smith Address: NewYork"
The above script iterates through all child nodes of the td
element. It checks if a child is a text node (using child.nodeType === Node.TEXT_NODE
) and, if so, appends its text content to the resulting string. The trim
function in the loop is used to remove any extra whitespace and the trim
function at the end is just used to remove the trailing extra whitespace.
Answered By - Bilesh Ganguly
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.