Issue
Say I have a string:
string = '<img src="image.png"><input type=text>'
I have a function which turns the string into HTML markup and removes all tags but <img>
tags like so:
VALID_TAGS = ['img']
def sanitizeHTML(value):
soup = BeautifulSoup(value)
for tag in soup.findAll(True):
if tag.name not in VALID_TAGS:
tag.hidden = True
return Markup(soup.renderContents())
If I pass the string though the function, it would return <img src="image.png">
as that is the only HTML tag valid.
As you can see, the <input>
tag doesn't even appear in the string. How would I keep '<input type=text>'
in the string but NOT render it so it will appear as text and not HTML.
How would I do this? Thanks.
Solution
For this, I would use the bleach
module - documentation here
Bleach takes care of sanitizing your HTML tags and HTML-escaping the "unsafe" tags.
Here's a sample program illustrating how you might use bleach:
#!/usr/bin/env python
from bs4 import BeautifulSoup
import bleach
def sanitizeHTML(value):
soup = BeautifulSoup(bleach.clean(value,tags=VALID_TAGS,attributes=VALID_ATTRIBUTES),"html5lib")
return soup.renderContents()
VALID_TAGS = ['img']
VALID_ATTRIBUTES = ['src']
string = '<img src="image.png"><input type=text>'
result = sanitizeHTML(string)
print result
Answered By - Matt Healy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.