java.lang.Object | |
↳ | org.jsoup.safety.Whitelist |
Whitelists define what HTML (elements and attributes) to allow through the cleaner. Everything else is removed.
Start with one of the defaults: If you need to allow more through (please be careful!), tweak a base whitelist with:addTags(String...)
addAttributes(String, String...)
addEnforcedAttribute(String, String, String)
addProtocols(String, String, String...)
body
fragment of HTML (to add user
supplied HTML into a templated page), and not to clean a full HTML document. If the latter is the case, either wrap the
document HTML around the cleaned body HTML, or create a whitelist that allows html
and head
elements as appropriate.
If you are going to extend a whitelist, please be very careful. Make sure you understand what attributes may lead to
XSS attack vectors. URL attributes are particularly vulnerable and require careful validation. See
http://ha.ckers.org/xss.html for some XSS attack examples.
Public Constructors | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Create a new, empty whitelist.
|
Public Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Add a list of allowed attributes to a tag.
| |||||||||||
Add an enforced attribute to a tag.
| |||||||||||
Add allowed URL protocols for an element's URL attribute.
| |||||||||||
Add a list of allowed elements to a whitelist.
| |||||||||||
This whitelist allows a fuller range of text nodes:
a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li,
ol, p, pre, q, small, strike, strong, sub, sup, u, ul , and appropriate attributes. | |||||||||||
This whitelist allows the same text tags as
basic() , and also allows img tags, with appropriate
attributes, with src pointing to http or https . | |||||||||||
This whitelist allows only text nodes: all HTML will be stripped.
| |||||||||||
This whitelist allows a full range of text and structural body HTML:
a, b, blockquote, br, caption, cite,
code, col, colgroup, dd, dl, dt, em, h1, h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, strike, strong, sub,
sup, table, tbody, td, tfoot, th, thead, tr, u, ul
Links do not have an enforced rel=nofollow attribute, but you can add that if desired. | |||||||||||
This whitelist allows only simple text formatting:
b, em, i, strong, u . |
[Expand]
Inherited Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
From class
java.lang.Object
|
Create a new, empty whitelist. Generally it will be better to start with a default prepared whitelist instead.
Add a list of allowed attributes to a tag. (If an attribute is not allowed on an element, it will be removed.)
To make an attribute valid for all tags, use the pseudo tag:all
, e.g.
addAttributes(":all", "class")
.tag | The tag the attributes are for |
---|---|
keys | List of valid attributes for the tag |
Add an enforced attribute to a tag. An enforced attribute will always be added to the element. If the element already has the attribute set, it will be overridden.
E.g.:addEnforcedAttribute("a", "rel", "nofollow")
will make all a
tags output as
<a href="..." rel="nofollow">
tag | The tag the enforced attribute is for |
---|---|
key | The attribute key |
value | The enforced attribute value |
Add allowed URL protocols for an element's URL attribute. This restricts the possible values of the attribute to URLs with the defined protocol.
E.g.:addProtocols("a", "href", "ftp", "http", "https")
tag | Tag the URL protocol is for |
---|---|
key | Attribute key |
protocols | List of valid protocols |
Add a list of allowed elements to a whitelist. (If a tag is not allowed, it will be removed from the HTML.)
tags | tag names to allow |
---|
This whitelist allows a fuller range of text nodes: a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li,
ol, p, pre, q, small, strike, strong, sub, sup, u, ul
, and appropriate attributes.
a
elements) can point to http, https, ftp, mailto
, and have an enforced
rel=nofollow
attribute.
Does not allow images.This whitelist allows the same text tags as basic()
, and also allows img
tags, with appropriate
attributes, with src
pointing to http
or https
.
This whitelist allows only text nodes: all HTML will be stripped.
This whitelist allows a full range of text and structural body HTML: a, b, blockquote, br, caption, cite,
code, col, colgroup, dd, dl, dt, em, h1, h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, strike, strong, sub,
sup, table, tbody, td, tfoot, th, thead, tr, u, ul
rel=nofollow
attribute, but you can add that if desired.This whitelist allows only simple text formatting: b, em, i, strong, u
. All other HTML (tags and
attributes) will be removed.