public class

Whitelist

extends Object
java.lang.Object
   ↳ org.jsoup.safety.Whitelist

Class Overview

Whitelists define what HTML (elements and attributes) to allow through the cleaner. Everything else is removed.

Start with one of the defaults:

If you need to allow more through (please be careful!), tweak a base whitelist with:

The cleaner and these whitelists assume that you want to clean a body fragment of HTML (to add user supplied HTML into a templated page), and not to clean a full HTML document. If the latter is the case, either wrap the document HTML around the cleaned body HTML, or create a whitelist that allows html and head elements as appropriate.

If you are going to extend a whitelist, please be very careful. Make sure you understand what attributes may lead to XSS attack vectors. URL attributes are particularly vulnerable and require careful validation. See http://ha.ckers.org/xss.html for some XSS attack examples.

Summary

Public Constructors
Whitelist()
Create a new, empty whitelist.
Public Methods
Whitelist addAttributes(String tag, String... keys)
Add a list of allowed attributes to a tag.
Whitelist addEnforcedAttribute(String tag, String key, String value)
Add an enforced attribute to a tag.
Whitelist addProtocols(String tag, String key, String... protocols)
Add allowed URL protocols for an element's URL attribute.
Whitelist addTags(String... tags)
Add a list of allowed elements to a whitelist.
static Whitelist basic()
This whitelist allows a fuller range of text nodes: a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li, ol, p, pre, q, small, strike, strong, sub, sup, u, ul, and appropriate attributes.
static Whitelist basicWithImages()
This whitelist allows the same text tags as basic(), and also allows img tags, with appropriate attributes, with src pointing to http or https.
static Whitelist none()
This whitelist allows only text nodes: all HTML will be stripped.
static Whitelist relaxed()
This whitelist allows a full range of text and structural body HTML: a, b, blockquote, br, caption, cite, code, col, colgroup, dd, dl, dt, em, h1, h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul

Links do not have an enforced rel=nofollow attribute, but you can add that if desired.

static Whitelist simpleText()
This whitelist allows only simple text formatting: b, em, i, strong, u.
[Expand]
Inherited Methods
From class java.lang.Object

Public Constructors

public Whitelist ()

Create a new, empty whitelist. Generally it will be better to start with a default prepared whitelist instead.

Public Methods

public Whitelist addAttributes (String tag, String... keys)

Add a list of allowed attributes to a tag. (If an attribute is not allowed on an element, it will be removed.)

To make an attribute valid for all tags, use the pseudo tag :all, e.g. addAttributes(":all", "class").

Parameters
tag The tag the attributes are for
keys List of valid attributes for the tag
Returns
  • this (for chaining)

public Whitelist addEnforcedAttribute (String tag, String key, String value)

Add an enforced attribute to a tag. An enforced attribute will always be added to the element. If the element already has the attribute set, it will be overridden.

E.g.: addEnforcedAttribute("a", "rel", "nofollow") will make all a tags output as <a href="..." rel="nofollow">

Parameters
tag The tag the enforced attribute is for
key The attribute key
value The enforced attribute value
Returns
  • this (for chaining)

public Whitelist addProtocols (String tag, String key, String... protocols)

Add allowed URL protocols for an element's URL attribute. This restricts the possible values of the attribute to URLs with the defined protocol.

E.g.: addProtocols("a", "href", "ftp", "http", "https")

Parameters
tag Tag the URL protocol is for
key Attribute key
protocols List of valid protocols
Returns
  • this, for chaining

public Whitelist addTags (String... tags)

Add a list of allowed elements to a whitelist. (If a tag is not allowed, it will be removed from the HTML.)

Parameters
tags tag names to allow
Returns
  • this (for chaining)

public static Whitelist basic ()

This whitelist allows a fuller range of text nodes: a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li, ol, p, pre, q, small, strike, strong, sub, sup, u, ul, and appropriate attributes.

Links (a elements) can point to http, https, ftp, mailto, and have an enforced rel=nofollow attribute.

Does not allow images.

Returns
  • whitelist

public static Whitelist basicWithImages ()

This whitelist allows the same text tags as basic(), and also allows img tags, with appropriate attributes, with src pointing to http or https.

Returns
  • whitelist

public static Whitelist none ()

This whitelist allows only text nodes: all HTML will be stripped.

Returns
  • whitelist

public static Whitelist relaxed ()

This whitelist allows a full range of text and structural body HTML: a, b, blockquote, br, caption, cite, code, col, colgroup, dd, dl, dt, em, h1, h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul

Links do not have an enforced rel=nofollow attribute, but you can add that if desired.

Returns
  • whitelist

public static Whitelist simpleText ()

This whitelist allows only simple text formatting: b, em, i, strong, u. All other HTML (tags and attributes) will be removed.

Returns
  • whitelist