public abstract class

XMLScanner

extends Object
implements XMLComponent
java.lang.Object
   ↳ org.apache.xerces.impl.XMLScanner
Known Direct Subclasses
Known Indirect Subclasses

Class Overview

This class is responsible for holding scanning methods common to scanning the XML document structure and content as well as the DTD structure and content. Both XMLDocumentScanner and XMLDTDScanner inherit from this base class.

This component requires the following features and properties from the component manager that uses it:

  • http://xml.org/sax/features/validation
  • http://xml.org/sax/features/namespaces
  • http://apache.org/xml/features/scanner/notify-char-refs
  • http://apache.org/xml/properties/internal/symbol-table
  • http://apache.org/xml/properties/internal/error-reporter
  • http://apache.org/xml/properties/internal/entity-manager
@xerces.internal

Summary

Constants
boolean DEBUG_ATTR_NORMALIZATION Debug attribute normalization.
String ENTITY_MANAGER Property identifier: entity manager.
String ERROR_REPORTER Property identifier: error reporter.
String NAMESPACES Feature identifier: namespaces.
String NOTIFY_CHAR_REFS Feature identifier: notify character references.
String PARSER_SETTINGS
String SYMBOL_TABLE Property identifier: symbol table.
String VALIDATION Feature identifier: validation.
Fields
protected static final String fAmpSymbol Symbol: "amp".
protected static final String fAposSymbol Symbol: "apos".
protected String fCharRefLiteral Literal value of the last character refence scanned.
protected static final String fEncodingSymbol Symbol: "encoding".
protected int fEntityDepth Entity depth.
protected XMLEntityManager fEntityManager Entity manager.
protected XMLEntityScanner fEntityScanner Entity scanner.
protected XMLErrorReporter fErrorReporter Error reporter.
protected static final String fGtSymbol Symbol: "gt".
protected static final String fLtSymbol Symbol: "lt".
protected boolean fNamespaces Namespaces.
protected boolean fNotifyCharRefs Character references notification.
protected boolean fParserSettings Internal parser-settings feature
protected static final String fQuotSymbol Symbol: "quot".
protected boolean fReportEntity Report entity boundary.
protected XMLResourceIdentifierImpl fResourceIdentifier
protected boolean fScanningAttribute Scanning attribute.
protected static final String fStandaloneSymbol Symbol: "standalone".
protected SymbolTable fSymbolTable Symbol table.
protected boolean fValidation Validation.
protected static final String fVersionSymbol Symbol: "version".
Public Constructors
XMLScanner()
Public Methods
void endEntity(String name, Augmentations augs)
This method notifies the end of an entity.
boolean getFeature(String featureId)
void reset(XMLComponentManager componentManager)
Resets the component.
String scanPseudoAttribute(boolean scanningTextDecl, XMLString value)
Scans a pseudo attribute.
void setFeature(String featureId, boolean value)
Sets the state of a feature.
void setProperty(String propertyId, Object value)
Sets the value of a property during parsing.
void startEntity(String name, XMLResourceIdentifier identifier, String encoding, Augmentations augs)
This method notifies of the start of an entity.
Protected Methods
String getVersionNotSupportedKey()
boolean isInvalid(int value)
boolean isInvalidLiteral(int value)
int isUnchangedByNormalization(XMLString value)
Checks whether this string would be unchanged by normalization.
boolean isValidNCName(int value)
boolean isValidNameChar(int value)
boolean isValidNameStartChar(int value)
boolean isValidNameStartHighSurrogate(int value)
void normalizeWhitespace(XMLString value, int fromIndex)
Normalize whitespace in an XMLString converting all whitespace characters to space characters.
void normalizeWhitespace(XMLString value)
Normalize whitespace in an XMLString converting all whitespace characters to space characters.
void reportFatalError(String msgId, Object[] args)
Convenience function used in all XML scanners.
void reset()
boolean scanAttributeValue(XMLString value, XMLString nonNormalizedValue, String atName, boolean checkEntities, String eleName)
Scans an attribute value and normalizes whitespace converting all whitespace characters to space characters.
int scanCharReferenceValue(XMLStringBuffer buf, XMLStringBuffer buf2)
Scans a character reference and append the corresponding chars to the specified buffer.
void scanComment(XMLStringBuffer text)
Scans a comment.
void scanExternalID(String[] identifiers, boolean optionalSystemId)
Scans External ID and return the public and system IDs.
void scanPI()
Scans a processing instruction.
void scanPIData(String target, XMLString data)
Scans a processing data.
boolean scanPubidLiteral(XMLString literal)
Scans public ID literal.
boolean scanSurrogates(XMLStringBuffer buf)
Scans surrogates and append them to the specified buffer.
void scanXMLDeclOrTextDecl(boolean scanningTextDecl, String[] pseudoAttributeValues)
Scans an XML or text declaration.
boolean versionSupported(String version)
[Expand]
Inherited Methods
From class java.lang.Object
From interface org.apache.xerces.xni.parser.XMLComponent

Constants

protected static final boolean DEBUG_ATTR_NORMALIZATION

Debug attribute normalization.

Constant Value: false

protected static final String ENTITY_MANAGER

Property identifier: entity manager.

Constant Value: "http://apache.org/xml/properties/internal/entity-manager"

protected static final String ERROR_REPORTER

Property identifier: error reporter.

Constant Value: "http://apache.org/xml/properties/internal/error-reporter"

protected static final String NAMESPACES

Feature identifier: namespaces.

Constant Value: "http://xml.org/sax/features/namespaces"

protected static final String NOTIFY_CHAR_REFS

Feature identifier: notify character references.

Constant Value: "http://apache.org/xml/features/scanner/notify-char-refs"

protected static final String PARSER_SETTINGS

Constant Value: "http://apache.org/xml/features/internal/parser-settings"

protected static final String SYMBOL_TABLE

Property identifier: symbol table.

Constant Value: "http://apache.org/xml/properties/internal/symbol-table"

protected static final String VALIDATION

Feature identifier: validation.

Constant Value: "http://xml.org/sax/features/validation"

Fields

protected static final String fAmpSymbol

Symbol: "amp".

protected static final String fAposSymbol

Symbol: "apos".

protected String fCharRefLiteral

Literal value of the last character refence scanned.

protected static final String fEncodingSymbol

Symbol: "encoding".

protected int fEntityDepth

Entity depth.

protected XMLEntityManager fEntityManager

Entity manager.

protected XMLEntityScanner fEntityScanner

Entity scanner.

protected XMLErrorReporter fErrorReporter

Error reporter.

protected static final String fGtSymbol

Symbol: "gt".

protected static final String fLtSymbol

Symbol: "lt".

protected boolean fNamespaces

Namespaces.

protected boolean fNotifyCharRefs

Character references notification.

protected boolean fParserSettings

Internal parser-settings feature

protected static final String fQuotSymbol

Symbol: "quot".

protected boolean fReportEntity

Report entity boundary.

protected XMLResourceIdentifierImpl fResourceIdentifier

protected boolean fScanningAttribute

Scanning attribute.

protected static final String fStandaloneSymbol

Symbol: "standalone".

protected SymbolTable fSymbolTable

Symbol table.

protected boolean fValidation

Validation. This feature identifier is: http://xml.org/sax/features/validation

protected static final String fVersionSymbol

Symbol: "version".

Public Constructors

public XMLScanner ()

Public Methods

public void endEntity (String name, Augmentations augs)

This method notifies the end of an entity. The document entity has the pseudo-name of "[xml]" the DTD has the pseudo-name of "[dtd]" parameter entity names start with '%'; and general entities are just specified by their name.

Parameters
name The name of the entity.
augs Additional information that may include infoset augmentations
Throws
XNIException Thrown by handler to signal an error.

public boolean getFeature (String featureId)

public void reset (XMLComponentManager componentManager)

Resets the component. The component can query the component manager about any features and properties that affect the operation of the component.

Parameters
componentManager The component manager.
Throws
Throws exception if required features and properties cannot be found.
XMLConfigurationException

public String scanPseudoAttribute (boolean scanningTextDecl, XMLString value)

Scans a pseudo attribute.

Parameters
scanningTextDecl True if scanning this pseudo-attribute for a TextDecl; false if scanning XMLDecl. This flag is needed to report the correct type of error.
value The string to fill in with the attribute value.
Returns
  • The name of the attribute Note: This method uses fStringBuffer2, anything in it at the time of calling is lost.

public void setFeature (String featureId, boolean value)

Sets the state of a feature. This method is called by the component manager any time after reset when a feature changes state.

Note: Components should silently ignore features that do not affect the operation of the component.

Parameters
featureId The feature identifier.
value The state of the feature.

public void setProperty (String propertyId, Object value)

Sets the value of a property during parsing.

Parameters
propertyId The property identifier.
value The value of the property.

public void startEntity (String name, XMLResourceIdentifier identifier, String encoding, Augmentations augs)

This method notifies of the start of an entity. The document entity has the pseudo-name of "[xml]" the DTD has the pseudo-name of "[dtd]" parameter entity names start with '%'; and general entities are just specified by their name.

Parameters
name The name of the entity.
identifier The resource identifier.
encoding The auto-detected IANA encoding name of the entity stream. This value will be null in those situations where the entity encoding is not auto-detected (e.g. internal entities or a document entity that is parsed from a java.io.Reader).
augs Additional information that may include infoset augmentations
Throws
XNIException Thrown by handler to signal an error.

Protected Methods

protected String getVersionNotSupportedKey ()

protected boolean isInvalid (int value)

protected boolean isInvalidLiteral (int value)

protected int isUnchangedByNormalization (XMLString value)

Checks whether this string would be unchanged by normalization.

Returns
  • -1 if the value would be unchanged by normalization, otherwise the index of the first whitespace character which would be transformed.

protected boolean isValidNCName (int value)

protected boolean isValidNameChar (int value)

protected boolean isValidNameStartChar (int value)

protected boolean isValidNameStartHighSurrogate (int value)

protected void normalizeWhitespace (XMLString value, int fromIndex)

Normalize whitespace in an XMLString converting all whitespace characters to space characters.

protected void normalizeWhitespace (XMLString value)

Normalize whitespace in an XMLString converting all whitespace characters to space characters.

protected void reportFatalError (String msgId, Object[] args)

Convenience function used in all XML scanners.

Throws
XNIException

protected void reset ()

protected boolean scanAttributeValue (XMLString value, XMLString nonNormalizedValue, String atName, boolean checkEntities, String eleName)

Scans an attribute value and normalizes whitespace converting all whitespace characters to space characters. [10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"

Parameters
value The XMLString to fill in with the value.
nonNormalizedValue The XMLString to fill in with the non-normalized value.
atName The name of the attribute being parsed (for error msgs).
checkEntities true if undeclared entities should be reported as VC violation, false if undeclared entities should be reported as WFC violation.
eleName The name of element to which this attribute belongs.
Returns
  • true if the non-normalized and normalized value are the same Note: This method uses fStringBuffer2, anything in it at the time of calling is lost.

protected int scanCharReferenceValue (XMLStringBuffer buf, XMLStringBuffer buf2)

Scans a character reference and append the corresponding chars to the specified buffer.

 [66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'
 
Note: This method uses fStringBuffer, anything in it at the time of calling is lost.

Parameters
buf the character buffer to append chars to
buf2 the character buffer to append non-normalized chars to
Returns
  • the character value or (-1) on conversion failure

protected void scanComment (XMLStringBuffer text)

Scans a comment.

 [15] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
 

Note: Called after scanning past '<!--' Note: This method uses fString, anything in it at the time of calling is lost.

Parameters
text The buffer to fill in with the text.

protected void scanExternalID (String[] identifiers, boolean optionalSystemId)

Scans External ID and return the public and system IDs.

Parameters
identifiers An array of size 2 to return the system id, and public id (in that order).
optionalSystemId Specifies whether the system id is optional. Note: This method uses fString and fStringBuffer, anything in them at the time of calling is lost.

protected void scanPI ()

Scans a processing instruction.

 [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
 [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
 
Note: This method uses fString, anything in it at the time of calling is lost.

protected void scanPIData (String target, XMLString data)

Scans a processing data. This is needed to handle the situation where a document starts with a processing instruction whose target name starts with "xml". (e.g. xmlfoo) Note: This method uses fStringBuffer, anything in it at the time of calling is lost.

Parameters
target The PI target
data The string to fill in with the data

protected boolean scanPubidLiteral (XMLString literal)

Scans public ID literal. [12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" [13] PubidChar::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%] The returned string is normalized according to the following rule, from http://www.w3.org/TR/REC-xml#dt-pubid: Before a match is attempted, all strings of white space in the public identifier must be normalized to single space characters (#x20), and leading and trailing white space must be removed.

Parameters
literal The string to fill in with the public ID literal.
Returns
  • True on success. Note: This method uses fStringBuffer, anything in it at the time of calling is lost.

protected boolean scanSurrogates (XMLStringBuffer buf)

Scans surrogates and append them to the specified buffer.

Note: This assumes the current char has already been identified as a high surrogate.

Parameters
buf The StringBuffer to append the read surrogates to.
Returns
  • True if it succeeded.

protected void scanXMLDeclOrTextDecl (boolean scanningTextDecl, String[] pseudoAttributeValues)

Scans an XML or text declaration.

 [23] XMLDecl ::= ''
 [24] VersionInfo ::= S 'version' Eq (' VersionNum ' | " VersionNum ")
 [80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' |  "'" EncName "'" )
 [81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')*
 [32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no') "'")
                 | ('"' ('yes' | 'no') '"'))

 [77] TextDecl ::= ''
 

Parameters
scanningTextDecl True if a text declaration is to be scanned instead of an XML declaration.
pseudoAttributeValues An array of size 3 to return the version, encoding and standalone pseudo attribute values (in that order). Note: This method uses fString, anything in it at the time of calling is lost.

protected boolean versionSupported (String version)