com.itextpdf.text.xml.simpleparser
Class SimpleXMLParser

java.lang.Object
  extended by com.itextpdf.text.xml.simpleparser.SimpleXMLParser

public final class SimpleXMLParser
extends Object

A simple XML and HTML parser. This parser is, like the SAX parser, an event based parser, but with much less functionality.

The parser can:


Field Summary
private static int ATTRIBUTE_EQUAL
           
private static int ATTRIBUTE_KEY
           
private static int ATTRIBUTE_VALUE
           
(package private)  String attributekey
          the attribute key.
(package private)  HashMap<String,String> attributes
          current attributes
(package private)  String attributevalue
          the attribute value.
private static int CDATA
           
(package private)  int character
          The current character.
(package private)  int columns
          the column where the current character occurs
(package private)  SimpleXMLDocHandlerComment comment
          The handler to which we are going to forward comments.
private static int COMMENT
           
(package private)  SimpleXMLDocHandler doc
          The handler to which we are going to forward document content
(package private)  StringBuffer entity
          current entity (whatever is encountered between & and ;)
private static int ENTITY
           
(package private)  boolean eol
          was the last character equivalent to a newline?
private static int EXAMIN_TAG
           
(package private)  boolean html
          Are we parsing HTML?
private static int IN_CLOSETAG
           
(package private)  int lines
          the line we are currently reading
(package private)  int nested
          Keeps track of the number of tags that are open.
(package private)  boolean nowhite
          A boolean indicating if the next character should be taken into account if it's a space character.
private static int PI
           
(package private)  int previousCharacter
          The previous character.
private static int QUOTE
           
(package private)  int quoteCharacter
          the quote character that was used to open the quote.
private static int SINGLE_TAG
           
(package private)  Stack<Integer> stack
          the state stack
(package private)  int state
          the current state
(package private)  String tag
          current tagname
private static int TAG_ENCOUNTERED
           
private static int TAG_EXAMINED
           
(package private)  StringBuffer text
          current text (whatever is encountered between tags)
private static int TEXT
           
private static int UNKNOWN
          possible states
 
Constructor Summary
private SimpleXMLParser(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, boolean html)
          Creates a Simple XML parser object.
 
Method Summary
private  void doTag()
          Sets the name of the tag.
static String escapeXML(String s, boolean onlyASCII)
          Escapes a string with the appropriated XML codes.
private  void flush()
          Flushes the text that is currently in the buffer.
private static String getDeclaredEncoding(String decl)
           
private static String getEncodingName(byte[] b4)
          Returns the IANA encoding name that is auto-detected from the bytes specified, with the endian-ness of that encoding where appropriate.
private  void go(Reader r)
          Does the actual parsing.
private  void initTag()
          Initialized the tag name and attributes.
static void parse(SimpleXMLDocHandler doc, InputStream in)
          Parses the XML document firing the events to the handler.
static void parse(SimpleXMLDocHandler doc, Reader r)
           
static void parse(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, Reader r, boolean html)
          Parses the XML document firing the events to the handler.
private  void processTag(boolean start)
          processes the tag.
private  int restoreState()
          Gets a state from the stack
private  void saveState(int s)
          Adds a state to the stack.
private  void throwException(String s)
          Throws an exception
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

UNKNOWN

private static final int UNKNOWN
possible states

See Also:
Constant Field Values

TEXT

private static final int TEXT
See Also:
Constant Field Values

TAG_ENCOUNTERED

private static final int TAG_ENCOUNTERED
See Also:
Constant Field Values

EXAMIN_TAG

private static final int EXAMIN_TAG
See Also:
Constant Field Values

TAG_EXAMINED

private static final int TAG_EXAMINED
See Also:
Constant Field Values

IN_CLOSETAG

private static final int IN_CLOSETAG
See Also:
Constant Field Values

SINGLE_TAG

private static final int SINGLE_TAG
See Also:
Constant Field Values

CDATA

private static final int CDATA
See Also:
Constant Field Values

COMMENT

private static final int COMMENT
See Also:
Constant Field Values

PI

private static final int PI
See Also:
Constant Field Values

ENTITY

private static final int ENTITY
See Also:
Constant Field Values

QUOTE

private static final int QUOTE
See Also:
Constant Field Values

ATTRIBUTE_KEY

private static final int ATTRIBUTE_KEY
See Also:
Constant Field Values

ATTRIBUTE_EQUAL

private static final int ATTRIBUTE_EQUAL
See Also:
Constant Field Values

ATTRIBUTE_VALUE

private static final int ATTRIBUTE_VALUE
See Also:
Constant Field Values

stack

Stack<Integer> stack
the state stack


character

int character
The current character.


previousCharacter

int previousCharacter
The previous character.


lines

int lines
the line we are currently reading


columns

int columns
the column where the current character occurs


eol

boolean eol
was the last character equivalent to a newline?


nowhite

boolean nowhite
A boolean indicating if the next character should be taken into account if it's a space character. When nospace is false, the previous character wasn't whitespace.

Since:
2.1.5

state

int state
the current state


html

boolean html
Are we parsing HTML?


text

StringBuffer text
current text (whatever is encountered between tags)


entity

StringBuffer entity
current entity (whatever is encountered between & and ;)


tag

String tag
current tagname


attributes

HashMap<String,String> attributes
current attributes


doc

SimpleXMLDocHandler doc
The handler to which we are going to forward document content


comment

SimpleXMLDocHandlerComment comment
The handler to which we are going to forward comments.


nested

int nested
Keeps track of the number of tags that are open.


quoteCharacter

int quoteCharacter
the quote character that was used to open the quote.


attributekey

String attributekey
the attribute key.


attributevalue

String attributevalue
the attribute value.

Constructor Detail

SimpleXMLParser

private SimpleXMLParser(SimpleXMLDocHandler doc,
                        SimpleXMLDocHandlerComment comment,
                        boolean html)
Creates a Simple XML parser object. Call go(BufferedReader) immediately after creation.

Method Detail

go

private void go(Reader r)
         throws IOException
Does the actual parsing. Perform this immediately after creating the parser object.

Throws:
IOException

restoreState

private int restoreState()
Gets a state from the stack

Returns:
the previous state

saveState

private void saveState(int s)
Adds a state to the stack.

Parameters:
s - a state to add to the stack

flush

private void flush()
Flushes the text that is currently in the buffer. The text can be ignored, added to the document as content or as comment,... depending on the current state.


initTag

private void initTag()
Initialized the tag name and attributes.


doTag

private void doTag()
Sets the name of the tag.


processTag

private void processTag(boolean start)
processes the tag.

Parameters:
start - if true we are dealing with a tag that has just been opened; if false we are closing a tag.

throwException

private void throwException(String s)
                     throws IOException
Throws an exception

Throws:
IOException

parse

public static void parse(SimpleXMLDocHandler doc,
                         SimpleXMLDocHandlerComment comment,
                         Reader r,
                         boolean html)
                  throws IOException
Parses the XML document firing the events to the handler.

Parameters:
doc - the document handler
r - the document. The encoding is already resolved. The reader is not closed
Throws:
IOException - on error

parse

public static void parse(SimpleXMLDocHandler doc,
                         InputStream in)
                  throws IOException
Parses the XML document firing the events to the handler.

Parameters:
doc - the document handler
in - the document. The encoding is deduced from the stream. The stream is not closed
Throws:
IOException - on error

getDeclaredEncoding

private static String getDeclaredEncoding(String decl)

parse

public static void parse(SimpleXMLDocHandler doc,
                         Reader r)
                  throws IOException
Throws:
IOException

escapeXML

public static String escapeXML(String s,
                               boolean onlyASCII)
Escapes a string with the appropriated XML codes.

Parameters:
s - the string to be escaped
onlyASCII - codes above 127 will always be escaped with &#nn; if true
Returns:
the escaped string

getEncodingName

private static String getEncodingName(byte[] b4)
Returns the IANA encoding name that is auto-detected from the bytes specified, with the endian-ness of that encoding where appropriate. (method found in org.apache.xerces.impl.XMLEntityManager, originally published by the Apache Software Foundation under the Apache Software License; now being used in iText under the MPL)

Parameters:
b4 - The first four bytes of the input.
Returns:
an IANA-encoding string

Hosted by Hostbasket