Class SimpleXMLParser
java.lang.Object
com.gitlab.pdftk_java.com.lowagie.text.pdf.SimpleXMLParser
A simple XML and HTML parser. This parser is, like the SAX parser,
an event based parser, but with much less functionality.
The parser can:
- It recognizes the encoding used
- It recognizes all the elements' start tags and end tags
- It lists attributes, where attribute values can be enclosed in single or double quotes
- It recognizes the
<[CDATA[ ... ]]>
construct - It recognizes the standard entities: &, <, >, ", and ', as well as numeric entities
- It maps lines ending in
\r\n
and\r
to\n
on input, in accordance with the XML Specification, Section 2.11
The code is based on http://www.javaworld.com/javaworld/javatips/javatip128/ with some extra code from XERCES to recognize the encoding.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final HashMap
private static final HashMap
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic char
static String
Escapes a string with the appropriated XML codes.private static void
private static String
getDeclaredEncoding
(String decl) private static String
getEncodingName
(byte[] b4) static String
getJavaEncoding
(String iana) Gets the java encoding from the IANA encoding.static void
parse
(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, Reader r, boolean html) Parses the XML document firing the events to the handler.static void
parse
(SimpleXMLDocHandler doc, InputStream in) Parses the XML document firing the events to the handler.static void
parse
(SimpleXMLDocHandler doc, Reader r) private static int
-
Field Details
-
fIANA2JavaMap
-
entityMap
-
TEXT
private static final int TEXT- See Also:
-
ENTITY
private static final int ENTITY- See Also:
-
OPEN_TAG
private static final int OPEN_TAG- See Also:
-
CLOSE_TAG
private static final int CLOSE_TAG- See Also:
-
START_TAG
private static final int START_TAG- See Also:
-
ATTRIBUTE_LVALUE
private static final int ATTRIBUTE_LVALUE- See Also:
-
ATTRIBUTE_EQUAL
private static final int ATTRIBUTE_EQUAL- See Also:
-
ATTRIBUTE_RVALUE
private static final int ATTRIBUTE_RVALUE- See Also:
-
QUOTE
private static final int QUOTE- See Also:
-
IN_TAG
private static final int IN_TAG- See Also:
-
SINGLE_TAG
private static final int SINGLE_TAG- See Also:
-
COMMENT
private static final int COMMENT- See Also:
-
DONE
private static final int DONE- See Also:
-
DOCTYPE
private static final int DOCTYPE- See Also:
-
PRE
private static final int PRE- See Also:
-
CDATA
private static final int CDATA- See Also:
-
-
Constructor Details
-
SimpleXMLParser
private SimpleXMLParser()
-
-
Method Details
-
popMode
-
parse
Parses the XML document firing the events to the handler.- Parameters:
doc
- the document handlerin
- the document. The encoding is deduced from the stream. The stream is not closed- Throws:
IOException
- on error
-
getDeclaredEncoding
-
getJavaEncoding
Gets the java encoding from the IANA encoding. If the encoding cannot be found it returns the input.- Parameters:
iana
- the IANA encoding- Returns:
- the java encoding
-
parse
- Throws:
IOException
-
parse
public static void parse(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, Reader r, boolean html) throws IOException Parses the XML document firing the events to the handler.- Parameters:
doc
- the document handlerr
- the document. The encoding is already resolved. The reader is not closed- Throws:
IOException
- on error
-
exc
- Throws:
IOException
-
escapeXML
Escapes a string with the appropriated XML codes.- Parameters:
s
- the string to be escapedonlyASCII
- codes above 127 will always be escaped with &#nn; iftrue
- Returns:
- the escaped string
-
decodeEntity
-
getEncodingName
-