Class SimpleXMLParser

java.lang.Object
com.gitlab.pdftk_java.com.lowagie.text.pdf.SimpleXMLParser

public class SimpleXMLParser extends Object
A simple XML and HTML parser. This parser is, like the SAX parser, an event based parser, but with much less functionality.

The parser can:

  • It recognizes the encoding used
  • It recognizes all the elements' start tags and end tags
  • It lists attributes, where attribute values can be enclosed in single or double quotes
  • It recognizes the <[CDATA[ ... ]]> construct
  • It recognizes the standard entities: &amp;, &lt;, &gt;, &quot;, and &apos;, as well as numeric entities
  • It maps lines ending in \r\n and \r to \n on input, in accordance with the XML Specification, Section 2.11

The code is based on http://www.javaworld.com/javaworld/javatips/javatip128/ with some extra code from XERCES to recognize the encoding.

  • Field Details

  • Constructor Details

    • SimpleXMLParser

      private SimpleXMLParser()
  • Method Details

    • popMode

      private static int popMode(Stack st)
    • parse

      public static void parse(SimpleXMLDocHandler doc, InputStream in) throws IOException
      Parses the XML document firing the events to the handler.
      Parameters:
      doc - the document handler
      in - the document. The encoding is deduced from the stream. The stream is not closed
      Throws:
      IOException - on error
    • getDeclaredEncoding

      private static String getDeclaredEncoding(String decl)
    • getJavaEncoding

      public static String getJavaEncoding(String iana)
      Gets the java encoding from the IANA encoding. If the encoding cannot be found it returns the input.
      Parameters:
      iana - the IANA encoding
      Returns:
      the java encoding
    • parse

      public static void parse(SimpleXMLDocHandler doc, Reader r) throws IOException
      Throws:
      IOException
    • parse

      public static void parse(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, Reader r, boolean html) throws IOException
      Parses the XML document firing the events to the handler.
      Parameters:
      doc - the document handler
      r - the document. The encoding is already resolved. The reader is not closed
      Throws:
      IOException - on error
    • exc

      private static void exc(String s, int line, int col) throws IOException
      Throws:
      IOException
    • escapeXML

      public static String escapeXML(String s, boolean onlyASCII)
      Escapes a string with the appropriated XML codes.
      Parameters:
      s - the string to be escaped
      onlyASCII - codes above 127 will always be escaped with &#nn; if true
      Returns:
      the escaped string
    • decodeEntity

      public static char decodeEntity(String s)
    • getEncodingName

      private static String getEncodingName(byte[] b4)