Class TaggedPdfReaderTool

java.lang.Object
com.itextpdf.text.pdf.parser.TaggedPdfReaderTool
Direct Known Subclasses:
CompareTool.CmpTaggedPdfReaderTool

public class TaggedPdfReaderTool extends Object
Converts a tagged PDF document into an XML file.
Since:
5.0.2
  • Field Details

    • reader

      protected PdfReader reader
      The reader object from which the content streams are read.
    • out

      protected PrintWriter out
      The writer object to which the XML will be written
  • Constructor Details

    • TaggedPdfReaderTool

      public TaggedPdfReaderTool()
  • Method Details

    • convertToXml

      public void convertToXml(PdfReader reader, OutputStream os, String charset) throws IOException
      Parses a string with structured content.
      Parameters:
      reader - the PdfReader that has access to the PDF file
      os - the OutputStream to which the resulting xml will be written
      charset - the charset to encode the data
      Throws:
      IOException
      Since:
      5.0.5
    • convertToXml

      public void convertToXml(PdfReader reader, OutputStream os) throws IOException
      Parses a string with structured content. The output is done using the current charset.
      Parameters:
      reader - the PdfReader that has access to the PDF file
      os - the OutputStream to which the resulting xml will be written
      Throws:
      IOException
    • inspectChild

      public void inspectChild(PdfObject k) throws IOException
      Inspects a child of a structured element. This can be an array or a dictionary.
      Parameters:
      k - the child to inspect
      Throws:
      IOException
    • inspectChildArray

      public void inspectChildArray(PdfArray k) throws IOException
      If the child of a structured element is an array, we need to loop over the elements.
      Parameters:
      k - the child array to inspect
      Throws:
      IOException
    • inspectChildDictionary

      public void inspectChildDictionary(PdfDictionary k) throws IOException
      If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.
      Parameters:
      k - the child dictionary to inspect
      Throws:
      IOException
    • inspectChildDictionary

      public void inspectChildDictionary(PdfDictionary k, boolean inspectAttributes) throws IOException
      If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.
      Parameters:
      k - the child dictionary to inspect
      Throws:
      IOException
    • xmlName

      protected String xmlName(PdfName name)
    • fixTagName

      private static String fixTagName(String tag)
    • parseTag

      public void parseTag(String tag, PdfObject object, PdfDictionary page) throws IOException
      Searches for a tag in a page.
      Parameters:
      tag - the name of the tag
      object - an identifier to find the marked content
      page - a page dictionary
      Throws:
      IOException