com.itextpdf.text.pdf.parser
Class PdfTextExtractor

java.lang.Object
  extended by com.itextpdf.text.pdf.parser.PdfTextExtractor

public class PdfTextExtractor
extends Object

Extracts text from a PDF file.

Since:
2.1.4

Field Summary
private  PdfReader reader
          The PdfReader that holds the PDF file.
private  TextProvidingRenderListener renderListener
          The TextProvidingRenderListener that will receive render notifications and provide resultant text
 
Constructor Summary
PdfTextExtractor(PdfReader reader)
          Creates a new Text Extractor object, using the most current algorithm for text extraction (currently LocationAwareTextExtractingPdfContentRenderListener) as the render listener
PdfTextExtractor(PdfReader reader, TextProvidingRenderListener renderListener)
          Creates a new Text Extractor object.
 
Method Summary
 String getTextFromPage(int page)
          Gets the text from a page.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

reader

private final PdfReader reader
The PdfReader that holds the PDF file.


renderListener

private final TextProvidingRenderListener renderListener
The TextProvidingRenderListener that will receive render notifications and provide resultant text

Constructor Detail

PdfTextExtractor

public PdfTextExtractor(PdfReader reader)
Creates a new Text Extractor object, using the most current algorithm for text extraction (currently LocationAwareTextExtractingPdfContentRenderListener) as the render listener

Parameters:
reader - the reader with the PDF

PdfTextExtractor

public PdfTextExtractor(PdfReader reader,
                        TextProvidingRenderListener renderListener)
Creates a new Text Extractor object.

Parameters:
reader - the reader with the PDF
renderListener - the render listener that will be used to analyze renderText operations and provide resultant text
Method Detail

getTextFromPage

public String getTextFromPage(int page)
                       throws IOException
Gets the text from a page.

Parameters:
page - the page number of the page
Returns:
a String with the content as plain text (without PDF syntax)
Throws:
IOException

Hosted by Hostbasket