com.itextpdf.text.pdf.parser
Class SimpleTextExtractingPdfContentRenderListener

java.lang.Object
  extended by com.itextpdf.text.pdf.parser.SimpleTextExtractingPdfContentRenderListener
All Implemented Interfaces:
RenderListener, TextProvidingRenderListener

public class SimpleTextExtractingPdfContentRenderListener
extends Object
implements TextProvidingRenderListener

A simple text extraction renderer. This renderer keeps track of the current Y position of each string. If it detects that the y position has changed, it inserts a line break into the output. If the PDF renders text in a non-top-to-bottom fashion, this will result in the text not being a true representation of how it appears in the PDF. This renderer also uses a simple strategy based on the font metrics to determine if a blank space should be inserted into the output.

Since:
2.1.5

Field Summary
private  Vector lastEnd
           
private  Vector lastStart
           
private  StringBuffer result
          used to store the resulting String.
 
Constructor Summary
SimpleTextExtractingPdfContentRenderListener()
          Creates a new text extraction renderer.
 
Method Summary
 void beginTextBlock()
          Called when a new text block is beginning (i.e.
 void endTextBlock()
          Called when a text block has ended (i.e.
 String getResultantText()
          Returns the result so far.
 void renderImage(ImageRenderInfo renderInfo)
          no-op method - this renderer isn't interested in image events
 void renderText(TextRenderInfo renderInfo)
          Captures text using a simplified algorithm for inserting hard returns and spaces
 void reset()
          Resets the internal state of the RenderListener
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

lastStart

private Vector lastStart

lastEnd

private Vector lastEnd

result

private StringBuffer result
used to store the resulting String.

Constructor Detail

SimpleTextExtractingPdfContentRenderListener

public SimpleTextExtractingPdfContentRenderListener()
Creates a new text extraction renderer.

Method Detail

reset

public void reset()
Description copied from interface: RenderListener
Resets the internal state of the RenderListener

Specified by:
reset in interface RenderListener

beginTextBlock

public void beginTextBlock()
Description copied from interface: RenderListener
Called when a new text block is beginning (i.e. BT)

Specified by:
beginTextBlock in interface RenderListener
Since:
5.0.1

endTextBlock

public void endTextBlock()
Description copied from interface: RenderListener
Called when a text block has ended (i.e. ET)

Specified by:
endTextBlock in interface RenderListener
Since:
5.0.1

getResultantText

public String getResultantText()
Returns the result so far.

Specified by:
getResultantText in interface TextProvidingRenderListener
Returns:
a String with the resulting text.

renderText

public void renderText(TextRenderInfo renderInfo)
Captures text using a simplified algorithm for inserting hard returns and spaces

Specified by:
renderText in interface RenderListener
Parameters:
renderInfo - render info

renderImage

public void renderImage(ImageRenderInfo renderInfo)
no-op method - this renderer isn't interested in image events

Specified by:
renderImage in interface RenderListener
Parameters:
renderInfo - information specifying what to render
Since:
5.0.1
See Also:
RenderListener.renderImage(com.itextpdf.text.pdf.parser.ImageRenderInfo)

Hosted by Hostbasket