Re: Copying text from a PDF



"Terry Pinnell" <terrypinDELETE@xxxxxxxxxxxxxxxxxxx> wrote in message
news:oajq911s3q95buepdthrl0ekpc0jnfrmm7@xxxxxxxxxx
> Quite often I have trouble extracting text from a PDF. I use the Text
> tool, copy, but on then pasting into my text editor I get garbage.
> Each individual character gets a return inserted. Typical example is
> at http://www.fairchildsemi.com/ds/BU/BUZ11.pdf, where I just wanted
> to extract the details under 'Absolute Maximum Ratings'.
>
> What's the deal here please? If the document is proprietorially
> protected, wouldn't the Text tool be inaccessible?

I just tried it and it worked OK for me when I pasted the text into the PFE
editor. Here are a couple of lines:

Drain to Source Breakdown Voltage (Note 1) . . . . . . . . . . . . . . . . .
.. . . . . . . . . . . . . . . . . . . . . .V
DS
50 V
Drain to Gate Voltage (R
GS
= 20k
Ù
) (Note 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. . . . V
DGR
50 V
Continuous Drain Current T
C

It's not perfect, but I haven't got a CR after every character.

I often extract text from PDFs whan creating PCB parts, and don't have many
problems.

Leon


.



Relevant Pages

  • Re: Copying text from a PDF
    ... Leon Heller wrote: ... >> Quite often I have trouble extracting text from a PDF. ... >> to extract the details under 'Absolute Maximum Ratings'. ... > Continuous Drain Current T ...
    (sci.electronics.design)
  • Re: Extract Image From PDF
    ... I have a demo app that can execute Ghostscript with command line parameters, and at the moment I can only get the revision number and a thumbnail view of the first page based on the content I have found. ... Do you know the parameters I would need to extract the image on the first page to a TIFF please? ... Here are the args I found to generate a jpeg based on a pdf document: ...
    (microsoft.public.dotnet.languages.vb)
  • Re: How to extract text from an PDF document
    ... > Hi Nils, ... > Gnostice PDFtoolkit can extract text from a PDF document, ... You can even extract pages to ... Skype ID: nilsboedeker ...
    (borland.public.delphi.thirdpartytools.general)
  • Re: Colored Text extraction from PDF
    ... is it possible to extract the colored text from pdf. ... There are 3 color texts in a pdf -- RED, ... using drawString. ...
    (comp.lang.java.programmer)
  • Re: document processing
    ... I have to work with filled forms, so I know what the fields are and I need to extract the info in the filled fields. ... I would like to build the user interface with some kind of script extracting info from the document and presentig to the user the necessary fields to fill in. ... I need to import documents in html, DOC and PDF formats and would like to parse them and automatically create fields to fill the documents. ...
    (comp.games.development.programming.algorithms)