While the PDF file format is a great tool for sharing documents while retaining their formatting and for assuring that documents aren’t changed (contracts, for instance), sometimes you need to use the text from a PDF. You may need to copy a paragraph, a page or more and edit it in a Microsoft Word document, or in another word processor or text editor. While you can select text in a PDF, chances are that this text will be seriously munged. You’ll often see odd breaks, or no breaks at all and styles will be lost. There are ways, however, that you can convert a PDF to formatted text. Here’s how to do this.
Create a workflow that extracts text
Open Automator, which is in your Applications folder. On the first screen that displays, choose to make a Workflow. Click on Files & Folders in the leftmost column, then drag Ask For Finder Items from the second column to the larger section at the right of the Automator window.
Next, click on PDFs in the leftmost column, and drag Extract PDF Text from the second column to a point below the first item you dragged to the right.
A simple (and inexpensive) way to extract text from a PDF is to use an Automator workflow. After you’ve added the two Automator actions, your window should look like this.
The second Automator action allows you to choose whether you want to save the text extracted from your PDFs as Plain Text or Rich Text. In most cases, you’ll want to check the second option, as this will retain formatting, such as bold and italic text. Word, Apple’s TextEdit, Pages and most other text editors can handle Rich Text format.
Press Command-S. Give your workflow a name, such as PDF to RTF, and then choose Application from the File Format pop-up menu. Finally, click on Save. Launch this application, select a PDF file in the screen that appears, and then let Automator do its work. Open the file that appears – it will have the same name as your source file, but will end with the file extension .rtf. Open this document in Word and you’ll see the text of your PDF file, with text formatting but no layout (no columns, and so on). This text can be a bit messy, but you can now edit it or copy it and use it in other documents.
Use a dedicated program to convert a document
There is a plethora of programs that can convert PDFs to Word documents, retaining formatting and images. If you need more than just the text, and want to make Word documents that look like your PDFs, you’ll need to go this route.
One of the most effective is Solid Documents’ US$79.95 (Solid PDF To Word For Mac). It can convert a PDF into a Word document that retains much, if not all, of the original formatting. (The program can also convert PDFs to Apple’s Pages format, Excel, HTML and more.)
I converted a number of complex PDFs using the program, notably an issue of Macworld, a Take Control book and a booklet for a CD. . While Solid PDF To Word takes a bit of time to make its conversions, the Word files do ressemble the originals.
I used Solid PDF To Word to convert a Macworld issue with complex formatting. As you can see, the resulting Word file (below right) looks a lot like the original PDF (below left). The program had some problems with numbers, though, as you can also see in the page to the right.
These conversions are not perfect – while similar fonts are used, graphics retained and approximate layout kept, there can be some glitches. In my tests, Word had difficulty displaying the Macworld conversion, with texts blinking as the program struggled to paginate and display the complex formatting. But the Take Control book displayed almost perfectly and the CD booklet as well.
You’ll find the results are certainly good enough for accessing most PDF’s content. For PDFs with simpler layout, the resulting content is nearly perfect. You could use it to amend documents and print them out, or, in some cases, create new PDFs.
Depending on your needs – just the text, ma’am, or the full monty – you have two choices for how you convert your PDFs. Take your pick.