The PDF file format is a great tool for sharing documents while retaining their formatting and ensuring that the documents don’t get changed in any way (contracts, for instance). But sometimes you do need to use the text in a PDF.
While you can copy and paste text from a PDF, chances are it will end up in serious disarray. You’ll see odd line breaks or no breaks at all and styles and formats will be lost. There are a couple of ways, however, to convert a PDF to formatted text.
A workflow that extracts text
The first method is the cheapest and uses a tool that is already part of Mac OS X: Automator. Use Automator to create a workflow that extracts text from PDFs and saves it as a text or Rich Text Format (RTF) document.
Open Automator, which you’ll find in the Applications folder. On the first screen that appears, choose the Workflow document type. Click on Files & Folders in the leftmost column and then drag Ask For Finder Items from the second column to the larger section at the right of the Automator window. Next, click on PDFs in the leftmost column and drag Extract PDF Text below the first item.
The second Automator action allows you to choose whether you want to save the text extracted from your PDFs as plain text or in RTF. In most cases, you want to choose the second option, as this retains formatting such as bold and italic text. Microsoft Word, Apple’s TextEdit and Pages and most other text editors can handle RTF.
Press Cmd-S. Give your workflow a name, such as ‘PDF to RTF,’ and then choose Application from the File Format pop-up menu. Finally, click on Save. Now launch your new application, select a PDF file in the screen that appears and let Automator do its work. Open the file that appears in Word to see the text of your PDF file, complete with text formatting but no layout elements (no columns and so on). The result can look a bit messy, but you can now edit the text and use it in other documents.
A dedicated conversion program
If you need more than just the text and want to create Word documents that look like your original PDFs, a dedicated conversion program can help. One of the most effective is Solid Documents’ US$80 (approx. $78) Solid PDF to Word (www.mac-pdf-converter.com). It’s capable of converting a PDF into a Word document that retains most formatting.
We used the program to convert several complex PDFs: an issue of Macworld, one of the Take Control series of books and an issue of Adobe Photoshop Elements Techniques. While Solid PDF to Word takes some time to perform the task, the resulting Word files do resemble the originals.
These conversions aren’t always perfect – they use similar fonts, retain graphics and keep the approximate layout, but there can be glitches. In our tests, Word had difficulty displaying the Macworld file, with text blinking as the program struggled to paginate and display complex formatting. However, both the Take Control book and the Adobe Photoshop Elements Techniques magazine displayed almost perfectly. The results are certainly good enough to let you use the content with considerably less hassle.