Saturday, October 09, 2010

How to convert PDF to text file

We can use PDFBox open source library.
Download PDFBox .NET version

Download source files

You need three four dll files from the bin folder where you extracted the download rar file.

PDFBox-0.7.3.dll
IKVM.GNU.Classpath
IKVM.Runtime
FontBox-0.1.0-dev

using following assembiles:

using System.Security;
using org.pdfbox.pdmodel;
using org.pdfbox.util;
 
Having a pdf file like below:


We can convert PDF to text like below:

PDDocument doc = PDDocument.load(Server.MapPath("~/StudentsResults.pdf"));
PDFTextStripper stripper = new PDFTextStripper();
string text = stripper.getText(doc);
File.WriteAllText(Server.MapPath("~/StudentsResults.txt"), text);

Output will be like this: 

No comments:

Azure Storage Account Types

Defferent Types of Blobs Block blobs store text and binary data. Block blobs are made up of blocks of data that can be managed individually...