Pdfer

A basic C# library meant to make accessing and manipulating PDFs complicated but extremely powerful.

Usage

Pdfer uses Streams to read and write PDFs. It's best to open a Stream with your PDF content, like a FileStream, to reduce memory usage while loading and parsing the PDF.

For a basic example, see the TestConsole Project.

What, you really want more detail? Fine.

Parsing

To parse a PDF, you can create a PdfDocumentParser with the PdfDocumentParserFactory. If you want to adjust behaviour of the parser, you can of course make your own and overwrite the behaviour of the various helper classes, but I recommend not doing that unless extremely necessary. If you need another feature, just make a PR and make the world better for everyone.

using var stream = File.OpenRead("test.pdf");
var parser = PdfDocumentParserFactory.Create();
var document = parser.Parse(stream);

You can also load the PDF into memory and parse a byte array:

byte[] pdf = File.ReadAllBytes("test.pdf");
var parser = PdfDocumentParserFactory.Create();
var document = parser.Parse(pdf);

Manipulating

Currently, manipulation of PDFs is very limited. You can access and edit the objects on the parsed level, like changing the Creator in the dictionary:

var infoReference = ObjectIdentifier.ParseReference(pdfDocument.Trailer.TrailerDictionary["/Info"]);
var infoDictionary = pdfDocument.Body[infoReference] as DictionaryObject ?? throw new InvalidOperationException("Info dictionary not found");
infoDictionary.Value["/Producer"] = new PdfStringHelper().GetHexString("My PDFer");

What you currently can't do is changing the raw data, even though there's a RawValue on DocumentObject, this is currently ignored. This might change at some point, if you need it you can make an issue. It's just not a priority for me right now.

Writing

Writing is done with the PdfDocumentWriter and you guessed it, you can make one with the PdfDocumentWriterFactory.

var writer = PdfDocumentWriterFactory.Create();
var stream = File.OpenWrite("test.pdf");
writer.Write(stream, document);

Known Issues

The parser doesn't currently support multilayer PDFs.
Things like signed PDFs with multiple trailers don't work.
Writing PDFs currently only really works for extraordinarily simple and conform PDFs.
Pdfs with \r\n delimiters are not supported
Encoding is all over the place. Best to just use ASCII in your PDFs.

Why

I and someone I know want a library that allows us to easily manipulate PDFs on a object level, so I decided to parse PDFs.

Help

Make an issue and pray I have the time to help

I want to give help

Yes!

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
Pdfer.TestConsole		Pdfer.TestConsole
Pdfer.Tests.Unit		Pdfer.Tests.Unit
Pdfer		Pdfer
.gitignore		.gitignore
PdfTest.sln		PdfTest.sln
PdfTest.sln.DotSettings		PdfTest.sln.DotSettings
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pdfer

Usage

Parsing

Manipulating

Writing

Known Issues

Why

Help

I want to give help

About

Releases

Packages

Languages

Tiefseetauchner/Pdfer

Folders and files

Latest commit

History

Repository files navigation

Pdfer

Usage

Parsing

Manipulating

Writing

Known Issues

Why

Help

I want to give help

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages