How to Repair a Corrupted PDF File
Interrupted downloads, failed email attachments, storage errors, or incomplete PDF exports can leave you with a file that won't open. Adobe Reader shows 'damaged file', Preview shows a blank page, browsers show errors. Often the content is still there — just the file's internal index is corrupted.
How PDF corruption happens
PDFs have a cross-reference table at the end of the file that lists where every internal object lives. If the file is truncated (interrupted download), the cross-reference table may be incomplete or pointing to bytes that don't exist. The objects themselves — pages, text, images — may be perfectly intact, but viewers can't find them without the index.
How PDFPuddle repairs
pdf-lib's parser is more forgiving than many viewers. It scans the file's raw bytes looking for valid PDF objects, rebuilds the cross-reference table from what it finds, and re-serializes the document. The output PDF has a fresh, valid index pointing to the recoverable content.
What's recoverable and what isn't
Files truncated at the end (most common) often recover fully — the missing bytes were just the original cross-reference table, which gets rebuilt. Files corrupted in the middle may have damaged page objects that can't be reconstructed. Severely corrupted files (random bytes scrambled throughout) usually can't be repaired by any tool.
Prevention
Always verify file integrity after long transfers — compare file sizes, run a checksum if available. For critical documents, keep a backup before processing. Cloud storage with versioning (Dropbox, Google Drive) lets you roll back to an earlier version if a file becomes corrupted.