← Back to blog

PDF Redaction vs. Black-Box Overlay: Why the Difference Matters

Jury D'Ambros··6 min read

Open almost any online PDF tool, find the annotation toolbar, draw a black rectangle over a social security number, and export. The page looks redacted. The number is no longer visible. To anyone scrolling through the file, the job is done.

It isn't. The original text is still sitting inside the PDF. A copy-paste, a text search, or opening the file in a capable viewer will often pull it right back out. Incidents where exactly this happened — to government agencies, law firms, and Fortune 500 companies — fill a small library of legal and press reports.

This post is a technical walkthrough of why a black-box overlay is not redaction, what real redaction looks like inside a PDF file, and how to tell the difference from the outside.

How PDFs Store Content

A PDF is not an image of a page. It is a structured file that describes a page as a series of objects: text runs, fonts, images, vector paths, and annotations. A PDF viewer takes this description and renders a visual page from it, but the underlying objects remain addressable in the file.

Concretely, a piece of text in a PDF is typically stored as a series of character codes positioned at specific coordinates, along with a reference to a font. When you draw a selection box with your mouse in a PDF viewer, you are selecting against this text layer, not against the visual rendering. This is why copy-paste works; it is also why a text search inside a PDF returns results that the naked eye may not even notice on the page.

Annotations — highlights, sticky notes, drawn shapes, and yes, the rectangles most "redaction" tools use — are a separate category of object. They sit on top of the page's content stream, at a higher z-index. They do not modify, remove, or replace the content underneath.

This architectural separation is useful: it lets you add and remove comments without touching the original document. It is also exactly what makes a drawn rectangle a terrible redaction.

Why Black-Box Overlays Fail

When you "redact" by drawing a black filled rectangle in an annotation tool and exporting, the resulting PDF contains two things in that region:

  1. The original text objects, untouched in the content stream
  2. A new black rectangle annotation layered above them

Open the exported PDF in any viewer and three things immediately fall apart:

  • Select across the "redacted" region. The text layer is still there. The viewer selects it. Copy, paste into a text editor, and the "redacted" content is visible.
  • Run a text search for a word you redacted. Many viewers will happily find it, highlight its position (which happens to be under your black box), and tell you how many matches it found.
  • Delete the annotation. Some viewers expose annotations in a side panel, where an end user can click the rectangle and delete it — revealing the original text.

Even if a particular viewer does not surface these capabilities, the content stream is trivially readable with any of several open-source PDF inspection tools. The data is in the file. A black box on top of data is not redaction; it is wallpaper.

Rasterisation (converting the PDF to a flat image) is sometimes proposed as a fix. It works — there is no text layer left to recover — but it destroys the text layer for the entire page, not just the redacted region. You lose searchability, selectability, and accessibility for the whole document to protect a single field. It is the nuclear option, and it creates its own problems.

What Real Redaction Does

Real redaction operates on the PDF's content stream, not on top of it. Specifically:

  • The text objects in the redacted region are removed from the content stream, not merely hidden.
  • Any images or vector graphics intersecting the redacted region are cropped or replaced so the underlying pixels are not recoverable.
  • The region is typically filled with a solid colour (white or black) in the content stream itself, not as an annotation on top.
  • Annotations that overlap the redacted region are removed, since they may themselves contain the sensitive content.

The defining test: after redaction, there is nothing to recover. Selecting the region returns empty. Text search does not match. Extracting the content stream yields no trace of the original. Deleting annotations changes nothing, because the sensitive data was never stored as an annotation.

This is what RedaktPDF's whiteout redaction does. Rather than painting over content, it strips the underlying text and image objects from the PDF content stream before export. The output is a clean file that cannot be reversed, without rasterising the rest of the page.

How to Tell the Difference from the Outside

You do not need to trust a tool's marketing. Three tests will tell you whether a redacted PDF is actually redacted.

Test 1 — Copy-paste. Open the redacted PDF in a standard viewer. Click and drag across the redacted region. Paste into a plain text editor. If you get text, the redaction failed.

Test 2 — Text search. Search for a word you know was in the redacted content. If the viewer finds it — even if the match is drawn under a black box — the data is still in the file.

Test 3 — Inspect the content stream. Open the file with a PDF inspector (pdftotext from Poppler, or any of several GUI tools). Dump the text content for the page. The redacted string should be absent.

Any redaction that passes all three tests is acceptable for most compliance workflows. Anything that fails even one of them is, from a data-protection standpoint, not a redaction at all.

The Practical Takeaway

If the tool you use treats redaction as an annotation — something you draw on top of a page — assume everything you have ever "redacted" with it is still in the original file. Go back, redact again using a tool that operates on the content stream, and verify with the three-test procedure above.

For a working example, upload a sample PDF to the RedaktPDF redactor, whiteout a region, export, and run the copy-paste test. You will see the difference immediately. For the broader workflow — especially in GDPR, HIPAA, and legal-discovery contexts — see how to redact a PDF for GDPR compliance and how to redact sensitive information from a PDF.

Ready to try RedaktPDF?

Edit, redact, and annotate PDFs directly in your browser — free and encrypted.

Get started

Related tools

Related articles