PDF Accessibility

Tags, reading order, checking tools, and remediation

Documents

PDFs are useful for distributing print-ready documents, but they carry their own accessibility model separate from HTML. A PDF can be fully accessible, partially accessible, or completely inaccessible - and that has nothing to do with the accessibility of the website it is linked from.

When HTML is better

For most web content, accessible HTML is preferable to PDF:

Event information, schedules, announcements
News and blog content
Policies and procedures that are regularly updated
Any content users will want to navigate, search, resize, or translate

HTML is readable on any device, resizable, searchable, and inherently more accessible when properly structured.

When PDF is appropriate

Documents that must maintain a specific print layout (forms, certificates, branded reports)
Documents distributed for offline use or archiving
Official records that must not be modified

Making PDF links clear

Always indicate when a link leads to a PDF, and include the file size:

<!-- Minimal -->
<a href="/files/annual-report-2024.pdf">2024 Annual Report (PDF)</a>

<!-- Better -->
<a href="/files/annual-report-2024.pdf">2024 Annual Report (PDF, 1.2 MB)</a>

Do not force PDFs to open in a new tab without warning the user.

What makes a PDF accessible

Tags - An accessible PDF must be tagged. Tags define reading order and structure (headings, paragraphs, lists, tables) in a way assistive technology can interpret. An untagged PDF is a flat sequence of visual objects with no structure.

Reading order - The logical reading order must match the visual order. In multi-column layouts and tables, these can diverge.

Alt text on images - Images within the PDF that convey information need alt text.

Document title and language - The document's metadata title should be set and descriptive. The primary language must be declared.

No security restrictions on accessibility - Some PDFs are locked with settings that block screen reader access. Security settings must permit assistive technology.

The worst case: scanned PDFs

A scanned PDF - a document printed, scanned, and saved as a PDF image - is a picture with no text content. There is nothing for a screen reader to read. OCR (optical character recognition) must be run on it before any accessibility work is possible. OCR produces text, not structure - tagging and review are still required after.

Scanned PDFs are not accessible

If the PDF was created by scanning a physical document, it has no machine-readable text at all. Users relying on screen readers will encounter a completely blank document.

Checking tools

PAC 2024 (PDF Accessibility Checker) - Free. Windows. The most thorough automated checker. Checks against PDF/UA and WCAG standards.
Adobe Acrobat Accessibility Checker - Built into Acrobat Pro. Good for quick checks and guided remediation.
CommonLook PDF Validator - Commercial tool for enterprise-scale PDF accessibility work.

Automated checkers catch structural issues but cannot evaluate whether alt text is meaningful or reading order is logical. Manual review is always part of a complete PDF audit.

Who owns the fix

For websites built by an agency or development team, PDF accessibility is usually the responsibility of the content owner - the organization that produced the PDF - not the website developer. The fix happens inside the PDF itself, not through a website code change. This matters when reporting PDF issues: the finding is real and should be reported, but the remediation path runs through the client's document team.

Vesper Audit surfaces all PDFs found during a site crawl and flags them for review. The audit report includes PDF links as findings - the fix happens in the document, not the website code.

WCAG criteria

Referenced criteria

2.4.4 Link Purpose (opens in a new tab) - Link purpose is determinable from context (applies to PDF download links). A

1.1.1 Non-text Content (opens in a new tab) - PDFs need accessible alternatives when they contain non-text content. A