← Back to guides

PDF Accessibility

Tags, reading order, checking tools, and remediation

Documents

PDFs are useful for distributing print-ready documents, but they carry their own accessibility model separate from HTML. A PDF can be fully accessible, partially accessible, or completely inaccessible - and that has nothing to do with the accessibility of the website it is linked from.

When HTML is better

For most web content, accessible HTML is preferable to PDF:

  • Event information, schedules, announcements
  • News and blog content
  • Policies and procedures that are regularly updated
  • Any content users will want to navigate, search, resize, or translate

HTML is readable on any device, resizable, searchable, and inherently more accessible when properly structured.

When PDF is appropriate

  • Documents that must maintain a specific print layout (forms, certificates, branded reports)
  • Documents distributed for offline use or archiving
  • Official records that must not be modified

Making PDF links clear

Always indicate when a link leads to a PDF, and include the file size:

<!-- Minimal -->
<a href="/files/annual-report-2024.pdf">2024 Annual Report (PDF)</a>

<!-- Better -->
<a href="/files/annual-report-2024.pdf">2024 Annual Report (PDF, 1.2 MB)</a>

Do not force PDFs to open in a new tab without warning the user.

What makes a PDF accessible

Tags - An accessible PDF must be tagged. Tags define reading order and structure (headings, paragraphs, lists, tables) in a way assistive technology can interpret. An untagged PDF is a flat sequence of visual objects with no structure.

Reading order - The logical reading order must match the visual order. In multi-column layouts and tables, these can diverge.

Alt text on images - Images within the PDF that convey information need alt text.

Document title and language - The document's metadata title should be set and descriptive. The primary language must be declared.

No security restrictions on accessibility - Some PDFs are locked with settings that block screen reader access. Security settings must permit assistive technology.

The worst case: scanned PDFs

A scanned PDF - a document printed, scanned, and saved as a PDF image - is a picture with no text content. There is nothing for a screen reader to read. OCR (optical character recognition) must be run on it before any accessibility work is possible. OCR produces text, not structure - tagging and review are still required after.

Scanned PDFs are not accessible

If the PDF was created by scanning a physical document, it has no machine-readable text at all. Users relying on screen readers will encounter a completely blank document.

Checking tools

  • PAC 2024 (PDF Accessibility Checker) - Free. Windows. The most thorough automated checker. Checks against PDF/UA and WCAG standards.
  • Adobe Acrobat Accessibility Checker - Built into Acrobat Pro. Good for quick checks and guided remediation.
  • CommonLook PDF Validator - Commercial tool for enterprise-scale PDF accessibility work.

Automated checkers catch structural issues but cannot evaluate whether alt text is meaningful or reading order is logical. Manual review is always part of a complete PDF audit.

Who owns the fix

For websites built by an agency or development team, PDF accessibility is usually the responsibility of the content owner - the organization that produced the PDF - not the website developer. The fix happens inside the PDF itself, not through a website code change. This matters when reporting PDF issues: the finding is real and should be reported, but the remediation path runs through the client's document team.

Vesper Audit surfaces all PDFs found during a site crawl and flags them for review. The audit report includes PDF links as findings - the fix happens in the document, not the website code.

WCAG criteria

Referenced criteria
2.4.4 Link Purpose (opens in a new tab) - Link purpose is determinable from context (applies to PDF download links). A
1.1.1 Non-text Content (opens in a new tab) - PDFs need accessible alternatives when they contain non-text content. A