PDF Accessibility
Tags, reading order, checking tools, and remediation
PDFs are useful for distributing print-ready documents, but they carry their own accessibility model separate from HTML. A PDF can be fully accessible, partially accessible, or completely inaccessible - and that has nothing to do with the accessibility of the website it is linked from.
When HTML is better
For most web content, accessible HTML is preferable to PDF:
- Event information, schedules, announcements
- News and blog content
- Policies and procedures that are regularly updated
- Any content users will want to navigate, search, resize, or translate
HTML is readable on any device, resizable, searchable, and inherently more accessible when properly structured.
When PDF is appropriate
- Documents that must maintain a specific print layout (forms, certificates, branded reports)
- Documents distributed for offline use or archiving
- Official records that must not be modified
Making PDF links clear
Always indicate when a link leads to a PDF, and include the file size:
<!-- Minimal -->
<a href="/files/annual-report-2024.pdf">2024 Annual Report (PDF)</a>
<!-- Better -->
<a href="/files/annual-report-2024.pdf">2024 Annual Report (PDF, 1.2 MB)</a>
Do not force PDFs to open in a new tab without warning the user.
What makes a PDF accessible
Tags - An accessible PDF must be tagged. Tags define reading order and structure (headings, paragraphs, lists, tables) in a way assistive technology can interpret. An untagged PDF is a flat sequence of visual objects with no structure.
Reading order - The logical reading order must match the visual order. In multi-column layouts and tables, these can diverge.
Alt text on images - Images within the PDF that convey information need alt text.
Document title and language - The document's metadata title should be set and descriptive. The primary language must be declared.
No security restrictions on accessibility - Some PDFs are locked with settings that block screen reader access. Security settings must permit assistive technology.
The worst case: scanned PDFs
A scanned PDF - a document printed, scanned, and saved as a PDF image - is a picture with no text content. There is nothing for a screen reader to read. OCR (optical character recognition) must be run on it before any accessibility work is possible. OCR produces text, not structure - tagging and review are still required after.
If the PDF was created by scanning a physical document, it has no machine-readable text at all. Users relying on screen readers will encounter a completely blank document.
Checking tools
- PAC 2024 (PDF Accessibility Checker) - Free. Windows. The most thorough automated checker. Checks against PDF/UA and WCAG standards.
- Adobe Acrobat Accessibility Checker - Built into Acrobat Pro. Good for quick checks and guided remediation.
- CommonLook PDF Validator - Commercial tool for enterprise-scale PDF accessibility work.
Automated checkers catch structural issues but cannot evaluate whether alt text is meaningful or reading order is logical. Manual review is always part of a complete PDF audit.
Who owns the fix
For websites built by an agency or development team, PDF accessibility is usually the responsibility of the content owner - the organization that produced the PDF - not the website developer. The fix happens inside the PDF itself, not through a website code change. This matters when reporting PDF issues: the finding is real and should be reported, but the remediation path runs through the client's document team.
Vesper Audit surfaces all PDFs found during a site crawl and flags them for review. The audit report includes PDF links as findings - the fix happens in the document, not the website code.