Domain media archiver

Crawl any domain and download every PDF, image, audio file, and video into a clean per-site folder. macOS desktop app, free public beta.

macOS Desktop App · Public Beta

Point Orbit at any site
and walk away.

Vesper Orbit walks a domain breadth-first and downloads every PDF, image, audio file, and video into a clean per-site folder in your Downloads. No accounts, no setup, no chasing links by hand.

Download for Apple Silicon Intel Mac Read the docs

v1.0.0-beta.1 · macOS · Updates automatically · What's new

Requires macOS · Google Chrome

Vesper Orbit

https://archive.example.org Start

11:02:14✓3 PDFs · oral-history-jane-doe.pdf

11:02:15✓5 images · archive-photo-1924.jpg

11:02:17✓1 audio · interview-track-3.mp3

11:02:18→Crawling /collections/manuscripts/…

11:02:19→Skipped /event/calendar/?date=2025-08…

11:02:20→Writing _manifest.csv…

142Pages

487Files

2.4 GBSaved

What it does

Domain crawl

Breadth-first walk of one domain through a real headless Chrome via Puppeteer. Renders JavaScript, so single-page apps and dynamically-injected media are captured. Default depth 8, max 5,000 pages.

Four media types

PDFs, images (jpg, png, webp, svg, avif, more), audio (mp3, wav, flac, m4a, more), and video (mp4, webm, mov, more). Toggle any combination on the idle screen.

Smart URL exclusions

Defaults match Vesper Audit so library and civic sites with Drupal calendars and faceted search don't bloat the crawl. Add your own path, query, or regex patterns. Universal traps like mailto:, /wp-json/, and /feed/ are always skipped.

Custom save folder

Pick where downloads go - default is ~/Downloads. Output folders always include a date-and-time stamp so re-scans of the same site never collide.

Manifest CSV

Every attempted asset gets a row: source URL, local path, type, bytes, status. Statuses: ok, too_large, error, skipped_type. Open in Excel and see exactly what was captured, what was skipped, and why.

Auto-updates

Vesper Orbit checks for updates automatically. New versions download silently in the background. A slim banner at the top of the window prompts you to restart when an update is ready.

Subdomain control

Default: exact-host match. Toggle on to expand to the registrable domain so docs.example.com and assets.example.com come along when crawling example.com.

Original filenames

When the URL has a filename (oral-history-jane-doe.pdf), Orbit keeps it. Collisions get -2, -3 suffixes. Nameless CDN URLs get a short hash fallback. Folder structure stays clean and meaningful.

Import Audit settings

Already tuned a Vesper Audit profile for a site? Import the JSON in Orbit and use the same exclusions. Reset returns to Audit defaults.

Built for

Librarians, archivists,
and researchers.

Manually downloading every PDF or image from a site is slow and error-prone. Browser extensions max out at a single page or require pasting URLs by hand. wget and curl don't render JavaScript and miss og:image or <picture srcset> references.

Vesper Orbit walks the site through a real Chrome via Puppeteer, captures media via the rendered DOM, and downloads with clean original filenames into typed subfolders, with a manifest CSV mapping every source URL to its local file. Built originally for a library client capturing hundreds of PDFs from an old archive site - useful for any domain with media worth keeping.

~/Downloads/archive-example-org-2026-05-07-1102/

├── pdfs/
│   ├── oral-history-jane-doe.pdf
│   ├── annual-report-1987.pdf
│   └── + 41 more
├── images/
│   ├── archive-photo-1924.jpg
│   └── + 312 more
├── audio/
│   └── interview-track-3.mp3
├── video/
│   └── tour-segment-1.mp4
└── _manifest.csv

Defaults

Sensible out of the box. Tweak any setting before starting a scan.

Max crawl depth

Max pages

5,000

Max file size

100 MB

Media types

All four ON

Subdomains

OFF (exact host)

Exclusions

Audit defaults

Start archiving today.

Free public beta. No account required. macOS, Apple Silicon and Intel.

Download for Apple Silicon Intel Mac

Point Orbit at any siteand walk away.