Media Types
Vesper Orbit captures four categories of media. Each is independently controllable from the idle screen via a pill toggle. All four are on by default.
The four pills
| Pill | Extensions captured |
|---|---|
| PDFs | .pdf |
| Images | .jpg, .jpeg, .png, .gif, .webp, .svg, .ico, .bmp, .avif, .tiff |
| Audio | .mp3, .wav, .ogg, .flac, .aac, .m4a, .opus, .aiff |
| Video | .mp4, .webm, .mov, .m4v, .ogv, .avi, .mkv |
How Orbit finds media
Orbit walks the site through a real headless Chrome via Puppeteer. On each page it visits, it inspects the rendered DOM and looks for media in three places:
- Direct links - any
<a href="...">pointing at a media URL. - Image elements -
<img src>,<img srcset>, and<source srcset>inside<picture>elements. - Embedded media -
<audio src>,<video src>,<source src>, andog:imagemeta tags.
Because Orbit renders JavaScript before extracting URLs, it captures media that pure HTTP crawlers (wget, curl) miss - lazy-loaded images, single-page-app routing, and dynamically-injected video sources.
What Orbit does not capture
- Streaming media - HLS, DASH, and other adaptive streaming formats are not captured. Orbit downloads complete files, not segmented streams.
- Embedded third-party players - YouTube, Vimeo, SoundCloud, and similar embeds. Their media lives on those services, not on the domain you're crawling.
- Files behind authentication - Orbit is not logged in. Members-only or paywalled assets will not be captured.
- Files served from other domains - by default, Orbit only follows internal navigation. CDN-hosted media on a different host is fetched as long as the link comes from a page on the target domain, but pages on other domains are not crawled.
The "all off" guard
The renderer prevents you from turning all four pills off at once. At least one media type must be on for a scan to start. If you click the last enabled pill, it will not toggle - this is intentional.
File size cap
By default, Vesper Orbit will not download any single file larger than 100 MB. Files exceeding this size appear in the manifest CSV with status too_large so you can see what was skipped and why. The cap exists to prevent accidental disk-fills on sites with very large video archives.
See also
- Output Structure - how captured files are organized.
- Exclusions - skip URL patterns that don't have useful media.