Skip to content

Tools - RetroBIOS

All tools are Python scripts in scripts/. Single dependency: pyyaml.

Pipeline

Run everything in sequence:

python scripts/pipeline.py --offline              # DB + verify + packs + manifests + integrity + readme + site
python scripts/pipeline.py --offline --skip-packs  # DB + verify + readme + site
python scripts/pipeline.py --offline --skip-docs   # skip readme + site generation
python scripts/pipeline.py --offline --target switch  # filter by hardware target
python scripts/pipeline.py --offline --with-truth  # include truth generation + diff
python scripts/pipeline.py --offline --with-export # include native format export
python scripts/pipeline.py --check-buildbot        # check buildbot data freshness

Pipeline steps:

Step Description Skipped by
1/8 Generate database -
2/8 Refresh data directories --offline
2a Refresh MAME BIOS hashes --offline
2a2 Refresh FBNeo BIOS hashes --offline
2b Check buildbot staleness only with --check-buildbot
2c Generate truth YAMLs only with --with-truth / --with-export
2d Diff truth vs scraped only with --with-truth / --with-export
2e Export native formats only with --with-export
3/8 Verify all platforms -
4/8 Generate packs --skip-packs
4b Generate install manifests --skip-packs
4c Generate target manifests --skip-packs
5/8 Consistency check if verify or pack skipped
6/8 Pack integrity (extract + hash) --skip-packs
7/8 Generate README --skip-docs
8/8 Generate site --skip-docs

Individual tools

generate_db.py

Scan bios/ and build database.json with multi-indexed lookups. Large files in .gitignore are preserved from the existing database and downloaded from GitHub release assets if not cached locally.

python scripts/generate_db.py --force --bios-dir bios --output database.json

verify.py

Check BIOS coverage for each platform using its native verification mode.

python scripts/verify.py --all                     # all platforms
python scripts/verify.py --platform batocera       # single platform
python scripts/verify.py --platform retroarch --verbose  # with ground truth details
python scripts/verify.py --emulator dolphin        # single emulator
python scripts/verify.py --emulator dolphin --standalone  # standalone mode only
python scripts/verify.py --system atari-lynx       # single system
python scripts/verify.py --platform retroarch --target switch  # filter by hardware
python scripts/verify.py --list-emulators          # list all emulators
python scripts/verify.py --list-systems            # list all systems
python scripts/verify.py --platform retroarch --list-targets  # list available targets

Verification modes per platform:

Platform Mode Logic
RetroArch, Lakka, RetroPie existence file present = OK
Batocera, RetroBat md5 MD5 hash match
Recalbox md5 MD5 multi-hash, 3 severity levels
EmuDeck md5 MD5 whitelist per system
RetroDECK md5 MD5 per file via component manifests
RomM md5 size + any hash (MD5/SHA1/CRC32)
BizHawk sha1 SHA1 per firmware from FirmwareDatabase.cs

generate_pack.py

Build platform-specific BIOS ZIP packs.

# Full platform packs
python scripts/generate_pack.py --all --output-dir dist/
python scripts/generate_pack.py --platform batocera
python scripts/generate_pack.py --emulator dolphin
python scripts/generate_pack.py --system atari-lynx

# Granular options
python scripts/generate_pack.py --platform retroarch --system sony-playstation
python scripts/generate_pack.py --platform batocera --required-only
python scripts/generate_pack.py --platform retroarch --split
python scripts/generate_pack.py --platform retroarch --split --group-by manufacturer

# Hash-based lookup and custom packs
python scripts/generate_pack.py --from-md5 d8f1206299c48946e6ec5ef96d014eaa
python scripts/generate_pack.py --platform batocera --from-md5-file missing.txt
python scripts/generate_pack.py --platform retroarch --list-systems

# Hardware target filtering
python scripts/generate_pack.py --all --target x86_64
python scripts/generate_pack.py --platform retroarch --target switch

# Source variants
python scripts/generate_pack.py --platform retroarch --source platform  # YAML baseline only
python scripts/generate_pack.py --platform retroarch --source truth     # emulator profiles only
python scripts/generate_pack.py --platform retroarch --source full      # both (default)
python scripts/generate_pack.py --all --all-variants --output-dir dist/ # all 6 combinations
python scripts/generate_pack.py --all --all-variants --verify-packs --output-dir dist/

# Data refresh
python scripts/generate_pack.py --all --refresh-data  # force re-download data dirs

# Install manifests (consumed by install.py)
python scripts/generate_pack.py --all --manifest --output-dir install/
python scripts/generate_pack.py --manifest-targets --output-dir install/targets/

Packs include platform baseline files plus files required by the platform's cores. When a file passes platform verification but fails emulator validation, the tool searches for a variant that satisfies both. If none exists, the platform version is kept and the discrepancy is reported.

Granular options:

  • --system with --platform: filter to specific systems within a platform pack
  • --required-only: exclude optional files, keep only required
  • --split: generate one ZIP per system instead of one big pack
  • --split --group-by manufacturer: group split packs by manufacturer (Sony, Nintendo, Sega...)
  • --from-md5: look up a hash in the database, or build a custom pack with --platform/--emulator
  • --from-md5-file: same, reading hashes from a file (one per line, comments with #)
  • --target: filter by hardware target (e.g. switch, rpi4, x86_64)
  • --source {platform,truth,full}: select file source (platform YAML only, emulator profiles only, or both)
  • --all-variants: generate all 6 combinations of source x required_only
  • --refresh-data: force re-download all data directories before packing

cross_reference.py

Compare emulator profiles against platform configs. Reports files that cores need beyond what platforms declare.

python scripts/cross_reference.py                    # all
python scripts/cross_reference.py --emulator dolphin  # single
python scripts/cross_reference.py --emulator dolphin --json  # JSON output
python scripts/cross_reference.py --platform batocera        # single platform
python scripts/cross_reference.py --platform retroarch --target switch

truth.py, generate_truth.py, diff_truth.py

Generate ground truth from emulator profiles, diff against scraped platform data.

python scripts/generate_truth.py --platform retroarch     # single platform truth
python scripts/generate_truth.py --all --output-dir dist/truth/  # all platforms
python scripts/diff_truth.py --platform retroarch         # diff truth vs scraped
python scripts/diff_truth.py --all                        # diff all platforms

export_native.py

Export truth data to native platform formats (System.dat, es_bios.xml, checkBIOS.sh, etc.).

python scripts/export_native.py --platform batocera
python scripts/export_native.py --all --output-dir dist/upstream/

validation.py

Validation index and ground truth formatting. Used by verify.py for emulator-level checks (size, CRC32, MD5, SHA1, crypto). Separates reproducible hash checks from cryptographic validations that require console-specific keys.

refresh_data_dirs.py

Fetch data directories (Dolphin Sys, PPSSPP assets, blueMSX databases) from upstream repositories into data/.

python scripts/refresh_data_dirs.py
python scripts/refresh_data_dirs.py --key dolphin-sys --force
python scripts/refresh_data_dirs.py --dry-run              # preview without downloading
python scripts/refresh_data_dirs.py --platform batocera    # single platform only
python scripts/refresh_data_dirs.py --registry path/to/_data_dirs.yml

Other tools

Script Purpose
common.py Shared library: hash computation, file resolution, platform config loading, emulator profiles, target filtering
dedup.py Deduplicate bios/ (--dry-run, --bios-dir), move duplicates to .variants/. RPG Maker and ScummVM excluded (NODEDUP)
validate_pr.py Validate BIOS files in pull requests, post markdown report
auto_fetch.py Fetch missing BIOS files from known sources (4-step pipeline)
list_platforms.py List active platforms (--all includes archived, used by CI)
download.py Download packs from GitHub releases (Python, multi-threaded)
generate_readme.py Generate README.md and CONTRIBUTING.md from database
generate_site.py Generate all MkDocs site pages (this documentation)
deterministic_zip.py Rebuild MAME BIOS ZIPs deterministically (same ROMs = same hash)
crypto_verify.py 3DS RSA signature and AES crypto verification
sect233r1.py Pure Python ECDSA verification on sect233r1 curve (3DS OTP cert)
check_buildbot_system.py Detect stale data directories by comparing with buildbot
migrate.py Migrate flat bios structure to Manufacturer/Console/ hierarchy

Installation tools

Cross-platform BIOS installer for end users:

# Python installer (auto-detects platform)
python install.py

# Shell one-liner (Linux/macOS)
bash scripts/download.sh retroarch ~/RetroArch/system/
bash scripts/download.sh --list

# Or via install.sh wrapper (detects curl/wget, runs install.py)
bash install.sh

install.py auto-detects the user's platform by checking config files, downloads the matching BIOS pack from GitHub releases with SHA1 verification, and extracts files to the correct directory. install.ps1 provides equivalent functionality for Windows/PowerShell.

Large files

Files over 50 MB are stored as assets on the large-files GitHub release. They are listed in .gitignore to keep the git repository lightweight. generate_db.py downloads them from the release when rebuilding the database, using fetch_large_file() from common.py. The same function is used by generate_pack.py when a file has a hash mismatch with the local variant.

Scrapers

Located in scripts/scraper/. Each inherits BaseScraper and implements fetch_requirements().

Scraper Source Format
libretro_scraper System.dat + core-info .info files clrmamepro DAT
batocera_scraper batocera-systems script Python dict
recalbox_scraper es_bios.xml XML
retrobat_scraper batocera-systems.json JSON
emudeck_scraper checkBIOS.sh Bash + CSV
retrodeck_scraper component manifests JSON per component
romm_scraper known_bios_files.json JSON
coreinfo_scraper .info files from libretro-core-info INI-like
bizhawk_scraper FirmwareDatabase.cs C# source
mame_hash_scraper mamedev/mame source tree C source (sparse clone)
fbneo_hash_scraper FBNeo source tree C source (sparse clone)

Internal modules: base_scraper.py (abstract base with _fetch_raw() caching and shared CLI), dat_parser.py (clrmamepro DAT format parser), mame_parser.py (MAME C source BIOS root set parser), fbneo_parser.py (FBNeo C source BIOS set parser), _hash_merge.py (text-based YAML patching that preserves formatting).

Adding a scraper: inherit BaseScraper, implement fetch_requirements(), call scraper_cli(YourScraper) in __main__.

Target scrapers

Located in scripts/scraper/targets/. Each inherits BaseTargetScraper and implements fetch_targets().

Scraper Source Targets
retroarch_targets_scraper libretro buildbot nightly 20+ architectures
batocera_targets_scraper Config.in + es_systems.yml 35+ boards
emudeck_targets_scraper EmuScripts GitHub API steamos, windows
retropie_targets_scraper scriptmodules + rp_module_flags 7 platforms
python -m scripts.scraper.targets.retroarch_targets_scraper --dry-run
python -m scripts.scraper.targets.batocera_targets_scraper --dry-run

Exporters

Located in scripts/exporter/. Each inherits BaseExporter and implements export().

Exporter Output format
systemdat_exporter clrmamepro DAT (RetroArch System.dat)
batocera_exporter Python dict (batocera-systems)
recalbox_exporter XML (es_bios.xml)
retrobat_exporter JSON (batocera-systems.json)
emudeck_exporter Bash script (checkBIOS.sh)
retrodeck_exporter JSON (component_manifest.json)
romm_exporter JSON (known_bios_files.json)
lakka_exporter clrmamepro DAT (delegates to systemdat)
retropie_exporter clrmamepro DAT (delegates to systemdat)