Docker troubleshooting
argus scan runs each scanner inside an immutable container by default. That
buys reproducibility — but it also means most scan failures bottom out in
something Docker did or did not do. This page is a single-stop reference for
the Docker-shaped pain points users hit most often: runtime detection, bind-mount
permissions, image pulls, network proxies, cache mounts, and the way the
engine reports execution / parse failures.
The runtime path is implemented in
argus/core/engine.py (_detect_runtime,
_pull_image, _run_in_container). When in doubt about what the engine is
doing, run with --verbose — every command, exit code, and stderr summary is
logged at DEBUG.
Backend selection: this page assumes
execution.backend: auto(the default) ordocker. If you run withbackend: local, the engine never launches a container — none of the symptoms below apply. See Configuration Reference → execution for the full backend semantics.
Docker not installed / not on PATH / not running
Problem. The engine couldn't find a usable container runtime, or it found
one but the daemon isn't running. With backend: auto you'll see scanners
fall through to local execution (and fail with not installed if the local
binary is missing). With backend: docker the run aborts with
Docker not available. No container runtime available..
How to verify.
docker info # daemon up?
docker run --rm hello-world # full round-trip works?
argus scan --verbose # check the "Using container runtime: ..." log line
Fix.
- Install Docker Desktop (macOS / Windows), Docker Engine (Linux), Podman, or nerdctl. The engine accepts any of them — see the next section.
- On Linux, ensure your user is in the
dockergroup, or usesudo. After adding yourself to the group, log out and back in. - On macOS / Windows, start Docker Desktop and wait for the whale icon to stop animating before scanning.
- Force the local code path while you debug Docker by editing
argus.yml:
execution:
backend: local
This is fine for one-off triage but loses Argus's version pinning — the
engine will warn about scanner-version drift unless you also pass
--allow-local-versions.
Wrong runtime selected (docker vs podman vs nerdctl)
Problem. You have multiple container CLIs on PATH and the engine picked
the wrong one — typically because Docker is installed but the daemon is down,
yet docker is still on PATH and gets selected over a working Podman.
How the engine picks. From ArgusEngine._detect_runtime in
argus/core/engine.py:
ARGUS_CONTAINER_RUNTIMEenvironment variable (explicit override, must resolve viashutil.which).- Auto-detect, in order:
docker,podman,nerdctl. First match wins.
How to verify.
argus scan --verbose 2>&1 | grep -i "container runtime"
# Using container runtime: docker
Fix — environment override (one shot).
ARGUS_CONTAINER_RUNTIME=podman argus scan
Fix — config file. The runtime itself is auto-detected, but you can disable Docker in your shell so Podman wins the fallback:
export ARGUS_CONTAINER_RUNTIME=podman
The auto-detector never tries combinations — it commits to the first runtime
it finds. If docker is on PATH but broken, set ARGUS_CONTAINER_RUNTIME
explicitly rather than uninstalling Docker.
Permission denied writing to /output (uid mismatch)
Problem. A scanner runs but produces no output files. Verbose logs show
something like Permission denied: /output/results.json from the container,
followed by Scanner 'X' produced no output files from the engine. This is
the most common silent-failure mode on macOS.
Root cause. Argus mounts a host tempdir at /output for the scanner to
write into. Most Argus scanner images run as a non-root user (USER argus,
uid 1000); the invoking host user on macOS is commonly uid 501. Python's
tempfile.TemporaryDirectory creates the host dir with mode 0o700, which
the container's uid 1000 process can't write into.
Fix. Already shipped — see commit 3e7dc78 and
ArgusEngine._run_in_container in argus/core/engine.py
(search for os.chmod(output_dir, 0o777)). On Linux and macOS the engine
chmods the host tempdir before launching the container. On Windows the
chmod is skipped because NTFS doesn't honor POSIX bits and Docker Desktop
maps uids differently — calling chmod 0o777 there is at best a no-op.
How to verify the fix is in effect.
argus version # should be the release containing 3e7dc78 (>= 0.7.x)
argus scan --verbose 2>&1 | grep -i "no output files"
If you still see permission-denied output after upgrading, check that
$TMPDIR (macOS / Linux) points at a directory where Docker Desktop's File
Sharing settings allow bind mounts. On macOS, that's typically /private/tmp,
/tmp, /Users/... — the defaults. If you've redirected $TMPDIR to a
custom path, add it to Docker Desktop → Settings → Resources → File Sharing.
For details on why 0o777 is safe in this specific context (host-only
tempdir, random name, removed at end of with-block, no secrets), read the
inline comment in _run_in_container directly.
Container exits with code 127 (executable not found)
Problem. A scanner's container starts and immediately exits with
return code 127. No findings, no output files, no useful stderr. Most often
seen with osv-scanner.
Root cause. Docker's --entrypoint override does not consult the
image's $PATH when given a bare binary name. Some scanner images declare an
absolute entrypoint (e.g. ghcr.io/google/osv-scanner uses
ENTRYPOINT ["/osv-scanner"]) and require the absolute path on the override.
A bare osv-scanner resolves nowhere and exits 127.
How to verify. Inspect the image's actual entrypoint:
docker image inspect ghcr.io/google/osv-scanner \
--format '{{json .Config.Entrypoint}}'
# ["/osv-scanner"]
Fix. This is fixed for OSV in PR #125 — the scanner class declares
container_entrypoint = "/osv-scanner" and the engine strips argv[0] for
ENTRYPOINT-based images. See
argus/scanners/osv.py.
If you hit exit 127 on a different scanner you're maintaining, follow the
same pattern: set container_entrypoint on the scanner class to the absolute
path read from Config.Entrypoint. The full failure pattern is documented
in .ai/errors.yaml under
category: scanner-execution / pattern: "exit 127|exec.*not found".
Image pull failures: registry auth, rate limits, air-gapped GHES
Problem. A scanner aborts with Failed to pull container image: <image>
during the first phase of _run_in_container. The cause is one of: Docker
Hub rate limits (toomanyrequests), missing credentials for a private
registry, or an air-gapped environment where the registry is simply
unreachable.
How to verify.
argus scan --verbose 2>&1 | grep -i "Pulling container image\|Failed to pull"
docker pull <image> # reproduce the pull outside argus
Fix — Docker Hub rate limits. Authenticate. Anonymous pulls cap at 100 / 6h per IP; authenticated pulls at 200 / 6h.
docker login docker.io -u <user>
Fix — private registry. For composite-action workflows, pass credentials to the action:
with:
registry_username: ${{ secrets.REGISTRY_USER }}
registry_password: ${{ secrets.REGISTRY_TOKEN }}
For SDK / CLI runs, log in to the registry on the host before invoking
argus scan:
echo "$REGISTRY_TOKEN" | docker login ghcr.io -u "$REGISTRY_USER" --password-stdin
argus scan
.ai/errors.yaml → "registry.*authentication.*failed" covers the registry
auth case end-to-end.
Fix — air-gapped GHES (mirror images). Pre-pull every Argus scanner
image on a machine with internet, push them to your internal registry, and
point Argus at the mirror via execution.registry:
execution:
registry: registry.internal.corp/argus
pull_policy: if-not-present
The mirror path is implemented in _resolve_image (same file). Argus
preserves the original <repo>/<image>:<tag> shape, just swaps the registry
host. See GHES with private container registry below for the full mirror
playbook.
Network proxies inside the container
Problem. The host can reach the internet through an HTTPS proxy, but
scanners that fetch databases at runtime (e.g. Trivy hitting its
vulnerability database, OSV-scanner hitting osv.dev) hang or time out
inside the container.
Root cause. docker run does not propagate HTTP_PROXY / HTTPS_PROXY
/ NO_PROXY from the host into the container automatically. Argus likewise
does not (by design — proxy semantics are environment-specific and
auto-injecting them is a footgun).
How to verify.
docker run --rm alpine sh -c 'env | grep -i proxy'
# (empty) → confirms host proxy isn't reaching the container
Fix — Docker Desktop. Configure proxies in Docker Desktop → Settings → Resources → Proxies. Docker Desktop injects the variables into every container automatically.
Fix — Docker Engine on Linux. Configure the proxy in the daemon and restart it:
# /etc/systemd/system/docker.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=http://proxy.corp:3128"
Environment="HTTPS_PROXY=http://proxy.corp:3128"
Environment="NO_PROXY=localhost,127.0.0.1,.corp"
sudo systemctl daemon-reload && sudo systemctl restart docker
The systemd-level config affects docker pull. To inject proxies into the
scanner process itself, set them in the user environment that runs Argus:
export HTTPS_PROXY=http://proxy.corp:3128
export NO_PROXY=localhost,127.0.0.1,.corp
argus scan
Argus does not currently re-emit these as -e HTTPS_PROXY=... flags on
docker run. If your scanners need in-container proxy access, configure it
at the daemon level (Docker Desktop / systemd) so it applies image-wide.
Pull-policy gotchas (always / if-not-present / never)
Problem. A CI run pulls a fresh scanner image every time and burns minutes; or a dev's local cache goes stale and they get last week's CVE database; or an air-gapped runner errors out trying to reach a registry it can't see.
How to verify.
argus scan --verbose 2>&1 | grep -i "Pull policy\|Pulled\|skipping pull"
Fix. Three policies, configured under execution.pull_policy in
argus.yml (default: if-not-present). Implementation:
ArgusEngine._pull_image in argus/core/engine.py.
| Policy | Behavior | Use when |
|---|---|---|
always |
Pull on every run, no local-cache check | Local dev where you want bleeding-edge CVE DBs |
if-not-present |
Pull only if the image isn't cached locally (default) | CI, day-to-day dev — best speed/freshness tradeoff |
never |
Never pull; fail if the image isn't already cached | Air-gapped runners, bandwidth-constrained environments |
Recommended pairings:
# CI — predictable speed, ephemeral runners pull once per workflow
execution:
pull_policy: if-not-present
# Air-gapped — registry is unreachable, all images pre-staged
execution:
pull_policy: never
registry: registry.internal.corp/argus
# Local dev — always grab the latest scanner DBs before a manual run
execution:
pull_policy: always
GitHub Actions caches Docker images per-runner — if-not-present on an
ephemeral runner still pulls every job. To avoid that cost, pair Argus with
the actions/cache action keyed off ~/.docker or use a self-hosted
runner with persistent disk.
Cache mount permission issues for scanner DBs (Trivy, Grype)
Problem. Trivy and Grype each maintain a vulnerability database (~100MB each) that's expensive to download every run. Argus mounts a host-side cache into the container so the DB persists across scans — but if the host cache directory is owned by the wrong uid or has restrictive permissions, the scanner can't read or write it and falls back to "no DB available".
How the cache works. From argus/containers.py::get_cache_mount:
| Scanner | Container path | Default host path |
|---|---|---|
| Trivy | /root/.cache/trivy |
$TMPDIR/argus-cache/trivy |
| Grype | /root/.cache/grype |
$TMPDIR/argus-cache/grype |
The mount is added unconditionally unless you pass --no-cache.
How to verify.
argus scan --verbose 2>&1 | grep -i "DB cache mount"
ls -la "$TMPDIR/argus-cache/"
Fix — wrong owner. Wipe the cache and let Argus recreate it on next run:
rm -rf "$TMPDIR/argus-cache"
argus scan
Fix — bypass cache entirely. When the cache directory is wedged and you just want a clean run:
argus scan --no-cache
--no-cache skips the -v <host>:<container> mount altogether — the scanner
downloads its DB into the ephemeral container layer and discards it on
exit. Slower per-run, but always-correct. Useful in CI where the runner is
ephemeral anyway and the cache provides no value.
"All containers fail with no output produced"
Problem. Every scanner finishes with produced no output files warnings
and the dashboard shows zero findings — but you can't tell whether the code
is genuinely clean or the whole pipeline silently failed.
How the engine handles this. Three states, all visible in
argus-results.json under each scanner's metadata:
metadata['execution_failed'] = True— the scanner couldn't run, or ran but produced no output.metadata['execution_failure_reason']carries a clipped stderr summary and the exit code.metadata['parse_failed'] = True— the scanner ran and wrote a file, but Argus couldn't parse it. See the next section.- Neither set — the scanner ran clean.
The terminal reporter labels the run Status: PASS (degraded — N did not
run, M unparsable) whenever any result has execution_failed or
parse_failed, so the warning row above the status is consistent with the
verdict below it.
How to verify.
argus scan --verbose # full DEBUG firehose, every docker invocation logged
jq '.results[] | select(.metadata.execution_failed == true)' \
./argus-results/argus-results.json
--verbose is the right starting point for any "scan looks weird" report —
it logs every docker run command, exit code, stderr (capped at 800 bytes),
and the list of output files the scanner wrote.
Fix — exit non-zero on scanner failures in CI. The default exit code
reflects only severity-threshold compliance, which a zero-finding run always
satisfies. To turn execution / parse failures into a CI hard-fail, add
--fail-on-scanner-error:
argus scan --fail-on-scanner-error
Implementation: search for fail_on_scanner_error in
argus/cli.py. The flag treats both execution_failed
and parse_failed as hard failures.
The execution_failed / parse_failed design is documented in
.ai/errors.yaml under
category: scanner-execution (patterns "scanner produced no output|execution_failed"
and "0 findings.known to be vulnerable|JSONDecodeError.results.json").
"Scanner produced output but parser failed"
Problem. A scanner exits cleanly and writes results.json, but Argus
warns Scanner 'X' produced output but parse failed: ... and the run shows
zero findings for that scanner. Most common cause: an upstream tool revved
its output schema (osv-scanner v1 → v2 is the canonical example), or the
scanner truncated output mid-write, or it interleaved text banners with
JSON.
How the engine handles this. PR #125 introduced a third state distinct
from "execution failed" and "ran clean":
metadata['parse_failed'] = True plus metadata['parse_failure_reason']
(the exception type and a 200-character clip of the output head). The
parser bug doesn't crash the rest of the scan — other scanners' results are
still useful — but the affected scanner is loudly surfaced.
Engine implementation: try/except around scanner.parse_results in
_run_in_container (argus/core/engine.py).
The same wrapping happens on the local-execution path in
scanner_template.run_subprocess_scan.
How to verify.
argus scan --verbose
jq '.results[] | select(.metadata.parse_failed == true)
| {scanner, reason: .metadata.parse_failure_reason}' \
./argus-results/argus-results.json
Also inspect the raw scanner output that Argus preserved (assuming
reporting.keep_raw: true, the default):
ls ./argus-results/raw/<scanner-name>/
cat ./argus-results/raw/<scanner-name>/results.json | head -40
Fix. Most parse failures are upstream-schema drifts — either upgrade
Argus to a release that handles the new schema, or pin the scanner's tool
version via the container image (the default for backend: auto). If you
hit a schema you believe Argus should handle, file an issue with the raw
output snippet from the raw/ directory.
--fail-on-scanner-error fires on parse_failed too, so a CI pipeline
that's strict about scan integrity stays strict.
Windows: AppLocker / SRP blocking executables
Problem. On Windows, yamllint (or any pip-installed scanner script)
launches with PermissionError: [WinError 5] Access is denied. The same
pip install works fine on Linux/macOS.
Root cause. Windows AppLocker or Software Restriction Policy blocks
executable launches from user AppData paths — exactly where pip --user
and virtualenv install scripts. The Python interpreter itself is
whitelisted, so loading the same package via python -m yamllint works on
the same machine.
How to verify.
where yamllint
yamllint --version
:: PermissionError: [WinError 5] Access is denied
python -m yamllint --version
:: 1.35.1 <- whitelisted via the interpreter
Fix. Already shipped — see commit 4760c1d and
argus/linters/yamllint.py
(_run_with_windows_fallback). On sys.platform == 'win32', a
PermissionError / OSError triggers a retry with
[sys.executable, '-m', 'yamllint'] + cmd[1:]. FileNotFoundError still
propagates so "yamllint not installed" renders cleanly. Linux/macOS bypass
the fallback — there, a PermissionError is a real bug, not a policy case.
If you're packaging your own linter scanner and hit AppLocker, copy that
helper. The pattern is documented in
.ai/errors.yaml under
category: scanner-execution / pattern: PermissionError.*WinError 5.
Windows: UTF-8 encoding errors
Problem. Scans abort with UnicodeDecodeError: 'charmap' codec can't
decode byte 0x8f mid-run on Windows. Either the docker stdout/stderr
decode fails, or Path.read_text() against a scanner output file fails
when the file contains a non-ASCII byte (CVE descriptions, accented file
paths, scanner banners).
Root cause. Docker container output (and most CLI tool output) is
UTF-8. subprocess.run(text=True) and Path.read_text() fall back to
the platform default encoding when encoding= is omitted — that's
cp1252 on Windows.
Fix. Already shipped — see commit 4760c1d. Every relevant call site
now passes encoding='utf-8', errors='replace':
_run_in_containersubprocess (engine docker invocation)scanner_template.run_subprocess_scan(local execution path)- All
argus/scanners/*.pyparse_resultsreads - yamllint subprocess + Windows fallback
errors='replace' is preferred over 'strict': a security tool showing
� is better than crashing on otherwise-usable output. See
.ai/errors.yaml under
pattern: "UnicodeDecodeError.*charmap codec.*can't decode byte" for the
full design rationale.
If you're upgrading from a pre-4760c1d release and you still see this
trace on Windows, you've found a missed call site — file an issue with
the scanner name and the line of the traceback in argus/.
GHES with private container registry
Problem. Your runners can't reach ghcr.io / docker.io, so every
scan dies on Failed to pull container image. You have an internal mirror
registry, but Argus is pinned to upstream image references.
Fix. Two steps: mirror the images, then point Argus at the mirror.
Step 1 — mirror the images. From a machine with internet access, pull
every image Argus uses, retag, and push to your internal registry. The
canonical list is generated by argus list --images (or read from
argus/containers.py). For each image:
docker pull ghcr.io/aquasecurity/trivy:0.50.0
docker tag ghcr.io/aquasecurity/trivy:0.50.0 \
registry.internal.corp/argus/aquasecurity/trivy:0.50.0
docker push registry.internal.corp/argus/aquasecurity/trivy:0.50.0
Pin to @sha256:... digests for production mirrors so a re-pushed upstream
tag can't drift your scan results. Renovate / Dependabot can update digest
pins automatically.
Step 2 — point Argus at the mirror.
execution:
registry: registry.internal.corp/argus
pull_policy: if-not-present
The registry knob is consumed by _resolve_image in
argus/core/engine.py: Argus replaces the
upstream registry host with yours and preserves the rest of the image path
(<repo>/<image>:<tag>).
Step 3 — authenticate (if the mirror is private).
echo "$REGISTRY_TOKEN" | docker login registry.internal.corp \
-u "$REGISTRY_USER" --password-stdin
argus scan
For composite actions, pass registry_username / registry_password as
inputs (see the
Image pull failures
section above). Never put credentials in the config file —
.ai/errors.yaml → "registry.*authentication.*failed" covers the
secrets-handling story.
Step 4 — pre-warm the cache on the runner (optional). If your runners
are ephemeral and pull_policy: if-not-present still re-pulls per job,
either bake the images into the runner AMI / image, or pair Argus with a
caching action keyed off ~/.docker.
Quick reference
| Symptom | Most likely cause | First command to run |
|---|---|---|
No container runtime available |
Docker not running / not on PATH | docker info |
Failed to pull container image |
Auth, rate limit, or network | docker pull <image> |
produced no output files (exit=N) |
uid mismatch or scanner crash | argus scan --verbose |
exit 127 |
Wrong entrypoint override | docker image inspect <image> |
parse_failed in metadata |
Schema drift in scanner output | cat ./argus-results/raw/<scanner>/results.json |
PermissionError: [WinError 5] (Windows) |
AppLocker on AppData binaries | python -m yamllint --version |
UnicodeDecodeError: 'charmap' codec |
cp1252 fallback on Windows | Upgrade to a release that includes commit 4760c1d |
| Scan looks clean but you don't trust it | Silent execution / parse failures | argus scan --fail-on-scanner-error |
For anything not covered here, run argus scan --verbose and grep the log
for the scanner name. The full container-execution path lives in
argus/core/engine.py (_run_in_container,
_pull_image, _detect_runtime) — those three functions account for
nearly every Docker-shaped failure mode the engine knows how to produce.