# Debugging Failed Builds The single most important debugging tool on sourcehut is SSH into the build VM. Use it; don't iterate blindly on the manifest. ## Reading sr.ht log output The build log is plain text, tasks separated by headers like: ``` [#1273143] 2025/01/15 10:23:01 Running task "build" + cd myproject + make gcc -c foo.c ... [#1273143] 2025/01/15 10:23:42 Task "build" failed (exit status 1) ``` Lines starting with `+` are from `set -x` — they show the command being run, with environment variables expanded. The next lines are the command's stdout/stderr. The trailing line with "failed" gives the exit status. When a task fails, **everything after that task is skipped**. The summary at the bottom of the log lists task statuses and exit codes. ## SSH into the failed VM On failure, the log prints: ``` [#1273143] 2025/01/15 10:23:42 Build failed. [#1273143] 2025/01/15 10:23:42 The build environment will be kept alive for 10 minutes. [#1273143] 2025/01/15 10:23:42 [#1273143] 2025/01/15 10:23:42 ssh -t builds@fra02.builds.sr.ht connect 1273143 [#1273143] 2025/01/15 10:23:42 [#1273143] 2025/01/15 10:23:42 After logging in, the deadline is increased to your remaining build time. ``` Run that SSH command. You'll be dropped into the VM as the `build` user, exactly as the build left it. The VM lives for **10 minutes** by default if you don't log in. Once you log in, the deadline extends to your remaining build time (which is `[builds.sr.ht::worker] timeout` minus already-elapsed time, often capped — instance-dependent, log will say "Your VM will be terminated N hours from now"). What to do once inside: - `cd ~` — `/home/build` is your home, where sources are cloned. - Re-run the failing command manually to see actual errors interactively. - `which ` — verify a package actually installed and is on PATH. - `cat ~/.buildenv` — see exactly what `environment:` set. - `env` — full environment, including `$OAUTH2_TOKEN`, `$JOB_ID`, etc. - `sudo` is passwordless — install missing packages, modify system config, whatever. - `logout` (or Ctrl-D) when done. The VM gets torn down. For SSH into the VM, your sourcehut SSH key needs to be added at `https://meta.sr.ht/keys`. The same key used for git operations is fine. ## `shell: true` for always-on SSH Add to the manifest: ```yaml shell: true ``` The VM stays alive after tasks complete, even on success. Use this when iterating; remove before committing for real. You can also SSH in *while the build is running* to watch progress interactively, run `top`, inspect the filesystem mid-build, etc. ## `complete-build` for early exit Magic in-VM command that ends the build successfully without running subsequent tasks: ```yaml tasks: - check-branch: | if [ "$GIT_REF" != "refs/heads/master" ]; then complete-build fi - deploy: | # only runs on master ``` It exits the *task* with status 0 and tells the runner to skip all subsequent tasks. The build is marked successful. Use for "this push doesn't need a full build" cases. Not for security gating — anyone editing the manifest can remove the `complete-build` call. ## Common errors and what they mean ### "No such image: foo/bar" The `image:` value isn't a valid sourcehut image. Check the spelling against `https://man.sr.ht/builds.sr.ht/compatibility.md`. Common typos: `alpine/3.18` (real) vs `alpine/3.18.0` (not real); `debian/bookworm` (real) vs `debian/12` (not real). ### "Cannot find package: xyz" Package isn't in the image's repos under that name. Cross-distro names differ: - Alpine: `nodejs` for Node, `npm` separate. - Debian: `nodejs` includes `npm` since recent versions. - Arch: `nodejs` and `npm` both. When unsure: `image: alpine/edge` + `packages: [xyz]`, push, see the error, find the right name via `https://pkgs.alpinelinux.org/packages`. ### "Permission denied (publickey)" Trying to SSH/git over SSH without a key, or with the wrong key. - Secret SSH key not configured: verify `secrets:` includes the right UUID and the secret type is "SSH key". - Public key not added on the receiving end: for GitHub mirror, add the build's public key (printed by `ssh-keygen -y -f ~/.ssh/id_*` inside the VM) as a deploy key on GitHub. - Wrong known_hosts: `ssh-keyscan -H >> ~/.ssh/known_hosts` before the SSH call. ### "401 Unauthorized" from a hut command or curl with `$OAUTH2_TOKEN` - `oauth:` directive missing or insufficient. Check the scope: read operations need `:RO`, write operations need `:RW`. - The OAuth grant is for a different service than you're calling. - Build was submitted in a context that disables secrets/OAuth (e.g. mailing-list patch test, `hut builds submit --no-secrets`, web "disable secrets" checkbox). When secrets are off, neither `~/.config/hut/config` nor `$OAUTH2_TOKEN` is provisioned. ### "missing access-token" from hut, even though `$OAUTH2_TOKEN` is set `hut` does **not** read `$OAUTH2_TOKEN`. It reads `~/.config/hut/config`. The worker pre-writes that file only when `oauth:` is in the manifest **and** secrets are enabled. If the env var is set but `hut` fails, something in your script removed/overwrote the config, or you're running `hut` as a user other than `build`. Inspect `~/.config/hut/config` to confirm. See `references/hut.md`. ### "Build failed with exit code 137" OOM kill. The VM ran out of memory. The VM's memory size is an instance/operator setting (`builds.sr.ht::worker` config), not a per-manifest value — there's no manifest key to bump it. Upstream `builds.sr.ht.org` runs a fixed amount per VM; self-hosted instances vary. Workarounds, in order of effort: 1. Reduce parallelism inside the build (`make -j2` instead of `make -j$(nproc)`). 2. Tell the compiler to use less memory (`go build -p 1`, `cargo build -j 1`, `cc -O1` instead of `-O3`, etc.). 3. Split the work across multiple jobs in `.builds/`. 4. Run a self-hosted runner on bigger hardware. ### Tar/pages publish accepts but site is broken pages.sr.ht silently discards invalid uploads. Verify the tarball: ```bash tar -tzvf site.tar.gz | head -20 ``` Every line should look like `-rw-r--r--` (mode 644), no `drwx` directories with weird modes, no `l` (symlinks), and the top-level entries should be files (`index.html`, etc.), not a directory like `public/`. ### "skip-ci doesn't seem to work" git push options need protocol v2 (default since git 2.26). If you're stuck on a very old git, pass `-c protocol.version=2` explicitly: ```bash git -c protocol.version=2 push -o skip-ci ``` Also: some middleboxes (mirroring services, certain proxies) strip push options entirely. If you push to a *mirror* that re-pushes to git.sr.ht, the options don't make it through — push directly to git.sr.ht. If you want `skip-ci` to be the default for a repo (e.g. for an auto-changelog branch), set it in `git config`: ```bash git config --add push.pushOption skip-ci ``` …and remember to override it (`-o '' ` or unset the config) when you do want a build. ### "Variable from previous task is undefined" Tasks are separate sessions. Variables `export`ed in one task don't persist. Write to `~/.buildenv`: ```yaml tasks: - compute: | VERSION=$(...) echo "VERSION=$VERSION" >> ~/.buildenv - use: | echo "Version is $VERSION" ``` ### "Source directory not found" The `sources:` URL was wrong, the ref doesn't exist, or you cloned the repo but tried `cd `. The clone directory is named after the last URL component: `https://git.sr.ht/~user/myproject` → `myproject/`. Custom names aren't supported via `sources:`; use a task `git clone` for that. ### Build hangs forever A task that's waiting for user input hangs until the per-job timeout elapses (instance config; the upstream config example uses `45m`, your self-hosted instance may differ). Check for: unattended `apt-get` (use `apt-get install -y`), interactive `make menuconfig`, prompts from `gpg --gen-key` without `--batch`, `npm` asking before installing a dependency, etc. When the job times out it ends with status `timeout` (treated as failure by triggers), prints the SSH connect line, and gives you the standard 10-minute grace window to log in and look around. ## Iteration workflow The slow way: edit `.build.yml`, commit, push, wait for build, read log, repeat. Each iteration takes minutes. The fast way: 1. Go to `https://builds.sr.ht/submit`. 2. Paste your manifest. 3. Click submit. The job runs. Watch the log streaming live. 4. On failure, SSH into the VM, fix manually, write down what worked. 5. Repeat in the web form with the corrected manifest. 6. Once green, commit the working manifest as `.build.yml`. This avoids polluting your git history with "fix CI try 7" commits. For local iteration, `hut builds submit --follow .build.yml` does the same thing from the CLI, streaming the log to your terminal. ## Reproducing locally The build images are public. You can pull them locally with QEMU if you want to reproduce a build environment exactly: ```bash # Image scripts are in the builds.sr.ht repo git clone https://git.sr.ht/~sircmpwn/builds.sr.ht cd builds.sr.ht/images/ # Build the image with the genimg script (requires QEMU + the right tooling) ``` Most people don't go this far. For "is this an environment issue or a code issue", a local Docker run with `docker run -it alpine sh` followed by manually running the build steps catches 90% of issues. ## When to ask for help The sourcehut admins are helpful but expect: - **Push UUID** if it's a push problem: `git push -o debug` prints it. - **Job URL** for build problems: `https://builds.sr.ht/~user/job/N`. - **The manifest** itself, in the message body. - **What you've tried**: SSH'd in? Read the log? Tried locally? The `sr.ht-discuss` mailing list is the right venue for general questions. `sr.ht-support` is for account and billing issues.