--- name: sourcehut-refresh description: Refresh the SourceHut documentation mirror — scrape the upstream source listing pages, clone any new repos, and update every existing repo to the latest tag. Use when the user says "refresh", "update", "sync", or "fetch new" sourcehut repos, or when answers seem stale. --- # sourcehut-refresh This workspace at `~/data/home/sourcehut` is a documentation mirror of the SourceHut project. Running this skill re-discovers the canonical list of subprojects from upstream and brings every clone up to its latest tag. ## What this skill does The workspace is a **git superproject** — each SourceHut repo is a submodule pinned to a specific commit, listed in `.gitmodules`. Refresh therefore means: re-discover upstream, add submodules for new repos, fast-forward existing submodules to the latest tag, then stage the bumped gitlinks so the user can review and commit. 1. Scrape `https://sr.ht/~sircmpwn/sourcehut/sources` and any `?page=N` continuations until no `?page=N+1` link appears. 2. Extract every repository name linked under `~sircmpwn/`. 3. For each name: - **Not yet a submodule** → `git submodule add https://git.sr.ht/~sircmpwn/`, then check out latest tag inside. - **Already a submodule** → enter it, `git fetch --tags --prune`, check out latest tag. 4. Tag selection: highest-versioned tag via `git tag --sort=-v:refname | head -n1`; if no tags, stay on default branch and fast-forward. 5. `git add ` in the superproject for every submodule whose HEAD moved. 6. Regenerate `.claude/INDEX.md`. 7. Report `OK / NEW / UPDATED / NOTAG / FAIL` and **leave the staged changes uncommitted** — the user reviews and commits. ## How to run Always run from `~/data/home/sourcehut`. Do all the work via Bash — do not invent intermediate scripts unless the inline pipeline below is genuinely insufficient. ### 1. Discover the repo list from upstream ```bash cd ~/data/home/sourcehut discover_repos() { local page=1 url repos="" got while :; do if [ "$page" -eq 1 ]; then url="https://sr.ht/~sircmpwn/sourcehut/sources" else url="https://sr.ht/~sircmpwn/sourcehut/sources?page=$page" fi got=$(curl -fsSL "$url" 2>/dev/null) || break # Anchor: href="/~sircmpwn/" where contains no slash. # Exclude the hub itself ("sourcehut") and any '?page=' query strings. page_repos=$(printf '%s\n' "$got" \ | grep -oE 'href="/~sircmpwn/[A-Za-z0-9._+-]+"' \ | sed -E 's:.*/~sircmpwn/([^"/]+)".*:\1:' \ | grep -vE '^(sourcehut)$' \ | sort -u) [ -z "$page_repos" ] && break repos="$repos"$'\n'"$page_repos" # Stop when there is no link to the next page. if ! printf '%s\n' "$got" | grep -q "sources?page=$((page+1))"; then break fi page=$((page+1)) done printf '%s\n' "$repos" | sed '/^$/d' | sort -u } discover_repos > /tmp/srht-repos.txt wc -l /tmp/srht-repos.txt ``` If the discovered list is empty or surprisingly short (< 20 entries), **stop and report to the user** — the upstream HTML may have changed. Do not proceed to mutate clones. ### 2. Update each submodule (adds new, fetches existing, checks out latest tag) Submodule operations are not safely parallel (they touch the shared superproject index), so this loop is serial. Still finishes in well under a minute for ~30 repos. ```bash cd ~/data/home/sourcehut while IFS= read -r name; do url="https://git.sr.ht/~sircmpwn/$name" status="" if ! git config -f .gitmodules --get submodule."$name".path >/dev/null 2>&1; then # New repo — register as submodule. submodule add clones into place. if git submodule add --quiet "$url" "$name" 2>/dev/null; then status="NEW" else printf 'FAIL %s (submodule add)\n' "$name"; continue fi else # Existing — make sure it is initialized, then fetch. git submodule update --init --quiet "$name" 2>/dev/null git -C "$name" fetch --quiet --tags --prune 2>/dev/null \ || { printf 'FAIL %s (fetch)\n' "$name"; continue; } status="OK" fi tag=$(git -C "$name" tag --sort=-v:refname 2>/dev/null | head -n1) if [ -z "$tag" ]; then git -C "$name" pull --ff-only --quiet 2>/dev/null || true printf 'NOTAG %s (on %s)\n' "$name" "$(git -C "$name" rev-parse --abbrev-ref HEAD)" git add "$name" 2>/dev/null continue fi current=$(git -C "$name" describe --tags --exact-match 2>/dev/null || echo "") if [ "$current" = "$tag" ] && [ "$status" != "NEW" ]; then printf '%-7s %s @ %s\n' "$status" "$name" "$tag" else if git -C "$name" -c advice.detachedHead=false checkout --quiet "$tag" 2>/dev/null; then printf 'UPDATED %s @ %s\n' "$name" "$tag" git add "$name" else printf 'FAIL %s (checkout %s)\n' "$name" "$tag" fi fi done < /tmp/srht-repos.txt | tee /tmp/srht-refresh.log ``` After this loop, `git status` in the superproject will show new gitlinks for every submodule whose HEAD moved (and any `NEW` submodules will also have a `.gitmodules` change). **Do not commit automatically** — show the user `git submodule status` and `git status --short`, summarize, and let them commit. ### 3. Rebuild the index After all clones are up-to-date, regenerate `.claude/INDEX.md`: ```bash bash ~/data/home/sourcehut/.claude/scripts/build-index.sh ``` This walks every repo and writes a per-service inventory of GraphQL types, SQL tables, Python blueprints, Go packages, and a cross-repo type map. The `sourcehut-lookup` skill reads it before doing anything else, so a stale index degrades every later lookup. Always run this — even if no clones changed — because the script also captures the tag of each repo. It takes ~5 seconds. ### 4. Stage `.claude/INDEX.md` ```bash git add .claude/INDEX.md ``` ### 5. Report Summarize counts of NEW / UPDATED / OK / NOTAG / FAIL. Highlight any FAIL lines. If `NEW` repos appeared, mention them by name — they likely warrant a one-line addition to the layout section of `CLAUDE.md`. Show `git status --short` and suggest a commit message like `refresh: bump submodules to latest tags` — let the user commit. ## Notes - The workspace is a git superproject. Submodules live in `.gitmodules` and their checkout state is the gitlink stored in the superproject's tree. **A refresh produces uncommitted submodule pointer bumps** that the user reviews and commits. - This skill performs **destructive checkout** on each submodule (detached-HEAD at a tag). If a submodule has local uncommitted changes (it shouldn't — this is a read-only mirror), `git checkout ` will fail and the repo will be reported as FAIL. Do not force. - Submodule updates are serial because they share the superproject's index lockfile. Do not parallelize. - Never remove submodules that disappear from upstream listings — flag them in the report and let the user decide whether to `git submodule deinit && git rm `. - `.clone-repos.sh` is now legacy/historical. Keep it as documentation of the original bootstrap; do not edit it as part of refresh. - A few submodules live outside `~sircmpwn` and therefore won't appear in the discovered list — they must be refreshed manually. Currently: `hut` (from `~xenrox/hut`). The loop above already skips them because they're absent from `/tmp/srht-repos.txt`; do not delete them. To bump them, `cd `, `git fetch --tags --prune`, `git checkout `, then `git add ` from the superproject.