~bigbes/sourcehut-root

ref: 72581ff94e21863373e2c72a01eb75332cb22f32 sourcehut-root/.claude/skills/sourcehut-refresh/SKILL.md -rw-r--r-- 7.3 KiB
72581ff9 — Eugene Blikh docs(sourcehut-ci): fix factual errors found in full audit 2 days ago

#name: sourcehut-refresh description: Refresh the SourceHut documentation mirror — scrape the upstream source listing pages, clone any new repos, and update every existing repo to the latest tag. Use when the user says "refresh", "update", "sync", or "fetch new" sourcehut repos, or when answers seem stale.

#sourcehut-refresh

This workspace at ~/data/home/sourcehut is a documentation mirror of the SourceHut project. Running this skill re-discovers the canonical list of subprojects from upstream and brings every clone up to its latest tag.

#What this skill does

The workspace is a git superproject — each SourceHut repo is a submodule pinned to a specific commit, listed in .gitmodules. Refresh therefore means: re-discover upstream, add submodules for new repos, fast-forward existing submodules to the latest tag, then stage the bumped gitlinks so the user can review and commit.

  1. Scrape https://sr.ht/~sircmpwn/sourcehut/sources and any ?page=N continuations until no ?page=N+1 link appears.
  2. Extract every repository name linked under ~sircmpwn/.
  3. For each name:
    • Not yet a submodulegit submodule add https://git.sr.ht/~sircmpwn/<name>, then check out latest tag inside.
    • Already a submodule → enter it, git fetch --tags --prune, check out latest tag.
  4. Tag selection: highest-versioned tag via git tag --sort=-v:refname | head -n1; if no tags, stay on default branch and fast-forward.
  5. git add <name> in the superproject for every submodule whose HEAD moved.
  6. Regenerate .claude/INDEX.md.
  7. Report OK / NEW / UPDATED / NOTAG / FAIL and leave the staged changes uncommitted — the user reviews and commits.

#How to run

Always run from ~/data/home/sourcehut. Do all the work via Bash — do not invent intermediate scripts unless the inline pipeline below is genuinely insufficient.

#1. Discover the repo list from upstream

cd ~/data/home/sourcehut
discover_repos() {
  local page=1 url repos="" got
  while :; do
    if [ "$page" -eq 1 ]; then
      url="https://sr.ht/~sircmpwn/sourcehut/sources"
    else
      url="https://sr.ht/~sircmpwn/sourcehut/sources?page=$page"
    fi
    got=$(curl -fsSL "$url" 2>/dev/null) || break
    # Anchor: href="/~sircmpwn/<name>" where <name> contains no slash.
    # Exclude the hub itself ("sourcehut") and any '?page=' query strings.
    page_repos=$(printf '%s\n' "$got" \
      | grep -oE 'href="/~sircmpwn/[A-Za-z0-9._+-]+"' \
      | sed -E 's:.*/~sircmpwn/([^"/]+)".*:\1:' \
      | grep -vE '^(sourcehut)$' \
      | sort -u)
    [ -z "$page_repos" ] && break
    repos="$repos"$'\n'"$page_repos"
    # Stop when there is no link to the next page.
    if ! printf '%s\n' "$got" | grep -q "sources?page=$((page+1))"; then
      break
    fi
    page=$((page+1))
  done
  printf '%s\n' "$repos" | sed '/^$/d' | sort -u
}
discover_repos > /tmp/srht-repos.txt
wc -l /tmp/srht-repos.txt

If the discovered list is empty or surprisingly short (< 20 entries), stop and report to the user — the upstream HTML may have changed. Do not proceed to mutate clones.

#2. Update each submodule (adds new, fetches existing, checks out latest tag)

Submodule operations are not safely parallel (they touch the shared superproject index), so this loop is serial. Still finishes in well under a minute for ~30 repos.

cd ~/data/home/sourcehut
while IFS= read -r name; do
  url="https://git.sr.ht/~sircmpwn/$name"
  status=""
  if ! git config -f .gitmodules --get submodule."$name".path >/dev/null 2>&1; then
    # New repo — register as submodule. submodule add clones into place.
    if git submodule add --quiet "$url" "$name" 2>/dev/null; then
      status="NEW"
    else
      printf 'FAIL    %s (submodule add)\n' "$name"; continue
    fi
  else
    # Existing — make sure it is initialized, then fetch.
    git submodule update --init --quiet "$name" 2>/dev/null
    git -C "$name" fetch --quiet --tags --prune 2>/dev/null \
      || { printf 'FAIL    %s (fetch)\n' "$name"; continue; }
    status="OK"
  fi
  tag=$(git -C "$name" tag --sort=-v:refname 2>/dev/null | head -n1)
  if [ -z "$tag" ]; then
    git -C "$name" pull --ff-only --quiet 2>/dev/null || true
    printf 'NOTAG   %s (on %s)\n' "$name" "$(git -C "$name" rev-parse --abbrev-ref HEAD)"
    git add "$name" 2>/dev/null
    continue
  fi
  current=$(git -C "$name" describe --tags --exact-match 2>/dev/null || echo "")
  if [ "$current" = "$tag" ] && [ "$status" != "NEW" ]; then
    printf '%-7s %s @ %s\n' "$status" "$name" "$tag"
  else
    if git -C "$name" -c advice.detachedHead=false checkout --quiet "$tag" 2>/dev/null; then
      printf 'UPDATED %s @ %s\n' "$name" "$tag"
      git add "$name"
    else
      printf 'FAIL    %s (checkout %s)\n' "$name" "$tag"
    fi
  fi
done < /tmp/srht-repos.txt | tee /tmp/srht-refresh.log

After this loop, git status in the superproject will show new gitlinks for every submodule whose HEAD moved (and any NEW submodules will also have a .gitmodules change). Do not commit automatically — show the user git submodule status and git status --short, summarize, and let them commit.

#3. Rebuild the index

After all clones are up-to-date, regenerate .claude/INDEX.md:

bash ~/data/home/sourcehut/.claude/scripts/build-index.sh

This walks every repo and writes a per-service inventory of GraphQL types, SQL tables, Python blueprints, Go packages, and a cross-repo type map. The sourcehut-lookup skill reads it before doing anything else, so a stale index degrades every later lookup. Always run this — even if no clones changed — because the script also captures the tag of each repo. It takes ~5 seconds.

#4. Stage .claude/INDEX.md

git add .claude/INDEX.md

#5. Report

Summarize counts of NEW / UPDATED / OK / NOTAG / FAIL. Highlight any FAIL lines. If NEW repos appeared, mention them by name — they likely warrant a one-line addition to the layout section of CLAUDE.md. Show git status --short and suggest a commit message like refresh: bump submodules to latest tags — let the user commit.

#Notes

  • The workspace is a git superproject. Submodules live in .gitmodules and their checkout state is the gitlink stored in the superproject's tree. A refresh produces uncommitted submodule pointer bumps that the user reviews and commits.
  • This skill performs destructive checkout on each submodule (detached-HEAD at a tag). If a submodule has local uncommitted changes (it shouldn't — this is a read-only mirror), git checkout <tag> will fail and the repo will be reported as FAIL. Do not force.
  • Submodule updates are serial because they share the superproject's index lockfile. Do not parallelize.
  • Never remove submodules that disappear from upstream listings — flag them in the report and let the user decide whether to git submodule deinit && git rm <name>.
  • .clone-repos.sh is now legacy/historical. Keep it as documentation of the original bootstrap; do not edit it as part of refresh.
  • A few submodules live outside ~sircmpwn and therefore won't appear in the discovered list — they must be refreshed manually. Currently: hut (from ~xenrox/hut). The loop above already skips them because they're absent from /tmp/srht-repos.txt; do not delete them. To bump them, cd <name>, git fetch --tags --prune, git checkout <latest tag>, then git add <name> from the superproject.