From 9438b1d8dc90f0fba217e87ffe540ecc8d45fd8e Mon Sep 17 00:00:00 2001 From: Mika Kuns Date: Wed, 20 May 2026 14:22:52 +0200 Subject: [PATCH] fix(plugin): make mailbox-update skill robust to common failure modes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous skill assumed a happy-path environment and silently broke on real installs. Hardened the flow: - Always pass --@kuns:registry on every npm call, so the upgrade survives an unreadable user .npmrc (e.g. roaming HOME on a network share that npm can't access). - Treat /health as the ground truth, not `claude-mailbox status`. Status only reflects the autostart wrapper and silently lies when the inner process crashed at boot. - Capture state before changing anything (CLI version, autostart state, port, daemon /health version) so failures can roll back to "do nothing — old daemon keeps running". - Detect the "daemon reachable but autostart NotInstalled" case (manual foreground serve) and refuse to swap binaries from under it. - After restart, poll /health for up to 10s and verify the live daemon's version matches LATEST; on timeout, dump the platform- appropriate daemon log tail instead of just reporting "Stopped". - Reorder: install first, then stop+start. Downtime is paid only after the new binary is verified on disk. --- plugin/commands/mailbox-update.md | 102 +++++++++++++++++++++++------- 1 file changed, 78 insertions(+), 24 deletions(-) diff --git a/plugin/commands/mailbox-update.md b/plugin/commands/mailbox-update.md index f4f22cc..efaa0b8 100644 --- a/plugin/commands/mailbox-update.md +++ b/plugin/commands/mailbox-update.md @@ -5,52 +5,106 @@ allowed-tools: Bash You are running the **Claude-Mailbox update** command. Update the `@kuns/claude-mailbox` npm package and restart the daemon. Never run `sudo` automatically — if elevation is needed, stop and ask. -## Step 1 — current version +Throughout, treat the **daemon's `/health` endpoint** as the ground truth for "is the daemon running and on what version", not `claude-mailbox status` (which only reflects the autostart wrapper). Always use the registry override flag `--@kuns:registry=https://git.kuns.dev/api/packages/releases/npm/` on every `npm` call, so the upgrade works even when the user's `.npmrc` is unreachable (e.g. roaming `$HOME` on a network share). -Run: `claude-mailbox --version` +## Step 1 — current state (must run before anything is changed) -- Exit 0 → record the version string as `CURRENT`. -- Non-zero → tell the user the daemon binary is not installed yet. Suggest `/claude-mailbox:mailbox-doctor` to do a full setup and stop. +Run, in order, and remember each result: + +1. `claude-mailbox --version` + - Exit 0 → `CURRENT_CLI = `. + - Non-zero → stop. The binary isn't installed; suggest `/claude-mailbox:mailbox-doctor` and exit. +2. `claude-mailbox status` + - Record as `AUTOSTART_STATE ∈ { Running, Stopped, NotInstalled }`. +3. Read the configured port. Try `~/.claude-mailbox/mailbox.json`; if absent or no `port` field, use **37849**. Call this `PORT`. +4. Probe `curl -sf -m 2 http://127.0.0.1:$PORT/health`. + - On success, parse JSON → `CURRENT_HEALTH_VERSION = .version`. Set `DAEMON_REACHABLE = true`. + - On failure → `CURRENT_HEALTH_VERSION = null`, `DAEMON_REACHABLE = false`. + +Note any inconsistencies (e.g. `AUTOSTART_STATE = NotInstalled` but `DAEMON_REACHABLE = true` means a manually-started foreground daemon) — they affect Step 5. ## Step 2 — latest published version -Run: `npm view @kuns/claude-mailbox version` +Run: `npm view --@kuns:registry=https://git.kuns.dev/api/packages/releases/npm/ @kuns/claude-mailbox version` -If the npm registry config is missing, the call may fail with a 404. Fall back to: +If that fails for any reason (network, registry), fall back to: ``` npm view --registry=https://git.kuns.dev/api/packages/releases/npm/ @kuns/claude-mailbox version ``` -Record the result as `LATEST`. +Record the result as `LATEST`. If both calls fail, stop and report the network/registry error. ## Step 3 — compare -- If `CURRENT === LATEST`: print "Already up to date (vX.Y.Z)." and stop. Do not run any further steps. -- Otherwise: tell the user `CURRENT` → `LATEST` and ask for confirmation before proceeding. +- `CURRENT_CLI === LATEST` AND `CURRENT_HEALTH_VERSION === LATEST` (or `DAEMON_REACHABLE = false`) → print "Already up to date (vLATEST)." and stop. +- `CURRENT_CLI === LATEST` but `CURRENT_HEALTH_VERSION !== LATEST` → the CLI is fresh but the running daemon is the old binary. Tell the user "Binary already at LATEST but the running daemon is still v$CURRENT_HEALTH_VERSION — restart needed." Then jump to Step 5 (no npm install). +- Otherwise → tell the user `CURRENT_CLI` → `LATEST` and ask for confirmation before proceeding. -## Step 4 — perform the update +Also warn before confirmation if `AUTOSTART_STATE = NotInstalled` AND `DAEMON_REACHABLE = false`: -On user confirmation, run these in order. Stop on the first failure and report it: +> Heads-up: autostart is not installed and no daemon is reachable on port $PORT. After the upgrade I can install the new binary, but I won't be able to start the daemon automatically — you'd need `/claude-mailbox:mailbox-doctor` to wire up autostart, or run `claude-mailbox serve` manually. Proceed anyway? -1. `claude-mailbox stop` -2. `npm install -g @kuns/claude-mailbox@latest` - - On Linux/macOS this may fail with EACCES. **Do not run sudo automatically.** Ask the user how they want to proceed (e.g., `sudo npm install -g …`, or switch to a user-scoped Node setup with nvm/fnm). -3. `claude-mailbox start` -4. `claude-mailbox --version` to verify the upgrade landed. -5. `claude-mailbox status` to verify the daemon is `Running`. +## Step 4 — install the new package -## Step 5 — summary +On user confirmation: + +``` +npm install -g @kuns/claude-mailbox@latest --@kuns:registry=https://git.kuns.dev/api/packages/releases/npm/ +``` + +- The scope override is mandatory — do not omit it, even if `npm config get @kuns:registry` looks right. It costs nothing and protects against unreachable user-level `.npmrc`. +- On Linux/macOS the install may fail with EACCES. **Do not run sudo automatically.** Stop and ask how the user wants to proceed (e.g. `sudo`, switch to nvm/fnm). +- On any other failure, stop and report. Do **not** touch the daemon — leaving the old daemon running is the safe rollback. + +After install, run `claude-mailbox --version` and confirm it now reports `LATEST`. If not (PATH shadowing, stale `which`), stop and report — the daemon is still on the old version, which is fine to keep running. + +## Step 5 — restart the daemon + +Now swap the daemon over to the new binary. + +1. Stop the existing daemon if anything is running: + - If `AUTOSTART_STATE = Running` → `claude-mailbox stop` and wait up to 5s for `/health` on `PORT` to start failing (poll once per second). + - If `DAEMON_REACHABLE = true` but `AUTOSTART_STATE = NotInstalled` → a foreground/manual daemon is running. Tell the user: + > A daemon is reachable on port $PORT but autostart is not installed, so I can't stop it. Stop the manual `claude-mailbox serve` process yourself, then re-run this command to finish the restart. + Then stop here. + - Otherwise nothing to stop. +2. Start the daemon, picking the path that matches `AUTOSTART_STATE`: + - `Running` or `Stopped` (i.e. autostart is installed) → `claude-mailbox start`. + - `NotInstalled` → skip the start. After the loop below times out, tell the user to run `/claude-mailbox:mailbox-doctor` to install autostart, then exit. +3. Poll `curl -sf -m 1 http://127.0.0.1:$PORT/health` up to **10 times with 1s sleeps**. Stop polling as soon as one returns JSON with `"status":"ok"`. +4. Outcome: + - Health came up AND `version === LATEST` → ✓ proceed to Step 6. + - Health came up but `version !== LATEST` → the wrapper started an *old* binary somewhere on PATH. Dump `which claude-mailbox` / `where claude-mailbox` and stop with that info. + - Health did not come up → dump the most recent daemon log to help diagnose. Try, in order, the first one that exists: + - Windows Scheduled Task / fallback: `%LOCALAPPDATA%\ClaudeMailbox\logs\daemon.log` (tail the last 40 lines) + - Windows Service variant: `Get-WinEvent -ProviderName ClaudeMailbox -MaxEvents 20` via `powershell -NoProfile -Command "..."` + - macOS launchd: `~/Library/Logs/ClaudeMailbox/daemon.log` (last 40 lines) + - Linux systemd-user: `journalctl --user -u claude-mailbox -n 40 --no-pager` + If none exist or all are empty, just say "No daemon logs found; try `claude-mailbox serve` in a terminal to see the error directly." + Stop with that diagnostic; do not pretend the update succeeded. + +## Step 6 — summary Print exactly this block: ``` Claude-Mailbox update - previous version: - new version: - daemon: Running | Stopped | NotInstalled - pending messages survived: + previous version: + new version: + daemon health: ok (v) | unreachable + daemon autostart: Running | Stopped | NotInstalled + pending messages: ``` -If the new version matches `LATEST` and daemon is `Running`, end with: "Update complete." -Otherwise, end with the first thing that went wrong. +End with one of: + +- New CLI version matches `LATEST` AND `/health` returns `version === LATEST` → **"Update complete."** +- Anything else → **"Update incomplete: ."** + +## Operating notes + +- **Always use the scope override flag** (`--@kuns:registry=...`) on every npm call. The user's `.npmrc` may be on a network drive that npm can't read. +- **Never rely on `claude-mailbox status` alone** to decide "the daemon is fine". Always cross-check with a `/health` probe — the status command only reflects whether the autostart task is in a Running state and doesn't notice if the process inside crashed at boot. +- **Never run `npm install` without first locking in the current state.** If the install fails, the safe rollback is to do nothing — the old daemon is still running. +- **Never `claude-mailbox stop` before the install succeeds.** Downtime is paid only after we know the new binary is on disk.