Storage Fixes, Vault Cleanup, and the Reboot That Bit Back

Energy: 7/10. Mood: 7/10. A full day — server work through most of it, vault documentation in the afternoon. Ended up dealing with an unexpected Immich outage too, which turned into one of the more satisfying debug sessions in a while.

Storage Rename and Samba

The 2TB external drive had been mounted at /mnt/photos since Immich was first set up — a name that made sense at the time but was misleading given the drive was always meant to act as a local NAS placeholder until a proper NAS gets built. Renamed it to /mnt/tmpnas: created the new mount point, unmounted the drive, updated /etc/fstab, remounted, and updated Immich's .env to point UPLOAD_LOCATION to /mnt/tmpnas/immich. Restarted the Immich stack to confirm photos still loaded. They did.

With the drive properly named, set up two Samba shares on top of it. homeshare is guest-accessible, read/write, available to anything on the network — a general-purpose drop point. eykli is private, requires a Samba password, for personal files. Both share blocks added to /etc/samba/smb.conf, Samba restarted. The shares are reachable locally and through Tailscale using smb://100.78.149.70/homeshare (or /eykli).

One gotcha worth noting: lsblk briefly showed the drive as unmounted during the remount process — not an actual problem, just a timing thing between commands. The Immich photo loading issue after the restart was caused by the container having started before the drive was properly mounted; restarting the stack after confirming the mount resolved it.

T7 Shield UAS Fix

Later in the day, opened Immich and photos weren't loading. Checked docker stats and saw immich_server sitting at 147% disk utilisation with a 25-second flush wait and 4.9 second write latency. That's not a software problem.

The Samsung T7 Shield (/dev/sda, mounted at /mnt/tmpnas) was throwing UAS timeout errors in the kernel log — uas_eh_abort_handler timeouts, reads and writes failing, the kernel aborting commands mid-operation. UAS (USB Attached SCSI) is the faster USB storage protocol but it can be unstable on certain cable and controller combinations. On this connection it was causing complete I/O degradation.

Fixed it by disabling UAS for the T7 Shield via kernel quirk. The device's vendor/product ID is 04e8:61fb, so added usb-storage.quirks=04e8:61fb:u to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub, ran sudo update-grub && sudo reboot. After reboot the drive uses the usb-storage driver instead of uas. Kernel confirmed the quirk applied, and the results were immediate — write latency dropped from 4,864ms to 37ms, flush wait back to normal, Immich back to responsive. All containers healthy.

Minor side effect: device enumeration changed from /dev/sdb1 to /dev/sda after switching drivers — just kernel reassignment, not an error. The fstab UUID entry was unaffected.

Vault CLAUDE.md Cleanup

The vault's context files had accumulated prose where bullet points would do. Went through the root CLAUDE.md, the Homelab file, and the N5HQ Website Build file and converted everything to dot points. Added two new rules to the root: a Writing Standards section making dot-point-only format the default for all context files going forward, and a Journal Workflow rule that locks in using find -newer to scope only files changed since the last entry rather than pulling from the whole vault.

The subfolder CLAUDE.md files had duplicate rules that were exact copies of what lives in root. Removed the duplicates and replaced them with a single pointer back to root — keeps things DRY and means there's one place to update if anything changes.

Also rewrote the existing journal entries (June 7–10) from a templated section format into proper blog prose.

The Reboot Bites Back

The T7 fix required a reboot, and that reboot collected its toll later in the evening. First casualty: Home Assistant. ha.n5hq.me was returning 400: Bad Request on every request while docker ps insisted the container was healthy. Checking locally told a stranger story — HA answered fine on port 8123, but redirected straight to the onboarding screen. A brand-new instance. Where did mine go?

The timestamps pieced it together. At 17:12, the live /homeassisstant data directory got moved into /maybebin during a root-level tidy-up. Five minutes later the server rebooted for the GRUB change. When Docker came back up, it auto-created an empty folder at the old bind-mount path, and HA initialised a factory-fresh instance into it — stock config, no reverse-proxy settings, no users, nothing. Every request through the Cloudflare tunnel got rejected because the new config didn't trust the proxy. A stray second cloudflared container running on the wrong network mode was muddying the diagnosis on top of it.

The real data — 3GB, the automations, HACS, history back to July 2025 — sat untouched in /maybebin the whole time. Stopped the container, swapped the directories back, started it again. One clean self-restart later (a database migration, most likely) and ha.n5hq.me was serving the proper login page with everything intact. The blank instance is parked in /maybebin as a fallback until it's clearly safe to delete.

The uncomfortable part of the post-mortem: nothing backs up Home Assistant on this server, and the backups/ folder was empty when it mattered. Today the data survived by luck — it was moved, not deleted. That gap gets closed soon.

The Website Was Down Too

Same reboot, second casualty: the local website dev server on port 3001. serve.mjs was a manually started process, so nothing brought it back after the boot. The published site on Vercel never blinked, but the local one had been dead since 17:17 without anyone noticing. Restarted it and added a crontab @reboot entry so it survives the next one.

While in the repo, closed a documentation gap that surfaced during the confusion: the project's CLAUDE.md knew where the site was hosted but never spelled out how to push — repo location, remote, what a push to main triggers (Vercel deploy, GitHub Actions regenerating the blog manifest), and what must never be committed. That's written down now. Also gitignored the vault-side folders that were one careless git add -A away from landing in a public repo.

What's Next

Re-establish HomeKit pairing for Aqara doorbell
Install Canon G4600 Series driver, update printer port to 10.1.1.5
Fix Claudian session ID error on MacBook
Fix WAN uptime Grafana panel
Cloudflare Access in front of Immich
Monitor T7 Shield stability over the next few days
Find what moved /homeassisstant to /maybebin before the reboot
Set up automatic Home Assistant backups, stored off the server
Fix the AirTouch4 integration (dataResult error — pre-existing, not reboot damage)
Commit and push the pending website repo changes

Solid day with a sting in the tail. The T7 Shield issue was a good reminder that hardware problems don't announce themselves cleanly — 147% disk utilisation in a container stat is not where you'd start looking for a USB protocol mismatch. The reboot fallout taught the other lesson: a fix that needs a restart will find every process that was only ever started by hand, and every folder that was moved "just for now". Both services came back, but the HA recovery worked because the data happened to still exist. Backups stop that being a matter of luck.