pal-e-platform
Notes
Doc
-
7f-4 Alignment Audit Report
report-7f4-alignment-audit -
Alert State Report — 2026-05-01
alert-report-2026-05-01 -
Architecture: CI Pipeline (shared Woodpecker pattern)
arch-ci-pipeline -
ArgoCD Image Updater
argocd-image-updater -
BUG: kube-router ipset population broken — NetworkPolicies block all traffic
bug-kube-router-ipset-empty -
Bug: ArgoCD Image Updater registry auth mismatch (platform-wide)
bug-image-updater-registry-auth -
Bug: CNPG webhook drift — kubernetes_manifest provider incompatibility
bug-cnpg-webhook-drift-wal-timeout -
DORA Framework: Platform Axiom
dora-framework -
Host Inventory: Arch Box
host-inventory-archbox -
Incident: Woodpecker webhook signatures invalid — merge=deploy broken
incident-2026-03-14-woodpecker-webhook-signatures -
Incident: pal-e-docs CI migration-test failure (Alembic drift)
incident-paledocs-alembic-drift-2026-03-14 -
Incident: pal-e-streamlit public funnel exposed PII (2026-04-10)
incident-2026-04-10-pal-e-streamlit-public-funnel -
Lesson: Salt GPG Renderer + GPG Agent Configuration
lesson-salt-gpg-agent-config -
Milestone: Woodpecker Postgres Migration + DORA Pipeline Complete
milestone-2026-03-14-woodpecker-postgres-dora-pipeline -
Network Traffic Map — pal-e Cluster
doc-network-traffic-map -
Observability Audit: pal-e-platform
observability-audit-2026-02-25 -
Observability Baseline Audit (2026-03-13)
audit-observability-baseline-2026-03-13 -
Platform Architecture
platform-architecture -
Platform CI/CD
platform-ci-cd -
Platform Maturity Matrix
platform-maturity-matrix -
Platform Monitoring
platform-monitoring -
Post-Move Network Recovery — Archbox at New Location
todo-post-move-network-recovery -
Review: Add computed_fields to CNPG kubernetes_manifest resources
review-1107-2026-04-26 -
Review: Apply 5+ pending terraform changes (ArgoCD migrations)
review-521-2026-03-28 -
Review: Apply ruff standard to gmail-mcp (missed from #29 rollout)
review-640-2026-03-28 -
Review: ArgoCD CMP sidecar fails to render kustomize+SOPS overlays
review-525-2026-03-29 -
Review: ArgoCD CMP sidecar fails to render kustomize+SOPS overlays (re-review)
review-525-2026-03-29-v2 -
Review: Bug: 31 stale agent worktrees accumulating across repos
review-637-2026-03-28 -
Review: Bug: Woodpecker webhook not firing on Forgejo squash merge
review-273-2026-04-07 -
Review: Bug: Woodpecker webhook not firing on Forgejo squash merge (re-review)
review-881-2026-04-07 -
Review: Bug: stale agent worktrees accumulating across repos (re-review)
review-637-2026-03-28-v2 -
Review: Bug: update-kustomize-tag skipped when CI tests fail
review-882-2026-04-07 -
Review: Bug: update-kustomize-tag skipped when CI tests fail (re-review)
review-882-2026-04-07-v2 -
Review: Change Tailscale SSH ACL from "check" to "accept"
review-802-2026-04-04 -
Review: Create ldraney user on Forgejo for public-facing profile URL
review-621-2026-03-28 -
Review: Keycloak realm config via Terraform provider
review-142-2026-03-28 -
Review: Keycloak service account for programmatic admin API access
review-785-2026-04-03 -
Review: Python repo standards: ruff pre-commit hooks + repo setup template
review-55-2026-03-28 -
Review: Python repo standards: ruff pre-commit hooks + repo setup template
review-55-2026-03-28-r4 -
Review: Woodpecker server log noise: orphaned queue.Done / stream errors
review-631-2026-03-28 -
Review: fix: PlayMe2K kustomize overlay references wrong namespace (twitch-2k-wager vs playme2k)
review-828-2026-04-05 -
Review: pal-e-platform#306 — Add admin_app_db_password to Salt pillar
review-306-2026-04-25 -
SRE Debugging with kubectl
sre-kubectl-debugging -
Service Onboarding Port + Registry Validation
todo-service-onboarding-validation -
TF: Best Practices Comparison
tf-best-practices-comparison -
TF: CI/CD Pipeline Design + DORA Metrics
tf-pipeline-design -
TF: Current File Tree (Annotated)
tf-current-filetree -
TF: Environment Strategy (Dev/Prod)
tf-environment-strategy -
TF: Modularization Roadmap
tf-modularization-roadmap -
TF: PostgreSQL Strategy
tf-postgres-strategy -
TF: Rollback Strategy + Disaster Recovery
tf-rollback-strategy -
TF: Team Readiness Assessment
tf-team-readiness -
TODO: Bump ArgoCD Image Updater memory limit to 512Mi
todo-argocd-image-updater-oom -
TODO: Clean up dead kustomize base files
todo-cleanup-dead-kustomize-bases -
TODO: Delete orphan Woodpecker secret tf_var_slack_webhook_url
todo-delete-woodpecker-slack-secret -
TODO: Fix CNPG backup verification CronJob failure
todo-cnpg-backup-verify-failure -
TODO: Fix CNPG postgres metrics exporter (port 9187 not listening)
todo-cnpg-metrics-exporter -
TODO: Fix Harbor imagePullSecret drift across namespaces
todo-harbor-pull-secret-drift -
TODO: Remove MCP remote + basketball-api-dev from pal-e-services k3s.tfvars
todo-remove-stale-services-tfvars -
TODO: Wire ALL woodpecker secrets into terraform helm values (DB password + agent secret + API token + encryption key)
todo-woodpecker-secrets-terraform -
Terraform Architecture Assessment (2026-02-26)
tf-architecture-assessment-2026-02-26 -
Validation 71 — cnpg_scheduled_backup cron drift auto-resolved
validation-71-2026-04-26 -
Validation: #118 pal-e-app Mobile Responsive
validation-118-2026-05-06 -
Validation: #345 Harbor Mobile CSS
validation-345-2026-05-06 -
Validation: #346 MinIO Mobile CSS
validation-346-2026-05-06 -
Validation: #347 Forgejo Mobile CSS
validation-347-2026-05-06 -
Validation: #348 Harbor Mobile Proxy
validation-348-2026-05-06 -
Validation: Add computed_fields to CNPG kubernetes_manifest
validation-69-2026-04-26 -
Validation: Tailscale SSH ACL accept (#262)
validation-262-2026-04-04 -
Validation: pal-e-deployments #95
validation-95-2026-04-05 -
Validation: pal-e-deployments#146 — KEYCLOAK_CLIENT_SECRET landed
validation-146-2026-05-03 -
Validation: sop-postgres-restore dry-run drill — 2026-04-21
validation-postgres-restore-2026-04-21 -
Why DevOps Materializes at Team Onboarding
insight-devops-materializes-at-team-onboarding
Architecture
-
Architecture: Secrets Pipeline
arch-secrets-pipeline
Todo
-
Bug: 19 alerts from broken/undeployed services (noise floor)
bug-alert-noise-broken-services -
Bug: ArgoCD Image Updater cannot authenticate to Harbor
bug-argocd-image-updater-harbor-auth -
Bug: Grafana CrashLoopBackOff — Duplicate Default Datasource
bug-grafana-crashloop -
Bug: NodeClockNotSynchronising — NTP not configured
bug-node-clock-ntp -
Bug: nftables Salt state uses service.running for oneshot service
bug-nftables-service-running-oneshot -
Bug: tf-state-backup CronJob — bitnami/kubectl:1.31 image removed from Docker Hub
bug-tf-state-backup-image-dead -
TODO: Add paledocs_db_password to Salt pillar and Makefile
todo-paledocs-db-password-pillar -
TODO: Deployment Safety — Rollback, Migration Guards, Dev Environment
todo-deployment-safety -
TODO: Fix Grafana CrashLoopBackOff — duplicate default datasource
todo-fix-grafana-duplicate-default-datasource -
TODO: Fix Woodpecker TLS clone failure (use internal Forgejo URL)
todo-woodpecker-tls-clone-fix -
TODO: Fix Woodpecker webhook token signatures (post-Postgres migration)
todo-woodpecker-webhook-token-fix -
TODO: Forgejo PyPI Registry -- Migrate 22 Public Packages
todo-forgejo-pypi -
TODO: GPG Physical Backup
todo-gpg-physical-backup -
TODO: Monitoring Stack MCP API Surface
todo-monitoring-stack-mcp-api -
TODO: Non-heading block anchor_ids
todo-block-anchor-ids -
TODO: Remove per-repo clone URL overrides (Woodpecker TLS fix deployed)
todo-remove-clone-url-overrides -
TODO: Rename deployments repo to pal-e-deployments
todo-rename-deployments-repo -
TODO: Use -lock=false for CI tofu plan (prevent state lock contention)
todo-tofu-plan-lock-false -
TODO: Woodpecker MCP (swagger.json -> SDK -> MCP pipeline)
todo-woodpecker-mcp
Convention
-
Convention: Kustomize Overlay for Deployments
convention-kustomize-overlay -
Namespace Conventions
namespace-conventions
Sop
-
Deployment Lessons Learned
deployment-lessons -
SOP: Gmail OAuth Token Management
sop-gmail-oauth -
SOP: Harbor Robot Import Recovery
sop-harbor-robot-import -
SOP: Incident Response
sop-incident-response -
SOP: Keycloak Client Creation (Admin Console)
sop-keycloak-client-creation -
SOP: Network Security
sop-network-security -
SOP: Platform Terraform Changes
sop-platform-tf-changes -
SOP: Postgres Restore (CNPG + MinIO)
sop-postgres-restore -
SOP: Secrets Management
sop-secrets-management -
SOP: harbor-creds Migration (SOPS-Overlay → Terraform-Managed)
sop-harbor-creds-migration -
Service Onboarding SOP
service-onboarding-sop
Phase
-
Epilogue: Post-Plan Cleanup
phase-postgres-epilogue-cleanup -
Phase 14a: Webhook Token Fix + Probe URL Nits
phase-pal-e-platform-14a-webhook-fix -
Phase 14b: Observability Cleanup — Alert Noise Reduction
phase-pal-e-platform-14b-observability-cleanup -
Phase 16: Alert Tuning & Resource Right-Sizing
phase-platform-16-alert-tuning -
Phase 16: SLO Governance (Sloth)
phase-pal-e-platform-16-slo-error-budgets -
Phase 17a: Woodpecker Secrets Hardening
phase-platform-17a-woodpecker-secrets -
Phase 17b: Terraform State Governance
phase-platform-17b-tf-state-governance -
Phase 19: Policy-as-Code (Kyverno)
phase-platform-19-policy-kyverno -
Phase 1: TF Modularization — DEFERRED
phase-postgres-1-tf-modularize -
Phase 20: Security Deepening
phase-platform-20-security-deepening -
Phase 20a: Dependency Scanning (Renovate)
phase-platform-20a-dependency-scanning -
Phase 20b: Runtime Security (Falco)
phase-platform-20b-runtime-security -
Phase 20c: Supply Chain Signing (Cosign/Sigstore + Syft)
phase-platform-20c-supply-chain-signing -
Phase 20d: Web App Scanning (OWASP ZAP)
phase-platform-20d-webapp-scanning -
Phase 21: Progressive Delivery (Argo Rollouts)
phase-platform-21-progressive-delivery -
Phase 22: Load Testing (k6 Operator)
phase-platform-22-load-testing -
Phase 23: Chaos Engineering (LitmusChaos) — Capstone
phase-platform-23-chaos-engineering -
Phase 24: MinIO SDK — S3 Signature V4 + Core Operations
phase-pal-e-platform-24-minio-sdk -
Phase 25: MinIO API — FastAPI REST Service
phase-pal-e-platform-25-minio-api -
Phase 26: MinIO Playground — Mobile-First File Browser
phase-pal-e-platform-26-minio-playground -
Phase 27: MinIO SvelteKit — Production Mobile App
phase-pal-e-platform-27-minio-sveltekit -
Phase 28: Keycloak Declarative Onboarding
phase-platform-28-keycloak-declarative-onboarding -
Phase 28: Keycloak SMTP — Platform Email Foundation
phase-pal-e-platform-28-keycloak-smtp -
Phase 29: SvelteKit Convention + Capacitor SOP Stages 5-6
phase-pal-e-platform-29-sveltekit-convention -
Phase 2: Platform CNPG Foundation
phase-postgres-2-deploy-cnpg -
Phase 2b: Clean Up Platform TF — Remove App-Level CNPG Resources
phase-postgres-2b-cleanup-platform -
Phase 30: Mac CI Agent — iOS Build Infrastructure
phase-pal-e-platform-30-mac-ci-agent -
Phase 4: Postgres Backup Verification + Restore SOP
phase-postgres-4-backup-restore -
Phase 6.3: Plan-on-PR Pipeline
phase-pal-e-platform-ci-6-3-plan-on-pr -
Phase 6.4: Apply-on-Merge Pipeline
phase-pal-e-platform-ci-6-4-apply-on-merge -
Phase 7f-4: Note Attribute Augmentation
phase-postgres-7f-4-attribute-augmentation -
Phase 8: Network Security Hardening
phase-pal-e-platform-network-security -
Phase: CI Pipeline & Team Hardening
phase-pal-e-platform-ci-hardening -
Phase: CI — State Backup CronJob
phase-pal-e-platform-ci-6-1-state-backup -
Phase: CI — Validation Pipeline
phase-pal-e-platform-ci-6-2-validation-pipeline -
Phase: DORA Re-Baseline + Dashboard Verification
phase-pal-e-platform-15-dora-rebaseline -
Phase: Data-Driven Operations Dashboard
phase-pal-e-platform-18-operations-dashboard -
Phase: Database Backup Verification
phase-pal-e-platform-backup-verification -
Phase: Dependency Scanning (Renovate)
phase-pal-e-platform-dependency-scanning -
Phase: Distributed Tracing (OpenTelemetry + Tempo)
phase-pal-e-platform-17-distributed-tracing -
Phase: Environment Isolation & Secret Boundaries
phase-pal-e-platform-env-isolation -
Phase: Incident Management SOP
phase-pal-e-platform-incident-mgmt -
Phase: Kustomize Service Bases
phase-pal-e-platform-kustomize -
Phase: Observability — Alerting + Deployment Protection
phase-observability-3-alerting -
Phase: Observability — Architecture Review
phase-observability-5-architecture -
Phase: Observability — First Service Dashboard
phase-observability-4-dashboard -
Phase: Observability — Project Page Foundation
phase-observability-1-project-page -
Phase: Observability — Telegram Alerting
phase-observability-3a-telegram-alerting -
Phase: Observability — Verify Baseline
phase-observability-2-verify-baseline -
Phase: Synthetic Monitoring + Uptime Checks
phase-pal-e-platform-14-synthetic-monitoring -
Phase: Vulnerability Scanning
phase-pal-e-platform-vuln-scanning -
Subphase 4a: Barman Cloud Plugin Migration
phase-postgres-4a-barman-plugin-migration
Board
-
Pal E Platform Board
board-pal-e-platform
Plan
-
Plan: DORA Metrics Dashboard
plan-2026-03-01-dora-metrics-dashboard -
Plan: MinIO Object Storage
plan-2026-02-24-minio-object-storage -
Plan: Platform Hardening
plan-pal-e-platform -
Plan: Platform Observability Foundation
plan-2026-02-25-platform-observability -
Plan: Salt Host Configuration Management
plan-2026-02-26-salt-host-management -
Plan: Shared Postgres (CloudNativePG)
plan-2026-02-26-tf-modularize-postgres
Project Page
-
Project: pal-e-platform
project-pal-e-platform
Review
-
Review (R2): P0 pal-e-services tf state drifted — prod postgres in blast radius
review-1064-2026-04-20-r2 -
Review R2: P1: validate sop-postgres-restore via dry-run drill (blocks #297)
review-1065-2026-04-21-r2 -
Review: #111 Fix Keycloak probe (1 alert)
review-190-2026-03-18 -
Review: Add argocd namespace to Forgejo network policy
review-447-2026-03-26 -
Review: Apply Terraform state drift (3 alerts)
review-192-2026-03-18 -
Review: Apply ruff standard to gmail-mcp (re-review)
review-640-v2-2026-03-28 -
Review: ArgoCD apps point to wrong source repos + external Forgejo URLs
review-452-2026-03-26 -
Review: ArgoCD repo-server memory bump (1 alert)
review-191-2026-03-18 -
Review: ArgoCD repo-server memory bump (1 alert)
review-item-191-2026-03-18 -
Review: ArgoCD repo_url :80 port mismatch — blocks tofu apply
review-460-2026-03-27 -
Review: Automate Gmail OAuth re-auth lifecycle (7-day token expiry)
review-359-2026-03-27 -
Review: Automate Gmail OAuth re-auth lifecycle (7-day token expiry) v2
review-359-2026-03-27-v2 -
Review: Automate Gmail OAuth re-auth lifecycle v3
review-359-2026-03-27-v3 -
Review: Automate Gmail OAuth re-auth lifecycle v4
review-359-2026-03-27-v4 -
Review: Bug: ArgoCD stale app
review-452-2026-03-27 -
Review: Bug: Blackbox probe TLS failure (expanded to 4 probes)
review-385-2026-03-26c -
Review: Bug: Blackbox probe TLS failure on pal-e-app funnel
review-385-2026-03-26 -
Review: Bug: Blackbox probe TLS failure on pal-e-app funnel (expanded to 4 probes)
review-385-2026-03-26b -
Review: Bug: contract signatures publicly exposed via MinIO CDN
review-415-2026-03-27 -
Review: Bug: dead westsidekingsandqueens-funnel ingress
review-443-2026-03-26 -
Review: Bug: image tag automation not firing -- manual deploys required
review-453-2026-03-26 -
Review: Bug: merge_approved_pr has no approval gate hook
review-363-2026-03-27 -
Review: Bug: pal-e-mail ServiceMonitor scraping nonexistent /metrics
review-386-2026-03-26 -
Review: Bug: pal-e-mail ServiceMonitor scraping nonexistent /metrics
review-386-2026-03-26-v2 -
Review: Bug: pal-e-mail ServiceMonitor scraping nonexistent /metrics
review-386-2026-03-26-v3 -
Review: Bug: platform-validation OOMKilled at 64Mi + stale alert rule
review-388-2026-03-26-v3 -
Review: Bug: platform-validation OOMKilled at 64Mi + stale alert rule
review-388-2026-03-26-v2 -
Review: Bug: platform-validation OOMKilled at 64Mi + stale alert rule (v2)
review-388-2026-03-26 -
Review: Bump agent parallel workflows 1 to 4
review-432-2026-03-26 -
Review: CI clone broken — Forgejo internal URL unreachable
review-221-2026-03-21 -
Review: CI clone broken — Forgejo internal URL unreachable (v2)
review-221-2026-03-21-v2 -
Review: CI pipeline targeted apply (depends on #197)
review-437-2026-03-27 -
Review: Clean up pal-e-playground repo
review-402-2026-03-27 -
Review: Critical: Migrate basketball-api Postgres to CNPG
review-417-2026-03-26 -
Review: Critical: Migrate basketball-api Postgres to CNPG (re-review v2)
review-417-2026-03-26-v2 -
Review: Critical: Migrate basketball-api Postgres to CNPG (v3)
review-417-2026-03-26-v3 -
Review: Critical: Re-establish orphaned CNPG cluster manifest
review-423-2026-03-26 -
Review: Critical: Re-establish orphaned CNPG cluster manifest (re-review)
review-423-2026-03-26-v2 -
Review: CronJob stale failures causing persistent KubeJobFailed alerts
review-387-2026-03-26 -
Review: Fix westside-app Harbor auth (4 alerts)
review-189-2026-03-18 -
Review: Fix westside-app Harbor auth (4 alerts)
review-189-2026-03-18-westside-harbor -
Review: Fix westside-app Harbor auth (4 alerts) — v2
review-189-2026-03-18-v2 -
Review: Harbor connectivity timeout from Woodpecker CI agent
review-411-2026-03-26 -
Review: Harbor unreachable from CI pods
review-254-2026-03-22 -
Review: Import Keycloak realms/clients into Terraform
review-277-2026-03-22 -
Review: Init container resource limits + busybox tag pinning
review-283-2026-03-27 -
Review: Kaniko HTTPS probe timeout — insecure-registry fix
review-428-2026-03-26 -
Review: Keycloak SMTP (phase note)
review-285-2026-03-27 -
Review: Keycloak realm config via Terraform provider
review-270-2026-03-27 -
Review: Landing site rename
review-450-2026-03-27 -
Review: Migrate all apps to pal-e-deployments
review-448-2026-03-26 -
Review: Migrate all apps to pal-e-deployments
review-448-2026-03-27 -
Review: MinIO network policy: allow pal-e-mail ingress
review-284-2026-03-22 -
Review: P0 pal-e-services tf state drifted — prod postgres in blast radius
review-1064-2026-04-20 -
Review: P1: validate sop-postgres-restore via dry-run drill (blocks #297)
review-1065-2026-04-21 -
Review: P2 off-cluster postgres backup destination (DR, not just resilience)
review-1066-2026-04-21 -
Review: Phase 29 SvelteKit Convention
review-286-2026-03-27 -
Review: Phase 30 Mac CI Agent
review-287-2026-03-27 -
Review: Python repo standards: ruff pre-commit hooks + repo setup template
review-55-2026-03-27 -
Review: Remove SendGrid dependency -- Gmail OAuth covers all email
review-222-2026-03-22 -
Review: Remove capacitor-dev (3 alerts)
review-194-2026-03-18 -
Review: Remove non-functional gRPC funnel
review-401-2026-03-27 -
Review: Remove palworld (1 alert)
review-193-2026-03-18 -
Review: Rollout: wire update-kustomize-tag into all 9 repos
review-464-2026-03-27 -
Review: Rollout: wire update-kustomize-tag into remaining repos (re-review)
review-464-2026-03-28 -
Review: Rotate Woodpecker API token in Salt pillar + consumers
review-333-2026-03-27 -
Review: Spike -- CI bootstrap resilience
review-231-2026-03-22 -
Review: Tailscale funnel bug
review-443-2026-03-27 -
Review: Terraform state splitting -- modularize main.tf
review-436-2026-03-26 -
Review: Validate pal-e-deployments (k8s API unreachable)
review-515-2026-03-27 -
Review: Validate pal-e-platform (3 merged + #222 pending)
review-512-2026-03-27 -
Review: Woodpecker agent label routing — pipeline contract
review-425-2026-03-26 -
Review: Woodpecker agent secret drift -- blocks safe apply
review-256-2026-03-22 -
Review: fix basketball-api network policy missing self + westside-contracts ingress
review-843-2026-04-03 -
Review: gmail-mcp: SSH-compatible gmail_reauth tool
review-361-2026-03-25 -
Review: nftables reload-after-tailscale
review-400-2026-03-27 -
Review: tofu apply blocked by MinIO provider refresh
review-435-2026-03-27 -
Review: tofu-state backup CronJob failures (2 alerts)
review-224-2026-03-22
Untyped
-
Review: pal-e-services PR #65 — westside-admin Harbor onboarding
review-pr-65-2026-04-25
Validation
-
Validation: Remove :80 from ArgoCD repo URLs (pal-e-services#36)
validation-36-2026-03-27 -
Validation: Remove dead westsidekingsandqueens-funnel ingress (pal-e-services#35)
validation-35-2026-03-27 -
Validation: Remove non-functional Woodpecker gRPC funnel
validation-182-2026-03-27
Repos
-
pal-e-deploymentsactive
-
pal-e-platformactive
-
pal-e-servicesactive