LaneAward Operations

Time On Tasks & Operations Runbook

This is the quick-reference HTML runbook for the recurring processes we use to keep the Time On Tasks contributor app current in development and staging. It is intentionally separate from the chronological setup log so routine work is easy to find.

Contributor App: timeontasks Console Draft: ops_console Shared Backend: workforce_app/backend Canonical DBFs: pm_database

Environment Parity Rule

Keep development, AWS staging, and the public runtime aligned on Python version, SQLite version, schema level, refreshed database content, backend code, frontend code, and service configuration. If those drift, report or workflow behavior can look broken even when the application logic itself is correct.

Staging Is Production-Like — Not A Sandbox

AWS staging is a production-like environment. Real contributors (Will, Alex) actively use it, and its database holds live work sessions and user records that cannot be recovered without a backup restore.

Treat every push, rsync, or database operation targeting staging with the same care you would give a production system. Before any staging operation, ask: does this preserve existing users and sessions? If the answer is not an obvious yes, stop and verify first.

Source Code Is Tracked With Local Git

The repository at /Users/donaldscott/Project-Code/laneaward/repo/ is under local Git version control. Deploy scripts read from the current working tree, so the branch that is checked out at deploy time determines what gets pushed to staging. Before running any deploy, confirm the intended branch with git branch --show-current. Full background is in the Version Control And Source Management section of the Project Reference document.

1 — Publish Contributor App To AWS Staging Deploys the full Time On Tasks bundle to staging — index.html, manifest.webmanifest, app.js, sw.js, version.json, sounds/, user-guide.html, and user-guide-images/ — injecting the "- Staging" label into the app title at deploy time. Always bump the version token in index.html when app.js changes — tablets cache by token URL and will serve stale code if the token is not updated.
1B — Publish Operations Console To AWS Staging Deploys the Operations Console bundle (index.html, manifest.webmanifest, app.js, version.json) to the staging console web root, injecting the "- Staging" label into the app title at deploy time. Always bump the version token in index.html when app.js changes.
1C — Deploy Console Documents To AWS Staging Deploys all four console documents (runbook, user guide, project reference, topology) to staging. No app code is touched — safe to run any time a document changes.
2 — Backup and Recovery The PWA BDR service runs automatic backups of the staging database every 12h and source code weekly. Covers manual pre-operation snapshot — verify the backup exists before any risky operation.
2B — Restore Staging Database From Backup Full guided restore procedure using restore_staging_db.sh. Auto-selects the most recent backup or accepts a specific file. Requires two typed confirmations, takes an automatic safety backup, stops the service, restores the database, and verifies health. The service is never left stopped — recovery is attempted on failure.
3 — Staging Status Check Runs the vm_workforce_status.sh helper for a quick read of service health, HTTP routes, and database row counts in one pass. Run before and after any significant staging operation to confirm clean state.
4 — Activity Reset for Staging Clears work sessions, tasks, and material usage from staging while preserving all users and ProfitMaker reference data. Caution: irreversible without a restore — an automatic backup is taken unless explicitly skipped with --no-backup.
6 — Backend Deploy Deploys server.py and schema.sql to staging and restarts the backend service. Use only when backend code or schema changed — do not run for frontend-only updates. Schema migrations apply automatically on restart.
7 — Update SQLite From ProfitMaker Files Two-step process: refresh the local SQLite from the latest ProfitMaker export, then push only the reference tables (customers, orders) to the live staging database. Runs on weekdays only by default — use FORCE_REFRESH=1 to run on a weekend. Never touches users, sessions, or activity data.
8 — Reliability Verification For Task Writes Documents the Phase 1 database reliability features (WAL, busy timeout, synchronous FULL, BEGIN IMMEDIATE) that must be preserved on every backend update. Includes a manual checklist for verifying write behavior under concurrent load.
9 — Service Worker Deploy and Cache Invalidation Explains the cache strategy (network-first for HTML, cache-first for versioned assets, network-only for API). Covers when sw.js must be updated and what to avoid to prevent stale content being served to tablets after a deploy.
10 — Concurrent Stress Test Simulates up to 20 concurrent tablet operators against staging. Run before go-live milestones and after significant backend changes. Zero hard failures required to pass — retryable busy responses are acceptable.
11 — Server Access And Security Group Management SSH connection procedures via Twingate (any location) or direct public IP (office only). Also covers updating AWS Security Group rules when authorized IP addresses change. Never remove all SSH rules simultaneously.

Canonical Routes

Contributor App	`/` = Time On Tasks
Console	https://staging.console.laneaward.com/ = combined Admin + Reports frontend
API	`/api/` = shared LaneAward backend
Legacy Route	`/workforce/` redirects to `/`

Process 1: Local Contributor-App Publish To AWS Staging

Use this after changing the contributor frontend in timeontasks. The staging host uses the contributor app at the root route.

Critical Rule — Bump The Token And Deploy Both Files Together

Any time app.js changes, you must also bump the version token in index.html before deploying. The service worker caches app.js by its token URL — if the token does not change, tablets will keep serving the old cached version indefinitely.

Bump the token in index.html: e.g. app.js?v=20260410-signin1 → app.js?v=20260410-signin2.
Deploy both index.html and app.js in the same rsync run.
rsync skips files whose timestamps have not changed, even if the content changed. If you are unsure whether app.js was picked up, add --checksum to force a content comparison: rsync -avh --checksum …
After deploy, a single normal page reload on the tablet is sufficient — no hard refresh needed. The SW fetches fresh index.html (network-first), sees the new token URL, misses the cache, and fetches the new app.js automatically.

Deploy: Run The Staging Deploy Script

Run from the repo root on the Mac. The script injects the staging label into index.html and manifest.webmanifest at deploy time, then pushes index.html, manifest.webmanifest, app.js, sw.js, version.json, sounds/, user-guide.html, and user-guide-images/ to the staging web root. No VM login required.

[Mac-local]

bash scripts/deploy_timeontasks.sh

Process 1B: Local Console Publish To AWS Staging

Use this after changing the Operations Console frontend in ops_console. The staging console is available at https://staging.console.laneaward.com/.

Critical Rule — Bump The Token And Deploy Both Files Together

Any time app.js changes, you must also bump the version token in index.html before deploying. The service worker caches app.js by its token URL — if the token does not change, browsers will keep serving the old cached version.

Bump the token in index.html: e.g. app.js?v=20260410-console1 → app.js?v=20260410-console2.
Deploy both index.html and app.js in the same rsync run.
rsync skips files whose timestamps have not changed, even if the content changed. If you are unsure whether app.js was picked up, add --checksum to force a content comparison.
After deploy, a normal page reload in the browser is sufficient — the SW fetches fresh index.html (network-first), sees the new token URL, and fetches the new app.js automatically.

Deploy: Run The Staging Console Deploy Script

Run from the repo root on the Mac. The script injects the staging label into index.html and manifest.webmanifest at deploy time, then pushes index.html, manifest.webmanifest, app.js, and version.json directly to the staging console web root. No VM login required.

[Mac-local]

bash scripts/deploy_console.sh

Process 1C: Deploy Console Documents To AWS Staging

Use this after updating any of the shared console documents: the staging runbook, user guide, project reference, or environment topology. This script deploys documents only — it does not touch app.js, index.html, or any application code.

Documents deployed by this script:

_documents/staging/runbook.html → runbook.html
_documents/user-guide.html → user-guide.html
_documents/project-reference.html → project-reference.html
_documents/laneaward_environment_topology.html → topology.html

Deploy: Run The Console Docs Deploy Script

[Mac-local]

bash scripts/deploy_console_docs.sh

Process 2: Backup and Recovery

Both LaneAward environments are protected by the 🚀 PWA BDR service — a shared macOS menu bar LaunchAgent on the development Mac. Nothing needs to be installed on the VM. Look for the 🚀 icon in the top menu bar to open the separate Lensboard and LaneAward dropdowns, trigger manual runs, adjust schedules, or mark jobs as Include or Skip.

Active Backup Jobs

Job	What it protects	Default schedule
`VM·STAGING — Database`	`/var/lib/laneaward-staging/workforce.db` on AWS	Every 12 h
`LOCAL — Source Code`	Full project on Mac (archives excluded), hardlink snapshots	Weekly

Backup Locations

Staging DB → ~/projectbackups/laneaward/staging-database/
Source snapshots → ~/projectbackups/laneaward/source/
Unified log → ~/projectbackups/backup_logs/backup.log

Each VM backup is a safe online copy — the service SSHs into the VM, runs sqlite3 .backup (no downtime, no locking), then scps the result to the Mac and cleans up the temp file. Every copy is independently restorable.

Restoring Staging

Use the dedicated restore script — see Process 2B. The script handles confirmation, safety backup, service stop/start, and health verification in a single guided run. Do not restore manually.

Manual VM-Side Snapshot (one-off, before a risky change)

[VM — Staging]

sudo cp /var/lib/laneaward-staging/workforce.db /var/lib/laneaward-staging/workforce-$(date +%F-%H%M%S).db

Process 2B: Restore Staging Database From Backup

Use this process when staging data must be recovered from a Mac-local backup. The script is fully guided — it shows you the backup and current database file sizes, requires two explicit typed confirmations, takes an automatic safety backup before making any changes, and verifies health after the restore completes.

This Permanently Destroys Staging Data After The Backup Timestamp

Staging holds live work sessions and user records from real contributors. Before running this script, confirm:

The correct backup file has been identified (check the timestamp carefully).
The data loss window is understood and accepted.
No alternative — such as Process 4 (Activity Reset) — would resolve the issue more safely.
Twingate is running and connected on your Mac.

What The Script Does

Shows the selected backup file and the current live database side by side
Lists the five most recent backups available for reference
Requires you to type RESTORE then YES — two separate gates
Uploads the backup to the server before stopping anything
Takes an automatic timestamped safety backup of the current staging database
Stops laneaward-workforce-api-staging.service
Copies the backup into place and sets correct ownership
Restarts the service and runs a live health check
Restarts the service even if the restore fails — server is never left stopped

Backup Files

Location: ~/projectbackups/laneaward/staging-database/
Pattern: workforce_vm_staging_YYYYMMDDTHHMMSS.db
Schedule: every 12 hours via PWA BDR service
Retention: 14 most recent copies kept

[Mac-local] List available backups

ls -lht ~/projectbackups/laneaward/staging-database/

Step 1 — Confirm Twingate is running

The restore script connects via laneaward-vm → 172.31.7.224. The Twingate client must be active before proceeding. Verify SSH access first:

[Mac-local]
ssh -i ~/.ssh/lane_webserver.pem laneaward-vm "echo SSH OK"

Step 2 — Run the restore script

Run with no argument to auto-select the most recent backup, or pass a specific backup file path:

[Mac-local — most recent backup]
bash /Users/donaldscott/Project-Code/laneaward/repo/scripts/restore_staging_db.sh

[Mac-local — specific backup file]
bash /Users/donaldscott/Project-Code/laneaward/repo/scripts/restore_staging_db.sh ~/projectbackups/laneaward/staging-database/workforce_vm_staging_TIMESTAMP.db

A clean restore finishes with:

  Restore complete. Staging is live and healthy.

  Restored from  : ~/projectbackups/laneaward/staging-database/workforce_vm_staging_TIMESTAMP.db
  Safety backup  : /var/lib/laneaward-staging/workforce-pre-restore-TIMESTAMP.db

Step 3 — Verify and clean up

Confirm staging is responding correctly before deleting the safety backup:

[Mac-local]
curl -sS https://staging.timeontasks.laneaward.com/api/health

Once confirmed, delete the server-side safety backup:

[VM]
sudo rm /var/lib/laneaward-staging/workforce-pre-restore-TIMESTAMP.db

If The Health Check Does Not Return Ok After Restore

Check the service status: ssh laneaward-vm 'sudo systemctl status laneaward-workforce-api-staging.service --no-pager'
Check the service log: ssh laneaward-vm 'sudo journalctl -u laneaward-workforce-api-staging.service -n 50 --no-pager'
The safety backup is on the server at /var/lib/laneaward-staging/workforce-pre-restore-TIMESTAMP.db — it can be used to roll back by running the restore script again with that file as the argument.

Process 3: Staging Status Check

Use the VM status helper when you want one quick read on service health, HTTP routes, and database row counts.

vm_workforce_status.sh

[VM]

SERVICE_NAME=laneaward-workforce-api-staging.service DB_DIR=/var/lib/laneaward-staging DIRECT_HEALTH_URL=http://127.0.0.1:9193/api/health NGINX_HEALTH_URL=https://staging.timeontasks.laneaward.com/api/health ROOT_URL=https://staging.timeontasks.laneaward.com/ CONSOLE_URL=https://staging.console.laneaward.com/ LEGACY_WORKFORCE_URL=https://staging.timeontasks.laneaward.com/workforce/ /opt/laneaward-staging/workforce_app/deploy/vm_workforce_status.sh

Process 4: Activity Reset for Staging

This is the verified safe reset for staging activity only. It preserves users, roles, teams, customer references, sales orders, and ProfitMaker import metadata.

vm_workforce_reset_activity.sh

[VM]

sudo env SERVICE_NAME=laneaward-workforce-api-staging.service LIVE_DB=/var/lib/laneaward-staging/workforce.db DIRECT_HEALTH_URL=http://127.0.0.1:9193/api/health NGINX_HEALTH_URL=https://staging.timeontasks.laneaward.com/api/health STATUS_SCRIPT=/opt/laneaward-staging/workforce_app/deploy/vm_workforce_status.sh /opt/laneaward-staging/workforce_app/deploy/vm_workforce_reset_activity.sh --yes

Deletes rows from order_task, work_session, and material_usage, and also clears work_session_correction_audit through the work_session delete cascade.
Preserves app_user and current PINs.
Backs up the live database unless --no-backup is used.

This does not remove seeded users. User cleanup still needs either the admin tool or a dedicated user-maintenance process.

Process 6: Backend Deploy (server.py or schema.sql changed)

Use this when backend code or schema changed and you need to push the update to staging without replacing the SQLite database. The script uploads server.py and schema.sql, promotes them to /opt/laneaward-staging/workforce_app/backend/, restarts the service, and verifies health. Schema migrations (new columns, indexes) are applied automatically by ensure_schema_upgrades() on startup — no manual SQL required.

[Mac-local]

bash /Users/donaldscott/Project-Code/laneaward/repo/scripts/deploy_backend_to_staging.sh

A clean run finishes with the service showing active and a live health response.

Service restart only (no file changes)

If you only need to restart the service without uploading new files:

vm_workforce_cutover.sh

[VM]

sudo env SERVICE_NAME=laneaward-workforce-api-staging.service DB_DIR=/var/lib/laneaward-staging DIRECT_HEALTH_URL=http://127.0.0.1:9193/api/health NGINX_HEALTH_URL=https://staging.timeontasks.laneaward.com/api/health ROOT_URL=https://staging.timeontasks.laneaward.com/ CONSOLE_URL=https://staging.console.laneaward.com/ LEGACY_WORKFORCE_URL=https://staging.timeontasks.laneaward.com/workforce/ STATUS_SCRIPT=/opt/laneaward-staging/workforce_app/deploy/vm_workforce_status.sh /opt/laneaward-staging/workforce_app/deploy/vm_workforce_cutover.sh --restart-only

Process 7: Update The SQLite Database From New ProfitMaker Files

Use this section when a new asidta_file_* folder arrives from production and you need to refresh the shared SQLite database with the latest:

customer numbers
customer names
order numbers
order descriptions

Critical Rule — Never Replace The Staging Database File For A Reference Refresh

The old process (steps 2–4) copied the local workforce.db to staging and replaced the entire file. This permanently destroys any users, sessions, or activity that were added directly on staging and do not exist in the local development database. Do not use that approach for reference data updates. The two-step process below updates only the reference tables (customer_account, sales_order, profitmaker_import_manifest) and never touches app_user, work_session, order_task, or any other operational table.

Primary Refresh Script

refresh_workforce_reference_snapshot.sh

Auto-detects the newest asidta_file_* folder if no path is passed.
Promotes only changed DBF, FPT, and CDX files into pm_database.
Ignores junk duplicates like Copy of ....
Refreshes workforce.db with current customers and the rolling order reference window.

Staging Push Script

push_reference_to_staging.sh

Detects the reference window from the current pm_database automatically.
Copies only the six DBF files the importer reads to the VM via scp.
Runs import_profitmaker_reference.py directly on the staging VM against the live database.
Uses the manifest to skip unchanged data — safe to run any time, even if nothing changed.
Runs on weekdays only by default. Use FORCE_REFRESH=1 to override on a weekend.
Never touches users, sessions, tasks, or any activity table.

Core Importer

import_profitmaker_reference.py

Reads the canonical DBF set from pm_database.
Updates only reference tables in the shared SQLite database.
Uses profitmaker_import_manifest to skip unchanged customer or order groups.
Prune logic never removes orders that have active tasks or sessions attached.

Step 1: Refresh The Local SQLite Database

Run the refresh script locally. This promotes any changed ProfitMaker files into pm_database and updates the local workforce.db with current customers and the rolling order reference window.

[Mac-local]

/Users/donaldscott/Project-Code/laneaward/repo/scripts/refresh_workforce_reference_snapshot.sh

Step 2: Push The Reference Data To Staging

Run the staging push script. This copies only the six required DBF files to the VM and runs the importer directly against the live staging database. No file swap, no service restart, no data loss.

[Mac-local]

/Users/donaldscott/Project-Code/laneaward/repo/scripts/push_reference_to_staging.sh

The script prints the reference window, confirms each file transfer, shows the importer output, cleans up temp files, and finishes with a live health check. If the data has not changed since the last run the importer will report skipped (unchanged since last successful import) — that is correct and expected behavior.

Step 3: Verify The Updated Data

Confirm that the API is healthy and that current order search results reflect the latest imported ProfitMaker data.

[Mac-local or VM]

curl -sS https://staging.timeontasks.laneaward.com/api/health

[Mac-local or VM]

curl -sS "https://staging.timeontasks.laneaward.com/api/orders/search?q=107923&limit=3"

Process 8: Reliability Verification For Task Writes

Phase 1 durability hardening is now part of the backend and should be preserved whenever the application programming interface, or API, is updated. The Time On Tasks API now opens SQLite in Write-Ahead Logging (WAL) mode, waits up to 10 seconds for short lock contention, uses synchronous = FULL for safer commits, and wraps each mutating route in a short BEGIN IMMEDIATE write transaction.

Connection Settings

PRAGMA journal_mode = WAL, which enables Write-Ahead Logging
PRAGMA busy_timeout = 10000, which gives SQLite up to 10,000 milliseconds to wait on a short lock
PRAGMA synchronous = FULL, which favors safer disk writes over speed

Protected Writes

add task
start, pause, complete, cancel
material usage logging
admin user maintenance routes

Contention Response

lock contention now returns HTTP 503 Service Unavailable
response includes retryable: true
treat this as a transient retry condition, not a data-loss event

Iteration test checklist:

Open two or more contributor-capable sessions on staging.
Rapidly add, start, pause, complete, and cancel tasks from separate browsers or tablets.
Confirm each UI action returns to a stable task state without duplicate sessions or duplicate task inserts.
If a write is delayed by contention, confirm the API reports a retryable busy condition instead of a generic server error.
After the test pass, verify the affected task lists and completed records still match the contributors who performed the actions.

Current Reliability Status

The current backend reliability work is best understood as safe but incomplete, not partially conflicting. The first five database-reliability features are fully implemented and should remain in place together:

Write-Ahead Logging (WAL)
busy_timeout = 10000
synchronous = FULL
short BEGIN IMMEDIATE write transactions
retryable HTTP 503 responses for SQLite busy/locked contention

In practical terms, these five changes make the shared SQLite backend safer under short write collisions, safer during commit, and clearer when contention happens. They do not depend on the unfinished Phase 2 client work in order to remain valid.

Phase 2 (client-side retry/backoff, temporary local storage, idempotent write keys) was evaluated and deferred. A concurrent stress test at 2× the expected user load passed cleanly with significant headroom — Phase 1 alone is sufficient at current scale. Phase 2 should be reconsidered only if load grows significantly beyond current projections.

Process 9: Service Worker — Deploy Model And Cache Invalidation

Time On Tasks includes a service worker at timeontasks/sw.js that improves load speed on shared tablets by caching static assets locally after the first visit. Understanding the cache strategy is important before deploying frontend changes.

Cache Strategy Summary

Request type	Strategy	Why
HTML documents (`index.html`, `user-guide.html`)	Network-first	Always fetches fresh HTML so deployed updates are visible on the next page load without any SW changes.
Versioned static assets (`app.js?v=…`, icons, manifest)	Cache-first	Version token in the URL acts as the cache key. New token = new URL = automatic cache miss = fresh fetch.
`/api/*` and all non-GET requests	Network-only	Task writes, session state, and PIN login must never be served from cache.

Deploying A Routine Frontend Update (app.js or assets)

No changes to sw.js are required. The version token does the work.

Make your changes to app.js or other assets.
Bump the version token in index.html (e.g. app.js?v=20260406-foreman1 → app.js?v=20260411-myfix1).
Deploy index.html and the updated asset file.
On next page load: SW fetches fresh index.html (network-first), browser sees the new token URL, cache misses, fetches new asset, caches it. Done.

When You DO Need To Update sw.js

Adding a new file type or path pattern that needs different caching behavior.
Changing the SW strategy itself (e.g. switching an asset from cache-first to network-first).
Forcing a full cache wipe on all tablets — bump CACHE_VERSION in sw.js. The new SW will delete all prior caches on activate.

When updating sw.js, the browser detects the change automatically (byte-for-byte comparison on every page load). The new SW installs in the background, then activates and claims all open tabs immediately via skipWaiting and clients.claim.

What To Avoid

Do not reuse a version token after changing the underlying file. The cache will serve the old content forever until the token changes.
Do not deploy only the asset without updating its token in index.html. The old token URL stays in cache and will be served.
Do not assume rsync transferred app.js. rsync skips files whose timestamps have not changed, even when content changed. Always verify the deploy output listed app.js as transferred, or add --checksum to force a content comparison.
Do not add index.html to a cache-first rule. HTML must always be network-first or the stale-app-shell problem returns.

Process 10: Concurrent Stress Test

Run this test against staging to confirm that the Phase 1 reliability improvements (WAL, busy_timeout, synchronous = FULL, BEGIN IMMEDIATE) hold up under the expected concurrent load of up to 20 simultaneous tablet operators on the shop floor. This test was completed before go-live and passed cleanly. The procedure is preserved here as a reference for future validation runs (e.g. after significant backend changes or scale increases).

The test script is at workforce_app/backend/stress_test_concurrent.py and runs from your local Mac. It seeds its own test fixtures into the staging database via SSH, runs the load, then cleans up after itself.

What It Tests

Up to 20 simulated tablet users running simultaneously
Full lifecycle per user: login → add task → start → pause → resume → complete → read
Human-paced by default — 1.5–4 s between each action per user, matching real operator speed
All users start their shift together so the server handles concurrent sessions throughout

What To Look For

Hard failures — must be zero. These are non-retryable errors or request timeouts.
Retryable busy responses — acceptable if zero hard failures also. Means SQLite queued the write safely. Non-zero count here opens Phase 2 client-side retry work.
p95 latency under 800 ms — good tablet UX. 800 ms–2 s is acceptable. Above 2 s warrants review of server resources.

Prerequisites

SSH key access to ubuntu@3.130.69.109 with no passphrase prompt (BatchMode).
Staging server running and reachable at https://staging.timeontasks.laneaward.com.
Python 3.10 or later on your local Mac — no third-party packages required.
Run from the repo root: cd ~/Project-Code/laneaward/repo

Step 1 — Verify SSH access before starting

The test seeds and cleans up via SSH. Confirm it works first:

ssh -i ~/.ssh/lane_webserver.pem ubuntu@3.130.69.109 echo "SSH OK"

You should see SSH OK with no password prompt. If you see Permission denied (publickey), the key path is wrong or the key is not present — check that ~/.ssh/lane_webserver.pem exists on your Mac.

Step 2 — Run the standard realistic test

This is the test to run before go-live. It simulates 20 human-paced operators for 2 task cycles each. Expected wall time is roughly 20–40 seconds.

python3 workforce_app/backend/stress_test_concurrent.py

The script will print its progress as it seeds, runs, and cleans up:

[seed]  Seeding 20 test user(s) on ubuntu@3.130.69.109 ...
[check] Verifying https://staging.timeontasks.laneaward.com/api/health ...
[run]   Releasing 20 threads simultaneously ...
[cleanup] Removing stress-test fixtures from remote DB ...

Step 3 — Read the report

At the end of the run, the script prints a report. A clean pass looks like this:

====================================================================
  LANEAWARD TIME-ON-TASKS  —  CONCURRENT STRESS TEST
====================================================================
  Mode:             REALISTIC (human-paced 1.5–4.0 s)
  Target:           https://staging.timeontasks.laneaward.com
  Concurrent users: 20
  Iterations/user:  2
  Total ops:        280
  Wall time:        31.4s

  Outcomes:
    Successes:        280  (100%)
    Hard failures:      0  (non-retryable errors or timeouts)
    Retryable busy:     0  (SQLite busy-wait — server queued OK)

  End-to-end latency (successful ops only):
    p50:    210 ms
    p75:    310 ms
    p95:    480 ms
    max:    740 ms

  VERDICT
  ------------------------------------------------------------------
  PASS  All operations completed cleanly under concurrent load.
        WAL + BEGIN IMMEDIATE handled 20-user contention with no busy errors.

  Latency looks good for tablet UX (p95 = 480 ms).

Step 4 — Optional: run the burst ceiling test

The burst test fires all writes near-simultaneously with no human delay. This scenario cannot occur with real operators, but it validates the SQLite busy-timeout safety net as an absolute ceiling. Run it after the realistic test passes.

python3 workforce_app/backend/stress_test_concurrent.py --burst

Expect more retryable busy responses in this mode — that is normal and expected. The important thing is still zero hard failures.

Adjusting the test

Common Options

--users N — number of concurrent users, 1–20 (default: 20)
--iterations N — task cycles per user (default: 2)
--burst — near-simultaneous writes, ceiling test only
--no-seed — skip seeding (test users already in DB from prior run)
--no-cleanup — leave test fixtures in DB for inspection
--local — spawn a local server with a temp DB (no SSH, for dev use)

Scaling Scenarios

Adding more people is the expected growth path — increase --users to match headcount before each go-live milestone
Increasing --iterations simulates a longer shift with more task cycles per person
Run both realistic and burst at each user count milestone to see where the ceiling is

If the test fails

Hard failures (non-zero)

Check the error detail in the report — it includes which user and which operation failed
A request timeout (25 s) means a write waited longer than SQLite's 10 s busy_timeout plus network — very unlikely at human pace, indicates a server resource issue
An HTTP 500 error means an unhandled exception on the server — check sudo journalctl -u laneaward-workforce-api-staging -n 100

High retryable busy count

In realistic mode this should be zero — if non-zero, writes are occasionally colliding despite human pacing
In burst mode this is expected and does not indicate a problem unless it accompanies hard failures
A persistent non-zero count in realistic mode opens Phase 2 work: client-side retry/backoff on HTTP 503

High p95 latency

p95 above 2 s on individual requests: check AWS instance CPU and disk I/O during the test run
High latency on login only may indicate the pin-throttle table is accumulating rows — run Process 4 (Activity Reset) to clear it on staging
Uniformly high latency across all ops usually means network, not SQLite — try ping staging.timeontasks.laneaward.com from the same machine

Test Fixture Identifiers

All stress test data is prefixed with stress_test_ so it is unambiguous in the database. If a run is interrupted before cleanup, remove the leftovers manually:

[Mac-local]

bash scripts/cleanup_stress_test_staging.sh

Process 11: Server Access And Security Group Management

The server is protected by two layers: an AWS Security Group that restricts SSH (port 22) to authorized IP addresses, and a Twingate connector that allows SSH from any location through the Twingate client. Use this process to connect to the server and to update security group rules when IP addresses change. This process applies to both the staging and production environments since both run on the same AWS server.

Security Group Reference

Group ID	`sg-0cc9719fa0e029c40` (launch-wizard-1)
COX Fiber — office	`98.175.1.150/32` · SSH allowed
COX Cable failover — office	`72.215.199.214/32` · SSH allowed
Home lab (pending removal)	`72.208.129.218/32` · SSH allowed
HTTP / HTTPS	Open to all — `0.0.0.0/0`

Twingate Reference

Remote Network	Lane Award PWA Server
Connector	`eggplant-okapi`
Resource address	`172.31.7.224` (server private IP)
SSH key	`~/.ssh/lane_webserver.pem`

Option A — SSH via Twingate (from any location)

Use this method when connecting from home or any location not on an authorized static IP. The Twingate client must be running and connected before opening SSH.

Step 1 — Open the Twingate client on your Mac and confirm it shows Connected

The Twingate icon lives in the Mac menu bar. Click it and verify the connection status is active.

Step 2 — SSH using the server private IP

[Mac-local]
ssh -i ~/.ssh/lane_webserver.pem ubuntu@172.31.7.224

Option B — SSH directly from an authorized office IP

Use this method when connecting from the office on either the fiber or cable connection. Twingate does not need to be running.

Step 1 — SSH using the server public IP

[Mac-local]
ssh -i ~/.ssh/lane_webserver.pem ubuntu@3.130.69.109

If Twingate Is Running While SSHing via Public IP

When the Twingate client is active, it intercepts connections to the server's public IP and routes them through the connector. Pause Twingate first before using Option B, or use the private IP (172.31.7.224) with Twingate active instead.

Check Twingate Connector Status

Run this on the server to confirm the connector service is running. A healthy connector shows State: Online in the log output.

[VM]
sudo systemctl status twingate-connector --no-pager

Update Security Group — Replace a Changed IP

Run these steps when an authorized IP address changes. Requires the AWS CLI configured on the development Mac with IAM user donald.

Step 1 — Verify current rules before making changes

[Mac-local]
aws ec2 describe-security-groups --group-ids sg-0cc9719fa0e029c40 --query "SecurityGroups[0].IpPermissions" --output json --no-cli-pager

Step 2 — Remove the old IP

[Mac-local]
aws ec2 revoke-security-group-ingress --group-id sg-0cc9719fa0e029c40 --protocol tcp --port 22 --cidr OLD.IP.ADDRESS/32

Step 3 — Add the new IP

[Mac-local]
aws ec2 authorize-security-group-ingress --group-id sg-0cc9719fa0e029c40 --protocol tcp --port 22 --cidr NEW.IP.ADDRESS/32

Step 4 — Verify the updated rules

[Mac-local]
aws ec2 describe-security-groups --group-ids sg-0cc9719fa0e029c40 --query "SecurityGroups[0].IpPermissions" --output json --no-cli-pager

Never Lock Yourself Out

Do not remove an IP that is your current connection without first confirming Twingate SSH works, or without another authorized IP still in place. If all SSH access is lost, recovery requires the AWS Console. Never remove all three SSH rules at once.

Remove the Home Lab IP (Pending Task)

Once Twingate is confirmed as the primary home access method, run this to remove the dynamic home lab IP. Do not run this until Twingate SSH has been verified working from the home location.

[Mac-local]
aws ec2 revoke-security-group-ingress --group-id sg-0cc9719fa0e029c40 --protocol tcp --port 22 --cidr 72.208.129.218/32

Salary Labor Cost — Calculation Methodology

Salaried contributors log time through the Time On Tasks app exactly like hourly workers. The Order Profitability report converts their annual salary to an effective hourly rate using the U.S. Bureau of Labor Statistics standard:

Formula

Effective Hourly Rate = Annual Salary ÷ 2,080

Session Labor Cost = (Annual Salary ÷ 2,080) × (Session Minutes ÷ 60)

2,080 = 52 weeks × 40 hours — the standard used by ADP, Paychex, QuickBooks, and the BLS.

What Is Included

Base wage equivalent only — the annual salary stored in the contributor's profile.
Applies to all roles where compensation is tracked (Contributor, Foreman, Team Lead).
Salary workers appear in the same labor cost columns as hourly workers on the Order Profitability report.

Not Yet Included — Open for Future Discussion

Burden rate — payroll taxes (FICA ~7.65%), benefits, PTO, workers' comp, and overhead typically add 25–35% on top of base wage. Example: a $52,000/yr salary worker costs closer to $32–$36/hr fully burdened vs. $25/hr at base. Applying a burden multiplier (e.g. 1.30×) would give a more accurate true cost per order.
Blended rate approach — some operations use a single blended rate for all salaried staff in a role tier rather than individual salaries, simplifying the calculation at the cost of per-person accuracy.
Part-time salary proration — the 2,080 divisor assumes full-time (40 hr/week). Part-time salaried workers would need a different annual-hours denominator.

Staging Notes

The activity reset (Process 4) intentionally preserves all user accounts. Staging contains a mix of real contributor accounts and developer test accounts — both are valid and should be retained.
Both Operations Console reports (Contributor Task Activity and Order Profitability) are fully implemented and live on staging and production.
Network and concurrent stress testing completed — Phase 1 database hardening is sufficient at current scale.

Source References

Fast Links

LANEAWARD_SETUP_LOG.md
workforce_app/deploy/README.md
aws-server-setup-summary.html
LANEAWARD_ENVIRONMENT_TOPOLOGY.html

Current Working Frontends

timeontasks/index.html
timeontasks/app.js
timeontasks/sw.js — service worker
timeontasks/manifest.webmanifest
timeontasks/user-guide.html
ops_console/index.html
ops_console/runbook.html — deploy copy of this document; canonical source is _documents/staging/runbook.html

Stress Test

workforce_app/backend/stress_test_concurrent.py — concurrent pre-go-live validation script

Time On Tasks & Operations Runbook

Staging

Public

Contents

Canonical Routes

Process 1: Local Contributor-App Publish To AWS Staging

Process 1B: Local Console Publish To AWS Staging

Process 1C: Deploy Console Documents To AWS Staging

Process 2: Backup and Recovery

Process 2B: Restore Staging Database From Backup

Step 1 — Confirm Twingate is running

Step 2 — Run the restore script

Step 3 — Verify and clean up

Process 3: Staging Status Check

Process 4: Activity Reset for Staging

Process 6: Backend Deploy (server.py or schema.sql changed)

Service restart only (no file changes)

Process 7: Update The SQLite Database From New ProfitMaker Files

Process 8: Reliability Verification For Task Writes

Current Reliability Status

Process 9: Service Worker — Deploy Model And Cache Invalidation

Process 10: Concurrent Stress Test

Prerequisites

Step 1 — Verify SSH access before starting

Step 2 — Run the standard realistic test

Step 3 — Read the report

Step 4 — Optional: run the burst ceiling test

Adjusting the test

If the test fails

Process 11: Server Access And Security Group Management

Option A — SSH via Twingate (from any location)

Step 1 — Open the Twingate client on your Mac and confirm it shows Connected

Step 2 — SSH using the server private IP

Option B — SSH directly from an authorized office IP

Step 1 — SSH using the server public IP

Check Twingate Connector Status

Update Security Group — Replace a Changed IP

Step 1 — Verify current rules before making changes

Step 2 — Remove the old IP

Step 3 — Add the new IP

Step 4 — Verify the updated rules

Remove the Home Lab IP (Pending Task)

Salary Labor Cost — Calculation Methodology

Staging Notes

Source References