152 lines
15 KiB
Markdown
152 lines
15 KiB
Markdown
# AGENTS
|
||
|
||
## Mission & Audience
|
||
- This document lives at the root so every agentic helper knows how to make, run, and reason about the middleware.
|
||
- Refer back to `docs/workstation_plan.md` for the architectural story, expected flows, and the canonical payload contract before touching new features.
|
||
- Preserve the operational stability that the SQLite queue + delivery worker already provides; avoid accidental schema drift or config leaks.
|
||
- Tailor every change to the Node 20+ CommonJS ecosystem and the SQLite-backed persistence layer this repo already embraces.
|
||
|
||
## Command Reference
|
||
|
||
### Install & Bootstrapping
|
||
- `npm install` populates `node_modules` (no lockfile generation beyond the committed `package-lock.json`).
|
||
- `npm start` is the go-to run command; it migrates the database, primes the instrument cache, spins up connectors, and starts the delivery worker plus health/metrics services.
|
||
- `npm run migrate` runs `middleware/src/storage/migrate.js` on demand; use it before seeding schema migrations in new environments or CI jobs.
|
||
|
||
### Maintenance & Database Care
|
||
- `npm run maintenance -- backup` copies `middleware/data/workstation.sqlite` to `workstation.sqlite.bak-<timestamp>`; this file should stay in place and not be committed or removed.
|
||
- `npm run maintenance -- vacuum` runs SQLite's `VACUUM` via `middleware/src/scripts/maintenance.js` and logs success/failure to stdout/stderr.
|
||
- `npm run maintenance -- prune --days=<n>` deletes `delivery_log` entries older than `<n>` days; default is 30 if `--days` is omitted.
|
||
|
||
### Testing & Single-Test Command
|
||
- `npm test` executes `node middleware/test/parsers.test.js` and serves as the allowable smoke check until a richer test harness exists.
|
||
- To rerun the single parser suite manually, target `node middleware/test/parsers.test.js` directly; it logs success via `console.log` and exits non-zero on failure.
|
||
|
||
## Environment & Secrets
|
||
- Node 20+ is assumed because the code uses optional chaining, `String.raw`, and other modern primitives; keep the same runtime for development and CI.
|
||
- All ports, DB paths, and CLQMS credentials are wired through `middleware/config/default.js` and its environmental overrides (e.g., `HTTP_JSON_PORT`, `CLQMS_TOKEN`, `WORKER_BATCH_SIZE`).
|
||
- Treat `CLQMS_TOKEN`, database files, and other secrets as environment-provided values; never embed them in checked-in files.
|
||
- `middleware/data/workstation.sqlite` is the runtime database. Don’t delete or reinitialize it from the repository tree unless part of an explicit migration/backup operation.
|
||
|
||
## Observability Endpoints
|
||
- `/health` returns connector statuses plus pending/retrying/dead-letter counts from `middleware/src/routes/health.js`.
|
||
- `/health/ready` pings the SQLite queue; any failure there should log an error and respond with `503` per the existing route logic.
|
||
- `/metrics` exposes Prometheus-style gauges/counters that read straight from `queue/sqliteQueue`; keep the plaintext format exactly as defined so Prometheus scrapers don't break.
|
||
- Health and metrics routers are mounted on `middleware/src/index.js` at ports declared in the config, so any addition should remain consistent with Express middleware ordering.
|
||
|
||
## Delivery Runbook & Retry Behavior
|
||
- Backoff: `30s -> 2m -> 10m -> 30m -> 2h -> 6h`, max 10 attempts as defined in `config.retries.schedule`. The worker taps `buildNextAttempt` in `deliveryWorker.js` to honor this array.
|
||
- Retry transient failures (timeouts, DNS/connection, HTTP 5xx); skip HTTP 400/422 or validation errors and ship those payloads immediately to `dead_letter` with the response body.
|
||
- After max attempts move the canonical payload to `dead_letter` with the final error message so postmortem tooling can surface the failure.
|
||
- `queue.recordDeliveryAttempt` accompanies every outbound delivery, so keep latency, status, and response code logging aligned with this helper.
|
||
- Duplicate detection relies on `utils/hash.dedupeKey`; keep `results` sorted and hashed consistently so deduplication stays stable.
|
||
- `deliveryWorker` marks `locked_at`/`locked_by` using `queue.claimPending` and always releases them via `queue.markOutboxStatus` to avoid worker starvation.
|
||
|
||
## Instrument Configuration Cache
|
||
- Instrument configuration is cached in `instrumentConfig/service.js`; reloads happen on init and via `setInterval`, so mutate the cache through `service.upsert` rather than touching `store` directly.
|
||
- `service.reload` parses JSON in the `config` column, logs parsing failures with `logger.warn`, and only keeps rows that successfully parse.
|
||
- Service helpers expose `list`, `get`, and `byConnector` so connectors can fetch the subset they care about without iterating raw rows.
|
||
- Store interactions use `middleware/src/storage/instrumentConfigStore.js`, which leverages `DatabaseClient` and parameterized `ON CONFLICT` upserts; follow that pattern when extending tables.
|
||
- `instrumentService.init` must run before connectors start so `processMessage` can enforce instrument-enabled checks and connector matching.
|
||
- Always drop payloads with no enabled config or connector mismatch and mark the raw row as `dropped` so operators can trace why a message was ignored.
|
||
|
||
## Metrics & Logging Enhancements
|
||
- `metrics.js` builds human-readable Prometheus strings via `formatMetric`; keep the helper intact when adding new metrics so type/help annotations stay formatted correctly.
|
||
- Metrics route reports pending, retrying, dead letters, delivery attempts, last success timestamp, and average latency; add new stats only when there is a clear operational need.
|
||
- Use `queue` helpers (`pendingCount`, `retryingCount`, `deadLetterCount`, `getLastSuccessTimestamp`, `getAverageLatency`, `getDeliveryAttempts`) rather than running fresh queries in routes.
|
||
- Always set the response content type to `text/plain; version=0.0.4; charset=utf-8` before returning metrics so Prometheus scrapers accept the payload.
|
||
- Health logs should cite both connectors and queue metrics so failure contexts are actionable and correlate with the operational dashboards referenced in `docs/workstation_plan.md`.
|
||
- Mask sensitive fields and avoid dumping raw payloads in logs; connectors and parsers add context objects to errors rather than full payload dumps.
|
||
|
||
## Maintenance Checklist
|
||
- `middleware/src/scripts/maintenance.js` supports the commands `backup`, `vacuum`, and `prune --days=<n>` (default 30); call these from CI or ops scripts when the backlog grows.
|
||
- `backup` copies the SQLite file before running migrations or schema updates so you can roll back quickly.
|
||
- `vacuum` recalculates and rebuilds the DB; wrap it in maintenance windows because it briefly locks the database.
|
||
- `prune` deletes old rows from `delivery_log`; use the same threshold as `docs/workstation_plan.md` (default 30 days) unless stakeholders approve a different retention.
|
||
- `maintenance` logging uses `console.log`/`console.error` because the script runs outside the Express app; keep those calls simple and exit with non-zero codes on failure to alert CI.
|
||
- Document every manual maintenance action in the repository README or a runbook so second-tier operators know what happened.
|
||
|
||
## Data & Schema Source of Truth
|
||
- All schema statements live in `middleware/db/migrations/00*_*.sql`; the bootstrapper iterates over these files alphabetically via `fs.readdirSync` and `db.exec`, so keep new migrations in that folder and add them with increasing numeric prefixes.
|
||
- Table definitions include: `inbox_raw`, `outbox_result`, `delivery_log`, `instrument_config`, and `dead_letter`. An additional migration adds `locked_at` and `locked_by` to `outbox_result`.
|
||
- `middleware/src/storage/migrate.js` is idempotent; it applies every `.sql` in the migrations folder unconditionally. Avoid writing irreversible SQL (DROP, ALTER without fallback) unless you also add compensating migrations.
|
||
- `DatabaseClient` in `middleware/src/storage/db.js` wraps sqlite3 callbacks in promises; reuse its `run`, `get`, and `all` helpers to keep SQL parameterization consistent and to centralize `busyTimeout` configuration.
|
||
|
||
## Code Style Guidelines
|
||
|
||
### Modules, Imports, and Exports
|
||
- Prefer CommonJS `const ... = require(...)` at the top of each module; grouping local `require`s by directory depth (config, utils, domain) keeps files predictable.
|
||
- Export objects/functions via `module.exports = { ... }` or `module.exports = <function>` depending on whether multiple helpers are exported.
|
||
- When a file exposes a factory (connectors, queue), return named methods (`start`, `stop`, `onMessage`, `health`) to keep the bootstrapper happy.
|
||
|
||
### Formatting & Layout
|
||
- Use two spaces for indentation and include semicolons at the end of statements; this matches existing files such as `middleware/src/utils/logger.js` and `index.js`.
|
||
- Keep line length reasonable (~100 characters) and break wrapped strings with template literals (see metric formatters) rather than concatenating with `+`.
|
||
- Prefer single quotes for strings unless interpolation or escaping makes backticks clearer.
|
||
- Keep helper functions (splitters, builders) at the top of parser modules, followed by the main exported parse function.
|
||
|
||
### Naming Conventions
|
||
- Stick to camelCase for functions, methods, and variables (`processMessage`, `buildNextAttempt`, `messageHandler`).
|
||
- Use descriptive object properties that mirror domain terms (`instrument_id`, `result_time`, `connector`, `status`).
|
||
- Constants for configuration or retry schedules stay uppercase/lowercase as seen in `config.retries.schedule`; keep them grouped inside `config/default.js`.
|
||
|
||
### Async Flow & Error Handling
|
||
- Embrace `async/await` everywhere; existing code rarely uses raw promises (except for wrappers like `new Promise((resolve) => ...)`).
|
||
- Wrap I/O boundaries in `try/catch` blocks and log failures with structured data via `logger.error({ err: err.message }, '...')` so Pino hooks can parse them.
|
||
- When rethrowing an error, ensure the calling context knows whether the failure is fatal (e.g., `processMessage` rethrows after queue logging).
|
||
- For connectors, propagate errors through `onError` hooks so the bootstrapper can log them consistently.
|
||
|
||
### Logging & Diagnostics
|
||
- Always prefer `middleware/src/utils/logger.js` instead of `console.log`/`console.error` inside core services; the exception is low-level scripts like `maintenance.js` and migration runners.
|
||
- Use structured objects for context (`{ err: err.message, connector: connector.name() }`), especially around delivery failures and config reloads.
|
||
- Log positive states (start listening, health server ready) along with port numbers so the runtime state can be traced during deployment.
|
||
|
||
### Validation & Canonical Payloads
|
||
- Use `zod` for inbound schema checks; validators already live in `middleware/src/routes/instrumentConfig.js` and `middleware/src/normalizers/index.js`.
|
||
- Always normalize parser output via `normalize(parsed)` before queue insertion to guarantee `instrument_id`, `sample_id`, `result_time`, and `results` conform to expectations.
|
||
- If `normalize` throws, let the caller log the failure and drop the payload silently after marking `inbox_raw` as `failed` to avoid partial writes.
|
||
|
||
### Database & Queue Best Practices
|
||
- Use `DatabaseClient` for all SQL interactions; it centralizes `busyTimeout` and promise conversion and prevents sqlite3 callback spaghetti.
|
||
- Parameterize every statement with `?` placeholders (see `queue/sqliteQueue.js` and `instrumentConfigStore.js`) to avoid SQL injection hazards.
|
||
- Always mark `inbox_raw` rows as `processed`, `failed`, or `dropped` after parsing to keep operators aware of what happened.
|
||
- When marking `outbox_result` statuses, clear `locked_at/locked_by` and update `attempts`/`next_attempt_at` in one statement so watchers can rely on atomic semantics.
|
||
|
||
### Connectors & Pipeline Contracts
|
||
- Each connector must provide `name`, `type`, `start`, `stop`, `health`, `onMessage`, and `onError` per the current implementation; keep this contract if you add new protocols.
|
||
- Keep connector internals event-driven: emit `messageHandler(payload)` and handle `.catch(errorHandler)` to ensure downstream failures get logged.
|
||
- For TCP connectors, track connections in `Set`s so `stop()` can destroy them before closing the server.
|
||
- Do not assume payload framing beyond what the current parser needs; let the parser module handle splitting text and trimming.
|
||
|
||
### Worker & Delivery Guidelines
|
||
- The delivery worker polls the queue (`config.worker.batchSize`) and records every attempt via `queue.recordDeliveryAttempt`; add retries in the same pattern if you introduce new failure-handling logic.
|
||
- Respect the retry schedule defined in `config.retries.schedule`; `buildNextAttempt` uses `Math.min` to cap indexes, so new delays should append to `config.retries.schedule` only.
|
||
- Duplicate detection relies on `utils/hash.dedupeKey`; keep `results` sorted and hashed consistently so deduplication stays stable.
|
||
- On HTTP 400/422 responses or too many retries, move payloads to `dead_letter` and log the reason to keep operators informed.
|
||
|
||
### Testing & Coverage Expectations
|
||
- Parser tests live in `middleware/test/parsers.test.js`; they rely on `node:assert` and deliberately simple sample payloads to avoid external dependencies.
|
||
- Add new tests by mimicking that file’s style—plain `assert.strictEqual` checks, no test framework dependencies, and `console.log` success acknowledgment.
|
||
- If you enhance the test surface, keep it runnable via `npm test` so agents and CI scripts can still rely on a single command line.
|
||
|
||
### Documentation & Storytelling
|
||
- Keep `docs/workstation_plan.md` in sync with architectural changes; it surfaces connector flows, phases, retry policies, and maintenance checklists that agents rely on.
|
||
- When adding routes/features, document the endpoint, request payload, and expected responses in either `docs/` or inline comments near the route.
|
||
|
||
## Cursor & Copilot Rules
|
||
- No `.cursor/rules/` or `.cursorrules` directories are present in this repo; therefore there are no Cursor-specific constraints to copy here.
|
||
- `.github/copilot-instructions.md` is absent as well, so there are no Copilot instructions to enforce or repeat.
|
||
|
||
## Final Notes for Agents
|
||
- Keep changes isolated to their area of responsibility; the middleware is intentionally minimal, so avoid introducing new bundlers/languages.
|
||
- Before opening PRs, rerun `npm run migrate` and `npm test` to verify schema/app coherence.
|
||
- Use environment variable overrides from `middleware/config/default.js` when running in staging/production so the same config file can stay committed.
|
||
## Additional Notes
|
||
- Never revert existing changes you did not make unless explicitly requested, since those changes were made by the user.
|
||
- If there are unrelated changes in the working tree, leave them untouched and focus on the files that matter for the ticket.
|
||
- Avoid destructive git commands (`git reset --hard`, `git checkout --`) unless the user explicitly requests them.
|
||
- If documentation updates were part of your change, add them to `docs/workstation_plan.md` or explain why the doc already covers the behavior.
|
||
- When a connector or parser handles a new instrument, double-check `instrument_config` rows to ensure the connector name matches the incoming protocol.
|
||
- The `queue` keeps `status`, `attempts`, `next_attempt_at`, and `locked_*` in sync; always update all relevant columns in a single SQL call to avoid race conditions.
|
||
- Keep the SQL schema in sync with `middleware/db/migrations`; add new migrations rather than editing existing ones when altering tables.
|