tinylink/docs/workstation_plan.md

155 lines
5.6 KiB
Markdown

# Workstation Middleware Plan (Node.js + SQLite)
## Goal
Build a lightweight Node.js service that:
- receives laboratory messages from instruments
- normalizes every protocol into a single JSON payload
- queues payloads in SQLite for durability
- reliably delivers results to CLQMS with retries
## Responsibilities
- **Middleware:** connector protocols (HTTP JSON, HL7 TCP, ASTM serial/TCP), parsing/normalization, schema checks, durable queue, retries, dead-letter, logging, health endpoints.
- **CLQMS:** domain validation, mapping rules, result persistence, workflow/flags/audit.
## Flow
1. Connector captures raw message and writes to `inbox_raw`.
2. Parser turns protocol text into a structured object.
3. Normalizer maps the object to canonical JSON.
4. Payload lands in `outbox_result` as `pending`.
5. Delivery worker sends to CLQMS and logs attempts.
6. On success mark `processed`; on failure mark `retrying` or `dead_letter` after max retries.
## Tech Stack (JS only)
- Node.js 20+
- SQLite via `sqlite3`
- `zod` for validation
- `pino` for logs
- `undici` (or `axios`) for HTTP delivery
## Suggested Layout
```text
middleware/
src/
connectors/
parsers/
normalizers/
pipeline/
queue/
client/
storage/
routes/
utils/
index.js
db/migrations/
config/
logs/
```
## Connector Contract
Each connector exposes:
```js
{
name: () => string,
type: () => "hl7-tcp" | "astm-serial" | "http-json",
start: async () => {},
stop: async () => {},
health: () => ({ status: "up" | "down" | "degraded" }),
onMessage: async (msg) => {},
onError: (handler) => {}
}
```
## Canonical Payload
Required fields: `instrument_id`, `sample_id`, `result_time`, `results[]` (each with `test_code` and `value`).
Optional: `unit`, `flag`, `patient_id`, `operator_id`, `meta`.
Example:
```json
{
"instrument_id": "SYSMEX_XN1000",
"sample_id": "SMP-20260326-001",
"result_time": "2026-03-26T10:20:00Z",
"results": [
{
"test_code": "WBC",
"value": "8.2",
"unit": "10^3/uL",
"flag": "N"
}
],
"meta": {
"source_protocol": "HL7",
"message_id": "msg-123",
"connector": "hl7-tcp"
}
}
```
## SQLite Tables
- `inbox_raw`: raw payloads, connector, parse status.
- `outbox_result`: canonical payload, delivery status, retry metadata, `dedupe_key`.
- `delivery_log`: attempt history, latency, responses.
- `instrument_config`: connector settings, enabled flag.
- `dead_letter`: failed payloads, reason, timestamp.
## Retry + Idempotency
- Backoff: `30s -> 2m -> 10m -> 30m -> 2h -> 6h`, max 10 attempts.
- Retry transient failures (timeouts, DNS/connection, HTTP 5xx); skip HTTP 400/422 or validation errors.
- After max attempts move to `dead_letter` with payload + last error.
- `dedupe_key` = hash of `instrument_id + sample_id + test_code + result_time`; unique index on `outbox_result.dedupe_key` ensures idempotent replay.
## Security & Observability
- CLQMS auth token stored in env, never hardcoded.
- Optional IP allowlist, strict TLS + request timeouts, mask sensitive logs.
- Health: `GET /health`, `GET /health/ready` (DB + workers).
- Metrics: pending size, retrying count, dead letters, last successful delivery.
- Alerts: backlog limits, dead letters increase, stale success timestamp.
## Delivery Phases
1. **Phase 1 (MVP):** scaffold Node app, add SQLite + migrations, build `http-json` connector, parser/normalizer, outbox worker, retries, dead-letter, pilot with one instrument.
2. **Phase 2:** add HL7 TCP and ASTM connectors, parser/normalizer per connector, instrument-specific config.
3. **Phase 3:** richer metrics/dashboards, backup + maintenance scripts, integration/failure tests, runbook + incident checklist.
## MVP Done When
- One instrument end-to-end with no loss during CLQMS downtime.
- Retries, dead-letter, and duplicate protection verified.
- Health checks and logs available for operations.
## Phase 2 Completion Notes
- Instruments are provisioned via `instrument_config` rows (connector, enabled flag, JSON payload) and can be managed through `POST /instruments` plus the instrumentation console for quick updates.
- Each connector validates against configured instruments so HL7/ASTM parsers are only accepted for known, enabled equipment.
- Deduplication now guarded by SHA-256 `dedupe_key`, and instrument metadata is carried through the pipeline.
## Metrics & Observability
- Health router provides `/health` (status) and `/health/ready` (DB + worker) plus metrics per connector.
- Prometheus-friendly `/metrics` exports pending/retrying/dead-letter counts, delivery attempts, last success timestamp, and average latency.
- Logs/pino already mask PII by design; connectors emit structured errors and handshake timing for alerts.
## Maintenance, Runbook & Automation
- SQLite maintenance script (`node middleware/src/scripts/maintenance.js`) supports `backup`, `vacuum`, and `prune --days=<n>` to keep the DB performant and reproducible.
- Daily/weekly checklist: run backup before deployments, vacuum monthly, and prune `delivery_log` older than 30 days (configurable via CLI).
- Incident checklist: 1) check `/health/ready`; 2) inspect `outbox_result` + `dead_letter`; 3) replay payloads with `pending` or `retrying` status; 4) rotate CLQMS token via env + restart; 5) escalate when dead letters spike or metrics show stale success timestamp.
## Testing & Validation
- Parser smoke tests under `middleware/test/parsers.test.js` verify HL7/ASTM canonical output and keep `normalize()` coverage intact. Run via `npm test`.
- Future CI can run the same script plus `npm run migrate` ahead of any pull request to ensure schema/queue logic still applies.