clqms-fe1/docs/audit-logging.md

353 lines
15 KiB
Markdown
Raw Normal View History

# Audit Logging Strategy (Implementation Ready)
## 1) Purpose, Scope, and Non-Goals
This document defines the production audit logging contract for CLQMS.
### Purpose
- Provide a single, normalized audit model for compliance, investigations, and operations.
- Ensure every protected workflow writes consistent, queryable audit records.
- Make behavior deterministic across API controllers, services, jobs, and integrations.
### Scope
This applies to four log tables:
- `logpatient` - patient identity, demographics, consent, insurance, and visit/ADT events.
- `logorder` - orders, specimen lifecycle, results lifecycle, and QC.
- `logmaster` - test/master configuration, value sets, role/permission updates, infrastructure configuration.
- `logsystem` - authentication, authorization, import/export, jobs, and system integrity operations.
### Non-goals
- This is not a replacement for metrics/tracing systems (Prometheus, APM, etc.).
- This is not a full immutable ledger; tamper evidence is implemented with controls described below.
## 2) Table Ownership
Use this mapping to choose the target table and minimum event shape.
| Event family | Table | Minimum keys in `Context` | Example `EventID` |
| --- | --- | --- | --- |
| Patient create/update/merge | `logpatient` | `route`, `request_id`, `entity_version` | `PATIENT_REGISTERED` |
| Consent/insurance changes | `logpatient` | `consent_type` or `payer_id` | `PATIENT_CONSENT_UPDATED` |
| Visit ADT transitions | `logpatient` | `visit_id`, `from_status`, `to_status` | `VISIT_TRANSFERRED` |
| Order create/cancel/reopen | `logorder` | `order_id`, `priority`, `source` | `ORDER_CREATED` |
| Specimen lifecycle | `logorder` | `specimen_id`, `specimen_status` | `SPECIMEN_RECEIVED` |
| Result lifecycle | `logorder` | `result_id`, `verification_state` | `RESULT_AMENDED` |
| QC lifecycle | `logorder` | `qc_run_id`, `instrument_id` | `QC_RECORDED` |
| Value sets/test definitions | `logmaster` | `config_group`, `change_ticket` | `VALUESET_ITEM_RETIRED` |
| Roles/permissions/users | `logmaster` | `target_user_id`, `target_role` | `USER_ROLE_CHANGED` |
| Login/logout/token/auth failures | `logsystem` | `auth_flow`, `failure_reason` (on failure) | `AUTH_LOGIN_FAILED` |
| Import/export/jobs/integration | `logsystem` | `batch_id`, `record_count`, `job_name` | `IMPORT_JOB_FINISHED` |
| Purge/archive/legal hold | `logsystem` | `archive_id`, `policy_name`, `approved_by` | `AUDIT_PURGE_EXECUTED` |
## 3) Canonical Schema (All Four Tables)
All four tables MUST implement the same logical columns. Physical PK name may vary (`LogPatientID`, `LogOrderID`, etc.).
### 3.1 Column contract
| Column | Type | Required | Max length | Description | Example |
| --- | --- | --- | --- | --- | --- |
| `LogID` (or table-specific PK) | `BIGINT UNSIGNED AUTO_INCREMENT` | Yes | N/A | Surrogate key per table | `987654` |
| `TblName` | `VARCHAR(64)` | Yes | 64 | Source business table | `patient` |
| `RecID` | `VARCHAR(64)` | Yes | 64 | Primary identifier of affected entity | `PAT000123` |
| `FldName` | `VARCHAR(128)` | Conditional | 128 | Changed field name, null for multi-field/bulk | `NameLast` |
| `FldValuePrev` | `TEXT` | Conditional | 65535 | Previous value (string or JSON) | `{"status":"PENDING"}` |
| `FldValueNew` | `TEXT` | Conditional | 65535 | New value (string or JSON) | `{"status":"VERIFIED"}` |
| `UserID` | `VARCHAR(64)` | Yes | 64 | Actor user id, or `SYSTEM` for non-user actions | `USR001` |
| `SiteID` | `VARCHAR(32)` | Yes | 32 | Facility/site context | `SITE01` |
| `DIDType` | `VARCHAR(32)` | No | 32 | Device identifier type | `UUID` |
| `DID` | `VARCHAR(128)` | No | 128 | Device identifier value | `6b8f...` |
| `MachineID` | `VARCHAR(128)` | No | 128 | Host/workstation identifier | `WS-LAB-07` |
| `SessionID` | `VARCHAR(128)` | Yes | 128 | Auth or workflow session identifier | `sess_abc123` |
| `AppID` | `VARCHAR(64)` | Yes | 64 | Calling client/application id | `clqms-api` |
| `ProcessID` | `VARCHAR(128)` | No | 128 | Process/workflow/job id | `job_20260325_01` |
| `WebPageID` | `VARCHAR(128)` | No | 128 | UI route/page id if user-driven | `patient-detail` |
| `EventID` | `VARCHAR(80)` | Yes | 80 | Canonical event code | `RESULT_RELEASED` |
| `ActivityID` | `VARCHAR(24)` | Yes | 24 | Canonical action enum | `UPDATE` |
| `Reason` | `VARCHAR(512)` | No | 512 | User/system reason or ticket reference | `Critical value corrected` |
| `LogDate` | `DATETIME(3)` | Yes | N/A | Event time in UTC | `2026-03-25 04:45:12.551` |
| `Context` | `JSON` (preferred) or `LONGTEXT` | Yes | N/A | Structured metadata payload | See section 5 |
| `IpAddress` | `VARCHAR(45)` | No | 45 | IPv4/IPv6 remote address | `10.10.2.44` |
### 3.2 Required/conditional rules
- `FldName`, `FldValuePrev`, and `FldValueNew` are required for single-field changes.
- For multi-field changes, set `FldName = NULL` and store a compact JSON diff under `Context.diff`.
- For non-mutating events (`READ`, `LOGIN`, `EXPORT`, `IMPORT`), `FldValuePrev` and `FldValueNew` may be null.
- `Context` is required for all rows. At minimum include `request_id` and `route` (or `job_name` for non-HTTP jobs).
## 4) DDL Template and Indexing
Use this template when creating a log table. Replace `${TABLE}` and `${PK}`.
```sql
CREATE TABLE `${TABLE}` (
`${PK}` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`TblName` VARCHAR(64) NOT NULL,
`RecID` VARCHAR(64) NOT NULL,
`FldName` VARCHAR(128) NULL,
`FldValuePrev` TEXT NULL,
`FldValueNew` TEXT NULL,
`UserID` VARCHAR(64) NOT NULL,
`SiteID` VARCHAR(32) NOT NULL,
`DIDType` VARCHAR(32) NULL,
`DID` VARCHAR(128) NULL,
`MachineID` VARCHAR(128) NULL,
`SessionID` VARCHAR(128) NOT NULL,
`AppID` VARCHAR(64) NOT NULL,
`ProcessID` VARCHAR(128) NULL,
`WebPageID` VARCHAR(128) NULL,
`EventID` VARCHAR(80) NOT NULL,
`ActivityID` VARCHAR(24) NOT NULL,
`Reason` VARCHAR(512) NULL,
`LogDate` DATETIME(3) NOT NULL,
`Context` JSON NOT NULL,
`IpAddress` VARCHAR(45) NULL,
PRIMARY KEY (`${PK}`),
INDEX `idx_${TABLE}_logdate` (`LogDate`),
INDEX `idx_${TABLE}_recid_logdate` (`RecID`, `LogDate`),
INDEX `idx_${TABLE}_userid_logdate` (`UserID`, `LogDate`),
INDEX `idx_${TABLE}_eventid_logdate` (`EventID`, `LogDate`),
INDEX `idx_${TABLE}_site_logdate` (`SiteID`, `LogDate`)
);
```
Optional JSON path index (DB engine specific):
- `Context.request_id`
- `Context.batch_id`
- `Context.job_name`
## 5) Context JSON Contract
`Context` MUST be valid JSON. Keep payload compact and predictable.
### 5.1 Required keys for all events
```json
{
"request_id": "a4f5b6c7",
"route": "PATCH /api/patient/123",
"timestamp_utc": "2026-03-25T04:45:12.551Z",
"entity_type": "patient",
"entity_version": 7
}
```
### 5.2 Additional keys by event class
- Patient/order/result mutation: `diff` (array of changed fields), `validation_profile`.
- Import/export/jobs: `batch_id`, `record_count`, `success_count`, `failure_count`, `job_name`.
- Auth/security events: `auth_flow`, `failure_reason`, `token_type` (never token value).
- Retention operations: `policy_name`, `archive_id`, `approved_by`, `window_start`, `window_end`.
### 5.3 Size and shape limits
- Maximum serialized `Context` size: 16 KB.
- `diff` array should include only audited fields, not entire entity snapshots.
- Store references (`file_id`, `blob_ref`) instead of large payloads.
## 6) Activity and Event Catalog Governance
`EventID` values MUST come from the ValueSet library, not hardcoded inline strings.
- Source file: `app/Libraries/Data/event_id.json`
- Runtime access: `\App\Libraries\ValueSet::getRaw('event_id')`
- Optional label lookup for reporting: `\App\Libraries\ValueSet::getLabel('event_id', $eventId)`
### 6.1 Allowed `ActivityID`
`CREATE`, `UPDATE`, `DELETE`, `READ`, `MERGE`, `SPLIT`, `CANCEL`, `REOPEN`, `VERIFY`, `AMEND`, `RETRACT`, `RELEASE`, `IMPORT`, `EXPORT`, `LOGIN`, `LOGOUT`, `LOCK`, `UNLOCK`, `RESET`
### 6.2 `EventID` naming pattern
- Format: `<DOMAIN>_<OBJECT>_<ACTION>`
- Character set: uppercase A-Z, numbers, underscore.
- Max length: 80.
- Examples: `PATIENT_DEMOGRAPHICS_UPDATED`, `ORDER_CANCELLED`, `AUTH_LOGIN_FAILED`.
### 6.3 Catalog lifecycle
- New `EventID` requires docs update and test coverage.
- New `EventID` must be added to `app/Libraries/Data/event_id.json` and deployed with cache refresh (`ValueSet::clearCache()`).
- Never repurpose an existing `EventID` to mean something else.
- Deprecated `EventID` remains queryable and documented for historical data.
## 7) Minimum Event Coverage (Must Implement)
### 7.1 `logpatient`
- `PATIENT_REGISTERED`, `PATIENT_DEMOGRAPHICS_UPDATED`, `PATIENT_MERGED`, `PATIENT_UNMERGED`
- `PATIENT_IDENTIFIER_UPDATED`, `PATIENT_CONSENT_UPDATED`, `PATIENT_INSURANCE_UPDATED`
- `VISIT_ADMITTED`, `VISIT_TRANSFERRED`, `VISIT_DISCHARGED`, `VISIT_STATUS_UPDATED`
### 7.2 `logorder`
- `ORDER_CREATED`, `ORDER_CANCELLED`, `ORDER_REOPENED`, `ORDER_TEST_ADDED`, `ORDER_TEST_REMOVED`
- `SPECIMEN_COLLECTED`, `SPECIMEN_RECEIVED`, `SPECIMEN_REJECTED`, `SPECIMEN_ALIQUOTED`, `SPECIMEN_DISPOSED`
- `RESULT_ENTERED`, `RESULT_UPDATED`, `RESULT_VERIFIED`, `RESULT_AMENDED`, `RESULT_RELEASED`, `RESULT_RETRACTED`, `RESULT_CORRECTED`
- `QC_RECORDED`, `QC_FAILED`, `QC_OVERRIDE_APPLIED`
### 7.3 `logmaster`
- `VALUESET_ITEM_CREATED`, `VALUESET_ITEM_UPDATED`, `VALUESET_ITEM_RETIRED`
- `TEST_DEFINITION_UPDATED`, `REFERENCE_RANGE_UPDATED`, `TEST_PANEL_MEMBERSHIP_UPDATED`
- `ANALYZER_CONFIG_UPDATED`, `INTEGRATION_CONFIG_UPDATED`, `CODING_SYSTEM_UPDATED`
- `USER_CREATED`, `USER_DISABLED`, `USER_PASSWORD_RESET`, `USER_ROLE_CHANGED`, `USER_PERMISSION_CHANGED`
- `SITE_CREATED`, `SITE_UPDATED`, `WORKSTATION_UPDATED`
### 7.4 `logsystem`
- `AUTH_LOGIN_SUCCESS`, `AUTH_LOGOUT_SUCCESS`, `AUTH_LOGIN_FAILED`, `AUTH_LOCKOUT_TRIGGERED`
- `TOKEN_ISSUED`, `TOKEN_REFRESHED`, `TOKEN_REVOKED`, `AUTHORIZATION_FAILED`
- `IMPORT_JOB_STARTED`, `IMPORT_JOB_FINISHED`, `EXPORT_JOB_STARTED`, `EXPORT_JOB_FINISHED`
- `JOB_STARTED`, `JOB_FINISHED`, `INTEGRATION_SYNC_STARTED`, `INTEGRATION_SYNC_FINISHED`
- `AUDIT_ARCHIVE_EXECUTED`, `AUDIT_PURGE_EXECUTED`, `LEGAL_HOLD_APPLIED`, `LEGAL_HOLD_RELEASED`
## 8) Capture Rules (Application Behavior)
### 8.1 Write timing
- For mutating transactions, write audit record in the same DB transaction where feasible.
- If asynchronous logging is required, enqueue within transaction and process with at-least-once delivery.
### 8.2 Failure policy
- Compliance-critical writes (patient, order, result, role/permission): fail request if audit write fails.
- Operational-only writes (non-critical job checkpoints): continue request, emit error log, retry in background.
- All audit write failures must produce `logsystem` event `AUDIT_WRITE_FAILED` with sanitized details.
### 8.3 Diff policy
- Single-field change: set `FldName`, `FldValuePrev`, `FldValueNew`.
- Multi-field change: set `FldName = NULL`, keep prev/new null or compact summary, place canonical diff in `Context.diff`.
- Bulk operations: include `batch_id`, `record_count`, sample `affected_ids` (capped), and source.
## 9) Security and Privacy Controls
### 9.1 Never log
- Passwords, raw JWTs, API secrets, private keys, OTP values.
- Full clinical free text unless explicitly required by policy.
### 9.2 Masking rules
- Identifiers with high sensitivity should be masked in `FldValuePrev/New` when not required.
- Token-like strings should be fully removed and replaced with `[REDACTED]`.
- Use deterministic masking where correlation is needed (e.g., hash + prefix).
### 9.3 Access control
- Insert permissions only for API/service accounts.
- No update/delete privileges for regular runtime users.
- Read access to logs is role-restricted and audited.
### 9.4 Tamper evidence
- Enable DB audit on DDL changes to log tables.
- Store periodic checksum snapshots of recent log ranges in secure storage.
- Record checksum run outcomes in `logsystem` (`AUDIT_CHECKSUM_CREATED`, `AUDIT_CHECKSUM_FAILED`).
## 10) Retention, Archive, and Purge
### 10.1 Default retention
- `logpatient`: 7 years
- `logorder`: 7 years
- `logmaster`: 5 years
- `logsystem`: 2 years
If regional policy requires longer periods, policy overrides these defaults.
### 10.2 Archive workflow
1. Select eligible rows by `LogDate` and legal-hold status.
2. Export to immutable archive format (compressed JSONL or parquet).
3. Verify checksums and row counts.
4. Write `AUDIT_ARCHIVE_EXECUTED` entry in `logsystem`.
### 10.3 Purge workflow
1. Require approval reference (`approved_by`, `change_ticket`).
2. Purge archived rows only.
3. Write `AUDIT_PURGE_EXECUTED` entry with table, date window, count, and archive reference.
## 11) Operational Monitoring
Track these SLIs/SLOs:
- Audit write success rate >= 99.9% for critical domains.
- P95 audit insert latency < 50 ms.
- Queue backlog age < 5 minutes (if async path is used).
- Zero unreviewed `AUDIT_WRITE_FAILED` older than 24 hours.
Alert on:
- Sustained write failures.
- Sudden drop in expected event volume.
- Purge/archive jobs without corresponding `logsystem` records.
## 12) Migration Strategy for Existing Logs
1. Inventory current columns and event vocabulary in all four tables.
2. Add missing canonical columns with nullable defaults.
3. Backfill required values (`AppID`, `SessionID`, `Context` minimum keys) where derivable.
4. Introduce canonical `EventID` mapping table for legacy names.
5. Enforce NOT NULL constraints only after backfill validation succeeds.
## 13) Testing Requirements
### 13.1 Automated tests
- Feature tests for representative endpoints must assert audit row creation.
- Assert table target, `ActivityID`, `EventID`, `RecID`, and required `Context` keys.
- Assert `EventID` exists in `\App\Libraries\ValueSet::getRaw('event_id')`.
- Add negative tests for audit failure policy (critical path blocks, non-critical path retries).
### 13.2 Test matrix minimum
- One success and one failure scenario per major domain (`patient`, `order`, `master`, `system`).
- One bulk operation scenario validating `batch_id` and counts.
- One security scenario validating redaction of sensitive fields.
## 14) Implementation Checklist (Phased)
### Phase 1 - Schema and constants
1. Create/align all four log tables to canonical schema.
2. Add shared enums/constants for `ActivityID` and `EventID`.
3. Add and maintain `app/Libraries/Data/event_id.json` as the `EventID` source of truth.
4. Add DB indexes listed in section 4.
### Phase 2 - Audit service
1. Implement centralized audit writer service.
2. Add helpers to normalize actor/device/session/context.
3. Add diff builder utility for single and multi-field changes.
### Phase 3 - Instrumentation
1. Instrument patient and order flows first (compliance-critical).
2. Instrument master and system flows.
3. Add fallback/retry path and `AUDIT_WRITE_FAILED` emission.
### Phase 4 - Validation and rollout
1. Add feature tests and failure-path tests.
2. Validate dashboards/queries for each table.
3. Release with runbook updates and retention job schedule.
## 15) Acceptance Criteria
The implementation is complete when all statements below are true:
- Every protected endpoint emits at least one canonical audit row.
- Each row has valid `ActivityID`, `EventID` (present in ValueSet `event_id`), `LogDate` (UTC), and non-empty `Context` with required keys.
- Sensitive values are redacted/masked per section 9.
- Archive and purge operations are fully traceable in `logsystem`.
- Tests cover critical success/failure paths and pass in CI.