clqms-be/docs/20251212002-database_design_review_opus.md

13 KiB

CLQMS Database Design Review Report

Prepared by: Claude OPUS
Date: December 12, 2025
Subject: Technical Assessment of Current Database Schema


Executive Summary

This report presents a technical review of the CLQMS (Clinical Laboratory Quality Management System) database schema based on analysis of 16 migration files containing approximately 45+ tables. While the current design is functional, several critical issues have been identified that impact data integrity, development velocity, and long-term maintainability.

Overall Assessment: The application will function, but the design causes significant developer friction and will create increasing difficulties as the system scales.


Critical Issues

1. Missing Foreign Key Constraints

Severity: 🔴 Critical

The database schema defines zero foreign key constraints. All relationships are implemented as integer columns without referential integrity.

Impact Description
Data Integrity Orphaned records when parent records are deleted
Data Corruption Invalid references can be inserted without validation
Performance Relationship logic must be enforced in application code
Debugging Difficult to trace data lineage across tables

Example: A patient can be deleted while their visits, orders, and results still reference the deleted InternalPID.


2. Test Definition Tables: Broken Relationships

Severity: 🔴 Critical — Impacts API Development

This issue directly blocks backend development. The test definition system spans 6 tables with unclear and broken relationships:

testdef         → Master test catalog (company-wide definitions)
testdefsite     → Site-specific test configurations  
testdeftech     → Technical settings (units, decimals, methods)
testdefcal      → Calculated test formulas
testgrp         → Test panel/profile groupings
testmap         → Host/Client analyzer code mappings

testdef table structure:

TestID (PK), Parent, TestCode, TestName, Description, DisciplineID, Method, ...

testdefsite table structure:

TestSiteID (PK), SiteID, TestSiteCode, TestSiteName, TestType, Description, ...

Caution

There is NO TestID column in testdefsite!
The relationship between master tests and site-specific configurations is undefined.

The assumed relationship appears to be matching TestCode = TestSiteCode, which is:

  • Fragile — codes can change or differ
  • Non-performant — string matching vs integer FK lookup
  • Undocumented — developers must guess

Developer Impact

Cannot create sample JSON payloads for API development.

To return a complete test with all configurations, we need to JOIN:

testdef 
  → testdefsite (HOW? No FK exists!)
    → testdeftech (via TestSiteID)
    → testdefcal (via TestSiteID)  
  → testgrp (via TestSiteID)
  → testmap (via TestSiteID)
  → refnum/refthold/refvset/reftxt (via TestSiteID)

What a Complete Test JSON Should Look Like

{
  "test": {
    "id": 1,
    "code": "GLU",
    "name": "Glucose",
    "discipline": "Chemistry",
    "method": "Hexokinase",
    "sites": [
      {
        "siteId": 1,
        "siteName": "Main Lab",
        "unit": "mg/dL",
        "decimalPlaces": 0,
        "referenceRange": { "low": 70, "high": 100 },
        "equipment": [
          { "name": "Cobas 6000", "hostCode": "GLU" }
        ]
      }
    ],
    "panelMemberships": ["BMP", "CMP"]
  }
}

What We're Forced to Create Instead

{
  "testdef": { "TestID": 1, "TestCode": "GLU", "TestName": "Glucose" },
  "testdefsite": { "TestSiteID": 1, "SiteID": 1, "TestSiteCode": "GLU" },
  "testdeftech": { "TestTechID": 1, "TestSiteID": 1, "Unit1": "mg/dL" },
  "refnum": { "RefNumID": 1, "TestSiteID": 1, "Low": 70, "High": 100 }
}

Problem: How does the API consumer know testdef.TestID=1 connects to testdefsite.TestSiteID=1? The relationship is implicit and undocumented.

Add TestID foreign key to testdefsite:

ALTER TABLE testdefsite ADD COLUMN TestID INT NOT NULL;
ALTER TABLE testdefsite ADD CONSTRAINT fk_testdefsite_testdef 
    FOREIGN KEY (TestID) REFERENCES testdef(TestID);

Deeper Problem: Over-Engineered Architecture

Warning

Even with TestID added, the test table design remains excessively complex and confusing.

Adding the missing foreign key fixes the broken link, but does not address the fundamental over-engineering. To retrieve ONE complete test for ONE site, developers must JOIN across 10 tables:

testdef                  ← "What is this test?"
  └── testdefsite        ← "Is it available at site X?"
       └── testdeftech   ← "What units/decimals at site X?"
       └── testdefcal    ← "Is it calculated at site X?"
       └── testgrp       ← "What panels is it in at site X?"
       └── testmap       ← "What analyzer codes at site X?"
       └── refnum        ← "Numeric reference ranges"
       └── refthold      ← "Threshold reference ranges"  
       └── refvset       ← "Value set references"
       └── reftxt        ← "Text references"

10 tables for one test at one site.

This design assumes maximum flexibility (every site configures everything differently), but creates:

  • Excessive query complexity — Simple lookups require 5+ JOINs
  • Developer confusion — Which table holds which data?
  • Maintenance burden — Changes ripple across multiple tables
  • API design friction — Difficult to create clean, intuitive endpoints

What a Simpler Design Would Look Like

Current (10 tables) Proposed (4 tables)
testdef tests
testdefsite + testdeftech + testdefcal test_configurations
refnum + refthold + refvset + reftxt test_reference_ranges (with type column)
testgrp test_panel_members
testmap (merged into test_configurations)

Recommendation

For long-term maintainability, consider a phased refactoring:

  1. Phase 1: Add TestID FK (immediate unblock)
  2. Phase 2: Create database VIEWs that flatten the structure for API consumption
  3. Phase 3: Evaluate consolidation of testdefsite/testdeftech/testdefcal into single table
  4. Phase 4: Consolidate 4 reference range tables into one with discriminator column

3. Data Type Mismatches Across Tables

Severity: 🔴 Critical

The same logical field uses different data types in different tables, making JOINs impossible.

Field Table A Type Table B Type
SiteID ordertest VARCHAR(15) site INT
OccupationID contactdetail VARCHAR(50) occupation INT
SpcType testdeftech INT refnum VARCHAR(10)
Country patient INT account VARCHAR(50)
City locationaddress INT account VARCHAR(150)

High-Priority Issues

4. Inconsistent Naming Conventions

Issue Examples
Mixed case styles InternalPID, CreateDate vs AreaCode, Parent
Cryptic abbreviations patatt, patcom, patidt, patvisitadt
Inconsistent ID naming InternalPID, PatientID, PatIdtID, PatComID
Unclear field names VSet, VValue, AspCnt, ME, DIDType

5. Inconsistent Soft-Delete Strategy

Multiple date fields used inconsistently:

Table Fields Used
patient CreateDate, DelDate
patvisit CreateDate, EndDate, ArchivedDate, DelDate
patcom CreateDate, EndDate
testdef CreateDate, EndDate

No documented standard for determining record state (active/deleted/archived/ended).


6. Duplicate Log Table Design

Three nearly identical audit tables exist:

  • patreglog
  • patvisitlog
  • specimenlog

Recommendation: Consolidate into single audit_log table.


Medium Priority Issues

7. Redundant Data Storage

Table Redundancy
patres Stores both InternalSID AND SID
patres Stores both TestSiteID AND TestSiteCode
patrestatus Duplicates SID from parent table

8. Incomplete Table Designs

patrelation table: Missing RelatedPatientID, RelationType
users table: Missing email, created_at, updated_at, status, last_login

9. Migration Script Bugs

File Issue
Specimen.php Creates specimen, drops specimens
CRMOrganizations.php Creates account/site, drops accounts/sites
PatRes.php Drops non-existent patrestech table

Recommendations

Immediate (Sprint 1-2)

  1. Add TestID to testdefsite — Unblocks API development
  2. Fix migration script bugs — Correct table names in down() methods
  3. Document existing relationships — Create ERD with assumed relationships

Short-Term (Sprint 3-6)

  1. Add foreign key constraints — Prioritize patient → visit → order → result chain
  2. Fix data type mismatches — Create migration scripts for type alignment
  3. Standardize soft-delete — Use deleted_at only, everywhere

Medium-Term (Sprint 7-12)

  1. Consolidate audit logs — Single polymorphic audit table
  2. Normalize addresses — Single addresses table
  3. Rename cryptic columns — Document and rename for clarity

Appendix: Tables by Migration

Migration Tables
PatientReg patient, patatt, patcom, patidt, patreglog, patrelation
PatVisit patvisit, patdiag, patvisitadt, patvisitlog
Location location, locationaddress
Users users
Contact contact, contactdetail, occupation, medicalspecialty
ValueSet valueset, valuesetdef
Counter counter
Specimen containerdef, specimen, specimenstatus, specimencollection, specimenprep, specimenlog
OrderTest ordertest, ordercom, orderatt, orderstatus
Test testdef, testdefsite, testdeftech, testdefcal, testgrp, testmap
RefRange refnum, refthold, refvset, reftxt
CRMOrganizations account, site
Organization discipline, department, workstation
Equipment equipmentlist, comparameters, devicelist
AreaGeo areageo
PatRes patres, patresflag, patrestatus, flagdef

Process Improvement: Database Design Ownership

Current Challenge

The issues identified in this report share a common theme: disconnect between database structure and API consumption patterns. Many design decisions optimize for theoretical flexibility rather than practical developer workflow.

This is not a critique of intent — the design shows careful thought about multi-site configurability. However, when database schemas are designed in isolation from the developers who build APIs on top of them, friction inevitably occurs.

Industry Best Practice

Modern software development teams typically follow this ownership model:

Role Responsibility
Product/Business Define what data needs to exist (requirements)
Backend Developers Design how data is structured (schema design)
Backend Developers Implement APIs that consume the schema
DBA (if applicable) Optimize performance, manage infrastructure

The rationale is simple: those who consume the schema daily are best positioned to design it.

Benefits of Developer-Owned Schema Design

Benefit Description
API-First Thinking Tables designed with JSON output in mind
Faster Iterations Schema changes driven by real implementation needs
Reduced Friction No translation layer between "what was designed" and "what we need"
Better Documentation Developers document what they build
Ownership & Accountability Single team owns the full stack

Recommendation

Consider transitioning database schema design ownership to the backend development team for future modules. This would involve:

  1. Requirements Gathering — Business/product defines data needs
  2. Schema Proposal — Backend team designs tables based on API requirements
  3. Review — Technical review with stakeholders before implementation
  4. Implementation — Backend team executes migrations and builds APIs

This approach aligns with how most modern development teams operate and would prevent the types of issues found in this review.

Note

This recommendation is not about past decisions, but about optimizing future development velocity. The backend team's daily work with queries, JOINs, and API responses gives them unique insight into practical schema design.


Conclusion

The test definition table structure is the most immediate blocker for development. Without a clear relationship between testdef and testdefsite, creating coherent API responses is not feasible. This should be prioritized in Sprint 1.

The broader issues (missing FKs, type mismatches) represent significant technical debt that will compound over time. Investment in database refactoring now prevents costly incidents later.


Report generated from migration file analysis in app/Database/Migrations/