Understanding the System โ
A comprehensive technical deep dive into PerchIQX's architecture, design decisions, and implementation patterns.
This guide helps developers, contributors, and curious engineers understand why PerchIQX is built the way it is. We'll explore the architecture, examine key technical decisions, review implementation details, and walk through real challenges we solved.
๐ฏ What You'll Learn โ
- Architecture patterns - Hexagonal architecture, DDD, and dependency injection
- Design rationale - Why we chose each approach and what trade-offs we made
- Technical implementation - Real code examples with explanations
- Testing philosophy - How we achieved 407 comprehensive tests
- Problem-solving - Real bugs and how we debugged them
- Semantic intent patterns - Observable anchoring and intent preservation
Project Overview โ
What is PerchIQX? โ
PerchIQX is an MCP (Model Context Protocol) server that provides AI-assisted database intelligence for Cloudflare D1 databases. It's a reference implementation demonstrating semantic intent patterns and hexagonal architecture.
Core capabilities:
- AI agents (like Claude) can understand database schemas through natural language
- 407 passing tests demonstrate enterprise-grade engineering rigor
- Showcases advanced patterns: DDD, hexagonal architecture, semantic anchoring
- Production-ready with multi-environment support (dev/staging/prod)
Value proposition:
- Accelerates database development by giving AI context about your schema
- Prevents common database issues through automated validation
- Provides intelligent optimization recommendations
- Maintains semantic integrity across environments
Tech stack: TypeScript, Node.js 20+, Cloudflare D1, MCP SDK, Vitest
Technical Architecture โ
Hexagonal Architecture (Ports & Adapters) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Presentation Layer (MCP) โ
โ โข 5 MCP Tools (analyze, validate, etc.) โ
โ โข Protocol handling (stdio transport) โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Application Layer (Use Cases) โ
โ โข AnalyzeSchemaUseCase โ
โ โข GetRelationshipsUseCase โ
โ โข ValidateSchemaUseCase โ
โ โข SuggestOptimizationsUseCase โ
โ โข CompareSchemasUseCase โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Domain Layer (Business Logic) โ
โ โข Entities: DatabaseSchema, TableInfo โ
โ โข Services: SchemaAnalyzer, etc. โ
โ โข Value Objects: Environment enum โ
โ โข Ports: ICloudflareD1Repository โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Infrastructure Layer (Adapters) โ
โ โข CloudflareD1Repository โ
โ โข CloudflareAPIClient (HTTP) โ
โ โข InMemoryCacheProvider โ
โ โข DatabaseConfig (env routing) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโWhy Hexagonal?
- Testability - Domain logic tested without HTTP/DB dependencies
- Flexibility - Swap D1 for PostgreSQL by changing only Infrastructure layer
- Maintainability - Clear boundaries, changes localized to specific layers
- AI-Friendly - Clear structure helps AI understand intent
Directory Structure โ
src/
โโโ domain/ # Pure business logic (212 tests)
โ โโโ entities/ # DatabaseSchema, TableInfo, Column, etc.
โ โโโ services/ # SchemaAnalyzer, RelationshipAnalyzer
โ โโโ repositories/ # ICloudflareD1Repository (port interface)
โ โโโ value-objects/ # Environment enum
โโโ application/ # Use case orchestration (35 tests)
โ โโโ use-cases/ # 5 use cases mapping to MCP tools
โ โโโ ports/ # ICacheProvider interface
โโโ infrastructure/ # External adapters (64 tests)
โ โโโ adapters/ # CloudflareD1Repository, Cache
โ โโโ config/ # CloudflareConfig, DatabaseConfig
โ โโโ http/ # CloudflareAPIClient
โโโ presentation/ # MCP protocol layer (13 tests)
โ โโโ mcp/ # D1DatabaseMCPServer
โโโ index.ts # Composition root (DI wiring)Dependency Flow (Dependency Inversion) โ
Presentation โโโ Application โโโ Domain
โ โ โฒ
โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโ
Infrastructure
(implements ports)Key principle: All layers depend on Domain abstractions, never on Infrastructure details.
Key Design Decisions & Trade-offs โ
Why Hexagonal Architecture over MVC/3-Tier? โ
Decision: Implemented full hexagonal architecture with DDD patterns
Rationale:
- Testability - Can test domain logic without mocking HTTP/database
- Flexibility - Infrastructure (Cloudflare D1) is swappable
- Clear boundaries - Business logic isolated from technical concerns
- Reference implementation - Demonstrates advanced patterns
Trade-off:
- โ More maintainable, easier to extend
- โ Safer refactoring with comprehensive tests
- โ More files/complexity upfront
Code reference: src/index.ts - Composition root showing DI
Semantic Anchoring Pattern โ
Decision: Use semantic markers (observable properties) over technical metrics
Example:
// โ
SEMANTIC: Based on observable schema properties
const needsIndex = table.hasForeignKey() && !table.hasIndexOnForeignKey()
// โ STRUCTURAL: Based on runtime metrics
const needsIndex = table.rowCount > 10000 && table.queryCount > 100Rationale:
- Predictable - Recommendations based on schema structure, not runtime behavior
- Reproducible - Same schema = same recommendations
- Intent preservation - Environment semantics maintained through transformations
Code reference: OptimizationService.ts
Multi-Environment Support โ
Decision: Environment enum (development, staging, production) as value object
Rationale:
- Type safety - Compiler catches typos ("prod" vs "production")
- Semantic anchor - Environment drives database selection, never overridden
- Intent preservation - User-specified environment maintained through pipeline
Implementation:
// src/domain/value-objects/Environment.ts
export enum Environment {
DEVELOPMENT = 'development',
STAGING = 'staging',
PRODUCTION = 'production'
}
// Parsing with validation
export function parseEnvironment(value: string): Environment {
// Throws error for invalid values
}Caching Strategy โ
Decision: In-memory cache with 10-minute TTL for schema analysis
Rationale:
- Performance - Avoid repeated D1 API calls for same environment
- Cost optimization - D1 API has rate limits
- Simple - No external Redis dependency
Trade-off:
- โ Fast, simple, no dependencies
- โ Sufficient for single MCP server
- โ Cache not shared across processes (acceptable for this use case)
Implementation Highlights โ
Composition Root (Dependency Injection) โ
Location: src/index.ts
What it does: Wires all dependencies in a single place
// Infrastructure layer
const apiClient = new CloudflareAPIClient(cloudflareConfig);
const repository = new CloudflareD1Repository(apiClient, databaseConfig);
const cache = new InMemoryCacheProvider();
// Domain layer
const schemaAnalyzer = new SchemaAnalyzer();
const relationshipAnalyzer = new RelationshipAnalyzer();
// Application layer
const analyzeSchemaUseCase = new AnalyzeSchemaUseCase(
repository,
schemaAnalyzer,
databaseConfig,
cache
);
// Presentation layer
const mcpServer = new D1DatabaseMCPServer(
analyzeSchemaUseCase,
getRelationshipsUseCase,
// ... other use cases
);Benefits:
- Single source of truth - All dependencies wired in one place
- Explicit - Easy to see what depends on what
- Testable - Can inject mocks for testing
Pure Domain Logic (No Infrastructure Dependencies) โ
Location: src/domain/services/SchemaAnalyzer.ts
export class SchemaAnalyzer {
/**
* Analyze overall schema statistics
*
* Semantic: High-level schema health metrics
* NO infrastructure dependencies - pure business logic
*/
analyzeSchema(tables: TableInfo[]): SchemaAnalysisResult {
return {
totalTables: tables.length,
tablesWithPrimaryKeys: tables.filter(t => t.hasPrimaryKey()).length,
tablesWithForeignKeys: tables.filter(t => t.hasForeignKeys()).length,
// ... pure calculations on domain entities
};
}
}Why this matters:
- Testable - No mocking HTTP/DB needed
- Fast - Tests run in milliseconds
- Maintainable - Business rules centralized
MCP Tool Registration โ
Location: src/presentation/mcp/MCPServer.ts
Exposes 5 domain capabilities as MCP tools for AI agents:
{
name: 'analyze_database_schema',
description: 'Analyze D1 database schema structure...',
inputSchema: {
type: 'object',
properties: {
environment: {
type: 'string',
enum: ['development', 'staging', 'production']
},
includeSamples: { type: 'boolean', default: true },
maxSampleRows: { type: 'number', default: 5 }
},
required: ['environment']
}
}Benefits:
- AI-friendly - Claude can call these tools naturally
- Type-safe - JSON schema validates inputs
- Self-documenting - Clear descriptions for AI
Observable Property Pattern โ
Location: src/domain/entities/TableInfo.ts
export class TableInfo {
hasPrimaryKey(): boolean {
// Observable: Primary key columns directly visible
return this.columns.some(col => col.isPrimaryKey);
}
hasIndexOnColumn(columnName: string): boolean {
// Observable: Index presence is directly observable
return this.indexes.some(idx =>
idx.columns.includes(columnName)
);
}
}Why this matters:
- Semantic anchoring - Decisions based on observable schema, not inferred behavior
- Predictable - Same schema always produces same results
- Testable - Easy to verify with unit tests
Testing Strategy โ
Test Distribution โ
Total: 407 tests (all passing โ )
| Layer | Tests | Strategy |
|---|---|---|
| Domain | 212 | Pure unit tests, no mocks |
| Infrastructure | 64 | Mock HTTP responses |
| Application | 35 | Mock repository/services |
| Presentation | 13 | MCP protocol tests |
| Integration | 15 | End-to-end flows |
| Value Objects | 59 | Enum validation |
| Other | 9 | Utility/config tests |
Domain Layer Tests (No Mocks Needed) โ
Pure business logic with no infrastructure dependencies:
describe('SchemaAnalyzer', () => {
it('should detect tables without primary keys', () => {
const table = new TableInfo(
'users',
'table',
[new Column('name', 'TEXT', false)], // no primary key
[],
[]
);
const analyzer = new SchemaAnalyzer();
const result = analyzer.analyzeSchema([table]);
expect(result.tablesWithoutPrimaryKeys).toBe(1);
});
});Why this works:
- Pure business logic - no infrastructure dependencies
- Fast - no HTTP/DB calls
- Deterministic - same inputs = same outputs
Infrastructure Layer Tests (Mock HTTP) โ
Test adapters with mocked external services:
describe('CloudflareD1Repository', () => {
it('should fetch and parse table schema', async () => {
const mockClient = {
query: vi.fn().mockResolvedValue({
results: [{
name: 'users',
type: 'table',
sql: 'CREATE TABLE users (id INTEGER PRIMARY KEY)'
}]
})
};
const repository = new CloudflareD1Repository(
mockClient,
mockConfig
);
const tables = await repository.fetchTableDetails('db-id');
expect(tables).toHaveLength(1);
expect(tables[0].name).toBe('users');
});
});Testing Commands โ
npm test # Run all 407 tests
npm run test:watch # TDD mode
npm run test:coverage # Coverage report (>90% target)
npm run test:ui # Visual test runnerTechnical Deep Dive: Common Questions โ
Q: How does the composition root implement dependency injection? โ
A: The composition root in src/index.ts manually wires all dependencies following the dependency inversion principle.
Pattern:
// 1. Infrastructure (outer layer) - knows about external services
const apiClient = new CloudflareAPIClient(config);
const repository = new CloudflareD1Repository(apiClient, dbConfig);
// 2. Domain (inner layer) - no dependencies on infrastructure
const schemaAnalyzer = new SchemaAnalyzer();
// 3. Application (middle) - orchestrates domain + infrastructure
const analyzeSchemaUseCase = new AnalyzeSchemaUseCase(
repository, // Implements ICloudflareD1Repository port
schemaAnalyzer, // Domain service
dbConfig,
cache
);
// 4. Presentation (outer) - depends on application
const mcpServer = new D1DatabaseMCPServer(analyzeSchemaUseCase);Benefits:
- Single source of truth for dependency graph
- Constructor injection makes dependencies explicit
- Easy to swap implementations for testing
- Compiler catches missing dependencies
Q: How do you parse SQLite CREATE TABLE statements? โ
A: We extract schema structure from sqlite_master.sql column using progressive regex parsing.
Challenge: D1 returns raw SQL:
CREATE TABLE users (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
FOREIGN KEY (org_id) REFERENCES orgs(id)
)Solution - Multi-step parsing:
1. Extract column definitions
private parseColumns(sql: string): Column[] {
const regex = /(\w+)\s+(TEXT|INTEGER|REAL|BLOB)\s*(PRIMARY KEY|NOT NULL)*/gi;
// Returns: [Column('id', 'INTEGER', primaryKey=true), ...]
}2. Extract foreign keys
private parseForeignKeys(sql: string): ForeignKey[] {
const fkRegex = /FOREIGN KEY\s*\((\w+)\)\s*REFERENCES\s*(\w+)\s*\((\w+)\)/gi;
// Returns: [ForeignKey('org_id', 'orgs', 'id'), ...]
}Robustness:
- Handles inline vs constraint syntax
- Supports composite primary keys
- Handles CASCADE/SET NULL actions
- 24 tests verify parsing accuracy
Code reference: CloudflareD1Repository.ts
Q: Why in-memory cache instead of Redis? โ
A: In-memory caching is simpler and sufficient for this use case.
Implementation:
export class InMemoryCacheProvider implements ICacheProvider {
private cache = new Map<string, CacheEntry>();
async set(key: string, value: unknown, ttl: number): Promise<void> {
this.cache.set(key, {
value,
expiresAt: Date.now() + (ttl * 1000)
});
}
async get<T>(key: string): Promise<T | null> {
const entry = this.cache.get(key);
if (!entry || Date.now() > entry.expiresAt) {
return null;
}
return entry.value as T;
}
}Trade-off analysis:
In-memory cache:
- โ Simple - no external dependencies
- โ Fast - no network latency
- โ Sufficient for single MCP server
- โ Not shared across processes
- โ Lost on restart
Redis cache:
- โ Shared across processes
- โ Persistent
- โ Additional ops complexity
- โ Network latency
- โ Overkill for this use case
Decision: For single MCP server with infrequent schema changes, in-memory is optimal.
Q: How do you prevent semantic violations? โ
A: Multi-layer protection ensures environment semantics are never overridden.
The problem:
// โ SEMANTIC VIOLATION
async analyzeSchema(environment: Environment) {
if (quickMode) {
environment = Environment.DEVELOPMENT; // Overrides user intent!
}
}Protection layers:
1. Immutability
export class DatabaseConfig {
constructor(private readonly databases: Map<Environment, DatabaseInfo>) {
Object.freeze(this); // Prevent mutations
}
}2. Type system
analyzeSchemaUseCase.execute({
environment: Environment.PRODUCTION // โ
Type-safe
});
analyzeSchemaUseCase.execute({
environment: "prod" // โ Compiler error!
});3. Runtime validation
export function parseEnvironment(value: string): Environment {
if (!Object.values(Environment).includes(value as Environment)) {
throw new Error(`Invalid environment: ${value}`);
}
return value as Environment;
}4. Governance rules - SEMANTIC_ANCHORING_GOVERNANCE.md
Q: What's the ICE Score pattern? โ
A: ICE (Impact, Confidence, Ease) helps prioritize schema differences when comparing environments.
Problem: When comparing dev โ prod, you might find 50+ differences. Which are most important?
Solution:
export class ICEScore {
constructor(
public readonly impact: number, // 1-10: Business impact
public readonly confidence: number, // 1-10: How sure are we?
public readonly ease: number // 1-10: How easy to fix?
) {}
getScore(): number {
return (this.impact * this.confidence * this.ease) / 100;
}
}Example:
| Difference | Impact | Confidence | Ease | Score | Priority |
|---|---|---|---|---|---|
| Missing PK | 10 | 10 | 3 | 3.0 | HIGH |
| Missing FK index | 8 | 9 | 7 | 5.04 | HIGH |
| New table | 7 | 5 | 4 | 1.4 | MEDIUM |
Benefits:
- Prioritizes critical changes over nice-to-haves
- Helps plan migrations
- Data-driven decision making
Real-World Challenges & Solutions โ
Challenge: Duplicate Optimization Recommendations โ
Symptom: suggest_schema_optimizations returning duplicate recommendations
Root cause: Checking foreign keys twice - once via relationships array, once via table.foreignKeys
The bug:
// BUGGY CODE
for (const rel of relationships) {
// Check via relationships
if (!table.hasIndexOnColumn(rel.fromColumn)) {
optimizations.push(/* recommendation */);
}
}
for (const table of tables) {
for (const fk of table.foreignKeys) {
// Check same FKs again! DUPLICATE!
if (!table.hasIndexOnColumn(fk.fromColumn)) {
optimizations.push(/* recommendation */);
}
}
}The fix:
// FIXED CODE
const seenColumns = new Set<string>(); // Deduplication
for (const rel of relationships) {
const key = `${rel.fromTable}.${rel.fromColumn}`;
if (!table.hasIndexOnColumn(rel.fromColumn) && !seenColumns.has(key)) {
optimizations.push(/* recommendation */);
seenColumns.add(key);
}
}Lesson learned: Domain modeling matters - Relationship is derived from ForeignKey, should have been clearer in design
Prevention: Added test:
it('should not generate duplicate recommendations', () => {
const optimizations = service.analyzeIndexNeeds(tables, relationships);
const unique = new Set(optimizations.map(o => `${o.tableName}.${o.columnName}`));
expect(optimizations.length).toBe(unique.size);
});Challenge: Parsing Different SQL Syntax Variations โ
Problem: SQLite CREATE TABLE has many syntax variations:
-- Inline primary key
CREATE TABLE users (id INTEGER PRIMARY KEY);
-- Constraint primary key
CREATE TABLE users (id INTEGER, PRIMARY KEY (id));
-- Composite primary key
CREATE TABLE user_roles (user_id INTEGER, role_id INTEGER, PRIMARY KEY (user_id, role_id));
-- Foreign key variations
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADESolution: Progressive regex parsing with comprehensive test coverage
Testing strategy:
describe('SQL Parsing', () => {
it('should parse inline primary key', () => { /* ... */ });
it('should parse constraint primary key', () => { /* ... */ });
it('should parse composite primary key', () => { /* ... */ });
it('should parse foreign key with CASCADE', () => { /* ... */ });
});Trade-off:
- โ Handles 90% of common SQLite syntax
- โ Fast - regex is performant
- โ Fragile to unusual syntax
- Could improve with proper SQL parser library
Semantic Intent Patterns โ
Semantic Over Structural โ
Principle: Base behavior on observable meaning, not technical characteristics
// โ
SEMANTIC
const needsIndex = table.hasForeignKey() && !table.hasIndexOnForeignKey();
// โ STRUCTURAL
const needsIndex = table.rowCount > 10000;Intent Preservation โ
Principle: Maintain semantic contracts through transformations
// โ
Environment preserved
const schema = await repository.fetchSchema(environment);
// โ Intent override
if (quickMode) environment = Environment.DEVELOPMENT;Observable Anchoring โ
Principle: Decisions based on directly observable properties
// โ
Observable
const relationships = parseForeignKeys(sqliteMaster);
// โ Inferred
const relationships = inferFromQueryPatterns(logs);Engineering Principles Demonstrated โ
1. Domain-Driven Design โ
Rich domain entities with behavior:
export class TableInfo {
hasPrimaryKey(): boolean { /* logic */ }
hasIndexOnColumn(name: string): boolean { /* logic */ }
}Clear bounded contexts:
- Schema Context (tables, columns, constraints)
- Relationship Context (foreign keys, references)
- Optimization Context (recommendations, improvements)
2. SOLID Principles โ
Single Responsibility:
SchemaAnalyzer- schema analysis onlyRelationshipAnalyzer- relationship extraction onlyOptimizationService- optimization recommendations only
Dependency Inversion:
- All layers depend on domain abstractions
ICloudflareD1RepositoryinterfaceICacheProviderinterface
Interface Segregation:
- Small, focused interfaces
- Clients depend only on what they need
3. Testing Best Practices โ
- 407 comprehensive tests across all layers
- Fast test suite (3.6 seconds)
- No mocks in domain layer
- Integration tests verify end-to-end flows
4. Documentation Quality โ
- Extensive README with examples
- Architecture documentation
- Semantic anchoring governance
- Comments explain "why" not "what"
Learning Resources โ
Key Files to Study โ
Architecture:
- src/index.ts - Composition root
- ARCHITECTURE.md - Design documentation
Domain Layer:
- src/domain/services/SchemaAnalyzer.ts - Pure business logic
- src/domain/entities/TableInfo.ts - Rich domain entity
Infrastructure:
- src/infrastructure/adapters/CloudflareD1Repository.ts - SQL parsing
Governance:
- SEMANTIC_ANCHORING_GOVERNANCE.md - Pattern rules
Related Projects โ
Part of Semantic Intent ecosystem:
- Semantic Wake Intelligence - Temporal reasoning
- Semantic Foragecast Engine - Multimedia pipeline
All demonstrate similar patterns applied to different domains.
Quick Reference Stats โ
- 407 passing tests (100% pass rate)
- 4 architectural layers (hexagonal)
- 5 MCP tools for AI agents
- 10-minute cache TTL for schema analysis
- 3 environments supported (dev/staging/prod)
- Node.js 20+ minimum
- TypeScript 5.9 for type safety
- 3.6 seconds to run full test suite
Contributing โ
Understanding the system architecture and design decisions is the first step to contributing effectively.
Next steps:
- Read ARCHITECTURE.md
- Review SEMANTIC_ANCHORING_GOVERNANCE.md
- Study the composition root in src/index.ts
- Run tests:
npm test - Check CONTRIBUTING.md
This guide helps you understand not just what PerchIQX does, but why it's built this way and how the engineering decisions were made. ๐๏ธ
