Overview
Effective monitoring is crucial for a real-time trading platform like Opinix Trade. This guide covers monitoring strategies, tools, and best practices.
Monitoring helps detect issues before they impact users and provides insights for optimization.
Key Metrics to Monitor
Application Metrics
Request rate & latency
Error rates
WebSocket connections
Queue depth & processing time
Business Metrics
Active orders
Trade volume
User activity
Order matching latency
Infrastructure Metrics
CPU & memory usage
Database connections
Redis memory
Network throughput
User Experience
Page load time
Time to interactive
WebSocket latency
Error rates by endpoint
Logging Strategy
Opiinix Trade includes a shared @opinix/logger package used across services.
Logger Configuration
The logger package likely uses Winston based on the dependencies:
import { logger } from '@opinix/logger' ;
// Log levels: error, warn, info, http, verbose, debug
logger . info ( 'Order placed' , {
orderId: '123' ,
userId: 'user456' ,
eventId: 'event789'
});
logger . error ( 'Failed to process order' , {
error: err . message ,
stack: err . stack ,
orderId: '123'
});
Structured Logging
Best Practices
Log Levels
Context Injection
Always use structured logging with context: // Good - Structured
logger . info ( 'Order matched' , {
orderId: order . id ,
price: order . price ,
quantity: order . quantity ,
side: order . side ,
eventId: order . eventId ,
userId: order . userId ,
matchTime: Date . now ()
});
// Bad - Unstructured
logger . info ( `Order ${ order . id } matched at ${ order . price } ` );
Benefits:
Easier to search and filter
Better for log aggregation
Machine-readable
Enables analytics
Use appropriate log levels: // ERROR - Critical issues requiring immediate attention
logger . error ( 'Database connection failed' , { error });
// WARN - Issues that should be investigated
logger . warn ( 'Queue depth exceeds threshold' , { depth: 1000 });
// INFO - Important business events
logger . info ( 'Order placed' , { orderId , eventId });
// HTTP - HTTP requests (via morgan middleware)
// Automatically logged by morgan
// DEBUG - Detailed diagnostic information
logger . debug ( 'Processing order' , { order });
Add request context to all logs: // Express middleware
app . use (( req , res , next ) => {
req . logger = logger . child ({
requestId: req . id ,
userId: req . user ?. id ,
ip: req . ip
});
next ();
});
// Use in routes
app . post ( '/order' , ( req , res ) => {
req . logger . info ( 'Order received' , { order: req . body });
});
Sentry provides real-time error tracking and performance monitoring. Installation: npm install @sentry/node @sentry/nextjs
Server Setup (Express): import * as Sentry from '@sentry/node' ;
Sentry . init ({
dsn: process . env . SENTRY_DSN ,
environment: process . env . NODE_ENV ,
tracesSampleRate: 1.0 ,
integrations: [
new Sentry . Integrations . Http ({ tracing: true }),
new Sentry . Integrations . Express ({ app })
]
});
// Request handler must be first
app . use ( Sentry . Handlers . requestHandler ());
app . use ( Sentry . Handlers . tracingHandler ());
// Routes...
// Error handler must be last
app . use ( Sentry . Handlers . errorHandler ());
Client Setup (Next.js): // sentry.client.config.js
import * as Sentry from '@sentry/nextjs' ;
Sentry . init ({
dsn: process . env . NEXT_PUBLIC_SENTRY_DSN ,
tracesSampleRate: 1.0 ,
replaysSessionSampleRate: 0.1 ,
replaysOnErrorSampleRate: 1.0
});
Comprehensive monitoring for Node.js applications. Installation: Configuration: // newrelic.js
exports . config = {
app_name: [ 'Opinix Trade Server' ],
license_key: process . env . NEW_RELIC_LICENSE_KEY ,
distributed_tracing: { enabled: true },
logging: { level: 'info' }
};
Usage: // Must be first import
import 'newrelic' ;
import express from 'express' ;
// ... rest of imports
Datadog - Infrastructure & APM
Full-stack monitoring with infrastructure metrics. Installation: Setup: // Must be imported first
import tracer from 'dd-trace' ;
tracer . init ({
service: 'opinix-trade-server' ,
env: process . env . NODE_ENV ,
logInjection: true
});
Infrastructure Monitoring
Database Monitoring
PostgreSQL Metrics
Prisma Monitoring
Monitor key PostgreSQL metrics: -- Active connections
SELECT count ( * ) FROM pg_stat_activity;
-- Long running queries
SELECT pid, now () - pg_stat_activity . query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active' AND now () - pg_stat_activity . query_start > interval '1 minute' ;
-- Database size
SELECT pg_size_pretty(pg_database_size( 'repo' ));
-- Table sizes
SELECT tablename, pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename))
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC ;
-- Cache hit ratio (should be > 90%)
SELECT sum (heap_blks_hit) / ( sum (heap_blks_hit) + sum (heap_blks_read)) AS cache_hit_ratio
FROM pg_statio_user_tables;
Key Metrics:
Connection pool usage
Query execution time
Cache hit ratio
Slow queries
Lock contention
Replication lag
Add Prisma query logging: import { PrismaClient } from '@prisma/client' ;
const prisma = new PrismaClient ({
log: [
{ emit: 'event' , level: 'query' },
{ emit: 'event' , level: 'error' },
{ emit: 'event' , level: 'warn' }
]
});
prisma . $on ( 'query' , ( e ) => {
logger . debug ( 'Query executed' , {
query: e . query ,
params: e . params ,
duration: e . duration ,
target: e . target
});
// Alert on slow queries
if ( e . duration > 1000 ) {
logger . warn ( 'Slow query detected' , {
query: e . query ,
duration: e . duration
});
}
});
prisma . $on ( 'error' , ( e ) => {
logger . error ( 'Prisma error' , { error: e });
});
Redis Monitoring
Redis Metrics
BullMQ Monitoring
Monitor Redis with INFO command: # Connect to Redis
redis-cli INFO
# Key metrics
redis-cli INFO stats
redis-cli INFO memory
redis-cli INFO clients
# Monitor commands in real-time
redis-cli MONITOR
# Check queue depth (BullMQ)
redis-cli LLEN bull:order-queue:wait
redis-cli LLEN bull:order-queue:active
redis-cli LLEN bull:order-queue:failed
Key Metrics:
Memory usage
Connected clients
Operations per second
Cache hit ratio
Evicted keys
Queue depths
Monitor BullMQ queues: import { Queue } from 'bullmq' ;
const queue = new Queue ( 'order-queue' , {
connection: { host: 'localhost' , port: 6379 }
});
// Get queue metrics
async function getQueueMetrics () {
const [ waiting , active , completed , failed , delayed ] = await Promise . all ([
queue . getWaitingCount (),
queue . getActiveCount (),
queue . getCompletedCount (),
queue . getFailedCount (),
queue . getDelayedCount ()
]);
logger . info ( 'Queue metrics' , {
waiting ,
active ,
completed ,
failed ,
delayed
});
// Alert on high queue depth
if ( waiting > 100 ) {
logger . warn ( 'High queue depth' , { waiting });
}
return { waiting , active , completed , failed , delayed };
}
// Run every minute
setInterval ( getQueueMetrics , 60000 );
Real-time Monitoring Dashboard
Grafana Setup
Install Grafana
docker run -d -p 3001:3000 --name=grafana grafana/grafana
Add Data Sources
Configure Prometheus, PostgreSQL, and Redis as data sources in Grafana.
Import Dashboards
Use pre-built dashboards:
Node.js Application Metrics
PostgreSQL Database
Redis
NGINX (if using as reverse proxy)
Create Custom Panels
Monitor Opinix Trade specific metrics:
Active orders by event
Order matching latency
WebSocket connections
Trade volume
Health Check Endpoints
Implement health check endpoints for all services:
Server Health Check
Liveness & Readiness
// apps/server/src/routes/health.ts
import { Router } from 'express' ;
import { PrismaClient } from '@prisma/client' ;
import Redis from 'ioredis' ;
const router = Router ();
const prisma = new PrismaClient ();
const redis = new Redis ( process . env . REDIS_URI );
router . get ( '/health' , async ( req , res ) => {
const checks = {
status: 'ok' ,
timestamp: new Date (). toISOString (),
uptime: process . uptime (),
checks: {
database: 'unknown' ,
redis: 'unknown'
}
};
// Check database
try {
await prisma . $queryRaw `SELECT 1` ;
checks . checks . database = 'ok' ;
} catch ( error ) {
checks . checks . database = 'error' ;
checks . status = 'degraded' ;
}
// Check Redis
try {
await redis . ping ();
checks . checks . redis = 'ok' ;
} catch ( error ) {
checks . checks . redis = 'error' ;
checks . status = 'degraded' ;
}
const statusCode = checks . status === 'ok' ? 200 : 503 ;
res . status ( statusCode ). json ( checks );
});
export default router ;
// Kubernetes-style probes
// Liveness - Is the service running?
router . get ( '/health/live' , ( req , res ) => {
res . status ( 200 ). json ({ status: 'alive' });
});
// Readiness - Can the service accept traffic?
router . get ( '/health/ready' , async ( req , res ) => {
try {
// Check critical dependencies
await Promise . all ([
prisma . $queryRaw `SELECT 1` ,
redis . ping ()
]);
res . status ( 200 ). json ({ status: 'ready' });
} catch ( error ) {
res . status ( 503 ). json ({ status: 'not ready' , error: error . message });
}
});
Alerting
Alert Rules
Immediate attention required:
Service down (health check failing)
Database connection pool exhausted
Error rate > 5%
Redis out of memory
Queue depth > 1000 orders
Order matching latency > 1s
Should be investigated:
CPU usage > 80%
Memory usage > 85%
Slow queries (> 1s)
Queue depth > 500
WebSocket connection errors
Failed background jobs
Informational:
Deployment completed
High trade volume
New event created
Scheduled maintenance
Alert Channels
Slack Send alerts to Slack channels:
#alerts-critical
#alerts-warning
#monitoring
Email Email notifications for critical issues to on-call team.
PagerDuty Incident management and on-call scheduling.
Discord Community notifications for status updates.
Database
Caching
Load Testing
Query Optimization: // Bad - N+1 query
const events = await prisma . event . findMany ();
for ( const event of events ) {
const participants = await prisma . user . findMany ({
where: { events: { some: { id: event . id } } }
});
}
// Good - Single query with include
const events = await prisma . event . findMany ({
include: { participants: true }
});
Indexing: -- Add indexes for frequently queried fields
CREATE INDEX idx_event_status ON "Event" ( status );
CREATE INDEX idx_event_slug ON "Event" (slug);
CREATE INDEX idx_user_phone ON "User" ( "phoneNumber" );
Connection Pooling: const prisma = new PrismaClient ({
datasources: {
db: {
url: process . env . DATABASE_URL + '?connection_limit=10'
}
}
});
Implement caching strategy: import Redis from 'ioredis' ;
const redis = new Redis ( process . env . REDIS_URI );
const CACHE_TTL = 60 ; // seconds
async function getEventWithCache ( eventId : string ) {
// Check cache first
const cached = await redis . get ( `event: ${ eventId } ` );
if ( cached ) {
return JSON . parse ( cached );
}
// Query database
const event = await prisma . event . findUnique ({
where: { id: eventId },
include: { participants: true }
});
// Store in cache
await redis . setex (
`event: ${ eventId } ` ,
CACHE_TTL ,
JSON . stringify ( event )
);
return event ;
}
// Invalidate cache on update
async function updateEvent ( eventId : string , data : any ) {
const event = await prisma . event . update ({
where: { id: eventId },
data
});
// Invalidate cache
await redis . del ( `event: ${ eventId } ` );
return event ;
}
Test performance with load testing tools: # Install k6
brew install k6 # macOS
# or download from k6.io
# Create load test
cat > load-test.js << 'EOF'
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '30s', target: 20 },
{ duration: '1m', target: 50 },
{ duration: '30s', target: 0 },
],
};
export default function () {
const res = http.get('http://localhost:3001/api/events');
check(res, { 'status was 200': (r) => r.status == 200 });
sleep(1);
}
EOF
# Run load test
k6 run load-test.js
Next Steps