Skip to main content

Overview

Effective monitoring is crucial for a real-time trading platform like Opinix Trade. This guide covers monitoring strategies, tools, and best practices.
Monitoring helps detect issues before they impact users and provides insights for optimization.

Key Metrics to Monitor

Application Metrics

  • Request rate & latency
  • Error rates
  • WebSocket connections
  • Queue depth & processing time

Business Metrics

  • Active orders
  • Trade volume
  • User activity
  • Order matching latency

Infrastructure Metrics

  • CPU & memory usage
  • Database connections
  • Redis memory
  • Network throughput

User Experience

  • Page load time
  • Time to interactive
  • WebSocket latency
  • Error rates by endpoint

Logging Strategy

Opiinix Trade includes a shared @opinix/logger package used across services.

Logger Configuration

The logger package likely uses Winston based on the dependencies:
import { logger } from '@opinix/logger';

// Log levels: error, warn, info, http, verbose, debug
logger.info('Order placed', { 
  orderId: '123', 
  userId: 'user456',
  eventId: 'event789'
});

logger.error('Failed to process order', { 
  error: err.message,
  stack: err.stack,
  orderId: '123'
});

Structured Logging

Always use structured logging with context:
// Good - Structured
logger.info('Order matched', {
  orderId: order.id,
  price: order.price,
  quantity: order.quantity,
  side: order.side,
  eventId: order.eventId,
  userId: order.userId,
  matchTime: Date.now()
});

// Bad - Unstructured
logger.info(`Order ${order.id} matched at ${order.price}`);
Benefits:
  • Easier to search and filter
  • Better for log aggregation
  • Machine-readable
  • Enables analytics

Application Performance Monitoring (APM)

Sentry provides real-time error tracking and performance monitoring.Installation:
npm install @sentry/node @sentry/nextjs
Server Setup (Express):
import * as Sentry from '@sentry/node';

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  environment: process.env.NODE_ENV,
  tracesSampleRate: 1.0,
  integrations: [
    new Sentry.Integrations.Http({ tracing: true }),
    new Sentry.Integrations.Express({ app })
  ]
});

// Request handler must be first
app.use(Sentry.Handlers.requestHandler());
app.use(Sentry.Handlers.tracingHandler());

// Routes...

// Error handler must be last
app.use(Sentry.Handlers.errorHandler());
Client Setup (Next.js):
// sentry.client.config.js
import * as Sentry from '@sentry/nextjs';

Sentry.init({
  dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
  tracesSampleRate: 1.0,
  replaysSessionSampleRate: 0.1,
  replaysOnErrorSampleRate: 1.0
});
Comprehensive monitoring for Node.js applications.Installation:
npm install newrelic
Configuration:
// newrelic.js
exports.config = {
  app_name: ['Opinix Trade Server'],
  license_key: process.env.NEW_RELIC_LICENSE_KEY,
  distributed_tracing: { enabled: true },
  logging: { level: 'info' }
};
Usage:
// Must be first import
import 'newrelic';
import express from 'express';
// ... rest of imports
Full-stack monitoring with infrastructure metrics.Installation:
npm install dd-trace
Setup:
// Must be imported first
import tracer from 'dd-trace';
tracer.init({
  service: 'opinix-trade-server',
  env: process.env.NODE_ENV,
  logInjection: true
});

Infrastructure Monitoring

Database Monitoring

Monitor key PostgreSQL metrics:
-- Active connections
SELECT count(*) FROM pg_stat_activity;

-- Long running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active' AND now() - pg_stat_activity.query_start > interval '1 minute';

-- Database size
SELECT pg_size_pretty(pg_database_size('repo'));

-- Table sizes
SELECT tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Cache hit ratio (should be > 90%)
SELECT sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS cache_hit_ratio
FROM pg_statio_user_tables;
Key Metrics:
  • Connection pool usage
  • Query execution time
  • Cache hit ratio
  • Slow queries
  • Lock contention
  • Replication lag

Redis Monitoring

Monitor Redis with INFO command:
# Connect to Redis
redis-cli INFO

# Key metrics
redis-cli INFO stats
redis-cli INFO memory
redis-cli INFO clients

# Monitor commands in real-time
redis-cli MONITOR

# Check queue depth (BullMQ)
redis-cli LLEN bull:order-queue:wait
redis-cli LLEN bull:order-queue:active
redis-cli LLEN bull:order-queue:failed
Key Metrics:
  • Memory usage
  • Connected clients
  • Operations per second
  • Cache hit ratio
  • Evicted keys
  • Queue depths

Real-time Monitoring Dashboard

Grafana Setup

1

Install Grafana

docker run -d -p 3001:3000 --name=grafana grafana/grafana
2

Add Data Sources

Configure Prometheus, PostgreSQL, and Redis as data sources in Grafana.
3

Import Dashboards

Use pre-built dashboards:
  • Node.js Application Metrics
  • PostgreSQL Database
  • Redis
  • NGINX (if using as reverse proxy)
4

Create Custom Panels

Monitor Opinix Trade specific metrics:
  • Active orders by event
  • Order matching latency
  • WebSocket connections
  • Trade volume

Health Check Endpoints

Implement health check endpoints for all services:
// apps/server/src/routes/health.ts
import { Router } from 'express';
import { PrismaClient } from '@prisma/client';
import Redis from 'ioredis';

const router = Router();
const prisma = new PrismaClient();
const redis = new Redis(process.env.REDIS_URI);

router.get('/health', async (req, res) => {
  const checks = {
    status: 'ok',
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    checks: {
      database: 'unknown',
      redis: 'unknown'
    }
  };

  // Check database
  try {
    await prisma.$queryRaw`SELECT 1`;
    checks.checks.database = 'ok';
  } catch (error) {
    checks.checks.database = 'error';
    checks.status = 'degraded';
  }

  // Check Redis
  try {
    await redis.ping();
    checks.checks.redis = 'ok';
  } catch (error) {
    checks.checks.redis = 'error';
    checks.status = 'degraded';
  }

  const statusCode = checks.status === 'ok' ? 200 : 503;
  res.status(statusCode).json(checks);
});

export default router;

Alerting

Alert Rules

Immediate attention required:
  • Service down (health check failing)
  • Database connection pool exhausted
  • Error rate > 5%
  • Redis out of memory
  • Queue depth > 1000 orders
  • Order matching latency > 1s
Should be investigated:
  • CPU usage > 80%
  • Memory usage > 85%
  • Slow queries (> 1s)
  • Queue depth > 500
  • WebSocket connection errors
  • Failed background jobs
Informational:
  • Deployment completed
  • High trade volume
  • New event created
  • Scheduled maintenance

Alert Channels

Slack

Send alerts to Slack channels:
  • #alerts-critical
  • #alerts-warning
  • #monitoring

Email

Email notifications for critical issues to on-call team.

PagerDuty

Incident management and on-call scheduling.

Discord

Community notifications for status updates.

Performance Optimization

Query Optimization:
// Bad - N+1 query
const events = await prisma.event.findMany();
for (const event of events) {
  const participants = await prisma.user.findMany({
    where: { events: { some: { id: event.id } } }
  });
}

// Good - Single query with include
const events = await prisma.event.findMany({
  include: { participants: true }
});
Indexing:
-- Add indexes for frequently queried fields
CREATE INDEX idx_event_status ON "Event"(status);
CREATE INDEX idx_event_slug ON "Event"(slug);
CREATE INDEX idx_user_phone ON "User"("phoneNumber");
Connection Pooling:
const prisma = new PrismaClient({
  datasources: {
    db: {
      url: process.env.DATABASE_URL + '?connection_limit=10'
    }
  }
});

Next Steps