Metrics Collection
Overview
Metrics Collection is a fundamental observability pattern in enterprise integration architectures that systematically gathers, aggregates, and exposes quantitative measurements of system behavior, performance characteristics, and business outcomes. Like a sophisticated dashboard in a modern vehicle that provides real-time insights into engine performance, fuel consumption, speed, and various operational parameters, metrics collection provides continuous visibility into the health, performance, and operational state of distributed systems. This pattern is essential for understanding system behavior patterns, identifying performance trends, enabling proactive issue detection, capacity planning, and making data-driven decisions about system optimization and scaling in complex enterprise environments.
Theoretical Foundation
Metrics Collection is grounded in measurement theory, statistical analysis, time-series data management, and operational intelligence principles. It incorporates concepts from performance monitoring, statistical sampling, data aggregation, and anomaly detection to provide a comprehensive framework for quantitative system observability. The pattern addresses the fundamental need for objective, measurable insights into system behavior that enable data-driven operational decisions, performance optimization, and proactive issue detection in complex distributed environments.
Core Principles
1. Systematic Measurement and Collection
Comprehensive gathering of quantitative system data: - Metric definition - clearly defined measurements with consistent semantics and units - Data collection - systematic gathering of metric data from all system components - Temporal consistency - regular, predictable metric collection intervals and timestamps - Multi-dimensional data - metrics with rich metadata and dimensional attributes for analysis
2. Aggregation and Statistical Analysis
Processing raw metric data for meaningful insights: - Data aggregation - combining metric data across time periods, systems, and dimensions - Statistical computation - calculating averages, percentiles, rates, and other statistical measures - Trend analysis - identifying patterns, trends, and anomalies in metric data over time - Comparative analysis - comparing metrics across different system components and time periods
3. Real-Time Processing and Storage
Efficient handling of high-volume metric data: - Stream processing - real-time processing of metric data as it's collected - Time-series storage - specialized storage optimized for time-stamped metric data - Data retention - intelligent data lifecycle management with appropriate retention policies - Query optimization - efficient querying and retrieval of metric data for analysis
4. Alerting and Anomaly Detection
Proactive identification of operational issues and anomalies: - Threshold monitoring - detecting when metrics exceed predefined operational thresholds - Anomaly detection - identifying unusual patterns and behaviors in metric data - Predictive alerting - forecasting potential issues based on metric trends - Alert correlation - connecting related metric anomalies for comprehensive issue detection
Why Metrics Collection is Essential in Integration Architecture
1. Operational Visibility and Control
In complex integration environments, metrics collection provides: - System health monitoring - continuous assessment of system operational state - Performance tracking - quantitative measurement of system performance characteristics - Resource utilization monitoring - tracking computational, memory, and network resource usage - Business process metrics - measurement of business-relevant operational outcomes
2. Capacity Planning and Scaling
Supporting infrastructure and capacity management: - Resource trend analysis - understanding resource consumption patterns over time - Capacity forecasting - predicting future resource needs based on historical trends - Scaling decisions - data-driven decisions about horizontal and vertical scaling - Cost optimization - optimizing infrastructure costs through usage-based insights
3. Performance Optimization
Enabling systematic performance improvement: - Bottleneck identification - pinpointing performance constraints and limitations - Optimization measurement - quantifying the impact of performance improvements - Baseline establishment - establishing performance baselines for comparison - Regression detection - identifying performance regressions through continuous monitoring
4. Service Level Management
Supporting service level agreement (SLA) management and compliance: - SLA monitoring - tracking compliance with service level agreements - Quality metrics - measuring service quality attributes like availability and reliability - Customer impact assessment - understanding how system performance affects customers - Compliance reporting - generating reports for regulatory and business compliance
Benefits in Integration Contexts
1. Technical Advantages
- Data-driven decisions - objective, quantitative basis for operational and architectural decisions
- Proactive management - early detection of issues before they impact users
- Performance insights - detailed understanding of system performance characteristics
- Automation enablement - enabling automated scaling, alerting, and optimization
2. Business Value
- Operational efficiency - improved operational efficiency through better visibility
- Cost management - optimized infrastructure costs through usage insights
- Risk mitigation - reduced operational risk through proactive monitoring
- Business intelligence - insights into business process performance and outcomes
3. Integration Enablement
- Integration monitoring - comprehensive monitoring of integration flows and processes
- Message throughput tracking - monitoring message processing rates and volumes
- Error rate monitoring - tracking integration error rates and failure patterns
- Latency measurement - measuring processing delays and response times
4. Operational Excellence
- Continuous improvement - data-driven continuous improvement of system operations
- Root cause analysis - supporting incident investigation through historical metric data
- Trend identification - identifying long-term trends for strategic planning
- Benchmarking - establishing benchmarks for performance comparison
Integration Architecture Applications
1. Microservices Metrics Collection
Comprehensive metrics for microservices architecture:
// Metrics Configuration
@Configuration
@EnableConfigurationProperties(MetricsProperties.class)
public class MetricsConfiguration {
@Bean
public MeterRegistry meterRegistry() {
return new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
}
@Bean
public TimedAspect timedAspect(MeterRegistry registry) {
return new TimedAspect(registry);
}
@Bean
public CountedAspect countedAspect(MeterRegistry registry) {
return new CountedAspect(registry);
}
@Bean
public MetricsCollector metricsCollector(MeterRegistry registry) {
return new MetricsCollector(registry);
}
}
// Custom Metrics Collector
@Component
public class MetricsCollector {
private final MeterRegistry meterRegistry;
private final Counter requestCounter;
private final Timer requestTimer;
private final Gauge activeConnectionsGauge;
private final DistributionSummary messageSizeSummary;
public MetricsCollector(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
this.requestCounter = Counter.builder("http.requests.total")
.description("Total number of HTTP requests")
.register(meterRegistry);
this.requestTimer = Timer.builder("http.request.duration")
.description("HTTP request processing time")
.register(meterRegistry);
this.messageSizeSummary = DistributionSummary.builder("message.size")
.description("Message size distribution")
.baseUnit("bytes")
.register(meterRegistry);
// Register JVM and system metrics
new JvmMetrics().bindTo(meterRegistry);
new ProcessorMetrics().bindTo(meterRegistry);
new UptimeMetrics().bindTo(meterRegistry);
new DiskSpaceMetrics(new File("/")).bindTo(meterRegistry);
}
public void recordHttpRequest(String method, String endpoint, int statusCode, long duration) {
requestCounter.increment(
Tags.of(
Tag.of("method", method),
Tag.of("endpoint", endpoint),
Tag.of("status", String.valueOf(statusCode))
)
);
requestTimer.record(duration, TimeUnit.MILLISECONDS,
Tags.of(
Tag.of("method", method),
Tag.of("endpoint", endpoint)
)
);
}
public void recordMessageProcessing(String messageType, int messageSize, long processingTime, boolean success) {
Counter.builder("messages.processed.total")
.description("Total number of processed messages")
.tags(
Tag.of("type", messageType),
Tag.of("status", success ? "success" : "error")
)
.register(meterRegistry)
.increment();
Timer.builder("message.processing.duration")
.description("Message processing time")
.tags(Tag.of("type", messageType))
.register(meterRegistry)
.record(processingTime, TimeUnit.MILLISECONDS);
messageSizeSummary.record(messageSize,
Tags.of(Tag.of("type", messageType)));
}
public void recordBusinessMetric(String metricName, double value, Tags tags) {
Gauge.builder("business." + metricName)
.description("Business metric: " + metricName)
.tags(tags)
.register(meterRegistry, this, obj -> value);
}
public void recordCustomCounter(String counterName, String description, Tags tags) {
Counter.builder(counterName)
.description(description)
.tags(tags)
.register(meterRegistry)
.increment();
}
}
// Order Service with Metrics
@RestController
@RequestMapping("/orders")
public class OrderController {
@Autowired
private OrderService orderService;
@Autowired
private MetricsCollector metricsCollector;
@PostMapping
@Timed(name = "order.creation.time", description = "Order creation processing time")
@Counted(name = "order.creation.count", description = "Order creation count")
public ResponseEntity<OrderResponse> createOrder(@RequestBody CreateOrderRequest request) {
long startTime = System.currentTimeMillis();
try {
// Record request metrics
metricsCollector.recordCustomCounter("order.requests.total",
"Total order requests",
Tags.of(Tag.of("operation", "create")));
OrderResult result = orderService.processOrder(request);
long duration = System.currentTimeMillis() - startTime;
if (result.isSuccess()) {
metricsCollector.recordHttpRequest("POST", "/orders", 200, duration);
metricsCollector.recordBusinessMetric("order.value",
result.getOrder().getTotalAmount().doubleValue(),
Tags.of(Tag.of("customer_tier", result.getOrder().getCustomerTier())));
return ResponseEntity.ok(OrderResponse.from(result));
} else {
metricsCollector.recordHttpRequest("POST", "/orders", 400, duration);
return ResponseEntity.badRequest().body(OrderResponse.error(result.getErrorMessage()));
}
} catch (Exception e) {
long duration = System.currentTimeMillis() - startTime;
metricsCollector.recordHttpRequest("POST", "/orders", 500, duration);
metricsCollector.recordCustomCounter("order.errors.total",
"Total order processing errors",
Tags.of(Tag.of("error_type", e.getClass().getSimpleName())));
throw e;
}
}
@GetMapping("/{orderId}")
@Timed(name = "order.retrieval.time", description = "Order retrieval time")
public ResponseEntity<OrderResponse> getOrder(@PathVariable String orderId) {
long startTime = System.currentTimeMillis();
try {
Optional<Order> order = orderService.getOrder(orderId);
long duration = System.currentTimeMillis() - startTime;
if (order.isPresent()) {
metricsCollector.recordHttpRequest("GET", "/orders/{id}", 200, duration);
return ResponseEntity.ok(OrderResponse.from(order.get()));
} else {
metricsCollector.recordHttpRequest("GET", "/orders/{id}", 404, duration);
return ResponseEntity.notFound().build();
}
} catch (Exception e) {
long duration = System.currentTimeMillis() - startTime;
metricsCollector.recordHttpRequest("GET", "/orders/{id}", 500, duration);
throw e;
}
}
}
// Message Processing Metrics
@Service
public class MessageProcessingService {
@Autowired
private MetricsCollector metricsCollector;
@EventListener
public void handleOrderEvent(OrderEvent event) {
long startTime = System.currentTimeMillis();
String messageType = event.getEventType();
int messageSize = calculateMessageSize(event);
try {
processOrderEvent(event);
long duration = System.currentTimeMillis() - startTime;
metricsCollector.recordMessageProcessing(messageType, messageSize, duration, true);
// Record business metrics
if ("ORDER_PLACED".equals(messageType)) {
metricsCollector.recordBusinessMetric("orders.placed.rate", 1.0,
Tags.of(Tag.of("customer_segment", event.getCustomerSegment())));
}
} catch (Exception e) {
long duration = System.currentTimeMillis() - startTime;
metricsCollector.recordMessageProcessing(messageType, messageSize, duration, false);
metricsCollector.recordCustomCounter("message.processing.errors",
"Message processing errors",
Tags.of(
Tag.of("message_type", messageType),
Tag.of("error_type", e.getClass().getSimpleName())
));
log.error("Error processing message of type: {}", messageType, e);
}
}
private void processOrderEvent(OrderEvent event) {
// Process the event
switch (event.getEventType()) {
case "ORDER_PLACED":
handleOrderPlaced(event);
break;
case "ORDER_UPDATED":
handleOrderUpdated(event);
break;
case "ORDER_CANCELLED":
handleOrderCancelled(event);
break;
default:
throw new IllegalArgumentException("Unknown event type: " + event.getEventType());
}
}
private int calculateMessageSize(OrderEvent event) {
try {
ObjectMapper mapper = new ObjectMapper();
return mapper.writeValueAsString(event).getBytes().length;
} catch (Exception e) {
return 0;
}
}
}
2. Apache Camel Integration Metrics
Comprehensive metrics collection for Camel routes:
@Component
public class CamelMetricsRoute extends RouteBuilder {
@Autowired
private MetricsCollector metricsCollector;
@Override
public void configure() throws Exception {
// Enable Camel metrics
getContext().addRoutePolicyFactory(new MicrometerRoutePolicyFactory());
getContext().setMessageHistoryFactory(new MicrometerMessageHistoryFactory());
// Order processing route with metrics
from("kafka:order-events?groupId=order-processor")
.routeId("order-processing-route")
.process(exchange -> {
// Record route entry metrics
exchange.setProperty("route.start.time", System.currentTimeMillis());
exchange.setProperty("route.id", "order-processing-route");
metricsCollector.recordCustomCounter("camel.route.entries",
"Route entry count",
Tags.of(Tag.of("route", "order-processing-route")));
})
.unmarshal().json(JsonLibrary.Jackson, OrderEvent.class)
.process(exchange -> {
OrderEvent event = exchange.getIn().getBody(OrderEvent.class);
// Record message metrics
int messageSize = exchange.getIn().getBody(String.class).getBytes().length;
metricsCollector.recordMessageProcessing("order_event", messageSize, 0, true);
// Record business metrics
if ("ORDER_PLACED".equals(event.getEventType())) {
metricsCollector.recordBusinessMetric("daily.orders", 1.0,
Tags.of(Tag.of("date", LocalDate.now().toString())));
}
})
.choice()
.when(simple("${body.eventType} == 'ORDER_PLACED'"))
.to("direct:processNewOrder")
.when(simple("${body.eventType} == 'ORDER_UPDATED'"))
.to("direct:processOrderUpdate")
.when(simple("${body.eventType} == 'ORDER_CANCELLED'"))
.to("direct:processOrderCancellation")
.otherwise()
.process(exchange -> {
metricsCollector.recordCustomCounter("camel.unknown.events",
"Unknown event types",
Tags.of(Tag.of("route", "order-processing-route")));
})
.log("Unknown event type: ${body.eventType}")
.end()
.process(exchange -> {
// Record route completion metrics
long startTime = exchange.getProperty("route.start.time", Long.class);
long duration = System.currentTimeMillis() - startTime;
String routeId = exchange.getProperty("route.id", String.class);
Timer.builder("camel.route.duration")
.description("Route processing duration")
.tags(Tag.of("route", routeId))
.register(meterRegistry)
.record(duration, TimeUnit.MILLISECONDS);
});
from("direct:processNewOrder")
.routeId("process-new-order")
.process(exchange -> {
exchange.setProperty("processing.start.time", System.currentTimeMillis());
metricsCollector.recordCustomCounter("order.processing.started",
"Order processing started",
Tags.of(Tag.of("type", "new_order")));
})
.to("direct:validateCustomer")
.to("direct:checkInventory")
.to("direct:calculatePricing")
.multicast().parallelProcessing()
.to("direct:updateInventory")
.to("direct:sendNotifications")
.end()
.process(exchange -> {
long startTime = exchange.getProperty("processing.start.time", Long.class);
long duration = System.currentTimeMillis() - startTime;
metricsCollector.recordCustomCounter("order.processing.completed",
"Order processing completed",
Tags.of(Tag.of("type", "new_order")));
Timer.builder("order.processing.duration")
.description("New order processing duration")
.register(meterRegistry)
.record(duration, TimeUnit.MILLISECONDS);
});
from("direct:validateCustomer")
.routeId("validate-customer")
.process(exchange -> {
exchange.setProperty("validation.start.time", System.currentTimeMillis());
metricsCollector.recordCustomCounter("customer.validations.started",
"Customer validations started",
Tags.empty());
})
.to("http://customer-service/validate?httpMethod=POST&connectTimeout=5000&socketTimeout=10000")
.process(exchange -> {
long startTime = exchange.getProperty("validation.start.time", Long.class);
long duration = System.currentTimeMillis() - startTime;
int statusCode = exchange.getIn().getHeader(Exchange.HTTP_RESPONSE_CODE, Integer.class);
boolean success = statusCode >= 200 && statusCode < 300;
metricsCollector.recordCustomCounter("customer.validations.completed",
"Customer validations completed",
Tags.of(Tag.of("status", success ? "success" : "failure")));
Timer.builder("customer.validation.duration")
.description("Customer validation duration")
.tags(Tag.of("status", success ? "success" : "failure"))
.register(meterRegistry)
.record(duration, TimeUnit.MILLISECONDS);
if (!success) {
metricsCollector.recordCustomCounter("customer.validation.errors",
"Customer validation errors",
Tags.of(Tag.of("http_status", String.valueOf(statusCode))));
}
});
from("direct:checkInventory")
.routeId("check-inventory")
.process(exchange -> {
exchange.setProperty("inventory.start.time", System.currentTimeMillis());
OrderEvent event = exchange.getIn().getBody(OrderEvent.class);
int itemCount = event.getItems().size();
metricsCollector.recordCustomCounter("inventory.checks.started",
"Inventory checks started",
Tags.of(Tag.of("item_count", String.valueOf(itemCount))));
DistributionSummary.builder("inventory.check.items")
.description("Number of items in inventory check")
.register(meterRegistry)
.record(itemCount);
})
.to("http://inventory-service/check?httpMethod=POST&connectTimeout=5000&socketTimeout=15000")
.process(exchange -> {
long startTime = exchange.getProperty("inventory.start.time", Long.class);
long duration = System.currentTimeMillis() - startTime;
int statusCode = exchange.getIn().getHeader(Exchange.HTTP_RESPONSE_CODE, Integer.class);
boolean success = statusCode >= 200 && statusCode < 300;
metricsCollector.recordCustomCounter("inventory.checks.completed",
"Inventory checks completed",
Tags.of(Tag.of("status", success ? "success" : "failure")));
Timer.builder("inventory.check.duration")
.description("Inventory check duration")
.tags(Tag.of("status", success ? "success" : "failure"))
.register(meterRegistry)
.record(duration, TimeUnit.MILLISECONDS);
});
// Error handling with metrics
onException(Exception.class)
.handled(true)
.process(exchange -> {
Exception exception = exchange.getException();
String routeId = exchange.getFromRouteId();
metricsCollector.recordCustomCounter("camel.route.errors",
"Route processing errors",
Tags.of(
Tag.of("route", routeId),
Tag.of("exception", exception.getClass().getSimpleName())
));
log.error("Error in route {}: {}", routeId, exception.getMessage());
});
// Metrics publishing route
from("timer:metrics-publisher?period=60000")
.routeId("metrics-publisher")
.process(exchange -> {
// Collect custom business metrics
BusinessMetrics metrics = new BusinessMetrics();
metrics.setTimestamp(Instant.now());
metrics.setDailyOrderCount(getDailyOrderCount());
metrics.setAverageOrderValue(getAverageOrderValue());
metrics.setCustomerSatisfactionScore(getCustomerSatisfactionScore());
metrics.setInventoryTurnover(getInventoryTurnover());
exchange.getIn().setBody(metrics);
})
.marshal().json(JsonLibrary.Jackson)
.to("kafka:business-metrics")
.log("Business metrics published: Daily orders ${body.dailyOrderCount}");
// System health metrics
from("timer:system-health?period=30000")
.routeId("system-health-metrics")
.process(exchange -> {
SystemHealthMetrics health = new SystemHealthMetrics();
health.setTimestamp(Instant.now());
health.setCpuUsage(getCpuUsage());
health.setMemoryUsage(getMemoryUsage());
health.setDiskUsage(getDiskUsage());
health.setActiveConnections(getActiveConnections());
health.setThreadPoolUtilization(getThreadPoolUtilization());
// Record as metrics
metricsCollector.recordBusinessMetric("system.cpu.usage", health.getCpuUsage(), Tags.empty());
metricsCollector.recordBusinessMetric("system.memory.usage", health.getMemoryUsage(), Tags.empty());
metricsCollector.recordBusinessMetric("system.disk.usage", health.getDiskUsage(), Tags.empty());
exchange.getIn().setBody(health);
})
.marshal().json(JsonLibrary.Jackson)
.to("kafka:system-health-metrics")
.log("System health metrics published");
}
private int getDailyOrderCount() {
// Implementation to get daily order count
return 0;
}
private double getAverageOrderValue() {
// Implementation to calculate average order value
return 0.0;
}
private double getCustomerSatisfactionScore() {
// Implementation to get customer satisfaction score
return 0.0;
}
private double getInventoryTurnover() {
// Implementation to calculate inventory turnover
return 0.0;
}
private double getCpuUsage() {
return ManagementFactory.getOperatingSystemMXBean().getProcessCpuLoad() * 100;
}
private double getMemoryUsage() {
MemoryMXBean memoryBean = ManagementFactory.getMemoryMXBean();
long used = memoryBean.getHeapMemoryUsage().getUsed();
long max = memoryBean.getHeapMemoryUsage().getMax();
return ((double) used / max) * 100;
}
private double getDiskUsage() {
File disk = new File("/");
long total = disk.getTotalSpace();
long free = disk.getFreeSpace();
return ((double) (total - free) / total) * 100;
}
}
3. Custom Business Metrics and Dashboards
Advanced metrics for business intelligence:
// Business Metrics Service
@Service
public class BusinessMetricsService {
@Autowired
private MetricsCollector metricsCollector;
@Autowired
private OrderRepository orderRepository;
@Autowired
private CustomerRepository customerRepository;
@Scheduled(fixedRate = 300000) // Every 5 minutes
public void collectBusinessMetrics() {
try {
// Revenue metrics
BigDecimal hourlyRevenue = calculateHourlyRevenue();
metricsCollector.recordBusinessMetric("revenue.hourly",
hourlyRevenue.doubleValue(),
Tags.of(Tag.of("currency", "USD")));
// Customer metrics
int activeCustomers = getActiveCustomerCount();
metricsCollector.recordBusinessMetric("customers.active",
activeCustomers,
Tags.of(Tag.of("period", "current_hour")));
// Order metrics
int completedOrders = getCompletedOrdersCount();
metricsCollector.recordBusinessMetric("orders.completed.hourly",
completedOrders,
Tags.empty());
double averageOrderProcessingTime = getAverageOrderProcessingTime();
metricsCollector.recordBusinessMetric("orders.processing.time.average",
averageOrderProcessingTime,
Tags.of(Tag.of("unit", "seconds")));
// Inventory metrics
int lowStockItems = getLowStockItemsCount();
metricsCollector.recordBusinessMetric("inventory.low_stock_items",
lowStockItems,
Tags.empty());
double inventoryTurnover = calculateInventoryTurnover();
metricsCollector.recordBusinessMetric("inventory.turnover.rate",
inventoryTurnover,
Tags.of(Tag.of("period", "monthly")));
// Customer satisfaction metrics
double satisfactionScore = getCustomerSatisfactionScore();
metricsCollector.recordBusinessMetric("customer.satisfaction.score",
satisfactionScore,
Tags.of(Tag.of("scale", "1-10")));
// Performance metrics
double systemResponseTime = getAverageSystemResponseTime();
metricsCollector.recordBusinessMetric("system.response.time.average",
systemResponseTime,
Tags.of(Tag.of("unit", "milliseconds")));
log.info("Business metrics collected successfully");
} catch (Exception e) {
log.error("Error collecting business metrics", e);
metricsCollector.recordCustomCounter("business.metrics.errors",
"Business metrics collection errors",
Tags.of(Tag.of("error_type", e.getClass().getSimpleName())));
}
}
@Scheduled(fixedRate = 60000) // Every minute
public void collectPerformanceMetrics() {
try {
// Database performance
long dbConnectionPoolSize = getDatabaseConnectionPoolSize();
metricsCollector.recordBusinessMetric("database.connection_pool.size",
dbConnectionPoolSize,
Tags.empty());
long dbActiveConnections = getDatabaseActiveConnections();
metricsCollector.recordBusinessMetric("database.connections.active",
dbActiveConnections,
Tags.empty());
double dbQueryTime = getAverageDatabaseQueryTime();
metricsCollector.recordBusinessMetric("database.query.time.average",
dbQueryTime,
Tags.of(Tag.of("unit", "milliseconds")));
// Cache performance
double cacheHitRate = getCacheHitRate();
metricsCollector.recordBusinessMetric("cache.hit.rate",
cacheHitRate,
Tags.of(Tag.of("unit", "percentage")));
int cacheSize = getCurrentCacheSize();
metricsCollector.recordBusinessMetric("cache.size",
cacheSize,
Tags.of(Tag.of("unit", "entries")));
// Queue metrics
int messageQueueDepth = getMessageQueueDepth();
metricsCollector.recordBusinessMetric("message_queue.depth",
messageQueueDepth,
Tags.of(Tag.of("queue", "order_processing")));
double messageProcessingRate = getMessageProcessingRate();
metricsCollector.recordBusinessMetric("message_queue.processing.rate",
messageProcessingRate,
Tags.of(Tag.of("unit", "messages_per_second")));
log.debug("Performance metrics collected successfully");
} catch (Exception e) {
log.error("Error collecting performance metrics", e);
metricsCollector.recordCustomCounter("performance.metrics.errors",
"Performance metrics collection errors",
Tags.of(Tag.of("error_type", e.getClass().getSimpleName())));
}
}
// Metric calculation methods
private BigDecimal calculateHourlyRevenue() {
LocalDateTime oneHourAgo = LocalDateTime.now().minusHours(1);
return orderRepository.getTotalRevenueAfter(oneHourAgo);
}
private int getActiveCustomerCount() {
LocalDateTime oneHourAgo = LocalDateTime.now().minusHours(1);
return customerRepository.getActiveCustomerCountAfter(oneHourAgo);
}
private int getCompletedOrdersCount() {
LocalDateTime oneHourAgo = LocalDateTime.now().minusHours(1);
return orderRepository.getCompletedOrderCountAfter(oneHourAgo);
}
private double getAverageOrderProcessingTime() {
return orderRepository.getAverageProcessingTimeInLastHour();
}
private int getLowStockItemsCount() {
return inventoryRepository.getLowStockItemsCount();
}
private double calculateInventoryTurnover() {
// Calculate monthly inventory turnover
return inventoryRepository.calculateMonthlyTurnover();
}
private double getCustomerSatisfactionScore() {
// Get recent customer satisfaction score
return customerFeedbackRepository.getAverageSatisfactionScore();
}
private double getAverageSystemResponseTime() {
// Calculate from recent response time metrics
return performanceRepository.getAverageResponseTime();
}
}
// Metrics Dashboard Controller
@RestController
@RequestMapping("/metrics")
public class MetricsDashboardController {
@Autowired
private MeterRegistry meterRegistry;
@Autowired
private BusinessMetricsService businessMetricsService;
@GetMapping("/prometheus")
public String prometheusMetrics() {
return ((PrometheusMeterRegistry) meterRegistry).scrape();
}
@GetMapping("/dashboard")
public ResponseEntity<DashboardMetrics> getDashboardMetrics() {
DashboardMetrics dashboard = new DashboardMetrics();
// System metrics
dashboard.setSystemMetrics(getSystemMetrics());
// Business metrics
dashboard.setBusinessMetrics(getBusinessMetrics());
// Performance metrics
dashboard.setPerformanceMetrics(getPerformanceMetrics());
// Alert metrics
dashboard.setAlertMetrics(getAlertMetrics());
return ResponseEntity.ok(dashboard);
}
@GetMapping("/alerts")
public ResponseEntity<List<MetricAlert>> getMetricAlerts() {
List<MetricAlert> alerts = new ArrayList<>();
// Check for performance alerts
double avgResponseTime = getMetricValue("http.request.duration");
if (avgResponseTime > 1000) { // More than 1 second
alerts.add(new MetricAlert("HIGH_RESPONSE_TIME",
"Average response time is " + avgResponseTime + "ms",
AlertSeverity.WARNING));
}
// Check for error rate alerts
double errorRate = getMetricValue("http.requests.errors.rate");
if (errorRate > 0.05) { // More than 5% error rate
alerts.add(new MetricAlert("HIGH_ERROR_RATE",
"Error rate is " + (errorRate * 100) + "%",
AlertSeverity.CRITICAL));
}
// Check for resource utilization alerts
double memoryUsage = getMetricValue("jvm.memory.usage");
if (memoryUsage > 0.9) { // More than 90% memory usage
alerts.add(new MetricAlert("HIGH_MEMORY_USAGE",
"Memory usage is " + (memoryUsage * 100) + "%",
AlertSeverity.WARNING));
}
return ResponseEntity.ok(alerts);
}
private SystemMetrics getSystemMetrics() {
SystemMetrics metrics = new SystemMetrics();
metrics.setCpuUsage(getMetricValue("system.cpu.usage"));
metrics.setMemoryUsage(getMetricValue("jvm.memory.used"));
metrics.setDiskUsage(getMetricValue("disk.usage"));
metrics.setActiveThreads(getMetricValue("jvm.threads.live"));
return metrics;
}
private BusinessMetrics getBusinessMetrics() {
BusinessMetrics metrics = new BusinessMetrics();
metrics.setHourlyRevenue(getMetricValue("business.revenue.hourly"));
metrics.setActiveCustomers(getMetricValue("business.customers.active"));
metrics.setCompletedOrders(getMetricValue("business.orders.completed.hourly"));
metrics.setCustomerSatisfaction(getMetricValue("business.customer.satisfaction.score"));
return metrics;
}
private PerformanceMetrics getPerformanceMetrics() {
PerformanceMetrics metrics = new PerformanceMetrics();
metrics.setAverageResponseTime(getMetricValue("http.request.duration"));
metrics.setThroughput(getMetricValue("http.requests.rate"));
metrics.setErrorRate(getMetricValue("http.requests.errors.rate"));
metrics.setDatabaseQueryTime(getMetricValue("database.query.time.average"));
return metrics;
}
private AlertMetrics getAlertMetrics() {
AlertMetrics metrics = new AlertMetrics();
metrics.setActiveAlerts(getActiveAlertsCount());
metrics.setCriticalAlerts(getCriticalAlertsCount());
metrics.setResolvedAlerts(getResolvedAlertsCount());
return metrics;
}
private double getMetricValue(String metricName) {
try {
return meterRegistry.get(metricName).gauge().value();
} catch (Exception e) {
log.warn("Could not retrieve metric value for: {}", metricName);
return 0.0;
}
}
}
Best Practices
1. Metric Design and Definition
- Use clear, consistent naming conventions for all metrics
- Include appropriate units and descriptions in metric definitions
- Design metrics with proper cardinality to avoid high-cardinality issues
- Use meaningful tags and labels for dimensional analysis
- Implement metric schemas and documentation for consistency
2. Collection and Storage Strategy
- Implement efficient metric collection with minimal performance impact
- Use appropriate sampling strategies for high-volume metrics
- Design retention policies based on metric importance and usage patterns
- Implement metric aggregation and pre-computation for frequently accessed data
- Use specialized time-series databases for optimal storage and query performance
3. Performance and Scalability
- Monitor the performance impact of metrics collection on application performance
- Implement asynchronous metric collection where possible
- Use metric batching and buffering for improved efficiency
- Design for horizontal scaling of metrics collection infrastructure
- Implement circuit breakers for metrics collection failures
4. Alerting and Monitoring
- Define meaningful thresholds and alerting rules for critical metrics
- Implement escalation policies for different alert severities
- Use statistical analysis and machine learning for anomaly detection
- Correlate multiple metrics for comprehensive alerting
- Implement alert fatigue prevention through intelligent filtering
5. Analysis and Visualization
- Create meaningful dashboards for different stakeholder audiences
- Implement drill-down capabilities for detailed analysis
- Use appropriate visualization types for different metric characteristics
- Implement real-time and historical views for metrics analysis
- Design for mobile and cross-platform dashboard access
6. Governance and Compliance
- Implement access controls for sensitive metric data
- Maintain audit trails for metric data access and modifications
- Ensure compliance with data protection regulations for metric data
- Implement data lifecycle management for metric retention
- Document metric definitions and business impact for governance
Metrics Collection is essential for maintaining operational visibility, performance optimization, and data-driven decision making in complex distributed enterprise integration architectures, providing the quantitative foundation for understanding system behavior and business outcomes.
← Back to All Patterns