This site is in English. Use your browser's built-in translate feature to read it in your language.

Health Checks

Overview

Health Checks is a proactive monitoring pattern in enterprise integration architectures that systematically verifies the operational status, availability, and functional capability of system components, services, and dependencies through automated diagnostic procedures. Like a comprehensive medical examination that assesses various aspects of patient health from vital signs to organ function, health checks provide continuous assessment of system wellness by evaluating critical indicators of system health, performance, and readiness to serve requests. This pattern is essential for ensuring system reliability, enabling automated failure detection, supporting load balancing decisions, facilitating graceful degradation, and maintaining operational excellence in complex distributed environments where manual monitoring is impractical.

Theoretical Foundation

Health Checks is grounded in systems monitoring theory, fault detection principles, availability engineering, and proactive maintenance strategies. It incorporates concepts from heartbeat monitoring, synthetic transaction testing, dependency verification, and operational readiness assessment to provide a comprehensive framework for automated system health assessment. The pattern addresses the fundamental need for continuous, automated verification of system operational capability and the early detection of issues that could impact system availability, performance, or functionality.

Core Principles

1. Multi-Level Health Assessment

Comprehensive evaluation of system health at different layers and granularities: - Service-level health - overall service operational status and readiness - Component-level health - individual component functionality and performance - Dependency health - external service and infrastructure dependency status - Business function health - end-to-end business capability verification

2. Automated Diagnostic Procedures

Systematic execution of health verification procedures: - Synthetic transactions - automated execution of representative business transactions - Connectivity testing - verification of network connectivity and communication paths - Resource availability - assessment of critical system resources and capacity - Data integrity verification - validation of data consistency and accessibility

3. Continuous Monitoring and Assessment

Regular, ongoing health evaluation and status reporting: - Periodic health checks - scheduled execution of health verification procedures - Real-time health monitoring - continuous assessment of system health indicators - Health status aggregation - consolidation of multiple health indicators into overall status - Health history tracking - maintenance of health status history for trend analysis

4. Actionable Health Information

Provision of meaningful, actionable health status information: - Health status reporting - clear communication of current system health state - Failure diagnosis - detailed information about detected health issues - Recovery guidance - recommendations for addressing identified health problems - Impact assessment - evaluation of health issues' impact on system operations

Why Health Checks are Essential in Integration Architecture

1. Proactive Issue Detection

In complex distributed systems, health checks provide: - Early failure detection - identification of issues before they impact users - Cascade failure prevention - detection of dependency failures before they spread - Performance degradation alerts - early warning of performance issues - Capacity threshold monitoring - alerting when resource limits are approached

2. Operational Automation Support

Supporting automated operational procedures and decisions: - Load balancer integration - informing load balancing decisions about service availability - Auto-scaling triggers - providing data for automated scaling decisions - Circuit breaker coordination - supporting circuit breaker pattern implementation - Service mesh integration - enabling intelligent traffic routing and failure handling

3. Service Level Agreement (SLA) Management

Ensuring compliance with service level commitments: - Availability monitoring - tracking service availability against SLA commitments - Performance verification - ensuring response times meet SLA requirements - Quality assurance - monitoring service quality characteristics - Compliance reporting - generating reports for SLA compliance verification

4. Development and Operations Integration

Supporting DevOps practices and continuous delivery: - Deployment verification - validating successful deployments through health checks - Rollback triggers - automatically triggering rollbacks when health checks fail - Environment validation - verifying environment readiness before deployments - Testing automation - incorporating health checks into automated testing pipelines

Benefits in Integration Contexts

1. Technical Advantages

2. Operational Benefits

3. Integration Enablement

4. Business Value

Integration Architecture Applications

1. Microservices Health Monitoring

Comprehensive health checks for microservices architecture:

// Health Check Configuration
@Configuration
@EnableConfigurationProperties(HealthCheckProperties.class)
public class HealthCheckConfiguration {

    @Bean
    public HealthIndicatorRegistry healthIndicatorRegistry() {
        return new DefaultHealthIndicatorRegistry();
    }

    @Bean
    public HealthAggregator healthAggregator() {
        return new OrderedHealthAggregator();
    }

    @Bean
    public CompositeHealthIndicator compositeHealthIndicator(
            HealthAggregator healthAggregator,
            HealthIndicatorRegistry healthIndicatorRegistry) {
        return new CompositeHealthIndicator(healthAggregator, healthIndicatorRegistry);
    }

    @Bean
    public HealthCheckManager healthCheckManager() {
        return new HealthCheckManager();
    }
}

// Custom Health Indicators
@Component
public class DatabaseHealthIndicator implements HealthIndicator {

    @Autowired
    private DataSource dataSource;

    @Override
    public Health health() {
        try {
            // Test database connectivity
            try (Connection connection = dataSource.getConnection()) {
                // Execute a simple query to verify database functionality
                try (PreparedStatement statement = connection.prepareStatement("SELECT 1")) {
                    ResultSet resultSet = statement.executeQuery();

                    if (resultSet.next()) {
                        long startTime = System.currentTimeMillis();

                        // Test database performance
                        try (PreparedStatement perfStatement = connection.prepareStatement(
                                "SELECT COUNT(*) FROM orders WHERE created_date > ?")) {
                            perfStatement.setTimestamp(1, Timestamp.valueOf(LocalDateTime.now().minusMinutes(5)));
                            ResultSet perfResult = perfStatement.executeQuery();

                            long queryTime = System.currentTimeMillis() - startTime;

                            if (perfResult.next()) {
                                int recentOrders = perfResult.getInt(1);

                                return Health.up()
                                    .withDetail("database", "PostgreSQL")
                                    .withDetail("connection_pool_active", getActiveConnections())
                                    .withDetail("connection_pool_idle", getIdleConnections())
                                    .withDetail("query_time_ms", queryTime)
                                    .withDetail("recent_orders", recentOrders)
                                    .withDetail("last_check", LocalDateTime.now())
                                    .build();
                            }
                        }
                    }
                }
            }

            return Health.down()
                .withDetail("error", "Database query failed")
                .withDetail("last_check", LocalDateTime.now())
                .build();

        } catch (Exception e) {
            return Health.down()
                .withDetail("error", e.getMessage())
                .withDetail("exception", e.getClass().getSimpleName())
                .withDetail("last_check", LocalDateTime.now())
                .build();
        }
    }

    private int getActiveConnections() {
        // Implementation to get active connection count
        try {
            if (dataSource instanceof HikariDataSource) {
                return ((HikariDataSource) dataSource).getHikariPoolMXBean().getActiveConnections();
            }
            return -1;
        } catch (Exception e) {
            return -1;
        }
    }

    private int getIdleConnections() {
        // Implementation to get idle connection count
        try {
            if (dataSource instanceof HikariDataSource) {
                return ((HikariDataSource) dataSource).getHikariPoolMXBean().getIdleConnections();
            }
            return -1;
        } catch (Exception e) {
            return -1;
        }
    }
}

@Component
public class ExternalServiceHealthIndicator implements HealthIndicator {

    @Autowired
    private RestTemplate restTemplate;

    @Value("${external.inventory.service.url}")
    private String inventoryServiceUrl;

    @Value("${external.payment.service.url}")
    private String paymentServiceUrl;

    @Override
    public Health health() {
        Health.Builder health = Health.up();

        // Check inventory service
        ServiceHealthStatus inventoryHealth = checkService("inventory", inventoryServiceUrl + "/health");
        health.withDetail("inventory_service", inventoryHealth);

        // Check payment service
        ServiceHealthStatus paymentHealth = checkService("payment", paymentServiceUrl + "/health");
        health.withDetail("payment_service", paymentHealth);

        // Check overall external service health
        boolean allServicesHealthy = inventoryHealth.isHealthy() && paymentHealth.isHealthy();

        if (allServicesHealthy) {
            health.withDetail("external_services_status", "ALL_HEALTHY");
        } else {
            health = Health.down();
            health.withDetail("external_services_status", "SOME_UNHEALTHY");
        }

        health.withDetail("last_check", LocalDateTime.now());

        return health.build();
    }

    private ServiceHealthStatus checkService(String serviceName, String healthUrl) {
        try {
            long startTime = System.currentTimeMillis();

            ResponseEntity<Map> response = restTemplate.getForEntity(healthUrl, Map.class);

            long responseTime = System.currentTimeMillis() - startTime;

            boolean isHealthy = response.getStatusCode().is2xxSuccessful();

            ServiceHealthStatus status = new ServiceHealthStatus();
            status.setServiceName(serviceName);
            status.setHealthy(isHealthy);
            status.setResponseTime(responseTime);
            status.setStatusCode(response.getStatusCode().value());
            status.setLastCheck(LocalDateTime.now());

            if (response.getBody() != null) {
                status.setDetails(response.getBody());
            }

            return status;

        } catch (Exception e) {
            ServiceHealthStatus status = new ServiceHealthStatus();
            status.setServiceName(serviceName);
            status.setHealthy(false);
            status.setError(e.getMessage());
            status.setLastCheck(LocalDateTime.now());

            return status;
        }
    }
}

@Component
public class CacheHealthIndicator implements HealthIndicator {

    @Autowired
    private RedisTemplate<String, Object> redisTemplate;

    @Override
    public Health health() {
        try {
            // Test Redis connectivity and performance
            long startTime = System.currentTimeMillis();

            String testKey = "health-check-" + System.currentTimeMillis();
            String testValue = "test-value";

            // Test SET operation
            redisTemplate.opsForValue().set(testKey, testValue, Duration.ofMinutes(1));

            // Test GET operation
            String retrievedValue = (String) redisTemplate.opsForValue().get(testKey);

            // Test DELETE operation
            redisTemplate.delete(testKey);

            long operationTime = System.currentTimeMillis() - startTime;

            if (testValue.equals(retrievedValue)) {
                // Get Redis info
                Properties redisInfo = redisTemplate.getConnectionFactory().getConnection().info();

                return Health.up()
                    .withDetail("cache_type", "Redis")
                    .withDetail("operation_time_ms", operationTime)
                    .withDetail("redis_version", redisInfo.getProperty("redis_version"))
                    .withDetail("connected_clients", redisInfo.getProperty("connected_clients"))
                    .withDetail("used_memory_human", redisInfo.getProperty("used_memory_human"))
                    .withDetail("keyspace_hits", redisInfo.getProperty("keyspace_hits"))
                    .withDetail("keyspace_misses", redisInfo.getProperty("keyspace_misses"))
                    .withDetail("last_check", LocalDateTime.now())
                    .build();
            } else {
                return Health.down()
                    .withDetail("error", "Cache operation verification failed")
                    .withDetail("expected", testValue)
                    .withDetail("actual", retrievedValue)
                    .withDetail("last_check", LocalDateTime.now())
                    .build();
            }

        } catch (Exception e) {
            return Health.down()
                .withDetail("error", e.getMessage())
                .withDetail("exception", e.getClass().getSimpleName())
                .withDetail("last_check", LocalDateTime.now())
                .build();
        }
    }
}

@Component
public class BusinessFunctionHealthIndicator implements HealthIndicator {

    @Autowired
    private OrderService orderService;

    @Autowired
    private InventoryService inventoryService;

    @Autowired
    private PaymentService paymentService;

    @Override
    public Health health() {
        Health.Builder health = Health.up();
        Map<String, Object> healthDetails = new HashMap<>();

        // Test order creation functionality
        boolean orderCreationHealthy = testOrderCreationFunction();
        healthDetails.put("order_creation", orderCreationHealthy ? "HEALTHY" : "UNHEALTHY");

        // Test inventory check functionality
        boolean inventoryHealthy = testInventoryFunction();
        healthDetails.put("inventory_check", inventoryHealthy ? "HEALTHY" : "UNHEALTHY");

        // Test payment processing functionality
        boolean paymentHealthy = testPaymentFunction();
        healthDetails.put("payment_processing", paymentHealthy ? "HEALTHY" : "UNHEALTHY");

        // Overall business function health
        boolean allFunctionsHealthy = orderCreationHealthy && inventoryHealthy && paymentHealthy;

        if (allFunctionsHealthy) {
            healthDetails.put("business_functions_status", "ALL_HEALTHY");
        } else {
            health = Health.down();
            healthDetails.put("business_functions_status", "SOME_UNHEALTHY");
        }

        healthDetails.put("last_check", LocalDateTime.now());

        return health.withDetails(healthDetails).build();
    }

    private boolean testOrderCreationFunction() {
        try {
            // Create a test order validation request
            CreateOrderRequest testRequest = createTestOrderRequest();

            // Validate the order creation logic (without actually creating the order)
            OrderValidationResult result = orderService.validateOrder(testRequest);

            return result.isValid();

        } catch (Exception e) {
            log.warn("Order creation health check failed", e);
            return false;
        }
    }

    private boolean testInventoryFunction() {
        try {
            // Test inventory availability check
            List<OrderItem> testItems = createTestOrderItems();
            InventoryCheckResult result = inventoryService.checkAvailability(testItems);

            return result.isSuccessful();

        } catch (Exception e) {
            log.warn("Inventory health check failed", e);
            return false;
        }
    }

    private boolean testPaymentFunction() {
        try {
            // Test payment method validation
            PaymentMethod testPaymentMethod = createTestPaymentMethod();
            PaymentValidationResult result = paymentService.validatePaymentMethod(testPaymentMethod);

            return result.isValid();

        } catch (Exception e) {
            log.warn("Payment health check failed", e);
            return false;
        }
    }

    private CreateOrderRequest createTestOrderRequest() {
        CreateOrderRequest request = new CreateOrderRequest();
        request.setCustomerId("health-check-customer");
        request.setItems(createTestOrderItems());
        request.setShippingAddress(createTestShippingAddress());
        request.setPaymentMethod(createTestPaymentMethod());
        return request;
    }

    private List<OrderItem> createTestOrderItems() {
        OrderItem item = new OrderItem();
        item.setProductId("health-check-product");
        item.setQuantity(1);
        return Arrays.asList(item);
    }

    private ShippingAddress createTestShippingAddress() {
        ShippingAddress address = new ShippingAddress();
        address.setStreet("123 Health Check St");
        address.setCity("Test City");
        address.setZipCode("12345");
        return address;
    }

    private PaymentMethod createTestPaymentMethod() {
        PaymentMethod method = new PaymentMethod();
        method.setType("CREDIT_CARD");
        method.setCardNumber("****-****-****-1234");
        return method;
    }
}

2. Apache Camel Route Health Monitoring

Health checks for Camel integration routes:

@Component
public class CamelHealthCheckRoute extends RouteBuilder {

    @Autowired
    private HealthCheckManager healthCheckManager;

    @Override
    public void configure() throws Exception {

        // Enable Camel health checks
        getContext().setUseMDCLogging(true);
        getContext().setMessageHistoryFactory(new MicrometerMessageHistoryFactory());

        // Health check endpoint
        from("timer://camel-health-check?period=30000")
            .routeId("camel-health-check")
            .process(exchange -> {
                CamelHealthStatus healthStatus = new CamelHealthStatus();
                healthStatus.setTimestamp(Instant.now());

                // Check route status
                List<Route> routes = getContext().getRoutes();
                int totalRoutes = routes.size();
                int startedRoutes = 0;
                List<String> stoppedRoutes = new ArrayList<>();

                for (Route route : routes) {
                    if (route.getRouteContext().getStatus().isStarted()) {
                        startedRoutes++;
                    } else {
                        stoppedRoutes.add(route.getId());
                    }
                }

                healthStatus.setTotalRoutes(totalRoutes);
                healthStatus.setStartedRoutes(startedRoutes);
                healthStatus.setStoppedRoutes(stoppedRoutes);

                // Check component status
                Map<String, Component> components = getContext().getComponentMap();
                int totalComponents = components.size();
                int activeComponents = 0;
                List<String> inactiveComponents = new ArrayList<>();

                for (Map.Entry<String, Component> entry : components.entrySet()) {
                    try {
                        ServiceStatus status = entry.getValue().getStatus();
                        if (status.isStarted()) {
                            activeComponents++;
                        } else {
                            inactiveComponents.add(entry.getKey());
                        }
                    } catch (Exception e) {
                        inactiveComponents.add(entry.getKey());
                    }
                }

                healthStatus.setTotalComponents(totalComponents);
                healthStatus.setActiveComponents(activeComponents);
                healthStatus.setInactiveComponents(inactiveComponents);

                // Check endpoint connectivity
                List<EndpointHealthStatus> endpointStatuses = checkEndpoints();
                healthStatus.setEndpointStatuses(endpointStatuses);

                // Overall health determination
                boolean isHealthy = startedRoutes == totalRoutes && 
                                  activeComponents == totalComponents &&
                                  endpointStatuses.stream().allMatch(EndpointHealthStatus::isHealthy);

                healthStatus.setOverallHealthy(isHealthy);

                exchange.getIn().setBody(healthStatus);

                log.info("Camel health check completed - Healthy: {}, Routes: {}/{}, Components: {}/{}", 
                        isHealthy, startedRoutes, totalRoutes, activeComponents, totalComponents);
            })
            .choice()
                .when(simple("${body.overallHealthy} == false"))
                    .to("direct:handleCamelHealthIssues")
                .otherwise()
                    .log("All Camel components healthy")
            .end()
            .marshal().json(JsonLibrary.Jackson)
            .to("kafka:camel-health-status");

        from("direct:handleCamelHealthIssues")
            .routeId("handle-camel-health-issues")
            .log("Camel health issues detected")
            .process(exchange -> {
                CamelHealthStatus healthStatus = exchange.getIn().getBody(CamelHealthStatus.class);

                CamelHealthAlert alert = new CamelHealthAlert();
                alert.setTimestamp(Instant.now());
                alert.setSeverity(AlertSeverity.WARNING);
                alert.setMessage("Camel health issues detected");

                List<String> issues = new ArrayList<>();

                if (!healthStatus.getStoppedRoutes().isEmpty()) {
                    issues.add("Stopped routes: " + String.join(", ", healthStatus.getStoppedRoutes()));
                }

                if (!healthStatus.getInactiveComponents().isEmpty()) {
                    issues.add("Inactive components: " + String.join(", ", healthStatus.getInactiveComponents()));
                }

                List<EndpointHealthStatus> unhealthyEndpoints = healthStatus.getEndpointStatuses()
                    .stream()
                    .filter(status -> !status.isHealthy())
                    .collect(Collectors.toList());

                if (!unhealthyEndpoints.isEmpty()) {
                    issues.add("Unhealthy endpoints: " + unhealthyEndpoints.stream()
                        .map(EndpointHealthStatus::getEndpointUri)
                        .collect(Collectors.joining(", ")));
                }

                alert.setIssues(issues);

                exchange.getIn().setBody(alert);
            })
            .marshal().json(JsonLibrary.Jackson)
            .to("kafka:camel-health-alerts")
            .log("Camel health alert sent: ${body}");

        // Route-specific health checks
        from("timer://route-health-check?period=60000")
            .routeId("route-health-check")
            .process(exchange -> {
                List<RouteHealthStatus> routeStatuses = new ArrayList<>();

                for (Route route : getContext().getRoutes()) {
                    RouteHealthStatus status = new RouteHealthStatus();
                    status.setRouteId(route.getId());
                    status.setStatus(route.getRouteContext().getStatus().name());

                    try {
                        // Get route statistics
                        MBeanServer server = ManagementFactory.getPlatformMBeanServer();
                        ObjectName objectName = new ObjectName(
                            "org.apache.camel:context=" + getContext().getManagementName() + 
                            ",type=routes,name=\"" + route.getId() + "\"");

                        if (server.isRegistered(objectName)) {
                            Long exchangesTotal = (Long) server.getAttribute(objectName, "ExchangesTotal");
                            Long exchangesCompleted = (Long) server.getAttribute(objectName, "ExchangesCompleted");
                            Long exchangesFailed = (Long) server.getAttribute(objectName, "ExchangesFailed");
                            Long meanProcessingTime = (Long) server.getAttribute(objectName, "MeanProcessingTime");

                            status.setExchangesTotal(exchangesTotal);
                            status.setExchangesCompleted(exchangesCompleted);
                            status.setExchangesFailed(exchangesFailed);
                            status.setMeanProcessingTime(meanProcessingTime);

                            // Calculate error rate
                            double errorRate = exchangesTotal > 0 ? 
                                (double) exchangesFailed / exchangesTotal : 0.0;
                            status.setErrorRate(errorRate);

                            // Determine health based on error rate and processing time
                            boolean isHealthy = errorRate < 0.05 && meanProcessingTime < 5000;
                            status.setHealthy(isHealthy);
                        } else {
                            status.setHealthy(false);
                            status.setError("Route statistics not available");
                        }

                    } catch (Exception e) {
                        status.setHealthy(false);
                        status.setError("Error retrieving route statistics: " + e.getMessage());
                    }

                    routeStatuses.add(status);
                }

                exchange.getIn().setBody(routeStatuses);
            })
            .marshal().json(JsonLibrary.Jackson)
            .to("kafka:route-health-status")
            .log("Route health status published for ${body.size} routes");

        // Message queue health check
        from("timer://queue-health-check?period=45000")
            .routeId("queue-health-check")
            .process(exchange -> {
                List<QueueHealthStatus> queueStatuses = new ArrayList<>();

                // Check Kafka topic health
                queueStatuses.add(checkKafkaTopicHealth("order-events"));
                queueStatuses.add(checkKafkaTopicHealth("inventory-updates"));
                queueStatuses.add(checkKafkaTopicHealth("payment-notifications"));

                exchange.getIn().setBody(queueStatuses);
            })
            .marshal().json(JsonLibrary.Jackson)
            .to("kafka:queue-health-status");

        // Endpoint connectivity health check
        from("timer://endpoint-health-check?period=120000")
            .routeId("endpoint-health-check")
            .process(exchange -> {
                List<EndpointHealthStatus> endpointStatuses = checkEndpoints();
                exchange.getIn().setBody(endpointStatuses);
            })
            .marshal().json(JsonLibrary.Jackson)
            .to("kafka:endpoint-health-status");
    }

    private List<EndpointHealthStatus> checkEndpoints() {
        List<EndpointHealthStatus> statuses = new ArrayList<>();

        // Check HTTP endpoints
        statuses.add(checkHttpEndpoint("customer-service", "http://customer-service:8080/health"));
        statuses.add(checkHttpEndpoint("inventory-service", "http://inventory-service:8080/health"));
        statuses.add(checkHttpEndpoint("payment-service", "http://payment-service:8080/health"));

        // Check database endpoints
        statuses.add(checkDatabaseEndpoint("postgresql", "jdbc:postgresql://postgres:5432/orders"));

        // Check cache endpoints
        statuses.add(checkCacheEndpoint("redis", "redis://redis:6379"));

        return statuses;
    }

    private EndpointHealthStatus checkHttpEndpoint(String serviceName, String url) {
        EndpointHealthStatus status = new EndpointHealthStatus();
        status.setEndpointName(serviceName);
        status.setEndpointUri(url);
        status.setEndpointType("HTTP");

        try {
            long startTime = System.currentTimeMillis();

            HttpClient client = HttpClient.newHttpClient();
            HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(url))
                .timeout(Duration.ofSeconds(10))
                .build();

            HttpResponse<String> response = client.send(request, 
                HttpResponse.BodyHandlers.ofString());

            long responseTime = System.currentTimeMillis() - startTime;

            status.setHealthy(response.statusCode() >= 200 && response.statusCode() < 300);
            status.setResponseTime(responseTime);
            status.setStatusCode(response.statusCode());
            status.setLastCheck(LocalDateTime.now());

            if (!status.isHealthy()) {
                status.setError("HTTP " + response.statusCode());
            }

        } catch (Exception e) {
            status.setHealthy(false);
            status.setError(e.getMessage());
            status.setLastCheck(LocalDateTime.now());
        }

        return status;
    }

    private EndpointHealthStatus checkDatabaseEndpoint(String dbName, String jdbcUrl) {
        EndpointHealthStatus status = new EndpointHealthStatus();
        status.setEndpointName(dbName);
        status.setEndpointUri(jdbcUrl);
        status.setEndpointType("DATABASE");

        try {
            long startTime = System.currentTimeMillis();

            try (Connection connection = DriverManager.getConnection(jdbcUrl)) {
                try (PreparedStatement statement = connection.prepareStatement("SELECT 1")) {
                    statement.executeQuery();
                }
            }

            long responseTime = System.currentTimeMillis() - startTime;

            status.setHealthy(true);
            status.setResponseTime(responseTime);
            status.setLastCheck(LocalDateTime.now());

        } catch (Exception e) {
            status.setHealthy(false);
            status.setError(e.getMessage());
            status.setLastCheck(LocalDateTime.now());
        }

        return status;
    }

    private EndpointHealthStatus checkCacheEndpoint(String cacheName, String redisUrl) {
        EndpointHealthStatus status = new EndpointHealthStatus();
        status.setEndpointName(cacheName);
        status.setEndpointUri(redisUrl);
        status.setEndpointType("CACHE");

        try {
            long startTime = System.currentTimeMillis();

            // Simple Redis connectivity check
            Jedis jedis = new Jedis(URI.create(redisUrl));
            String response = jedis.ping();
            jedis.close();

            long responseTime = System.currentTimeMillis() - startTime;

            status.setHealthy("PONG".equals(response));
            status.setResponseTime(responseTime);
            status.setLastCheck(LocalDateTime.now());

            if (!status.isHealthy()) {
                status.setError("Unexpected ping response: " + response);
            }

        } catch (Exception e) {
            status.setHealthy(false);
            status.setError(e.getMessage());
            status.setLastCheck(LocalDateTime.now());
        }

        return status;
    }

    private QueueHealthStatus checkKafkaTopicHealth(String topicName) {
        QueueHealthStatus status = new QueueHealthStatus();
        status.setQueueName(topicName);
        status.setQueueType("KAFKA_TOPIC");

        try {
            // Check if topic exists and get partition information
            Properties props = new Properties();
            props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka:9092");

            try (AdminClient adminClient = AdminClient.create(props)) {
                DescribeTopicsResult topicsResult = adminClient.describeTopics(Arrays.asList(topicName));
                TopicDescription description = topicsResult.values().get(topicName).get(5, TimeUnit.SECONDS);

                status.setHealthy(true);
                status.setPartitionCount(description.partitions().size());
                status.setLastCheck(LocalDateTime.now());
            }

        } catch (Exception e) {
            status.setHealthy(false);
            status.setError(e.getMessage());
            status.setLastCheck(LocalDateTime.now());
        }

        return status;
    }
}

3. Health Check Management and Aggregation

Centralized health check management and reporting:

// Health Check Manager
@Service
public class HealthCheckManager {

    @Autowired
    private HealthIndicatorRegistry healthIndicatorRegistry;

    @Autowired
    private HealthAggregator healthAggregator;

    private final Map<String, HealthCheckResult> healthCheckHistory = new ConcurrentHashMap<>();
    private final ScheduledExecutorService executorService = Executors.newScheduledThreadPool(5);

    @PostConstruct
    public void initializeHealthChecks() {
        // Schedule periodic health checks
        executorService.scheduleAtFixedRate(this::performHealthChecks, 0, 30, TimeUnit.SECONDS);
        executorService.scheduleAtFixedRate(this::publishHealthStatus, 10, 60, TimeUnit.SECONDS);
        executorService.scheduleAtFixedRate(this::cleanupHealthHistory, 0, 1, TimeUnit.HOURS);
    }

    public SystemHealthStatus getSystemHealth() {
        SystemHealthStatus systemHealth = new SystemHealthStatus();
        systemHealth.setTimestamp(Instant.now());

        Map<String, Health> individualHealths = new HashMap<>();

        // Collect health from all registered indicators
        for (Map.Entry<String, HealthIndicator> entry : healthIndicatorRegistry.getAll().entrySet()) {
            try {
                Health health = entry.getValue().health();
                individualHealths.put(entry.getKey(), health);
            } catch (Exception e) {
                Health errorHealth = Health.down()
                    .withDetail("error", e.getMessage())
                    .withDetail("exception", e.getClass().getSimpleName())
                    .build();
                individualHealths.put(entry.getKey(), errorHealth);
            }
        }

        // Aggregate overall health
        Health overallHealth = healthAggregator.aggregate(individualHealths);

        systemHealth.setOverallStatus(overallHealth.getStatus());
        systemHealth.setIndividualHealths(individualHealths);
        systemHealth.setHealthScore(calculateHealthScore(individualHealths));

        // Add trending information
        systemHealth.setHealthTrend(calculateHealthTrend());
        systemHealth.setIssuesSummary(summarizeHealthIssues(individualHealths));

        return systemHealth;
    }

    public List<HealthAlert> getHealthAlerts() {
        List<HealthAlert> alerts = new ArrayList<>();
        SystemHealthStatus currentHealth = getSystemHealth();

        // Check for critical health issues
        for (Map.Entry<String, Health> entry : currentHealth.getIndividualHealths().entrySet()) {
            if (entry.getValue().getStatus() == Status.DOWN) {
                HealthAlert alert = new HealthAlert();
                alert.setAlertId(UUID.randomUUID().toString());
                alert.setComponent(entry.getKey());
                alert.setSeverity(AlertSeverity.CRITICAL);
                alert.setTitle("Component Health Check Failed");
                alert.setMessage("Health check failed for component: " + entry.getKey());
                alert.setTimestamp(Instant.now());
                alert.setDetails(entry.getValue().getDetails());

                alerts.add(alert);
            }
        }

        // Check for performance degradation
        if (currentHealth.getHealthScore() < 0.8) {
            HealthAlert alert = new HealthAlert();
            alert.setAlertId(UUID.randomUUID().toString());
            alert.setComponent("SYSTEM");
            alert.setSeverity(AlertSeverity.WARNING);
            alert.setTitle("System Health Degradation");
            alert.setMessage("Overall system health score has degraded to " + 
                           String.format("%.2f", currentHealth.getHealthScore()));
            alert.setTimestamp(Instant.now());

            alerts.add(alert);
        }

        // Check for health trend issues
        if ("DECLINING".equals(currentHealth.getHealthTrend())) {
            HealthAlert alert = new HealthAlert();
            alert.setAlertId(UUID.randomUUID().toString());
            alert.setComponent("SYSTEM");
            alert.setSeverity(AlertSeverity.WARNING);
            alert.setTitle("Declining Health Trend");
            alert.setMessage("System health trend is declining over recent checks");
            alert.setTimestamp(Instant.now());

            alerts.add(alert);
        }

        return alerts;
    }

    public HealthCheckReport generateHealthReport(Duration period) {
        HealthCheckReport report = new HealthCheckReport();
        report.setReportPeriod(period);
        report.setGeneratedAt(Instant.now());

        Instant cutoff = Instant.now().minus(period);

        // Collect health check results from history
        Map<String, List<HealthCheckResult>> componentHistory = new HashMap<>();

        for (Map.Entry<String, HealthCheckResult> entry : healthCheckHistory.entrySet()) {
            if (entry.getValue().getTimestamp().isAfter(cutoff)) {
                String component = entry.getKey().split("-")[0]; // Extract component name
                componentHistory.computeIfAbsent(component, k -> new ArrayList<>()).add(entry.getValue());
            }
        }

        // Calculate availability statistics
        Map<String, AvailabilityStatistics> availabilityStats = new HashMap<>();

        for (Map.Entry<String, List<HealthCheckResult>> entry : componentHistory.entrySet()) {
            AvailabilityStatistics stats = calculateAvailabilityStatistics(entry.getValue());
            availabilityStats.put(entry.getKey(), stats);
        }

        report.setAvailabilityStatistics(availabilityStats);

        // Calculate overall system availability
        double overallAvailability = availabilityStats.values().stream()
            .mapToDouble(AvailabilityStatistics::getAvailabilityPercentage)
            .average()
            .orElse(0.0);

        report.setOverallAvailability(overallAvailability);

        // Identify top issues
        List<HealthIssue> topIssues = identifyTopHealthIssues(componentHistory);
        report.setTopIssues(topIssues);

        return report;
    }

    private void performHealthChecks() {
        try {
            SystemHealthStatus healthStatus = getSystemHealth();

            // Store health check results in history
            for (Map.Entry<String, Health> entry : healthStatus.getIndividualHealths().entrySet()) {
                HealthCheckResult result = new HealthCheckResult();
                result.setComponent(entry.getKey());
                result.setStatus(entry.getValue().getStatus());
                result.setDetails(entry.getValue().getDetails());
                result.setTimestamp(Instant.now());

                String historyKey = entry.getKey() + "-" + System.currentTimeMillis();
                healthCheckHistory.put(historyKey, result);
            }

            log.debug("Health checks completed - Overall status: {}, Score: {}", 
                     healthStatus.getOverallStatus(), healthStatus.getHealthScore());

        } catch (Exception e) {
            log.error("Error performing health checks", e);
        }
    }

    private void publishHealthStatus() {
        try {
            SystemHealthStatus healthStatus = getSystemHealth();

            // Publish to monitoring system
            publishToMonitoring(healthStatus);

            // Check for alerts
            List<HealthAlert> alerts = getHealthAlerts();
            if (!alerts.isEmpty()) {
                publishHealthAlerts(alerts);
            }

            log.info("Health status published - Status: {}, Alerts: {}", 
                    healthStatus.getOverallStatus(), alerts.size());

        } catch (Exception e) {
            log.error("Error publishing health status", e);
        }
    }

    private double calculateHealthScore(Map<String, Health> healthMap) {
        if (healthMap.isEmpty()) {
            return 0.0;
        }

        int totalComponents = healthMap.size();
        long healthyComponents = healthMap.values().stream()
            .mapToLong(health -> health.getStatus() == Status.UP ? 1 : 0)
            .sum();

        return (double) healthyComponents / totalComponents;
    }

    private String calculateHealthTrend() {
        // Analyze recent health scores to determine trend
        List<Double> recentScores = getRecentHealthScores(Duration.ofMinutes(30));

        if (recentScores.size() < 3) {
            return "INSUFFICIENT_DATA";
        }

        // Simple trend analysis
        double firstHalf = recentScores.subList(0, recentScores.size() / 2).stream()
            .mapToDouble(Double::doubleValue)
            .average()
            .orElse(0.0);

        double secondHalf = recentScores.subList(recentScores.size() / 2, recentScores.size()).stream()
            .mapToDouble(Double::doubleValue)
            .average()
            .orElse(0.0);

        if (secondHalf > firstHalf + 0.1) {
            return "IMPROVING";
        } else if (secondHalf < firstHalf - 0.1) {
            return "DECLINING";
        } else {
            return "STABLE";
        }
    }

    private List<String> summarizeHealthIssues(Map<String, Health> healthMap) {
        return healthMap.entrySet().stream()
            .filter(entry -> entry.getValue().getStatus() != Status.UP)
            .map(entry -> entry.getKey() + ": " + entry.getValue().getStatus())
            .collect(Collectors.toList());
    }
}

// Health Check REST Controller
@RestController
@RequestMapping("/health")
public class HealthCheckController {

    @Autowired
    private HealthCheckManager healthCheckManager;

    @GetMapping
    public ResponseEntity<SystemHealthStatus> getSystemHealth() {
        SystemHealthStatus health = healthCheckManager.getSystemHealth();

        HttpStatus httpStatus = health.getOverallStatus() == Status.UP ? 
            HttpStatus.OK : HttpStatus.SERVICE_UNAVAILABLE;

        return ResponseEntity.status(httpStatus).body(health);
    }

    @GetMapping("/detailed")
    public ResponseEntity<SystemHealthStatus> getDetailedHealth() {
        SystemHealthStatus health = healthCheckManager.getSystemHealth();
        return ResponseEntity.ok(health);
    }

    @GetMapping("/alerts")
    public ResponseEntity<List<HealthAlert>> getHealthAlerts() {
        List<HealthAlert> alerts = healthCheckManager.getHealthAlerts();
        return ResponseEntity.ok(alerts);
    }

    @GetMapping("/report")
    public ResponseEntity<HealthCheckReport> getHealthReport(
            @RequestParam(defaultValue = "PT24H") String period) {

        Duration reportPeriod = Duration.parse(period);
        HealthCheckReport report = healthCheckManager.generateHealthReport(reportPeriod);

        return ResponseEntity.ok(report);
    }

    @GetMapping("/readiness")
    public ResponseEntity<Map<String, String>> readinessCheck() {
        SystemHealthStatus health = healthCheckManager.getSystemHealth();

        Map<String, String> response = new HashMap<>();
        response.put("status", health.getOverallStatus().toString());
        response.put("ready", health.getOverallStatus() == Status.UP ? "true" : "false");
        response.put("timestamp", health.getTimestamp().toString());

        HttpStatus httpStatus = health.getOverallStatus() == Status.UP ? 
            HttpStatus.OK : HttpStatus.SERVICE_UNAVAILABLE;

        return ResponseEntity.status(httpStatus).body(response);
    }

    @GetMapping("/liveness")
    public ResponseEntity<Map<String, String>> livenessCheck() {
        // Liveness check should be simpler - just verify basic application health
        Map<String, String> response = new HashMap<>();
        response.put("status", "UP");
        response.put("alive", "true");
        response.put("timestamp", Instant.now().toString());

        return ResponseEntity.ok(response);
    }
}

Best Practices

1. Health Check Design and Implementation

2. Health Check Coverage and Scope

3. Performance and Resource Management

4. Alerting and Response

5. Integration and Automation

6. Security and Compliance

Health Checks are essential for maintaining system reliability, enabling proactive issue detection, and supporting automated operational procedures in complex distributed enterprise integration architectures, providing the foundation for operational excellence and service reliability.

← Back to All Patterns