This site is in English. Use your browser's built-in translate feature to read it in your language.

Error Tracking

Overview

Error Tracking is a systematic observability pattern in enterprise integration architectures that comprehensively captures, categorizes, analyzes, and manages application errors, exceptions, failures, and anomalies across distributed systems to enable rapid issue identification, effective debugging, proactive problem resolution, and continuous system reliability improvement. Like a sophisticated diagnostic and forensic system that not only detects when something goes wrong but also provides detailed context about what happened, why it happened, and how to fix it, error tracking provides end-to-end visibility into system failures and their impact on business operations. This pattern is essential for maintaining system reliability, reducing mean time to resolution (MTTR), preventing error recurrence, supporting root cause analysis, and ensuring high-quality user experiences in complex enterprise environments where rapid error detection and resolution are critical to business continuity.

Theoretical Foundation

Error Tracking is grounded in fault tolerance theory, error propagation analysis, incident management principles, and reliability engineering methodologies. It incorporates concepts from exception handling patterns, failure analysis frameworks, observability theory, and continuous improvement processes to provide a comprehensive framework for error management and system reliability. The pattern addresses the fundamental need for systematic error capture, intelligent error analysis, effective error communication, and data-driven reliability improvements in distributed enterprise systems.

Core Principles

1. Comprehensive Error Capture and Classification

Systematic capture and categorization of all types of system errors and failures: - Exception tracking - detailed capture of application exceptions with full stack traces and context - System error monitoring - monitoring of system-level errors and infrastructure failures - Business logic errors - tracking of business rule violations and process failures - Integration failures - monitoring of external service failures and communication errors

2. Contextual Error Information and Analysis

Rich contextual information to support effective error analysis and resolution: - Environmental context - system state, configuration, and environmental conditions at time of error - User context - user information, session details, and user journey context when errors occur - Technical context - detailed technical information including stack traces, request details, and system metrics - Business context - business process context, transaction details, and business impact assessment

3. Intelligent Error Aggregation and Deduplication

Smart grouping and management of related errors to reduce noise and improve efficiency: - Error fingerprinting - intelligent grouping of similar errors to reduce duplicate noise - Error correlation - identification of related errors and failure cascades across services - Impact assessment - evaluation of error frequency, severity, and business impact - Trend analysis - identification of error patterns and trends over time

4. Proactive Error Management and Resolution

Automated and guided approaches to error resolution and prevention: - Automated alerting - intelligent alerting based on error severity, frequency, and business impact - Resolution tracking - systematic tracking of error resolution status and progress - Root cause analysis - guided analysis to identify underlying causes of errors - Prevention strategies - implementation of measures to prevent error recurrence

Why Error Tracking is Essential in Integration Architecture

1. Rapid Issue Detection and Response

In complex distributed systems, error tracking provides: - Real-time error detection - immediate notification when critical errors occur - Error impact assessment - understanding of error impact on business operations and users - Prioritized response - intelligent prioritization of errors based on severity and business impact - Coordinated incident response - support for coordinated incident management and resolution

2. Effective Debugging and Troubleshooting

Supporting rapid problem diagnosis and resolution: - Detailed error context - comprehensive information for understanding error causes and conditions - Error reproduction - sufficient context to reproduce and debug errors in development environments - Cross-service correlation - understanding of error propagation across distributed services - Historical analysis - access to historical error patterns for comparative analysis

3. System Reliability Improvement

Using error data to continuously improve system reliability: - Reliability metrics - measurement of system reliability and error rates over time - Failure pattern analysis - identification of common failure patterns and root causes - Preventive measures - implementation of measures to prevent known error patterns - Quality assurance - support for quality assurance processes and reliability testing

4. Business Continuity and Customer Experience

Minimizing business impact through effective error management: - Customer impact minimization - rapid resolution of errors that affect customer experience - Business process continuity - ensuring business processes continue despite technical errors - SLA compliance - support for meeting service level agreement commitments - Reputation protection - preventing error-related damage to business reputation

Benefits in Integration Contexts

1. Technical Advantages

2. Operational Benefits

3. Integration Enablement

4. Business Value

Integration Architecture Applications

1. Comprehensive Error Tracking System

Enterprise-grade error tracking with intelligent analysis and management:

// Error Tracking Configuration
@Configuration
@EnableConfigurationProperties(ErrorTrackingProperties.class)
public class ErrorTrackingConfiguration {

    @Bean
    public ErrorCaptureService errorCaptureService() {
        return new ErrorCaptureService();
    }

    @Bean
    public ErrorAnalysisService errorAnalysisService() {
        return new ErrorAnalysisService();
    }

    @Bean
    public ErrorAggregationService errorAggregationService() {
        return new ErrorAggregationService();
    }

    @Bean
    public ErrorAlertService errorAlertService() {
        return new ErrorAlertService();
    }

    @Bean
    public ErrorReportingService errorReportingService() {
        return new ErrorReportingService();
    }

    @Bean
    public ErrorResolutionTracker errorResolutionTracker() {
        return new ErrorResolutionTracker();
    }
}

// Error Capture Service
@Service
public class ErrorCaptureService {

    @Autowired
    private ErrorAnalysisService errorAnalysisService;

    @Autowired
    private ErrorAggregationService errorAggregationService;

    @Autowired
    private ContextCollectorService contextCollectorService;

    private static final Logger logger = LoggerFactory.getLogger(ErrorCaptureService.class);

    @Async
    public void captureError(Throwable throwable, ErrorContext context) {
        try {
            // Create error entry
            ErrorEntry errorEntry = createErrorEntry(throwable, context);

            // Enrich with additional context
            enrichErrorContext(errorEntry);

            // Calculate error fingerprint
            String fingerprint = calculateErrorFingerprint(errorEntry);
            errorEntry.setFingerprint(fingerprint);

            // Determine error severity
            ErrorSeverity severity = determineErrorSeverity(errorEntry);
            errorEntry.setSeverity(severity);

            // Analyze error
            ErrorAnalysisResult analysis = errorAnalysisService.analyzeError(errorEntry);
            errorEntry.setAnalysisResult(analysis);

            // Aggregate with similar errors
            ErrorGroup errorGroup = errorAggregationService.aggregateError(errorEntry);

            // Store error
            storeError(errorEntry, errorGroup);

            // Send alerts if necessary
            checkAndSendAlerts(errorEntry, errorGroup);

            logger.info("Error captured and processed - ErrorId: {}, Type: {}, Severity: {}, Fingerprint: {}", 
                       errorEntry.getId(), errorEntry.getExceptionType(), 
                       errorEntry.getSeverity(), fingerprint);

        } catch (Exception e) {
            logger.error("Failed to capture error", e);
            // Fallback error capture to prevent error capture failures
            captureErrorCaptureFallback(throwable, context, e);
        }
    }

    @EventListener
    public void handleUncaughtException(UncaughtExceptionEvent event) {
        ErrorContext context = ErrorContext.builder()
            .source("UNCAUGHT_EXCEPTION")
            .timestamp(Instant.now())
            .threadName(Thread.currentThread().getName())
            .build();

        captureError(event.getThrowable(), context);
    }

    public void captureBusinessError(String errorCode, String errorMessage, 
                                   Map<String, Object> businessContext) {
        BusinessError businessError = new BusinessError(errorCode, errorMessage);

        ErrorContext context = ErrorContext.builder()
            .source("BUSINESS_LOGIC")
            .businessContext(businessContext)
            .timestamp(Instant.now())
            .build();

        captureError(businessError, context);
    }

    public void captureIntegrationError(String serviceName, String endpoint, 
                                      Throwable throwable, IntegrationContext integrationContext) {
        ErrorContext context = ErrorContext.builder()
            .source("INTEGRATION")
            .serviceName(serviceName)
            .endpoint(endpoint)
            .integrationContext(integrationContext)
            .timestamp(Instant.now())
            .build();

        captureError(throwable, context);
    }

    private ErrorEntry createErrorEntry(Throwable throwable, ErrorContext context) {
        ErrorEntry entry = new ErrorEntry();
        entry.setId(UUID.randomUUID().toString());
        entry.setTimestamp(context.getTimestamp());
        entry.setExceptionType(throwable.getClass().getName());
        entry.setExceptionMessage(throwable.getMessage());
        entry.setStackTrace(getStackTraceString(throwable));
        entry.setSource(context.getSource());
        entry.setServiceName(context.getServiceName());
        entry.setEndpoint(context.getEndpoint());

        // Add cause chain
        if (throwable.getCause() != null) {
            entry.setCauseChain(buildCauseChain(throwable));
        }

        // Add suppressed exceptions
        Throwable[] suppressed = throwable.getSuppressed();
        if (suppressed.length > 0) {
            entry.setSuppressedExceptions(Arrays.stream(suppressed)
                .map(this::createSuppressedException)
                .collect(Collectors.toList()));
        }

        return entry;
    }

    private void enrichErrorContext(ErrorEntry errorEntry) {
        try {
            // Add system context
            SystemContext systemContext = contextCollectorService.collectSystemContext();
            errorEntry.setSystemContext(systemContext);

            // Add request context if available
            RequestContext requestContext = contextCollectorService.collectRequestContext();
            if (requestContext != null) {
                errorEntry.setRequestContext(requestContext);
            }

            // Add user context if available
            UserContext userContext = contextCollectorService.collectUserContext();
            if (userContext != null) {
                errorEntry.setUserContext(userContext);
            }

            // Add application context
            ApplicationContext applicationContext = contextCollectorService.collectApplicationContext();
            errorEntry.setApplicationContext(applicationContext);

            // Add performance context
            PerformanceContext performanceContext = contextCollectorService.collectPerformanceContext();
            errorEntry.setPerformanceContext(performanceContext);

        } catch (Exception e) {
            logger.warn("Failed to enrich error context", e);
        }
    }

    private String calculateErrorFingerprint(ErrorEntry errorEntry) {
        try {
            // Create fingerprint based on exception type, message pattern, and stack trace
            StringBuilder fingerprintData = new StringBuilder();

            // Add exception type
            fingerprintData.append(errorEntry.getExceptionType());

            // Add normalized error message (remove dynamic values)
            String normalizedMessage = normalizeErrorMessage(errorEntry.getExceptionMessage());
            fingerprintData.append("|").append(normalizedMessage);

            // Add key stack trace elements (top 3-5 frames from application code)
            List<String> keyStackFrames = extractKeyStackFrames(errorEntry.getStackTrace());
            fingerprintData.append("|").append(String.join(",", keyStackFrames));

            // Add source/endpoint information
            if (errorEntry.getEndpoint() != null) {
                fingerprintData.append("|").append(errorEntry.getEndpoint());
            }

            // Calculate hash
            MessageDigest digest = MessageDigest.getInstance("SHA-256");
            byte[] hash = digest.digest(fingerprintData.toString().getBytes(StandardCharsets.UTF_8));
            return Base64.getEncoder().encodeToString(hash);

        } catch (Exception e) {
            logger.warn("Failed to calculate error fingerprint, using fallback", e);
            return "fallback-" + errorEntry.getExceptionType() + "-" + System.currentTimeMillis();
        }
    }

    private ErrorSeverity determineErrorSeverity(ErrorEntry errorEntry) {
        // Determine severity based on exception type, context, and impact

        // Critical errors
        if (isSecurityException(errorEntry)) {
            return ErrorSeverity.CRITICAL;
        }

        if (isDataCorruptionException(errorEntry)) {
            return ErrorSeverity.CRITICAL;
        }

        if (isSystemFailureException(errorEntry)) {
            return ErrorSeverity.CRITICAL;
        }

        // High severity errors
        if (isPaymentRelatedError(errorEntry)) {
            return ErrorSeverity.HIGH;
        }

        if (isCustomerImpactingError(errorEntry)) {
            return ErrorSeverity.HIGH;
        }

        if (isDatabaseException(errorEntry)) {
            return ErrorSeverity.HIGH;
        }

        // Medium severity errors
        if (isIntegrationException(errorEntry)) {
            return ErrorSeverity.MEDIUM;
        }

        if (isValidationException(errorEntry)) {
            return ErrorSeverity.MEDIUM;
        }

        // Low severity errors (default)
        return ErrorSeverity.LOW;
    }

    private String normalizeErrorMessage(String message) {
        if (message == null) return "";

        // Replace dynamic values with placeholders
        return message
            .replaceAll("\\d+", "{number}")
            .replaceAll("[a-fA-F0-9-]{36}", "{uuid}")
            .replaceAll("\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}", "{timestamp}")
            .replaceAll("'[^']*'", "'{string}'")
            .replaceAll("\"[^\"]*\"", "\"{string}\"");
    }

    private List<String> extractKeyStackFrames(String stackTrace) {
        List<String> keyFrames = new ArrayList<>();

        String[] lines = stackTrace.split("\n");
        for (String line : lines) {
            if (line.contains("at ") && isApplicationCode(line)) {
                // Extract method and class information
                String frame = line.trim().replaceAll("at ", "")
                    .replaceAll("\\(.*\\)", "()"); // Remove line numbers and file info
                keyFrames.add(frame);

                // Limit to top 5 application frames
                if (keyFrames.size() >= 5) {
                    break;
                }
            }
        }

        return keyFrames;
    }

    private boolean isApplicationCode(String stackFrame) {
        // Identify application code vs framework/library code
        return stackFrame.contains("org.kallio") || // Application package
               stackFrame.contains("com.yourcompany"); // Additional application packages
    }
}

// Error Analysis Service
@Service
public class ErrorAnalysisService {

    @Autowired
    private ErrorPatternMatcher errorPatternMatcher;

    @Autowired
    private ErrorImpactAnalyzer errorImpactAnalyzer;

    @Autowired
    private ErrorCorrelationService errorCorrelationService;

    public ErrorAnalysisResult analyzeError(ErrorEntry errorEntry) {
        ErrorAnalysisResult result = new ErrorAnalysisResult();
        result.setErrorId(errorEntry.getId());
        result.setAnalysisTimestamp(Instant.now());

        // Pattern matching
        List<ErrorPattern> matchedPatterns = errorPatternMatcher.matchPatterns(errorEntry);
        result.setMatchedPatterns(matchedPatterns);

        // Root cause analysis
        RootCauseAnalysis rootCause = performRootCauseAnalysis(errorEntry, matchedPatterns);
        result.setRootCauseAnalysis(rootCause);

        // Impact analysis
        ErrorImpactAssessment impact = errorImpactAnalyzer.assessImpact(errorEntry);
        result.setImpactAssessment(impact);

        // Correlation analysis
        ErrorCorrelationResult correlation = errorCorrelationService.findCorrelatedErrors(errorEntry);
        result.setCorrelationResult(correlation);

        // Resolution suggestions
        List<ResolutionSuggestion> suggestions = generateResolutionSuggestions(errorEntry, matchedPatterns);
        result.setResolutionSuggestions(suggestions);

        // Classification
        ErrorClassification classification = classifyError(errorEntry, result);
        result.setClassification(classification);

        return result;
    }

    private RootCauseAnalysis performRootCauseAnalysis(ErrorEntry errorEntry, 
                                                     List<ErrorPattern> matchedPatterns) {
        RootCauseAnalysis analysis = new RootCauseAnalysis();
        analysis.setAnalysisMethod("AUTOMATED_PATTERN_MATCHING");

        // Analyze based on matched patterns
        if (!matchedPatterns.isEmpty()) {
            ErrorPattern primaryPattern = matchedPatterns.get(0);
            analysis.setPossibleCauses(primaryPattern.getKnownCauses());
            analysis.setConfidenceLevel(primaryPattern.getConfidenceLevel());
            analysis.setRecommendedActions(primaryPattern.getRecommendedActions());
        }

        // Additional analysis based on context
        analyzeContextualCauses(errorEntry, analysis);

        // Analyze error timing and frequency
        analyzeErrorTiming(errorEntry, analysis);

        return analysis;
    }

    private void analyzeContextualCauses(ErrorEntry errorEntry, RootCauseAnalysis analysis) {
        List<String> contextualCauses = new ArrayList<>();

        // Analyze system context
        if (errorEntry.getSystemContext() != null) {
            SystemContext systemContext = errorEntry.getSystemContext();

            if (systemContext.getMemoryUtilization() > 85) {
                contextualCauses.add("High memory utilization (" + 
                                   systemContext.getMemoryUtilization() + "%)");
            }

            if (systemContext.getCpuUtilization() > 80) {
                contextualCauses.add("High CPU utilization (" + 
                                   systemContext.getCpuUtilization() + "%)");
            }

            if (systemContext.getDiskUtilization() > 90) {
                contextualCauses.add("High disk utilization (" + 
                                   systemContext.getDiskUtilization() + "%)");
            }
        }

        // Analyze request context
        if (errorEntry.getRequestContext() != null) {
            RequestContext requestContext = errorEntry.getRequestContext();

            if (requestContext.getRequestSize() > 50 * 1024 * 1024) { // 50MB
                contextualCauses.add("Large request size (" + 
                                   formatBytes(requestContext.getRequestSize()) + ")");
            }

            if (requestContext.getProcessingTime() > 30000) { // 30 seconds
                contextualCauses.add("Long processing time (" + 
                                   requestContext.getProcessingTime() + "ms)");
            }
        }

        analysis.setContextualCauses(contextualCauses);
    }

    private List<ResolutionSuggestion> generateResolutionSuggestions(ErrorEntry errorEntry, 
                                                                   List<ErrorPattern> patterns) {
        List<ResolutionSuggestion> suggestions = new ArrayList<>();

        // Suggestions based on patterns
        for (ErrorPattern pattern : patterns) {
            for (String action : pattern.getRecommendedActions()) {
                ResolutionSuggestion suggestion = new ResolutionSuggestion();
                suggestion.setType(ResolutionSuggestionType.PATTERN_BASED);
                suggestion.setAction(action);
                suggestion.setConfidence(pattern.getConfidenceLevel());
                suggestion.setDescription("Based on error pattern: " + pattern.getName());
                suggestions.add(suggestion);
            }
        }

        // Suggestions based on error type
        suggestions.addAll(generateTypeBasedSuggestions(errorEntry));

        // Suggestions based on context
        suggestions.addAll(generateContextBasedSuggestions(errorEntry));

        return suggestions.stream()
            .sorted((a, b) -> Double.compare(b.getConfidence(), a.getConfidence()))
            .limit(5) // Top 5 suggestions
            .collect(Collectors.toList());
    }

    private List<ResolutionSuggestion> generateTypeBasedSuggestions(ErrorEntry errorEntry) {
        List<ResolutionSuggestion> suggestions = new ArrayList<>();

        String exceptionType = errorEntry.getExceptionType();

        if (exceptionType.contains("OutOfMemoryError")) {
            suggestions.add(createSuggestion(
                ResolutionSuggestionType.INFRASTRUCTURE,
                "Increase JVM heap size",
                "Add -Xmx parameter to increase maximum heap memory",
                0.8
            ));

            suggestions.add(createSuggestion(
                ResolutionSuggestionType.CODE,
                "Analyze memory usage patterns",
                "Review code for memory leaks and optimize data structures",
                0.7
            ));
        }

        if (exceptionType.contains("TimeoutException")) {
            suggestions.add(createSuggestion(
                ResolutionSuggestionType.CONFIGURATION,
                "Increase timeout configuration",
                "Review and increase timeout values for affected operations",
                0.7
            ));

            suggestions.add(createSuggestion(
                ResolutionSuggestionType.MONITORING,
                "Monitor external service performance",
                "Check performance of external dependencies",
                0.6
            ));
        }

        if (exceptionType.contains("SQLException")) {
            suggestions.add(createSuggestion(
                ResolutionSuggestionType.DATABASE,
                "Check database connection pool",
                "Review database connection pool configuration and health",
                0.8
            ));

            suggestions.add(createSuggestion(
                ResolutionSuggestionType.DATABASE,
                "Analyze SQL query performance",
                "Review SQL query execution plans and optimize if necessary",
                0.7
            ));
        }

        return suggestions;
    }
}

// Error Aggregation Service
@Service
public class ErrorAggregationService {

    @Autowired
    private ErrorGroupRepository errorGroupRepository;

    @Autowired
    private ErrorEntryRepository errorEntryRepository;

    private final Map<String, ErrorGroup> activeGroups = new ConcurrentHashMap<>();

    public ErrorGroup aggregateError(ErrorEntry errorEntry) {
        String fingerprint = errorEntry.getFingerprint();

        // Find or create error group
        ErrorGroup errorGroup = activeGroups.computeIfAbsent(fingerprint, fp -> {
            // Check if group exists in database
            Optional<ErrorGroup> existingGroup = errorGroupRepository.findByFingerprint(fp);
            return existingGroup.orElseGet(() -> createNewErrorGroup(errorEntry));
        });

        // Update group with new error
        updateErrorGroup(errorGroup, errorEntry);

        // Check if group status needs updating
        updateGroupStatus(errorGroup);

        // Persist updates
        errorGroupRepository.save(errorGroup);

        return errorGroup;
    }

    private ErrorGroup createNewErrorGroup(ErrorEntry errorEntry) {
        ErrorGroup group = new ErrorGroup();
        group.setId(UUID.randomUUID().toString());
        group.setFingerprint(errorEntry.getFingerprint());
        group.setTitle(generateGroupTitle(errorEntry));
        group.setDescription(generateGroupDescription(errorEntry));
        group.setFirstSeen(errorEntry.getTimestamp());
        group.setLastSeen(errorEntry.getTimestamp());
        group.setErrorCount(0);
        group.setAffectedUsers(new HashSet<>());
        group.setStatus(ErrorGroupStatus.OPEN);
        group.setSeverity(errorEntry.getSeverity());
        group.setSource(errorEntry.getSource());
        group.setServiceName(errorEntry.getServiceName());
        group.setEnvironment(getEnvironment());

        return group;
    }

    private void updateErrorGroup(ErrorGroup group, ErrorEntry errorEntry) {
        // Update occurrence information
        group.setLastSeen(errorEntry.getTimestamp());
        group.setErrorCount(group.getErrorCount() + 1);

        // Update severity if new error is more severe
        if (errorEntry.getSeverity().ordinal() > group.getSeverity().ordinal()) {
            group.setSeverity(errorEntry.getSeverity());
        }

        // Track affected users
        if (errorEntry.getUserContext() != null && 
            errorEntry.getUserContext().getUserId() != null) {
            group.getAffectedUsers().add(errorEntry.getUserContext().getUserId());
        }

        // Update frequency metrics
        updateFrequencyMetrics(group, errorEntry);

        // Update trend information
        updateTrendInformation(group, errorEntry);
    }

    private void updateGroupStatus(ErrorGroup group) {
        // Auto-resolve if no new errors for a certain period
        Duration timeSinceLastSeen = Duration.between(group.getLastSeen(), Instant.now());

        if (group.getStatus() == ErrorGroupStatus.OPEN && 
            timeSinceLastSeen.toDays() > 30 && 
            group.getErrorCount() < 5) {
            group.setStatus(ErrorGroupStatus.AUTO_RESOLVED);
            group.setResolvedAt(Instant.now());
            group.setResolutionNote("Auto-resolved: No new occurrences for 30 days");
        }

        // Escalate if error frequency increases significantly
        if (group.getStatus() == ErrorGroupStatus.OPEN && 
            hasFrequencySpike(group)) {
            group.setStatus(ErrorGroupStatus.ESCALATED);
            group.setEscalatedAt(Instant.now());
        }
    }

    private void updateFrequencyMetrics(ErrorGroup group, ErrorEntry errorEntry) {
        // Update hourly frequency
        LocalDateTime errorHour = errorEntry.getTimestamp().atZone(ZoneOffset.UTC)
            .truncatedTo(ChronoUnit.HOURS).toLocalDateTime();

        Map<LocalDateTime, Integer> hourlyFrequency = group.getHourlyFrequency();
        if (hourlyFrequency == null) {
            hourlyFrequency = new HashMap<>();
            group.setHourlyFrequency(hourlyFrequency);
        }

        hourlyFrequency.merge(errorHour, 1, Integer::sum);

        // Keep only last 48 hours of data
        Instant cutoff = Instant.now().minus(Duration.ofHours(48));
        hourlyFrequency.entrySet().removeIf(entry -> 
            entry.getKey().toInstant(ZoneOffset.UTC).isBefore(cutoff));
    }

    public ErrorGroupSummary getErrorGroupSummary(Duration period) {
        Instant startTime = Instant.now().minus(period);

        List<ErrorGroup> groups = errorGroupRepository.findByLastSeenAfter(startTime);

        ErrorGroupSummary summary = new ErrorGroupSummary();
        summary.setPeriod(period);
        summary.setTotalGroups(groups.size());

        // Calculate statistics
        summary.setOpenGroups((int) groups.stream()
            .filter(g -> g.getStatus() == ErrorGroupStatus.OPEN)
            .count());

        summary.setResolvedGroups((int) groups.stream()
            .filter(g -> g.getStatus() == ErrorGroupStatus.RESOLVED)
            .count());

        summary.setEscalatedGroups((int) groups.stream()
            .filter(g -> g.getStatus() == ErrorGroupStatus.ESCALATED)
            .count());

        // Calculate total errors
        int totalErrors = groups.stream()
            .mapToInt(ErrorGroup::getErrorCount)
            .sum();
        summary.setTotalErrors(totalErrors);

        // Calculate affected users
        Set<String> allAffectedUsers = groups.stream()
            .flatMap(g -> g.getAffectedUsers().stream())
            .collect(Collectors.toSet());
        summary.setAffectedUsers(allAffectedUsers.size());

        // Top error groups
        List<ErrorGroup> topGroups = groups.stream()
            .sorted((a, b) -> Integer.compare(b.getErrorCount(), a.getErrorCount()))
            .limit(10)
            .collect(Collectors.toList());
        summary.setTopErrorGroups(topGroups);

        return summary;
    }
}

// Error Alert Service
@Service
public class ErrorAlertService {

    @Autowired
    private NotificationService notificationService;

    @Autowired
    private ErrorAlertConfiguration alertConfiguration;

    @Value("${error.alerts.slack.webhook-url}")
    private String slackWebhookUrl;

    @Value("${error.alerts.email.recipients}")
    private List<String> emailRecipients;

    public void checkAndSendAlerts(ErrorEntry errorEntry, ErrorGroup errorGroup) {
        List<ErrorAlert> alerts = new ArrayList<>();

        // Check severity-based alerts
        if (shouldAlertOnSeverity(errorEntry.getSeverity())) {
            alerts.add(createSeverityAlert(errorEntry, errorGroup));
        }

        // Check frequency-based alerts
        if (shouldAlertOnFrequency(errorGroup)) {
            alerts.add(createFrequencyAlert(errorGroup));
        }

        // Check new error type alerts
        if (isNewErrorType(errorEntry, errorGroup)) {
            alerts.add(createNewErrorTypeAlert(errorEntry, errorGroup));
        }

        // Check user impact alerts
        if (shouldAlertOnUserImpact(errorGroup)) {
            alerts.add(createUserImpactAlert(errorGroup));
        }

        // Send alerts
        for (ErrorAlert alert : alerts) {
            sendAlert(alert);
        }
    }

    private boolean shouldAlertOnSeverity(ErrorSeverity severity) {
        return alertConfiguration.getSeverityAlertThresholds().contains(severity);
    }

    private boolean shouldAlertOnFrequency(ErrorGroup errorGroup) {
        // Alert if error count exceeds threshold within time window
        int threshold = alertConfiguration.getFrequencyThreshold();
        Duration timeWindow = alertConfiguration.getFrequencyTimeWindow();

        Instant cutoff = Instant.now().minus(timeWindow);

        // Count recent errors
        int recentCount = errorEntryRepository.countByGroupFingerprintAndTimestampAfter(
            errorGroup.getFingerprint(), cutoff);

        return recentCount >= threshold;
    }

    private boolean isNewErrorType(ErrorEntry errorEntry, ErrorGroup errorGroup) {
        return errorGroup.getErrorCount() == 1; // First occurrence
    }

    private boolean shouldAlertOnUserImpact(ErrorGroup errorGroup) {
        int userThreshold = alertConfiguration.getUserImpactThreshold();
        return errorGroup.getAffectedUsers().size() >= userThreshold;
    }

    private void sendAlert(ErrorAlert alert) {
        try {
            NotificationMessage message = createNotificationMessage(alert);

            // Send based on severity
            switch (alert.getSeverity()) {
                case CRITICAL:
                    notificationService.sendEmail(emailRecipients, message);
                    notificationService.sendSlack(slackWebhookUrl, message);
                    notificationService.sendSms(getOnCallContacts(), message);
                    break;

                case HIGH:
                    notificationService.sendEmail(emailRecipients, message);
                    notificationService.sendSlack(slackWebhookUrl, message);
                    break;

                case MEDIUM:
                    notificationService.sendSlack(slackWebhookUrl, message);
                    break;

                case LOW:
                    // Only log for low severity
                    logger.info("Error alert: {}", alert.getMessage());
                    break;
            }

            // Store alert for tracking
            storeAlert(alert);

            logger.info("Error alert sent - Type: {}, Severity: {}, ErrorGroup: {}", 
                       alert.getType(), alert.getSeverity(), alert.getErrorGroupId());

        } catch (Exception e) {
            logger.error("Failed to send error alert", e);
        }
    }
}

2. Error Resolution Tracking System

Systematic tracking and management of error resolution processes:

// Error Resolution Tracker
@Service
public class ErrorResolutionTracker {

    @Autowired
    private ErrorGroupRepository errorGroupRepository;

    @Autowired
    private ResolutionActivityRepository resolutionActivityRepository;

    @Autowired
    private NotificationService notificationService;

    public ResolutionTicket createResolutionTicket(ErrorGroup errorGroup, String assignee, 
                                                  ResolutionPriority priority) {
        ResolutionTicket ticket = new ResolutionTicket();
        ticket.setId(UUID.randomUUID().toString());
        ticket.setErrorGroupId(errorGroup.getId());
        ticket.setTitle("Resolve: " + errorGroup.getTitle());
        ticket.setDescription(generateResolutionDescription(errorGroup));
        ticket.setAssignee(assignee);
        ticket.setPriority(priority);
        ticket.setStatus(ResolutionStatus.OPEN);
        ticket.setCreatedAt(Instant.now());
        ticket.setDueDate(calculateDueDate(priority));

        // Add resolution suggestions
        List<ResolutionSuggestion> suggestions = getResolutionSuggestions(errorGroup);
        ticket.setSuggestions(suggestions);

        // Create initial activity
        ResolutionActivity activity = new ResolutionActivity();
        activity.setTicketId(ticket.getId());
        activity.setType(ResolutionActivityType.CREATED);
        activity.setDescription("Resolution ticket created");
        activity.setUserId("SYSTEM");
        activity.setTimestamp(Instant.now());

        resolutionActivityRepository.save(activity);

        // Update error group status
        errorGroup.setStatus(ErrorGroupStatus.IN_PROGRESS);
        errorGroup.setAssignee(assignee);
        errorGroupRepository.save(errorGroup);

        // Send notification
        notifyAssignment(ticket, assignee);

        logger.info("Resolution ticket created - TicketId: {}, ErrorGroup: {}, Assignee: {}", 
                   ticket.getId(), errorGroup.getId(), assignee);

        return ticket;
    }

    public void updateResolutionProgress(String ticketId, String userId, String update, 
                                       ResolutionProgressType progressType) {
        ResolutionTicket ticket = getResolutionTicket(ticketId);

        // Create activity record
        ResolutionActivity activity = new ResolutionActivity();
        activity.setTicketId(ticketId);
        activity.setType(ResolutionActivityType.PROGRESS_UPDATE);
        activity.setDescription(update);
        activity.setUserId(userId);
        activity.setTimestamp(Instant.now());
        activity.setProgressType(progressType);

        resolutionActivityRepository.save(activity);

        // Update ticket status if needed
        updateTicketStatus(ticket, progressType);

        logger.info("Resolution progress updated - TicketId: {}, UserId: {}, Type: {}", 
                   ticketId, userId, progressType);
    }

    public void markResolved(String ticketId, String userId, String resolutionNote, 
                           ResolutionType resolutionType) {
        ResolutionTicket ticket = getResolutionTicket(ticketId);

        // Update ticket
        ticket.setStatus(ResolutionStatus.RESOLVED);
        ticket.setResolvedBy(userId);
        ticket.setResolvedAt(Instant.now());
        ticket.setResolutionNote(resolutionNote);
        ticket.setResolutionType(resolutionType);

        // Create resolution activity
        ResolutionActivity activity = new ResolutionActivity();
        activity.setTicketId(ticketId);
        activity.setType(ResolutionActivityType.RESOLVED);
        activity.setDescription("Ticket resolved: " + resolutionNote);
        activity.setUserId(userId);
        activity.setTimestamp(Instant.now());

        resolutionActivityRepository.save(activity);

        // Update error group
        ErrorGroup errorGroup = errorGroupRepository.findById(ticket.getErrorGroupId()).orElse(null);
        if (errorGroup != null) {
            errorGroup.setStatus(ErrorGroupStatus.RESOLVED);
            errorGroup.setResolvedAt(Instant.now());
            errorGroup.setResolutionNote(resolutionNote);
            errorGroupRepository.save(errorGroup);
        }

        // Calculate resolution metrics
        updateResolutionMetrics(ticket);

        // Send notifications
        notifyResolution(ticket, resolutionType);

        logger.info("Resolution ticket marked resolved - TicketId: {}, ResolvedBy: {}, Type: {}", 
                   ticketId, userId, resolutionType);
    }

    public void markVerified(String ticketId, String userId, boolean verificationSuccess, 
                           String verificationNote) {
        ResolutionTicket ticket = getResolutionTicket(ticketId);

        if (verificationSuccess) {
            ticket.setStatus(ResolutionStatus.VERIFIED);
            ticket.setVerifiedBy(userId);
            ticket.setVerifiedAt(Instant.now());
            ticket.setVerificationNote(verificationNote);

            // Create verification activity
            ResolutionActivity activity = new ResolutionActivity();
            activity.setTicketId(ticketId);
            activity.setType(ResolutionActivityType.VERIFIED);
            activity.setDescription("Resolution verified: " + verificationNote);
            activity.setUserId(userId);
            activity.setTimestamp(Instant.now());

            resolutionActivityRepository.save(activity);

            // Update error group status to closed
            ErrorGroup errorGroup = errorGroupRepository.findById(ticket.getErrorGroupId()).orElse(null);
            if (errorGroup != null) {
                errorGroup.setStatus(ErrorGroupStatus.CLOSED);
                errorGroupRepository.save(errorGroup);
            }

        } else {
            // Reopen ticket
            ticket.setStatus(ResolutionStatus.REOPENED);

            ResolutionActivity activity = new ResolutionActivity();
            activity.setTicketId(ticketId);
            activity.setType(ResolutionActivityType.REOPENED);
            activity.setDescription("Resolution verification failed: " + verificationNote);
            activity.setUserId(userId);
            activity.setTimestamp(Instant.now());

            resolutionActivityRepository.save(activity);

            // Update error group status back to in progress
            ErrorGroup errorGroup = errorGroupRepository.findById(ticket.getErrorGroupId()).orElse(null);
            if (errorGroup != null) {
                errorGroup.setStatus(ErrorGroupStatus.IN_PROGRESS);
                errorGroupRepository.save(errorGroup);
            }
        }

        logger.info("Resolution verification completed - TicketId: {}, Success: {}, VerifiedBy: {}", 
                   ticketId, verificationSuccess, userId);
    }

    public ResolutionReport generateResolutionReport(Duration period) {
        Instant endTime = Instant.now();
        Instant startTime = endTime.minus(period);

        ResolutionReport report = new ResolutionReport();
        report.setPeriod(period);
        report.setStartTime(startTime);
        report.setEndTime(endTime);
        report.setGeneratedAt(Instant.now());

        // Get resolution tickets in period
        List<ResolutionTicket> tickets = resolutionTicketRepository.findByCreatedAtBetween(startTime, endTime);

        // Calculate metrics
        ResolutionMetrics metrics = calculateResolutionMetrics(tickets);
        report.setMetrics(metrics);

        // Resolution time analysis
        ResolutionTimeAnalysis timeAnalysis = analyzeResolutionTimes(tickets);
        report.setTimeAnalysis(timeAnalysis);

        // Assignee performance
        List<AssigneePerformance> assigneePerformance = analyzeAssigneePerformance(tickets);
        report.setAssigneePerformance(assigneePerformance);

        // Resolution type breakdown
        Map<ResolutionType, Integer> resolutionTypeBreakdown = tickets.stream()
            .filter(t -> t.getResolutionType() != null)
            .collect(Collectors.groupingBy(
                ResolutionTicket::getResolutionType,
                Collectors.collectingAndThen(Collectors.counting(), Long::intValue)
            ));
        report.setResolutionTypeBreakdown(resolutionTypeBreakdown);

        return report;
    }
}

// Error REST Controller
@RestController
@RequestMapping("/api/errors")
public class ErrorTrackingController {

    @Autowired
    private ErrorCaptureService errorCaptureService;

    @Autowired
    private ErrorAggregationService errorAggregationService;

    @Autowired
    private ErrorResolutionTracker resolutionTracker;

    @PostMapping("/capture")
    public ResponseEntity<Map<String, String>> captureError(@RequestBody ErrorCaptureRequest request) {
        try {
            ErrorContext context = ErrorContext.builder()
                .source(request.getSource())
                .serviceName(request.getServiceName())
                .endpoint(request.getEndpoint())
                .businessContext(request.getBusinessContext())
                .timestamp(Instant.now())
                .build();

            Exception exception = new Exception(request.getMessage());
            exception.setStackTrace(parseStackTrace(request.getStackTrace()));

            errorCaptureService.captureError(exception, context);

            return ResponseEntity.ok(Map.of("status", "captured"));

        } catch (Exception e) {
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                .body(Map.of("error", e.getMessage()));
        }
    }

    @GetMapping("/groups")
    public ResponseEntity<PagedResponse<ErrorGroup>> getErrorGroups(
            @RequestParam(defaultValue = "0") int page,
            @RequestParam(defaultValue = "20") int size,
            @RequestParam(required = false) ErrorGroupStatus status,
            @RequestParam(required = false) ErrorSeverity severity,
            @RequestParam(defaultValue = "lastSeen") String sortBy,
            @RequestParam(defaultValue = "desc") String sortDirection) {

        Pageable pageable = PageRequest.of(page, size, 
            Sort.Direction.fromString(sortDirection), sortBy);

        Page<ErrorGroup> groups = errorGroupRepository.findWithFilters(
            status, severity, pageable);

        PagedResponse<ErrorGroup> response = new PagedResponse<>();
        response.setContent(groups.getContent());
        response.setPageNumber(groups.getNumber());
        response.setPageSize(groups.getSize());
        response.setTotalElements(groups.getTotalElements());
        response.setTotalPages(groups.getTotalPages());

        return ResponseEntity.ok(response);
    }

    @GetMapping("/groups/{groupId}")
    public ResponseEntity<ErrorGroupDetail> getErrorGroupDetail(@PathVariable String groupId) {
        Optional<ErrorGroup> group = errorGroupRepository.findById(groupId);

        if (group.isEmpty()) {
            return ResponseEntity.notFound().build();
        }

        ErrorGroupDetail detail = new ErrorGroupDetail();
        detail.setGroup(group.get());

        // Get recent errors
        List<ErrorEntry> recentErrors = errorEntryRepository
            .findByGroupFingerprintOrderByTimestampDesc(group.get().getFingerprint(), 
                PageRequest.of(0, 10));
        detail.setRecentErrors(recentErrors);

        // Get resolution ticket if exists
        Optional<ResolutionTicket> ticket = resolutionTicketRepository
            .findByErrorGroupId(groupId);
        detail.setResolutionTicket(ticket.orElse(null));

        return ResponseEntity.ok(detail);
    }

    @GetMapping("/summary")
    public ResponseEntity<ErrorSummary> getErrorSummary(
            @RequestParam(defaultValue = "P1D") String period) {

        Duration summaryPeriod = Duration.parse(period);

        ErrorGroupSummary groupSummary = errorAggregationService.getErrorGroupSummary(summaryPeriod);
        ResolutionReport resolutionReport = resolutionTracker.generateResolutionReport(summaryPeriod);

        ErrorSummary summary = new ErrorSummary();
        summary.setGroupSummary(groupSummary);
        summary.setResolutionReport(resolutionReport);
        summary.setPeriod(summaryPeriod);
        summary.setGeneratedAt(Instant.now());

        return ResponseEntity.ok(summary);
    }

    @PostMapping("/groups/{groupId}/resolve")
    public ResponseEntity<Map<String, String>> createResolutionTicket(
            @PathVariable String groupId,
            @RequestBody ResolutionTicketRequest request) {

        try {
            ErrorGroup group = errorGroupRepository.findById(groupId)
                .orElseThrow(() -> new IllegalArgumentException("Error group not found"));

            ResolutionTicket ticket = resolutionTracker.createResolutionTicket(
                group, request.getAssignee(), request.getPriority());

            return ResponseEntity.ok(Map.of("ticketId", ticket.getId()));

        } catch (Exception e) {
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                .body(Map.of("error", e.getMessage()));
        }
    }

    @PostMapping("/tickets/{ticketId}/progress")
    public ResponseEntity<Map<String, String>> updateProgress(
            @PathVariable String ticketId,
            @RequestBody ProgressUpdateRequest request) {

        try {
            resolutionTracker.updateResolutionProgress(
                ticketId, request.getUserId(), request.getUpdate(), request.getProgressType());

            return ResponseEntity.ok(Map.of("status", "updated"));

        } catch (Exception e) {
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                .body(Map.of("error", e.getMessage()));
        }
    }

    @PostMapping("/tickets/{ticketId}/resolve")
    public ResponseEntity<Map<String, String>> markResolved(
            @PathVariable String ticketId,
            @RequestBody ResolveRequest request) {

        try {
            resolutionTracker.markResolved(
                ticketId, request.getUserId(), request.getNote(), request.getResolutionType());

            return ResponseEntity.ok(Map.of("status", "resolved"));

        } catch (Exception e) {
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                .body(Map.of("error", e.getMessage()));
        }
    }
}

Best Practices

1. Comprehensive Error Capture Strategy

2. Intelligent Error Analysis and Classification

3. Effective Error Aggregation and Management

4. Proactive Error Alerting and Communication

5. Systematic Error Resolution and Prevention

6. Integration and Automation

Error Tracking is essential for maintaining system reliability, ensuring rapid issue resolution, supporting effective debugging and troubleshooting, and driving continuous system improvement in complex enterprise integration architectures, providing the foundation for high-quality, reliable systems that meet business and customer expectations.

← Back to All Patterns