Distributed Tracing
Overview
Distributed Tracing is a comprehensive observability pattern in enterprise integration architectures that enables tracking, monitoring, and analyzing requests as they flow through multiple services, systems, and infrastructure components in distributed environments. Like a detective following clues through a complex investigation that spans multiple jurisdictions and involves numerous witnesses, distributed tracing provides complete visibility into the journey of requests across microservices, databases, message queues, and external systems. This pattern is essential for understanding system behavior, diagnosing performance issues, identifying bottlenecks, and maintaining operational excellence in complex distributed architectures where traditional debugging and monitoring approaches are insufficient.
Theoretical Foundation
Distributed Tracing is grounded in observability theory, distributed systems monitoring, and performance analysis principles. It incorporates concepts from request correlation, span-based monitoring, dependency mapping, and causal relationship tracking to provide a comprehensive framework for understanding complex distributed system interactions. The pattern addresses the fundamental challenges of distributed system observability where traditional logging and metrics alone cannot provide sufficient insight into request flows, inter-service dependencies, and performance characteristics across multiple system boundaries.
Core Principles
1. Request Correlation and Context Propagation
Tracking requests across multiple services and system boundaries: - Trace correlation - unique identifiers that follow requests through entire distributed call chains - Context propagation - automatic transmission of tracing context across service boundaries - Span relationships - hierarchical representation of operations within a request trace - Baggage transmission - carrying business context and metadata throughout request traces
2. Span-Based Operation Tracking
Detailed monitoring of individual operations within distributed requests: - Span creation - capturing detailed information about specific operations and their execution - Timing measurements - precise measurement of operation durations and latencies - Metadata collection - gathering relevant context, tags, and annotations for each operation - Error tracking - capturing and correlating errors within the context of their trace spans
3. Service Dependency Mapping
Understanding relationships and dependencies between services: - Topology discovery - automatic mapping of service interactions and communication patterns - Dependency analysis - identifying critical paths and service interdependencies - Impact assessment - understanding how service failures affect downstream operations - Architecture visualization - providing visual representations of system architecture and data flow
4. Performance Analysis and Optimization
Enabling comprehensive performance monitoring and optimization: - Latency analysis - identifying performance bottlenecks and slow operations - Throughput monitoring - tracking request volumes and processing rates across services - Resource utilization - correlating performance with infrastructure resource consumption - Optimization identification - discovering opportunities for performance improvements
Why Distributed Tracing is Essential in Integration Architecture
1. Complex System Observability
In modern distributed architectures, distributed tracing provides: - End-to-end visibility - complete understanding of request flows across multiple systems - Cross-service debugging - ability to debug issues that span multiple microservices - Performance bottleneck identification - pinpointing specific operations causing performance issues - System behavior understanding - comprehensive insights into how distributed systems operate
2. Microservices Architecture Support
For microservices environments requiring sophisticated monitoring: - Service interaction mapping - understanding how microservices communicate and depend on each other - Request flow visualization - tracking requests as they traverse multiple microservice boundaries - Failure correlation - understanding how failures in one service affect others - Performance optimization - identifying optimization opportunities across service boundaries
3. Enterprise Integration Monitoring
Supporting complex enterprise integration scenarios: - Integration flow tracking - monitoring data flows through enterprise integration platforms - Message correlation - tracking messages through complex routing and transformation scenarios - External system monitoring - understanding interactions with third-party and legacy systems - Business process visibility - correlating technical operations with business process execution
4. Operational Excellence
Enhancing operational capabilities and system reliability: - Root cause analysis - rapidly identifying the source of issues in complex distributed systems - SLA monitoring - tracking performance against service level agreements - Capacity planning - understanding system behavior patterns for capacity planning - Incident response - providing detailed context for faster incident resolution
Benefits in Integration Contexts
1. Technical Advantages
- Complete visibility - comprehensive understanding of distributed system behavior
- Performance insights - detailed performance analysis across all system components
- Debugging capabilities - advanced debugging for complex distributed scenarios
- Dependency management - clear understanding of system dependencies and relationships
2. Business Value
- Faster issue resolution - reduced mean time to resolution for system issues
- Improved reliability - better understanding leads to more reliable system operations
- Performance optimization - data-driven performance improvements and optimizations
- Business impact analysis - understanding how technical issues affect business operations
3. Integration Enablement
- Integration monitoring - comprehensive monitoring of complex integration flows
- Data flow visibility - tracking data transformations and routing through integration pipelines
- Protocol correlation - correlating operations across different communication protocols
- External system insights - understanding performance and reliability of external dependencies
4. Development and Operations
- Development productivity - faster development through better system understanding
- Testing efficiency - comprehensive testing insights for complex distributed scenarios
- Documentation automation - automatic generation of system interaction documentation
- Knowledge transfer - better system understanding for team knowledge transfer
Integration Architecture Applications
1. E-commerce Platform Tracing
Comprehensive tracing for e-commerce order processing:
// Order Service with Distributed Tracing
@RestController
@RequestMapping("/orders")
public class OrderController {
@Autowired
private OrderService orderService;
@Autowired
private Tracer tracer;
@PostMapping
public ResponseEntity<OrderResponse> createOrder(@RequestBody CreateOrderRequest request) {
Span span = tracer.nextSpan()
.name("order-creation")
.tag("operation", "create-order")
.tag("customer-id", request.getCustomerId())
.tag("order-total", request.getTotalAmount().toString())
.start();
try (Tracer.SpanInScope ws = tracer.withSpanInScope(span)) {
log.info("Processing order creation for customer: {}", request.getCustomerId());
OrderResult result = orderService.processOrder(request);
span.tag("order-id", result.getOrderId());
span.tag("order-status", result.getStatus().toString());
if (result.isSuccess()) {
span.tag("success", "true");
return ResponseEntity.ok(OrderResponse.from(result));
} else {
span.tag("success", "false");
span.tag("failure-reason", result.getErrorMessage());
return ResponseEntity.badRequest().body(OrderResponse.error(result.getErrorMessage()));
}
} catch (Exception e) {
span.tag("error", true);
span.tag("error.object", e.getClass().getSimpleName());
span.tag("error.message", e.getMessage());
log.error("Error processing order creation", e);
throw e;
} finally {
span.end();
}
}
}
@Service
public class OrderService {
@Autowired
private InventoryService inventoryService;
@Autowired
private PaymentService paymentService;
@Autowired
private ShippingService shippingService;
@Autowired
private Tracer tracer;
public OrderResult processOrder(CreateOrderRequest request) {
Span span = tracer.nextSpan()
.name("order-processing")
.tag("operation", "process-order")
.start();
try (Tracer.SpanInScope ws = tracer.withSpanInScope(span)) {
String orderId = generateOrderId();
span.tag("order-id", orderId);
// Step 1: Validate inventory
Span inventorySpan = tracer.nextSpan()
.name("inventory-validation")
.tag("operation", "validate-inventory")
.start();
try (Tracer.SpanInScope inventoryScope = tracer.withSpanInScope(inventorySpan)) {
InventoryValidationResult inventoryResult = inventoryService.validateInventory(request.getItems());
inventorySpan.tag("items-count", String.valueOf(request.getItems().size()));
inventorySpan.tag("validation-result", inventoryResult.isValid() ? "valid" : "invalid");
if (!inventoryResult.isValid()) {
inventorySpan.tag("validation-errors", inventoryResult.getErrors().toString());
return OrderResult.failure("Inventory validation failed: " + inventoryResult.getErrors());
}
} finally {
inventorySpan.end();
}
// Step 2: Process payment
Span paymentSpan = tracer.nextSpan()
.name("payment-processing")
.tag("operation", "process-payment")
.start();
try (Tracer.SpanInScope paymentScope = tracer.withSpanInScope(paymentSpan)) {
PaymentRequest paymentRequest = new PaymentRequest();
paymentRequest.setOrderId(orderId);
paymentRequest.setAmount(request.getTotalAmount());
paymentRequest.setPaymentMethod(request.getPaymentMethod());
PaymentResult paymentResult = paymentService.processPayment(paymentRequest);
paymentSpan.tag("payment-amount", request.getTotalAmount().toString());
paymentSpan.tag("payment-method", request.getPaymentMethod());
paymentSpan.tag("payment-result", paymentResult.isSuccess() ? "success" : "failed");
if (!paymentResult.isSuccess()) {
paymentSpan.tag("payment-error", paymentResult.getErrorMessage());
return OrderResult.failure("Payment failed: " + paymentResult.getErrorMessage());
}
span.tag("payment-id", paymentResult.getPaymentId());
} finally {
paymentSpan.end();
}
// Step 3: Initiate shipping
Span shippingSpan = tracer.nextSpan()
.name("shipping-initiation")
.tag("operation", "initiate-shipping")
.start();
try (Tracer.SpanInScope shippingScope = tracer.withSpanInScope(shippingSpan)) {
ShippingRequest shippingRequest = new ShippingRequest();
shippingRequest.setOrderId(orderId);
shippingRequest.setItems(request.getItems());
shippingRequest.setShippingAddress(request.getShippingAddress());
ShippingResult shippingResult = shippingService.initiateShipping(shippingRequest);
shippingSpan.tag("shipping-address", request.getShippingAddress().getCity());
shippingSpan.tag("shipping-result", shippingResult.isSuccess() ? "success" : "failed");
if (!shippingResult.isSuccess()) {
shippingSpan.tag("shipping-error", shippingResult.getErrorMessage());
return OrderResult.failure("Shipping failed: " + shippingResult.getErrorMessage());
}
span.tag("tracking-number", shippingResult.getTrackingNumber());
} finally {
shippingSpan.end();
}
span.tag("order-success", "true");
return OrderResult.success(orderId);
} catch (Exception e) {
span.tag("error", true);
span.tag("error.object", e.getClass().getSimpleName());
span.tag("error.message", e.getMessage());
throw e;
} finally {
span.end();
}
}
}
// Inventory Service with External System Tracing
@Service
public class InventoryService {
@Autowired
private ExternalInventoryClient inventoryClient;
@Autowired
private Tracer tracer;
public InventoryValidationResult validateInventory(List<OrderItem> items) {
Span span = tracer.nextSpan()
.name("external-inventory-check")
.tag("operation", "validate-inventory")
.tag("external-system", "inventory-service")
.start();
try (Tracer.SpanInScope ws = tracer.withSpanInScope(span)) {
List<String> errors = new ArrayList<>();
for (OrderItem item : items) {
Span itemSpan = tracer.nextSpan()
.name("inventory-item-check")
.tag("product-id", item.getProductId())
.tag("quantity", String.valueOf(item.getQuantity()))
.start();
try (Tracer.SpanInScope itemScope = tracer.withSpanInScope(itemSpan)) {
InventoryItem inventoryItem = inventoryClient.getInventoryItem(item.getProductId());
itemSpan.tag("available-quantity", String.valueOf(inventoryItem.getAvailableQuantity()));
itemSpan.tag("reserved-quantity", String.valueOf(inventoryItem.getReservedQuantity()));
if (inventoryItem.getAvailableQuantity() < item.getQuantity()) {
String error = String.format("Insufficient inventory for product %s: requested %d, available %d",
item.getProductId(), item.getQuantity(), inventoryItem.getAvailableQuantity());
errors.add(error);
itemSpan.tag("validation-result", "insufficient");
} else {
itemSpan.tag("validation-result", "sufficient");
}
} catch (Exception e) {
itemSpan.tag("error", true);
itemSpan.tag("error.message", e.getMessage());
errors.add("Error checking inventory for product " + item.getProductId() + ": " + e.getMessage());
} finally {
itemSpan.end();
}
}
span.tag("validation-errors-count", String.valueOf(errors.size()));
span.tag("validation-result", errors.isEmpty() ? "valid" : "invalid");
return new InventoryValidationResult(errors.isEmpty(), errors);
} finally {
span.end();
}
}
}
2. Apache Camel Integration Tracing
Distributed tracing for complex Camel integration flows:
@Component
public class CamelTracingRoute extends RouteBuilder {
@Override
public void configure() throws Exception {
// Enable Camel tracing
getContext().setTracing(true);
// Order processing integration flow with tracing
from("kafka:order-events?groupId=order-integration")
.routeId("order-integration-flow")
.setHeader("integration-start-time", simple("${date:now:yyyy-MM-dd HH:mm:ss.SSS}"))
.log("Starting order integration flow for order: ${body.orderId}")
.unmarshal().json(JsonLibrary.Jackson, OrderEvent.class)
.process(exchange -> {
// Add tracing headers
OrderEvent orderEvent = exchange.getIn().getBody(OrderEvent.class);
exchange.getIn().setHeader("order-id", orderEvent.getOrderId());
exchange.getIn().setHeader("customer-id", orderEvent.getCustomerId());
exchange.getIn().setHeader("trace-operation", "order-integration");
})
.choice()
.when(simple("${body.eventType} == 'ORDER_PLACED'"))
.to("direct:processNewOrder")
.when(simple("${body.eventType} == 'ORDER_UPDATED'"))
.to("direct:processOrderUpdate")
.when(simple("${body.eventType} == 'ORDER_CANCELLED'"))
.to("direct:processOrderCancellation")
.otherwise()
.log("Unknown order event type: ${body.eventType}")
.setHeader("trace-error", simple("Unknown event type: ${body.eventType}"))
.to("direct:handleUnknownEvent")
.end()
.log("Completed order integration flow for order: ${header.order-id}");
from("direct:processNewOrder")
.routeId("process-new-order")
.log("Processing new order: ${header.order-id}")
.setHeader("trace-step", constant("customer-validation"))
.to("direct:validateCustomer")
.setHeader("trace-step", constant("inventory-check"))
.to("direct:checkInventory")
.setHeader("trace-step", constant("pricing-calculation"))
.to("direct:calculatePricing")
.setHeader("trace-step", constant("external-notification"))
.multicast().parallelProcessing()
.to("direct:notifyWarehouse")
.to("direct:updateCRM")
.to("direct:sendCustomerNotification")
.end()
.setHeader("trace-step", constant("order-confirmation"))
.to("direct:confirmOrder")
.log("New order processing completed for order: ${header.order-id}");
from("direct:validateCustomer")
.routeId("validate-customer")
.log("Validating customer: ${header.customer-id}")
.setHeader("trace-operation", constant("customer-validation"))
.process(exchange -> {
String customerId = exchange.getIn().getHeader("customer-id", String.class);
// Add tracing context
exchange.getIn().setHeader("validation-start-time", System.currentTimeMillis());
exchange.getIn().setHeader("trace-customer-id", customerId);
})
.to("http://customer-service/customers/${header.customer-id}?httpMethod=GET&connectTimeout=5000&socketTimeout=10000")
.process(exchange -> {
long startTime = exchange.getIn().getHeader("validation-start-time", Long.class);
long duration = System.currentTimeMillis() - startTime;
exchange.getIn().setHeader("customer-validation-duration", duration);
// Parse customer response
String customerResponse = exchange.getIn().getBody(String.class);
ObjectMapper mapper = new ObjectMapper();
CustomerInfo customer = mapper.readValue(customerResponse, CustomerInfo.class);
exchange.getIn().setHeader("customer-status", customer.getStatus());
exchange.getIn().setHeader("customer-tier", customer.getTier());
log.info("Customer validation completed in {}ms for customer: {}",
duration, exchange.getIn().getHeader("customer-id"));
})
.choice()
.when(simple("${header.customer-status} != 'ACTIVE'"))
.log("Customer validation failed - inactive customer: ${header.customer-id}")
.setHeader("trace-error", simple("Inactive customer: ${header.customer-id}"))
.throwException(new IllegalStateException("Customer is not active"))
.otherwise()
.log("Customer validation successful for customer: ${header.customer-id}")
.end();
from("direct:checkInventory")
.routeId("check-inventory")
.log("Checking inventory for order: ${header.order-id}")
.setHeader("trace-operation", constant("inventory-check"))
.process(exchange -> {
OrderEvent orderEvent = exchange.getIn().getBody(OrderEvent.class);
InventoryCheckRequest request = new InventoryCheckRequest();
request.setOrderId(orderEvent.getOrderId());
request.setItems(orderEvent.getItems());
exchange.getIn().setBody(request);
exchange.getIn().setHeader("inventory-check-start-time", System.currentTimeMillis());
exchange.getIn().setHeader("items-count", orderEvent.getItems().size());
})
.marshal().json(JsonLibrary.Jackson)
.to("http://inventory-service/inventory/check?httpMethod=POST&connectTimeout=5000&socketTimeout=15000")
.process(exchange -> {
long startTime = exchange.getIn().getHeader("inventory-check-start-time", Long.class);
long duration = System.currentTimeMillis() - startTime;
exchange.getIn().setHeader("inventory-check-duration", duration);
String response = exchange.getIn().getBody(String.class);
ObjectMapper mapper = new ObjectMapper();
InventoryCheckResponse inventoryResponse = mapper.readValue(response, InventoryCheckResponse.class);
exchange.getIn().setHeader("inventory-available", inventoryResponse.isAvailable());
exchange.getIn().setHeader("inventory-warnings", inventoryResponse.getWarnings().size());
log.info("Inventory check completed in {}ms for order: {} - Available: {}",
duration, exchange.getIn().getHeader("order-id"), inventoryResponse.isAvailable());
})
.choice()
.when(simple("${header.inventory-available} != true"))
.log("Inventory check failed for order: ${header.order-id}")
.setHeader("trace-error", simple("Insufficient inventory for order: ${header.order-id}"))
.throwException(new IllegalStateException("Insufficient inventory"))
.otherwise()
.log("Inventory check successful for order: ${header.order-id}")
.end();
from("direct:calculatePricing")
.routeId("calculate-pricing")
.log("Calculating pricing for order: ${header.order-id}")
.setHeader("trace-operation", constant("pricing-calculation"))
.process(exchange -> {
OrderEvent orderEvent = exchange.getIn().getBody(OrderEvent.class);
PricingRequest request = new PricingRequest();
request.setOrderId(orderEvent.getOrderId());
request.setCustomerId(orderEvent.getCustomerId());
request.setCustomerTier(exchange.getIn().getHeader("customer-tier", String.class));
request.setItems(orderEvent.getItems());
exchange.getIn().setBody(request);
exchange.getIn().setHeader("pricing-start-time", System.currentTimeMillis());
})
.marshal().json(JsonLibrary.Jackson)
.to("http://pricing-service/pricing/calculate?httpMethod=POST&connectTimeout=5000&socketTimeout=10000")
.process(exchange -> {
long startTime = exchange.getIn().getHeader("pricing-start-time", Long.class);
long duration = System.currentTimeMillis() - startTime;
exchange.getIn().setHeader("pricing-duration", duration);
String response = exchange.getIn().getBody(String.class);
ObjectMapper mapper = new ObjectMapper();
PricingResponse pricingResponse = mapper.readValue(response, PricingResponse.class);
exchange.getIn().setHeader("total-amount", pricingResponse.getTotalAmount());
exchange.getIn().setHeader("discounts-applied", pricingResponse.getDiscounts().size());
log.info("Pricing calculation completed in {}ms for order: {} - Total: {}",
duration, exchange.getIn().getHeader("order-id"), pricingResponse.getTotalAmount());
});
from("direct:notifyWarehouse")
.routeId("notify-warehouse")
.log("Notifying warehouse for order: ${header.order-id}")
.setHeader("trace-operation", constant("warehouse-notification"))
.process(exchange -> {
OrderEvent orderEvent = exchange.getIn().getBody(OrderEvent.class);
WarehouseNotification notification = new WarehouseNotification();
notification.setOrderId(orderEvent.getOrderId());
notification.setItems(orderEvent.getItems());
notification.setShippingAddress(orderEvent.getShippingAddress());
notification.setPriority(determineShippingPriority(orderEvent));
exchange.getIn().setBody(notification);
exchange.getIn().setHeader("notification-type", "warehouse");
exchange.getIn().setHeader("notification-start-time", System.currentTimeMillis());
})
.marshal().json(JsonLibrary.Jackson)
.to("kafka:warehouse-notifications?requestTimeout=10000")
.process(exchange -> {
long startTime = exchange.getIn().getHeader("notification-start-time", Long.class);
long duration = System.currentTimeMillis() - startTime;
exchange.getIn().setHeader("warehouse-notification-duration", duration);
log.info("Warehouse notification sent in {}ms for order: {}",
duration, exchange.getIn().getHeader("order-id"));
});
from("direct:updateCRM")
.routeId("update-crm")
.log("Updating CRM for order: ${header.order-id}")
.setHeader("trace-operation", constant("crm-update"))
.process(exchange -> {
OrderEvent orderEvent = exchange.getIn().getBody(OrderEvent.class);
CRMUpdate crmUpdate = new CRMUpdate();
crmUpdate.setCustomerId(orderEvent.getCustomerId());
crmUpdate.setOrderId(orderEvent.getOrderId());
crmUpdate.setOrderValue(exchange.getIn().getHeader("total-amount", BigDecimal.class));
crmUpdate.setActivity("ORDER_PLACED");
exchange.getIn().setBody(crmUpdate);
exchange.getIn().setHeader("crm-update-start-time", System.currentTimeMillis());
})
.marshal().json(JsonLibrary.Jackson)
.to("http://crm-service/customers/${header.customer-id}/activities?httpMethod=POST&connectTimeout=5000&socketTimeout=10000")
.process(exchange -> {
long startTime = exchange.getIn().getHeader("crm-update-start-time", Long.class);
long duration = System.currentTimeMillis() - startTime;
exchange.getIn().setHeader("crm-update-duration", duration);
log.info("CRM update completed in {}ms for customer: {}",
duration, exchange.getIn().getHeader("customer-id"));
});
// Tracing error handler
onException(Exception.class)
.handled(true)
.log("Error in integration flow: ${exception.message}")
.process(exchange -> {
Exception exception = exchange.getException();
ErrorTrace errorTrace = new ErrorTrace();
errorTrace.setTraceId(exchange.getIn().getHeader("trace-id", String.class));
errorTrace.setOperationId(exchange.getIn().getHeader("trace-operation", String.class));
errorTrace.setErrorMessage(exception.getMessage());
errorTrace.setErrorType(exception.getClass().getSimpleName());
errorTrace.setTimestamp(Instant.now());
errorTrace.setOrderId(exchange.getIn().getHeader("order-id", String.class));
errorTrace.setCustomerId(exchange.getIn().getHeader("customer-id", String.class));
exchange.getIn().setBody(errorTrace);
})
.marshal().json(JsonLibrary.Jackson)
.to("kafka:error-traces")
.log("Error trace published for order: ${header.order-id}");
// Performance monitoring route
from("timer:tracing-metrics?period=60000")
.routeId("tracing-metrics-collector")
.process(exchange -> {
TracingMetrics metrics = new TracingMetrics();
metrics.setTimestamp(Instant.now());
metrics.setActiveTraces(tracingManager.getActiveTraceCount());
metrics.setCompletedTraces(tracingManager.getCompletedTraceCount());
metrics.setAverageTraceLatency(tracingManager.getAverageTraceLatency());
metrics.setErrorRate(tracingManager.getErrorRate());
exchange.getIn().setBody(metrics);
})
.marshal().json(JsonLibrary.Jackson)
.to("kafka:tracing-metrics")
.log("Tracing metrics published - Active traces: ${body.activeTraces}");
}
}
3. Tracing Configuration and Management
Comprehensive tracing setup with Zipkin and Jaeger:
// Tracing Configuration
@Configuration
@EnableZipkinServer // For local development
public class TracingConfiguration {
@Bean
public Sender sender() {
return OkHttpSender.create("http://zipkin-server:9411/api/v2/spans");
}
@Bean
public AsyncReporter<Span> spanReporter() {
return AsyncReporter.create(sender());
}
@Bean
public Tracing tracing() {
return Tracing.newBuilder()
.localServiceName("integration-service")
.spanReporter(spanReporter())
.sampler(Sampler.create(1.0f)) // 100% sampling for development
.build();
}
@Bean
public Tracer tracer() {
return tracing().tracer();
}
@Bean
public SpanCustomizer spanCustomizer() {
return CurrentSpanCustomizer.create();
}
@Bean
public TracingManager tracingManager() {
return new TracingManager(tracer());
}
}
// Custom Tracing Manager
@Component
public class TracingManager {
private final Tracer tracer;
private final AtomicLong activeTraceCount = new AtomicLong(0);
private final AtomicLong completedTraceCount = new AtomicLong(0);
private final AtomicLong totalLatency = new AtomicLong(0);
private final AtomicLong errorCount = new AtomicLong(0);
public TracingManager(Tracer tracer) {
this.tracer = tracer;
}
public Span createSpan(String operationName, Map<String, String> tags) {
Span span = tracer.nextSpan().name(operationName);
if (tags != null) {
tags.forEach(span::tag);
}
span.tag("service.name", "integration-service");
span.tag("service.version", getServiceVersion());
span.tag("environment", getEnvironment());
activeTraceCount.incrementAndGet();
return span.start();
}
public void finishSpan(Span span, boolean success) {
if (span != null) {
if (!success) {
span.tag("error", true);
errorCount.incrementAndGet();
}
span.end();
activeTraceCount.decrementAndGet();
completedTraceCount.incrementAndGet();
}
}
public void addSpanEvent(Span span, String event, Map<String, String> attributes) {
if (span != null) {
span.annotate(event);
if (attributes != null) {
attributes.forEach(span::tag);
}
}
}
public long getActiveTraceCount() {
return activeTraceCount.get();
}
public long getCompletedTraceCount() {
return completedTraceCount.get();
}
public double getAverageTraceLatency() {
long completed = completedTraceCount.get();
return completed > 0 ? (double) totalLatency.get() / completed : 0.0;
}
public double getErrorRate() {
long completed = completedTraceCount.get();
return completed > 0 ? (double) errorCount.get() / completed : 0.0;
}
private String getServiceVersion() {
return getClass().getPackage().getImplementationVersion();
}
private String getEnvironment() {
return System.getProperty("spring.profiles.active", "development");
}
}
// Tracing Interceptor for REST calls
@Component
public class TracingInterceptor implements ClientHttpRequestInterceptor {
private final Tracer tracer;
public TracingInterceptor(Tracer tracer) {
this.tracer = tracer;
}
@Override
public ClientHttpResponse intercept(
HttpRequest request,
byte[] body,
ClientHttpRequestExecution execution) throws IOException {
Span span = tracer.nextSpan()
.name("http-client-request")
.tag("http.method", request.getMethod().name())
.tag("http.url", request.getURI().toString())
.tag("component", "http-client")
.start();
try (Tracer.SpanInScope ws = tracer.withSpanInScope(span)) {
// Inject tracing headers
TraceContext.Injector<HttpHeaders> injector =
tracing().propagation().injector(HttpHeaders::set);
injector.inject(span.context(), request.getHeaders());
ClientHttpResponse response = execution.execute(request, body);
span.tag("http.status_code", String.valueOf(response.getStatusCode().value()));
if (!response.getStatusCode().is2xxSuccessful()) {
span.tag("error", true);
span.tag("error.status", response.getStatusCode().toString());
}
return response;
} catch (Exception e) {
span.tag("error", true);
span.tag("error.object", e.getClass().getSimpleName());
span.tag("error.message", e.getMessage());
throw e;
} finally {
span.end();
}
}
}
Best Practices
1. Trace Design and Instrumentation
- Use meaningful span names that describe the operation being performed
- Add relevant tags and metadata to spans for better searchability and context
- Implement consistent naming conventions across all services
- Use span events to mark important milestones within operations
- Implement proper parent-child relationships for nested operations
2. Context Propagation and Correlation
- Ensure trace context is properly propagated across all service boundaries
- Use automatic instrumentation where available to reduce manual effort
- Implement custom propagation for non-standard protocols and communications
- Maintain trace context through asynchronous operations and message queues
- Use baggage sparingly to avoid performance impact
3. Sampling and Performance
- Implement intelligent sampling strategies to balance observability and performance
- Use adaptive sampling based on request characteristics and system load
- Monitor the performance impact of tracing on application throughput
- Implement efficient trace collection and transmission mechanisms
- Use asynchronous reporting to minimize application latency impact
4. Error Handling and Debugging
- Capture comprehensive error information in trace spans
- Include stack traces and error context for debugging purposes
- Implement trace-based alerting for critical errors and performance issues
- Use distributed tracing for root cause analysis of complex failures
- Correlate traces with logs and metrics for comprehensive debugging
5. Storage and Analysis
- Implement efficient trace storage strategies with appropriate retention policies
- Use trace aggregation and analysis tools for performance insights
- Create dashboards and visualizations for trace data analysis
- Implement trace search and filtering capabilities for operational use
- Use trace data for service dependency mapping and architecture analysis
6. Security and Compliance
- Implement proper security for trace data collection and storage
- Avoid including sensitive information in trace spans and tags
- Implement access controls for trace data viewing and analysis
- Ensure compliance with data protection regulations for trace data
- Use encryption for trace data transmission and storage
Distributed Tracing is essential for maintaining observability and operational excellence in complex distributed enterprise integration architectures, providing comprehensive insights into system behavior, performance characteristics, and failure patterns across all system components and boundaries.
← Back to All Patterns