APM & Distributed Tracing

Trace every request across your distributed systems.

Service-centric APM built from tracing data and RED metrics. Compare latency, throughput, error rate, and dependencies, then drill into the trace and logs behind any slowdown.

POST /api/checkout — KloudMate Traces KloudMate · Traces Trace · checkout POST /api/checkout Duration 1.84s Spans 27 Services 5 Errors 1 Span Duration frontend-proxy 1.84s checkout-api handle 1.79s redis GET cart 11ms inventory-api getStock 1.45s postgres SELECT items 1.18s payments-api charge 168ms kafka publish order 31ms

APM breaks down when teams can see an unhealthy service but can’t quickly reach the dependency, trace, or span that caused it.

KloudMate keeps service-level health, dependency analysis, trace detail, and request logs in one connected workflow so engineers can move from symptom to evidence without switching tools.

What teams can do with KloudMate APM

Ground service health in tracing data, then move from an outlier to the exact request path behind it.

Compare service health in one APM view

See requests, throughput, error rate, and P99, P95, and P50 latency for every instrumented service so regressions stand out quickly.

Monitor every API endpoint

Track rate, errors, and latency for each HTTP and RPC endpoint, then drill from a failing endpoint straight into its traces.

Search traces by span attribute

Filter traces by method, status, database, or any span attribute you tag yourself, and reuse saved queries to isolate a single request path.

Inspect dependencies with a live service map

Understand topology and traffic flow from tracing data, then inspect nodes and edges to find high-latency or erroring dependencies.

Pivot from APM to traces and logs

Open a service dashboard, inspect recent traces, and move into request logs without copying trace IDs between disconnected tools.

Instrument the way your stack needs

Start with eBPF-based observability, use the KloudMate Agent in Kubernetes, or connect manual OpenTelemetry instrumentation for deeper spans.

From service health to request evidence in one investigation path.

Start in APM Views when you need service health and dependency analysis. Move to Trace Explorer when you need request-level detail.

01

Spot the unhealthy service

Start in APM Views to compare request volume, error rate, and latency percentiles across services in the selected time range.

02

Narrow to the failing endpoint

Open API Monitoring to compare rate, errors, and latency for every HTTP and RPC endpoint on that service, and sort to the one that is actually failing.

03

Check the dependency graph

Use Service Map to see which node or edge is carrying the traffic path and whether any dependency is surfacing errors.

04

Drill into a representative trace

Move into Trace Explorer, choose the trace that matches the incident, and inspect span timing, attributes, and the request waterfall.

05

Read the request logs in context

Open the request logs linked to that trace to confirm the failure mode and gather the evidence needed for the next investigation step.

Zero-code instrumentation

Get traces without changing your application code

Pick what fits your stack. Both run alongside your services, no code changes required.

eBPF tracing

Kernel-level visibility. Zero SDK.

Captures service calls, network I/O, and system calls at the kernel level. No library to add, no application restart, no code changes. Works across any language or runtime on Linux.

  • No SDK to install or maintain
  • Any language: Go, Java, Python, Node, Ruby, PHP
  • Works on bare metal, VMs, and containers
KloudMate Agent · Kubernetes

Auto-instrument every workload in your cluster

Deploy the KloudMate Agent once as a DaemonSet. It auto-instruments every pod in the cluster and generates OpenTelemetry-compatible spans, no annotations, no sidecars, no per-service work.

  • Deploy once, covers every workload
  • Spans appear in APM and Trace Explorer immediately
  • Add manual OTel instrumentation for custom spans and attributes

Already using OpenTelemetry SDKs? Manual instrumentation works alongside both. Add custom spans and attributes for business logic the auto-instrumentation can't see.

Compare service health — KloudMate APM KloudMate · APM APM Views · services Compare service health Services 5 Requests 412/s Erroring 2 Service Env p99 latency Error rate inventory-api v8.4.0 · just deployed prod 1.62s 5.1% err checkout-api v2026.05.12 prod 1.45s 3.2% err payments-api v3.1.2 prod 180ms 0.2% err cart-api v1.9.0 prod 120ms 0.0% err notification v2.0.1 prod 95ms 0.1% err

Spot the slow or failing service

The Services overview puts requests, error rate, and latency percentiles for every instrumented service on one screen, so a slow or failing service stands out instead of hiding in an average.

  • Compare requests, throughput, and error rate across all services in the selected time range
  • Use p99, p95, and p50 latency to spot outliers instead of relying on averages alone
  • Correlate a performance shift with the service version that introduced it
RED metrics for every endpoint — KloudMate API Monitoring KloudMate · API Monitoring Endpoints · rate, errors, duration RED metrics for every endpoint Endpoints 38 Throughput 412/min Failing 2 Endpoint Protocol p95 Error rate POST /api/checkout checkout-api HTTP 1.45s 5xx · 3.2% GET /api/inventory/{id} inventory-api HTTP 1.20s 5xx · 1.1% GET /api/cart cart-api HTTP 160ms healthy cart.CartService/AddItem cart-api gRPC 110ms 0.2% POST /api/payments/charge payments-api HTTP 180ms healthy

See which endpoints are slow, failing, or busy

API Monitoring breaks each service down to its individual HTTP and RPC endpoints and reports rate, errors, and latency for every route, built entirely from the spans your services already send.

  • Sort endpoints by throughput, error rate, or p95 to surface the worst offenders first
  • Covers HTTP and RPC alike: gRPC and Connect endpoints get an error rate even without HTTP status codes
  • Open an endpoint for its request, error, and latency trends, then jump to its recent or failed traces
Checkout request path — KloudMate Service Map KloudMate · Service Map Topology · built from traces Checkout request path frontend-proxy HTTP checkout-api SERVICE cart-api SERVICE inventory-api SERVICE payments-api SERVICE postgres DATABASE 210/s 90/s 1.4s 180ms 1.2s 40ms

Inspect dependency flow with the live Service Map

Service Map is generated from tracing data, so topology stays tied to real traffic. Inspect a service node, inspect an edge, and follow the path that is carrying the incident.

  • See request rate and average latency on service nodes and dependency edges
  • Highlight erroring services and dependencies instead of reading a static architecture diagram
  • Open the source or target service page directly from the dependency you are investigating
Traces matching your query — KloudMate Trace Explorer KloudMate · Trace Explorer Trace search · span attributes Traces matching your query Matches 1,284 Errors 38 p99 1.7s Request Status Duration Spans POST /api/checkout trace 4f9c2a 500 1.84s 27 spans POST /api/checkout trace a90c44 200 1.51s 26 spans GET /api/cart trace 2d77f1 200 210ms 9 spans GET /api/inventory/{id} trace 7b3e10 200 180ms 8 spans GET /api/products trace 51bb20 200 96ms 6 spans

Search traces by any span attribute

When you need one specific request, Trace Explorer searches every trace by span attributes like http.status_code, db.statement, or your own tags, then opens the full waterfall behind it.

  • Filter on any span attribute, not just service or status, and save the queries you rerun
  • Open the request waterfall to read span timing, errors, and the slowest path
  • See the entry point, services involved, span count, and total duration for a trace in one place
Scoped to the checkout trace — KloudMate Logs KloudMate · Logs Request logs · trace 4f9c2a Scoped to the checkout trace service: checkout-api severity ≥ warn trace 4f9c2a 12:04:00.812 INFO frontend POST /api/checkout started 12:04:00.991 INFO checkout-api calling inventory-api.getStock 12:04:01.224 WARN inventory-api slow query: SELECT items > 1000ms 12:04:01.882 ERROR inventory-api timeout calling postgres after 1200ms 12:04:01.903 WARN checkout-api inventory degraded, using cache fallback 12:04:02.010 INFO payments-api charge authorized: $128.40

Move from a trace to request logs without losing context

When a slow request or error span needs more evidence, open the request logs already scoped to that trace. Filter by service, severity, or text and confirm the failure mode without copying IDs across tools.

  • Keep log evidence tied to the exact trace under investigation
  • Use service and severity filters to isolate noisy request streams quickly
  • Validate what happened with structured log metadata next to span and trace identifiers
KloudMate AI

Use KloudMate Assistant to shorten the path to the right trace

KloudMate Assistant works across metrics, logs, traces, profiles, and connected data sources. In APM workflows, it can help teams summarize a regression, surface likely bottlenecks, and point engineers toward the next useful trace or log search.

  • Summarize Explain a service-level regression before engineers read every chart
  • Highlight Call out the spans or dependencies most likely contributing to latency
  • Correlate Connect traces, logs, and related telemetry with less manual query work
Open the product demo
Checkout latency regression — KloudMate Auto-RCA Assistant · APM Checkout latency regression Q
Why did p99 latency on checkout jump after 12:00?
Assistant · likely cause
  • p99 rose from 240ms to 1.8s right after the inventory-api v8.4 deploy.
  • The added latency sits on postgres SELECT spans inside getStock.
  • Open the /checkout trace cluster and compare db.statement across versions.
Likely cause inventory-api v8.4.0 deployed 12:01 · slow DB spans Slowest span postgres SELECT items 1.18s · 64% of trace time Next step Open trace 4f9c2a filter spans by db.statement

Related Features

APM works best when traces, logs, alerts, and incident workflows stay connected.

Get started

From telemetry to root cause,
in one platform.

Connect your OpenTelemetry pipeline, AWS integrations, or eBPF agent. Distributed tracing, log management, alerting, and AI-assisted investigation: unified, with predictable pricing.