Documentation
Advanced Topics
Monitoring CloudQuery
Overview

Overview

Monitoring CloudQuery can be done in a number of ways:

Logging

CloudQuery utilizes structured logging (in plain and JSON formats) which can be analyzed by local tools such as jq, grep and remote aggregations tools like loki, datadog or any other popular log aggregation that supports structured logging.

OpenTelemetry (Preview)

ELT workloads can be long running and sometimes it is necessary to better understand what calls are taking the most time, to optimize those on the integration side, ignore them or split them to a different workload. CloudQuery supports OpenTelemetry (opens in a new tab) traces, metrics and logs out of the box and can be enabled easily via configuration.

To collect OpenTelemetry data you need a backend (opens in a new tab) that supports the OpenTelemetry protocol. For example you can use Jaeger (opens in a new tab) to visualize and analyze traces.

To start Jaeger locally you can use Docker:

docker run -d \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/all-in-one:1.58

and then specify in the source spec the following:

kind: source
spec:
  name: "aws"
  path: "cloudquery/aws"
  registry: "cloudquery"
  version: "v30.1.0"
  tables: ["aws_s3_buckets"]
  destinations: ["postgresql"]
  otel_endpoint: "localhost:4318"
  otel_endpoint_insecure: true # this is only in development when running local jaeger
  spec:

After that you can open http://localhost:16686 (opens in a new tab) and see the traces:

jaeger

In production, it is common to use an OpenTelemetry collector (opens in a new tab) that runs locally or as a gateway to batch the traces and forward it to the final backend. This helps with performance, fault-tolerance and decoupling of the backend in case the tracing backend changes.