Managing Incremental Tables

Incremental tables are tables that fetch only the data that changed since the last sync. Tables that support this mode are marked as “incremental” in integration table documentation. Learn more about sync modes and schema migrations.

When a sync runs on an incremental table, the table will first fetch the last known cursor state from the state backend, then resume syncing from that point. Incremental tables guarantee at-least-once delivery, which means that there should never be gaps in the data as a result of the cursor state being used, but there may be duplicates. If the destination uses overwrite or overwrite-delete-stale write mode, these duplicates will be handled automatically. But if the destination uses append mode, care will need to be taken to either exclude the duplicates at query time or to run a deduplication process on the destination.

In order to resume from a previous position, incremental tables store some state, known as the cursor. When using the CloudQuery CLI, the cursor state is stored in a backend. Any destination integration can be used as a backend. This state destination is often the same destination data is being written to for the sync, but it doesn’t need to be. In some destinations that support only append mode, each state update will generate a new row (with a new version) in the state table and may need to be cleaned up periodically. Destinations supporting overwrite mode don’t have this issue.

Example Configuration

Add a backend_options property to the source configuration to enable the use of a state backend for incremental tables. It accepts the following properties:

table_name (string, required): The name of the table to store the cursor state in.
connection (string, required): The connection string to the destination integration. Note that this is gRPC connection string, not the connection string to the destination database itself. See below for more details.

For example, the following configuration shows how to enable the use of a PostgreSQL destination as the state backend for the AWS integration:

kind: source
spec:
  name: aws
  path: cloudquery/aws
  registry: cloudquery
  version: "v33.8.1"
  tables: ["aws_cloudtrail_events"]
  destinations: ["postgresql"]
  backend_options:
    table_name: "cq_state_aws"
    connection: "@@plugins.postgresql.connection"
  spec:
    # AWS integration specific configuration
    # ...
---
kind: destination
spec:
  name: "postgresql"
  path: "cloudquery/postgresql"
  registry: "cloudquery"
  version: "v8.14.1"
  write_mode: "overwrite-delete-stale"
  spec:
    connection_string: "${CONNECTION_STRING}"

A special @@plugins.<plugin-name>.<property-name> syntax is used to reference a property from another integration. In this case, the connection property from the PostgreSQL integration is being referenced. Note that here the connection refers to the gRPC connection to the destination integration, automatically inferred after the destination integration is started, not the connection string to the destination database itself.

Sometimes it may be useful to use a different state backend than the destination. For example, if the destination is a data warehouse that only supports append mode, a separate database can be used as the state backend. For example, the following configuration writes AWS data to BigQuery, but stores state in a local SQLite database:

kind: source
spec:
  name: aws
  path: cloudquery/aws
  registry: cloudquery
  version: "v33.8.1"
  tables: ["aws_cloudtrail_events"]
  destinations: ["bigquery"]
  backend_options:
    table_name: "cq_state_aws"
    connection: "@@plugins.sqlite.connection"
  spec:
    # AWS integration specific configuration
    # ...
---
kind: destination
spec:
  name: bigquery
  path: cloudquery/bigquery
  registry: cloudquery
  version: "v4.7.2"
  write_mode: "append"
  spec:
    project_id: ${PROJECT_ID}
    dataset_id: ${DATASET_ID}
---
kind: destination
spec:
  name: sqlite
  path: cloudquery/sqlite
  registry: cloudquery
  version: "v2.14.1"
  spec:
    connection_string: ./db.sqlite

Performance Tuning Managing Versions