CLIIntegrationsTransformers

Transformer Integrations

Transformer integrations allow you to modify, filter, and transform data as it flows from source integrations to destination integrations. They provide powerful data manipulation capabilities without requiring custom code.

How Transformer Integrations Work

  1. Data Interception: Receive data from source integrations
  2. Transformation: Apply configured transformations to the data
  3. Data Delivery: Send transformed data to destination integrations

Available Transformer Integrations

For a complete list of available transformer integrations with detailed documentation, visit the CloudQuery Hub.

Basic Transformations

  • Basic: Rename tables, add prefixes, modify column names
  • Filter: Filter rows based on conditions
  • Column: Add, remove, or modify columns

Advanced Transformations

  • Data Quality: Validate and clean data
  • Aggregation: Summarize and aggregate data
  • Custom Logic: Apply custom business rules

Configuration

Transformer integrations are configured in your CloudQuery configuration file. Each transformer requires:

  • Name: Unique identifier for the transformer
  • Path: Plugin path (e.g., cloudquery/basic)
  • Version: Plugin version to use
  • Transformations: List of transformations to apply

Example configuration:

kind: transformer
spec:
  name: basic
  path: cloudquery/basic
  registry: cloudquery
  version: "VERSION_TRANSFORMER_BASIC"
  spec:
    transformations:
      - kind: change_table_names
        tables: ["*"]
        new_table_name_template: "cq_{{.OldName}}"

Transformer Spec Reference

This goes through all the available options for the transformer integration spec object.

name

(string, required)

Name of the integration. If you have multiple transformer integrations, this must be unique.

The name field may be used to uniquely identify a particular transformer configuration. For example, if you have two configs for the basic integration for transforming a source table differently in each of two different destination databases, one may be named basic-1 and the other basic-2. In this case, the path option below must be used to specify the download path for the integration.

registry

(string, optional, default: cloudquery, available: github, cloudquery, local, grpc, docker)

  • cloudquery: CloudQuery will look for and download the integration from the official CloudQuery registry, and then execute it.
  • local: CloudQuery will execute the integration from a local path.
  • grpc: mostly useful in debug mode when integration is already running in a different terminal, CloudQuery will connect to the gRPC integration server directly without spawning the process.

path

(string, required)

Configures how to retrieve the integration. The contents depend on the value of registry (github by default).

  • For integrations hosted on GitHub, path should be of the form "<org>/<repository>". For official integrations, should be cloudquery/<integration-name>.
  • For integrations that are located in the local filesystem, path should a filesystem path to the integration binary.
  • To connect to a running integration via grpc (mostly useful for debugging), path should be the host-port of the integration (e.g. localhost:7777).

version

(string, required)

version must be a valid SemVer, e.g. vMajor.Minor.Patch. You can find all official integration versions under our GitHub releases page, and for community integrations you can find it in the relevant community repository.

spec

(object, optional)

Plugin specific configurations. Visit transformers documentation for more information.

Common Use Cases

Table Naming

Rename tables to follow your organization’s naming conventions:

transformations:
  - kind: change_table_names
    tables: ["aws_*"]
    new_table_name_template: "cloud_aws_{{.OldName}}"

Column Modifications

Add prefixes to column names or modify data types:

transformations:
  - kind: change_column_names
    tables: ["aws_s3_buckets"]
    columns:
      - old_name: "name"
        new_name: "bucket_name"

Data Filtering

Filter out sensitive or unnecessary data:

transformations:
  - kind: filter_rows
    tables: ["aws_s3_buckets"]
    conditions:
      - column: "name"
        operator: "not_contains"
        value: "sensitive"

Performance Considerations

  • Transformation Order: Order matters when applying multiple transformations
  • Memory Usage: Complex transformations may increase memory usage
  • Processing Time: Transformations add processing overhead to syncs
  • Batch Size: Consider adjusting batch sizes for transformed data

Creating Custom Transformers

Need a transformation that doesn’t exist? Learn how to create your own transformer integration.