Transformer Integrations
Transformer integrations allow you to modify, filter, and transform data as it flows from source integrations to destination integrations. They provide powerful data manipulation capabilities without requiring custom code.
How Transformer Integrations Work
- Data Interception: Receive data from source integrations
- Transformation: Apply configured transformations to the data
- Data Delivery: Send transformed data to destination integrations
Available Transformer Integrations
For a complete list of available transformer integrations with detailed documentation, visit the CloudQuery Hub.
Popular Transformer Categories
Basic Transformations
- Basic: Rename tables, add prefixes, modify column names
- Filter: Filter rows based on conditions
- Column: Add, remove, or modify columns
Advanced Transformations
- Data Quality: Validate and clean data
- Aggregation: Summarize and aggregate data
- Custom Logic: Apply custom business rules
Configuration
Transformer integrations are configured in your CloudQuery configuration file. Each transformer requires:
- Name: Unique identifier for the transformer
- Path: Plugin path (e.g.,
cloudquery/basic
) - Version: Plugin version to use
- Transformations: List of transformations to apply
Example configuration:
kind: transformer
spec:
name: basic
path: cloudquery/basic
registry: cloudquery
version: "VERSION_TRANSFORMER_BASIC"
spec:
transformations:
- kind: change_table_names
tables: ["*"]
new_table_name_template: "cq_{{.OldName}}"
Transformer Spec Reference
This goes through all the available options for the transformer integration spec
object.
name
(string
, required)
Name of the integration. If you have multiple transformer integrations, this must be unique.
The name field may be used to uniquely identify a particular transformer configuration. For example, if you have two configs for the basic integration for transforming a source table differently in each of two different destination databases, one may be named basic-1
and the other basic-2
. In this case, the path
option below must be used to specify the download path for the integration.
registry
(string
, optional, default: cloudquery
, available: github
, cloudquery
, local
, grpc
, docker
)
cloudquery
: CloudQuery will look for and download the integration from the official CloudQuery registry, and then execute it.local
: CloudQuery will execute the integration from a local path.grpc
: mostly useful in debug mode when integration is already running in a different terminal, CloudQuery will connect to the gRPC integration server directly without spawning the process.
path
(string
, required)
Configures how to retrieve the integration. The contents depend on the value of registry
(github
by default).
- For integrations hosted on GitHub,
path
should be of the form"<org>/<repository>"
. For official integrations, should becloudquery/<integration-name>
. - For integrations that are located in the local filesystem,
path
should a filesystem path to the integration binary. - To connect to a running integration via
grpc
(mostly useful for debugging),path
should be the host-port of the integration (e.g.localhost:7777
).
version
(string
, required)
version
must be a valid SemVer, e.g. vMajor.Minor.Patch
. You can find all official integration versions under our GitHub releases page, and for community integrations you can find it in the relevant community repository.
spec
(object
, optional)
Plugin specific configurations. Visit transformers documentation for more information.
Common Use Cases
Table Naming
Rename tables to follow your organization’s naming conventions:
transformations:
- kind: change_table_names
tables: ["aws_*"]
new_table_name_template: "cloud_aws_{{.OldName}}"
Column Modifications
Add prefixes to column names or modify data types:
transformations:
- kind: change_column_names
tables: ["aws_s3_buckets"]
columns:
- old_name: "name"
new_name: "bucket_name"
Data Filtering
Filter out sensitive or unnecessary data:
transformations:
- kind: filter_rows
tables: ["aws_s3_buckets"]
conditions:
- column: "name"
operator: "not_contains"
value: "sensitive"
Performance Considerations
- Transformation Order: Order matters when applying multiple transformations
- Memory Usage: Complex transformations may increase memory usage
- Processing Time: Transformations add processing overhead to syncs
- Batch Size: Consider adjusting batch sizes for transformed data
Creating Custom Transformers
Need a transformation that doesn’t exist? Learn how to create your own transformer integration.