Skip to main content

Configuration

Setup#

Make sure you read our Getting Started section first.

Installation#

CloudQuery History needs a PostgreSQL (version >= 11) instance with TimescaleDB extension (version >= 2.6) installed or a TimescaleDB instance (version >= 12). You can either spawn a local one (usually good for development and local testing) or connect to an existing one. For more install options of TimescaleDB see their guide.

For local, you can use the following docker command:

docker run -d --name localTimeScaleDB -p 5432:5432 -e POSTGRES_PASSWORD=pass timescale/timescaledb:latest-pg12

Configuration#

DSN Format#

Using TimescaleDB, the type parameter in connection block of config.hcl should be set to tsdb.

cloudquery {    connection {        type = "tsdb"        username = "user"        password = "pass"        host = "localhost"        port = 5432        database = "localTimeScaleDB"        sslmode = "disable"    }}

Alternatively if using the URI DSN format, you can start it with tsdb://, as such:

cloudquery {    connection {        dsn = "tsdb://user:[email protected]:5432/localTimeScaleDB?sslmode=disable"    }}

This way CloudQuery knows you're pointing to a TimescaleDB instance in the connection block.

History Settings#

The history block in our config.hcl allows us to configure our history preferences. Currently we have the following options:

  • Retention [Optional]: defines the retention policy of your data and how long it should exist in days. The default is 7 days.
  • Interval [Optional]: Defines how history chunks are split by time, defaults to one chunk per 24 hours. See TimescaleDB docs for more info on this behavior.
  • Truncation [Optional]: truncates fetch time by hour, for example if we fetch with TimeTruncation = 1 at 11:25 the fetch date will truncate to 11:00, defaults to 24 hours, which means one set of fetch data per day. Data fetched on the same truncation date will be replaced.
cloudquery {  # history block configuration makes cloudquery run in history mode  history {    // Save data retention for 7 days    retention = 7    // Truncate our fetch by 6 hours per fetch    truncation = 6    // Tell Timescale to split our chunks in a daily interval (24 hours)    interval = 24  }  # ... continuation of cloudquery block (connection, providers, etc')}

Querying#

CloudQuery will create a new schema called history which holds full historical data for each table. In our public schema you will have views that fetch the latest fetch data of each table. This allows the previous queries before enabling history mode to work out-of-the-box.


-- Will select all historical fetches of aws_iam_usersSELECT * FROM history.aws_iam_users
-- Will select only the latest fetch of aws_iam_usersSELECT * from aws_iam_users

Examples#

Select all S3 buckets that were fetched yesterday and don't exist today, i.e find deleted buckets.

SELECT * FROM history.aws_s3_buckets WHERE cq_fetch_date BETWEEN now() - interval '2 day' and now() - interval '1 day'EXCEPTSELECT * FROM aws_s3_buckets