Understanding Platform Views

Learn how the normalization layer of CloudQuery Platform works underneath the hood.

Cloud Assets View

When a sync runs on CloudQuery Platform, the sync is configured (through use of a transformer) to prefix all output tables with `raw_`. As the sync progresses, a post-load transformer detects when tables have completed syncing, and starts copying the latest data to two tables: cloud_assets_historicaland cloud_assets_incremental.

cloud_assets_historicalstores data for all non-incremental tables. It uses a ClickHouse ORDER BY clause that allows stale records to be removed after a period of time. It does this through incorporation of _cq_sync_group_idinto the clause.

cloud_assets_incremental stores dat for all incremental tables. As incremental tables can only add new data, the ORDER BY clause does not include _cq_sync_group_id.

Finally a view, cloud_assets, creates a single unified view over these two underlying tables, creating a cross-cloud asset inventory. The view defines how to identify the latest snapshot, and ensures de-duplication through the addition of a FINAL clause to all queries. The view is updated only once a table from a _cq_sync_group_idis complete, guaranteeing consistency on a per-table level.

Diagram illustrating how the cloud assets view is built.

Table Views

Table views are less complicated. As alluded to above, syncs first write to tables prefixed with raw. A view, with the original table name, is created for every table. This view is defined in a way that it always points to the latest complete snapshot of the data. This ensures data stay consistent during syncs, and switches atomically, while still allowing records to be appended efficiently into ClickHouse.

Diagram illustrating how table views are defined.

Last updated

Was this helpful?