Docker
It is possible to use CloudQuery in an isolated container. You can pull the relevant image via docker pull ghcr.io/cloudquery/cloudquery:latest
.
Configuration
Create a Config File on the Host Machine
CloudQuery uses YAML files as the primary means of configuration. A simple example config.yml
file would look like:
kind: source
spec:
name: aws
path: cloudquery/aws
registry: cloudquery
version: "v31.1.0"
tables: ["aws_ec2_*"]
destinations: ["postgresql"]
---
kind: destination
spec:
name: postgresql
path: cloudquery/postgresql
registry: cloudquery
version: "v8.7.7"
spec:
connection_string: "postgres://postgres:pass@host.docker.internal:5432/postgres?sslmode=disable"
Run the Container
Downloading integrations requires users to be authenticated, normally this means running cloudquery login
but that is not doable in a CI environment or inside of a docker build process. The recommended way to handle this is to use an API key. More information on generating an API Key can be found here
For the CloudQuery docker container to use this configuration file you will need to mount the volume to the container like so:
docker run --pull=always \
-v <ABSOLUTE_PATH_TO_CONFIG_FILE>:/config.yml \
# set any env variable with -e <ENV_VAR_NAME>=<ENV_VAR_VALUE>
-e CLOUDQUERY_API_KEY=<REPLACE_WITH_CQ_API_KEY>
ghcr.io/cloudquery/cloudquery:latest \
sync /config.yml
As with running any cloudquery
command on your CLI you can override the configuration with the optional flags with the docker container. You will also need to make sure you load any ENV variables for source and destination integrations, such as your AWS_*
keys etc.
If you split the configuration between multiple files, you can mount the directory containing them, instead of just the config.yml
file.
If you are running Docker on an ARM Apple device and you see a segmentation fault when running the container like so qemu: uncaught target signal 11 (Segmentation fault) - core dumped
; please make sure you are running the latest Docker for Mac release.
Caching
Due to the way cloudquery
is designed it downloads all the components to interact with source and destination integrations. This means that with a docker container it runs the download step each time, as the local cache is lost between executions. To avoid this we recommend mounting a volume to cache the data and configuring cloudquery
to use this via the --cq-dir
optional flag. An example of this would be:
docker run --pull=always \
-v <ABSOLUTE_PATH_TO_CONFIG_FILE>:/config.yml \
-v <ABSOLUTE_PATH_TO_CACHE_DIRECTORY>:/cache/.cq \
# set any env variable with -e <ENV_VAR_NAME>=<ENV_VAR_VALUE>
ghcr.io/cloudquery/cloudquery:latest \
sync /config.yml --cq-dir /cache/.cq
Depending on your operating system, the built components maybe different between your local system and the container. To avoid the different please use a separate cache directory for the container than a local instance of cloudquery
.