Deploying CloudQuery using Kubernetes CronJobs
In this tutorial we will set up a Kubernetes CronJob to run CloudQuery Sync on a regular schedule. Due to its flexibility and standardization across cloud providers, Kubernetes is commonly used by DevOps & Platform Engineers when deploying workloads and microservices.
Prerequisites
- A CloudQuery API key. More information on generating an API Key can be found here
- A Kubernetes Cluster
- A Postgres Database server
- Credentials (with Read privileges) for a supported API - in the example we'll use a DigitalOcean account
Step 1: Create a Kubernetes Secret for your API Keys
To keep our credentials separate from our manifests, we need to create a secret that we'll use to supply our environment variables We can do this in a single command:
kubectl create secret generic cloudquery-secret \
--from-literal=CLOUDQUERY_API_KEY=<your_cloudquery_api_key> \
--from-literal=DIGITALOCEAN_TOKEN=<your_token> \
--from-literal=SPACES_ACCESS_KEY_ID=<your_access_key> \
--from-literal=SPACES_SECRET_ACCESS_KEY=<your_secret_key> \
--from-literal=PG_CONNECTION_STR=<your_postgres_connection_string>
Step 2: Create the CronJob Manifest
A CronJob is a Kubernetes object that allows you to run a job on a schedule. It's similar to a Kubernetes Workload, except that it is intended for Jobs (short-lived containers) that conduct a task and then shutdown, instead of Pods (long-lived containers intended for services).
The CronJob manifest has two key elements, the jobTemplate
, and the schedule
.
apiVersion: batch/v1
kind: CronJob
metadata:
name: cloudquery
labels:
app: cloudquery
spec:
schedule: "0 0 * * *"
jobTemplate: {}
The Schedule expression follows the same format as crontab; that is a space separated list as follows minute hour day(of month) month day(of week)
.
Which in the above example would be midnight (i.e. 00:00) every day.
To learn more about cron schedule expressions, check out https://crontab.guru/ (opens in a new tab) or the Kubernetes documentation https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/ (opens in a new tab)
The jobTemplate
describes what to run, how to run it, and any volumes that it needs.
As this follows the same structure as a Kubernetes Workload, we won't go too deep into the details here.
apiVersion: batch/v1
kind: CronJob
metadata:
name: cloudquery
labels:
app: cloudquery
spec:
schedule: "0 0 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: cloudquery
image: ghcr.io/cloudquery/cloudquery:latest
imagePullPolicy: IfNotPresent
args: ["sync", "/config/config.yml", "--log-console", "--log-format", "json"]
envFrom:
- secretRef:
name: cloudquery-secret
volumeMounts:
- name: config
mountPath: "/config"
readOnly: true
restartPolicy: Never
volumes:
- name: config
configMap:
name: cloudquery-config
items:
- key: "config"
path: "config.yml"
Looking at this completed manifest, there are three key elements to pay attention to: args
, envFrom
, volumes
In the args
, you'll see the arguments to be passed to CloudQuery, in envFrom
you're instructing Kubernetes to use the secret you created earlier, and in volumes
you're telling Kubernetes where to find the configuration file.
Step 3: Defining the ConfigMap to hold the CloudQuery Config
When using Kubernetes configuration files are commonly stored on a special type of object called a ConfigMap. ConfigMaps enable you to define collections of text files that can be mounted as directories by Pods and Jobs.
A ConfigMap definition is relatively simple, containing a data
object where files are defined as key-value pairs.
apiVersion: v1
kind: ConfigMap
metadata:
name: cloudquery-config
data: {}
In this case, you want to define a ConfigMap to store a CloudQuery configuration file, that defines a Source integration and a Destination integration.
apiVersion: v1
kind: ConfigMap
metadata:
name: cloudquery-config
data:
config: |
kind: source
spec:
# Source spec section
name: digitalocean
path: cloudquery/digitalocean
registry: cloudquery
version: "v6.5.1"
tables:
- "digitalocean_droplets"
- "digitalocean_databases"
- "digitalocean_accounts"
- "digitalocean_storage_volumes"
- "digitalocean_floating_ips"
- "digitalocean_firewalls"
- "digitalocean_load_balancers"
- "digitalocean_billing_history"
destinations: ["postgresql"]
---
kind: destination
spec:
name: "postgresql"
path: "cloudquery/postgresql"
registry: "cloudquery"
version: "v8.6.5"
spec:
connection_string: ${PG_CONNECTION_STR}
In this example we're using the DigitalOcean source integration and the Postgres destination integration. But you can find out more about building configuration files and integrations here.
Step 4: Apply the manifest
Apply the configuration from a terminal using kubectl apply
kubectl apply -f cronjob.yaml -f configmap.yaml
Step 5: Query the data
You can manually trigger the cronjob to run early using:
kubectl create job --from=cronjob/<name of cronjob> <name of job>
After which you can query your SQL database to see the results.
Summary
In this tutorial, we have seen how to set up CloudQuery on Kubernetes and sync data to a PostgreSQL database. If you have any questions, check out the video above or join our Community (opens in a new tab) to chat with us!