Skip to Content
CLIManaging CloudQueryDeploymentsKubernetes CronJob 🎥

Deploying CloudQuery using Kubernetes CronJobs

In this tutorial we will set up a Kubernetes CronJob to run CloudQuery Sync on a regular schedule. Due to its flexibility and standardization across cloud providers, Kubernetes is commonly used by DevOps & Platform Engineers when deploying workloads and microservices.

Prerequisites

  • A CloudQuery API key. More information on generating an API Key can be found here
  • A Kubernetes Cluster
  • A Postgres Database server
  • Credentials (with Read privileges) for a supported API - in the example we’ll use a DigitalOcean account

Step 1: Create a Kubernetes Secret for your API Keys

To keep our credentials separate from our manifests, we need to create a secret that we’ll use to supply our environment variables We can do this in a single command:

kubectl create secret generic cloudquery-secret \ --from-literal=CLOUDQUERY_API_KEY=<your_cloudquery_api_key> \ --from-literal=DIGITALOCEAN_TOKEN=<your_token> \ --from-literal=SPACES_ACCESS_KEY_ID=<your_access_key> \ --from-literal=SPACES_SECRET_ACCESS_KEY=<your_secret_key> \ --from-literal=PG_CONNECTION_STR=<your_postgres_connection_string>

Step 2: Create the CronJob Manifest

A CronJob is a Kubernetes object that allows you to run a job on a schedule. It’s similar to a Kubernetes Workload, except that it is intended for Jobs (short-lived containers) that conduct a task and then shutdown, instead of Pods (long-lived containers intended for services).

The CronJob manifest has two key elements, the jobTemplate, and the schedule.

apiVersion: batch/v1 kind: CronJob metadata: name: cloudquery labels: app: cloudquery spec: schedule: "0 0 * * *" jobTemplate: {}

The Schedule expression follows the same format as crontab; that is a space separated list as follows minute hour day(of month) month day(of week). Which in the above example would be midnight (i.e. 00:00) every day.

To learn more about cron schedule expressions, check out https://crontab.guru/ or the Kubernetes documentation https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/

The jobTemplate describes what to run, how to run it, and any volumes that it needs.

As this follows the same structure as a Kubernetes Workload, we won’t go too deep into the details here.

apiVersion: batch/v1 kind: CronJob metadata: name: cloudquery labels: app: cloudquery spec: schedule: "0 0 * * *" jobTemplate: spec: template: spec: containers: - name: cloudquery image: ghcr.io/cloudquery/cloudquery:latest imagePullPolicy: IfNotPresent args: ["sync", "/config/config.yml", "--log-console", "--log-format", "json"] envFrom: - secretRef: name: cloudquery-secret volumeMounts: - name: config mountPath: "/config" readOnly: true restartPolicy: Never volumes: - name: config configMap: name: cloudquery-config items: - key: "config" path: "config.yml"

Looking at this completed manifest, there are three key elements to pay attention to: args, envFrom, volumes

In the args, you’ll see the arguments to be passed to CloudQuery, in envFrom you’re instructing Kubernetes to use the secret you created earlier, and in volumes you’re telling Kubernetes where to find the configuration file.

Step 3: Defining the ConfigMap to hold the CloudQuery Config

When using Kubernetes configuration files are commonly stored on a special type of object called a ConfigMap. ConfigMaps enable you to define collections of text files that can be mounted as directories by Pods and Jobs.

A ConfigMap definition is relatively simple, containing a data object where files are defined as key-value pairs.

apiVersion: v1 kind: ConfigMap metadata: name: cloudquery-config data: {}

In this case, you want to define a ConfigMap to store a CloudQuery configuration file, that defines a Source integration and a Destination integration.

apiVersion: v1 kind: ConfigMap metadata: name: cloudquery-config data: config: | kind: source spec: # Source spec section name: digitalocean path: cloudquery/digitalocean registry: cloudquery version: "v6.8.4" tables: - "digitalocean_droplets" - "digitalocean_databases" - "digitalocean_accounts" - "digitalocean_storage_volumes" - "digitalocean_floating_ips" - "digitalocean_firewalls" - "digitalocean_load_balancers" - "digitalocean_billing_history" destinations: ["postgresql"] --- kind: destination spec: name: "postgresql" path: "cloudquery/postgresql" registry: "cloudquery" version: "v8.14.3" spec: connection_string: ${PG_CONNECTION_STR}

In this example we’re using the DigitalOcean source integration and the Postgres destination integration. But you can find out more about building configuration files and integrations here.

Step 4: Apply the manifest

Apply the configuration from a terminal using kubectl apply

kubectl apply -f cronjob.yaml -f configmap.yaml

Step 5: Query the data

You can manually trigger the cronjob to run early using:

kubectl create job --from=cronjob/<name of cronjob> <name of job>

After which you can query your SQL database to see the results.

Summary

In this tutorial, we have seen how to set up CloudQuery on Kubernetes and sync data to a PostgreSQL database. If you have any questions, check out the video above or join our Community to chat with us!

Last updated on