Search Through Your Cloud Infrastructure with CloudQuery and Elasticsearch
February 10, 2023
Have you ever wished you could enter the ID of an EC2 instance into the search bar and get a list of all the related resources? Or search by tag across all your accounts and regions? Or search for resources by keyword? With the new CloudQuery Elasticsearch destination, you can do all this, and more.
Using the Elasticsearch destination, you can search through your cloud infrastructure, create visualizations and dashboards, and generally explore your infrastructure data in a way that wasn't possible before. In this article we will show you how to get started and a few examples of the things you can do.
Elasticsearch is a popular open source search engine built on top of Lucene. It is a distributed, RESTful search and analytics engine that is capable of storing and searching terabytes of data. As datasets grow over time, managing a SQL database can become more and more difficult, but Elasticsearch offers a great alternative here. It is schemaless and has built-in support for scaling, sharding and replication. Its full text search capabilities also make it a great fit for searching through unstructured data, especially JSON columns.
First, you will need to have CloudQuery installed. If you haven't already, you can do so by following the instructions in the Quickstart guide.
You will also need a running Elasticsearch instance. If you don't have one, you can either create one locally using Docker, or use the Elastic Cloud (opens in a new tab) service, which offers a free trial. This
docker-compose file (opens in a new tab) will get you started with local Elasticsearch and Kibana services (download the file, then run
docker compose up). Once it's up, you should be able to visit
localhost:5601 and see the Kibana interface.
Now that you have CloudQuery and Elasticsearch running, you can configure the Elasticsearch destination. To do this, follow the instructions in the Elasticsearch destination plugin docs (opens in a new tab) to create a configuration file called
Assuming you want to browse AWS infrastructure data, you also need to create a configuration file for the AWS provider. Again, follow the instructions in the AWS Source guide (opens in a new tab) to create a configuration file called
Finally, you can run the CloudQuery command to sync your AWS infrastructure data to Elasticsearch:
cloudquery sync aws.yml elasticsearch.yml
When the sync completes, head over to the "Stack Management" section of Kibana and click on "Data Views" under "Kibana". Click "Create data view" and enter
aws* as the index pattern. Select
_cq_sync_time as the time field (you can change this later, if you want). Click "Save data view to Kibana" to finish.
This article is using Elasticsearch 8.6.0. In previous versions of Elasticsearch, "Data Views" were called "Index Patterns".
Now that you have your data in Elasticsearch, you can start searching through it. To do this, head over to the "Discover" section of Kibana and enter a search query. For example, we can search by IP address:
Or by tag:
Or we can use the native JSON object support to search through nested JSON objects. This query will find EC2 instances that have their monitoring state set to "disabled":
We can also build visualizations and dashboards using the data in Elasticsearch. For example, we can create a pie chart that shows the distribution of EC2 instance types:
(Note: To create this particular visualization with the current version of the AWS and Elasticsearch plugins, we had to change the type of the
instance_type field from
keyword in the "Index Patterns" section of Kibana. This can be done by first doing a migration, which creates the index templates, editing the template, and then running a new sync.)
One of the great things about Elasticsearch is that it is designed to be used with time series data, and therefore a final thing we wanted to note in this post is the ability to view historical snapshots of your infrastructure from Elasticsearch. The CloudQuery Elasticsearch destination currently supports three write modes:
overwrite-delete-stale (the default) and
append. You can read more about these in the plugin documentation (opens in a new tab), but in short, when using
append mode, CloudQuery will never delete data, so you can build a historical view of your infrastructure over time. For example, by running a sync in
append mode every day, you can build a visualization that shows the number of EC2 instances over time, or more generally track how (or whether!) your organization's security posture is improving over time.
In this article we showed you how to get started with the new CloudQuery Elasticsearch destination, now available in Preview. We also showed you a few examples of the things you can do with the data in Elasticsearch in combination with the AWS source plugin, including searching through your cloud infrastructure, building visualizations and dashboards, and building a historical view of your infrastructure. But don't think that it's limited to AWS! The Elasticsearch destination works with all CloudQuery source plugins (opens in a new tab), so you can search through your Azure, GCP, Kubernetes, or even Marketing and FinOps data as well. We hope you find this new destination useful, and we look forward to seeing what you build with it! If you have any feedback on the Elasticsearch destination, or any other part of CloudQuery, please let us know on Discord (opens in a new tab) or by opening an issue on GitHub (opens in a new tab). 🚀