CloudQuery is an open-source data integration platform that allows you to export data from any source to any destination.
The CloudQuery GitHub plugin allows you to sync data from GitHub to any destination, including Firehose. It's free, open source, requires no account, and takes only minutes to get started.
Ready? Let's dive right in!
Step 1. Install the CloudQuery CLI
The CloudQuery CLI is a command-line tool that runs the sync. It supports MacOS, Linux and Windows.
Step 2. Configure the GitHub source plugin
Create a configuration file for the GitHub plugin and set up authentication.
Create a file called github.yaml and add the following contents:
Fine-tune this configuration to match your needs. For more information, see the GitHub Plugin ↗ page in the docs.
Step 3. Configure the Firehose destination plugin
Create a configuration file for the Firehose plugin and set up authentication.
Create a file called firehose.yaml and add the following contents:
Fine-tune this configuration to match your needs. For more information, see the Firehose Plugin ↗ page in the docs.
Step 4. Start the Sync
Run the following command in your terminal to start the sync
And away we go! 🚀 The sync will run until completion, fetching all selected tables from GitHub. Any errors will be logged to a file called cloudquery.log.
Now that you've seen the basics of syncing GitHub to Firehose, you should know that there's a lot more you can do. Check out the CloudQuery Documentation, Source Code and How-to Guides for more details.
The GitHub source plugin supports two authentication methods: Personal Access Token and App authentication. Which one you use is up to and the security requirements of your organization.
CloudQuery requires only read permissions (we will never make any changes to your GitHub account or organizations),
so, following the principle of least privilege, it's recommended to grant it read-only permissions to all the resources you wish to sync.
For App authentication, you need to create a GitHub App and install it on your organization. Follow this guide (opens in a new tab) and install the App into your organization(s). Give it all the permissions you need (read-only is recommended).
Every organization will have a unique installation ID. You can find it by going to the organization's settings page, and clicking on the "Installed GitHub Apps" tab. The installation ID is the number in the URL of the page.
Passing private_key as plaintext
You can use | to pass the multi-line private key as plaintext.
When referencing the private_key as a string from environment variables, you will need to replace all the new lines in your PEM file with \n otherwise the new line and indent will prevent CloudQuery from reading the variable correctly.
Shared configuration files.
SDK defaults to credentials file under .aws folder that is placed in the home folder on your computer.
SDK defaults to config file under .aws folder that is placed in the home folder on your computer.
If your application uses an ECS task definition or RunTask API operation, IAM role for tasks.
If your application is running on an Amazon EC2 instance, IAM role for Amazon EC2.
To configure CloudQuery to extract from GitHub, create a .yml file in your CloudQuery configuration directory.
The following configuration will extract all issues from the cloudquery/cloudquery repository:
kind:sourcespec:# Source spec sectionname:githubpath:cloudquery/githubregistry:cloudqueryversion:"v7.5.1"tables:["github_issues"]destinations:["firehose"]spec:access_token:<YOUR_ACCESS_TOKEN_HERE># Personal Access Token, required if not using App Authentication.## App Authentication (one per org):# app_auth:# - org: cloudquery# private_key: <PRIVATE_KEY> # Private key as a string# private_key_path: <PATH_TO_PRIVATE_KEY> # Path to private key file# app_id: <YOUR_APP_ID> # App ID, required for App Authentication.# installation_id: <ORG_INSTALLATION_ID> # Installation ID for this orgorgs:# Optional. List of organizations to sync fromrepos:["cloudquery/cloudquery"]# Optional. List of repositories to sync from## GitHub Enterprise# In order to enable GHE you have to provide two urls, the base url of the server and the upload url.# Quote from GitHub's client:# If the base URL does not have the suffix "/api/v3/", it will be added automatically. If the upload URL does not have the suffix "/api/uploads", it will be added automatically.# Another important thing is that by default, the GitHub Enterprise URL format should be http(s)://[hostname]/api/v3/ or you will always receive the 406 status code. The upload URL format should be http(s)://[hostname]/api/uploads/"# If you are not configuring against an enterprise server please omit the enterprise stanza bellowenterprise:base_url:"http(s)://[your-ghe-hostname]/api/v3/"upload_url:"http(s)://[your-ghe-hostname]/api/uploads/"# Optional parameters# concurrency: 1000 0# Optional. Number of concurrent requests to GitHub API. Default is 10000.# discovery_concurrency: 1 # Optional. Number of concurrent requests to GitHub API during discovery phase. Default 1.
You must specify either orgs or repos in the configuration. If a repository is specified in both orgs and repos, it will be extracted only once, and other repositories from that organization will be ignored.
You can define either private_key or private_key_path in the configuration, but not both.
It is recommended that you use environment variable expansion for the access token in production. For example, if the access token is stored in an environment variable called GITHUB_ACCESS_TOKEN: