Whilst many Infracost CLI users connect to our hosted Cloud Pricing API (since no cloud credentials or secrets are sent to it), large enterprises that have restrictive security policies require self-hosting. Thus in July we focused on improving the self-hosting experience of the Cloud Pricing API. This is a GraphQL-based API that includes all public prices from AWS, Azure and Google; it contains over 3 million prices just now!
This is actually the third Cloud Pricing API our team has built; the complexity in cloud pricing has increased significantly in the last 10 years:
- In 2010 (as part of PlanForCloud), we developed scrapers to fetch cloud prices as vendors didn't offer APIs then. There were over 10,000 prices at the time.
- In 2015 (as part of RightScale), we developed a RESTful API. There were over 100,000 prices at the time.
- In 2021 (as part of Infracost), we developed an open source GraphQL API for use with the Infracost CLI. AS previously mentioned, it containers over 3,000,000 prices just now.
We made the following main improvements in July:
- The datastore was switched from MongoDB to PostgreSQL since all the major cloud vendors have hosted options for this. Previously some of our users ran into compatibility issues with cloud vendor's hosted MongoDB versions.
- An official Helm chart was developed; the existing Docker compose file was also updated. This will deploy the Cloud Pricing API and a weekly cronjob to to update prices. By default it will also deploy a PostgreSQL pod/container but it can also be configured to use an external PostgreSQL instance for high availability, e.g. AWS RDS or Azure Database for PostgreSQL.
- The time it takes to update prices was reduced from around 1.5 hours to 2 minutes.
The pricing DB dump is downloaded from Infracost's API as that simplifies the task of keeping prices up-to-date. We have created one job that you can run once a week to download the latest prices. This provides you with:
- Fast updates: our aim is to enable you to deploy this service in less than 15mins. Some cloud vendors paginate API calls to 100 resources at a time, and making too many requests result in errors; fetching prices directly from them takes more than an hour.
- Complete updates: we run integration tests to ensure that the CLI is using the correct prices. In the past, there have been cases when cloud vendors have tweaked their pricing API data that caused direct downloads to fail. With this method, we check the pricing data passes our integration tests before publishing them, and everyone automatically gets the entire up-to-date data. The aim is reduce the risk of failed or partial updates.