Cost is something that is on everyone’s mind when building a product. Especially when you are working in a startup, you want to develop a product by optimizing resources and without paying a lot. As you develop products on various clouds, costs could quickly pile up. Following are some of the ways that helped us optimize our usage of cloud and reduce costs.
Each of the cloud providers have excellent cost explorer tools. They provided a detailed view of what are the resources being used and how much you are paying for them. There are different ways to save on each of the cloud platforms. Pre purchase the resources, spot instances etc. These can be used to get the costs under control a bit, however we had to do more work to optimize our multi cloud usage.
Since we develop on multiple clouds, we understood how each of the resources are being charged. For instance, shutting down the virtual machine doesn’t mean you are not paying for it, there are other resources such as volumes, networking etc that you pay for.
As an organization, these are the steps we took to bring cloud costs under control and they work across clouds.
– First is to identify and segregate the cloud usage based on departments and environments. For instance, we have segregated the cloud accounts for the presales team, product management, production and engineering. These will help identify how much each of the teams are burning. Now within engineering we further segregated the usage. We have separate accounts for development activities, for regressions, for developer smoke tests and staging tests. This gave us a good picture of costs for various activities. For instance we exactly know how much our regressions are costing and we are able to optimize in such a way that we get the best out of each penny we pay. This exercise also helped us in identifying what the cloud costs would be for a new hire in Engineering. Product management and presales teams are also able to see the costs for doing a demo or trialing a feature with customers. Overall, segregating accounts across the clouds was the first right step towards optimizing the cloud resources.
– Now that we have segregated the accounts, we wanted to identify what resources we are spinning up and what we are paying for. Every resource that we spin up is tagged. Tags work across the clouds and you can develop various monitoring tools around them. These tags can get as granular as you want. We tag resources in such a way that we can trace back the resource to an account and a person.
– Spinning up and tagging every resource is tedious especially across different clouds. We built several tools to spin up resources across clouds. Most of the resources needed for day to day activities are spun up using various Jenkins jobs and scripts. These scripts spin up the needed resources and tag them appropriately. Now there are several other monitoring scripts that are constantly running and keeping track of what resources are being spun up and how long they have been running for.
– Finally, ruthless shutdown and termination. We implemented a leasing system, every time a resource is spun up, a lease timer starts. Once the lease expires the resource is either shut down or terminated. This significantly helps with reducing the wastage. Even with a lot of automation and monitoring some times there may be leakages. So as a team we decided to have a flag day, usually one Friday evening once a month, to completely clean up the dev accounts and rebuild the resources. This ensured that leakages were not lingering around for a long time and also pushed our automation further since we have to tear down all the resources and rebuild them.
When you are developing or working on cloud, getting costs under control is not a one-time exercise. It’s a constant drill. One needs to clearly identify the needed resources, track their usage diligently and clean up the leakages aggressively.