How to pick an infrastructure as code language
You can use several languages today to define cloud infrastructure as code. Sometimes, you have a range of languages to choose from, even for a single tool.
So what language is the right choice? My intention with this article is to give you some guidance on how to pick the right language for you.
This text intentionally focuses on language choice, and not on tool features. You will still need to consider tool features, and I will briefly touch on some of these considerations, but not do a full-blown evaluation, since that would be a too large scope.
With that in mind, here are some aspects we will look at in terms of language choice:
Existing infrastructure code
Consider your target audience
Reduce cognitive load
Speed of feedback loop
Compliance and regulation requirements
Existing infrastructure code
If you inherit a lot of infrastructure already defined with some language and tool, it may be difficult to find business value to change this to something entirely new. This will need to be weighted in when choosing a language, and even more if choosing a new tool.
Do not just jump in headfirst to convert the code to something new. It is vital to understand the reasons behind the choices for the current language and tools. This is true for any non-trivial amount of infrastructure code.
I have myself been tempted on multiple occasions to switch to something I thought would be better. However, sometimes, I backed down from that after considering several points outlined below. In some other cases, we gradually changed, and that turned out to be a pretty good choice then.
If you have an infrastructure that is set up manually, aim to import that so you can manage these resources with infrastructure as code. You need to evaluate your tool choices and see what languages it will support to generate code for.
For retaining existing infrastructure as code, you will also need to consider the ability to integrate with other tool solutions:
CloudFormation and AWS CDK are in the same group of tools and integrate reasonably well. They do not integrate well with other tool groups (Terraform/CDKTF and Pulumi)
Terraform and CDKTF is in the same group of tools and has some (read) integration with CloudFormation, and thus indirectly with AWS CDK.
Pulumi has some (read) integrations with both Terraform and CloudFormation, and thus indirectly with AWS CDK and CDKTF
This is for integration with existing provisioned infrastructure, e.g. referencing resources in existing stacks or state storage. A more tool agnostic approach is possible as well. For example, by sharing resource references via AWS Systems Manager Parameter Store, then the underlying tools that provisioned a resource do not matter.
Consider your target audience
Who will define infrastructure? Are the application developers also responsible for the application infrastructure? Or is there a platform or operations team that handles the infrastructure?
Are there multiple groups, e.g. application developers handle “just enough infrastructure” and the more complicated parts by a platform team?
There may not be a single choice here, as the needs for each group to work efficiently with the infrastructure definitions may be different.
An application development team may work better with a different language than a platform team, since the way of working and tools used may be different.
Reduce cognitive load
Do you have an application development team that works with Java, and also will be responsible for application infrastructure? Then it may make sense to pick Java to define the infrastructure, as that may reduce the cognitive load - the amount of things you need to keep track of and handle to do the work.
If the application team does very little or no infrastructure, and a separate team handles the infrastructure which does not work with Java regularly, then a different choice may be better.
If an application development team handles some of the infrastructure, and another team the rest, then each team might benefit from different language choices.
When we talk about a cognitive load for different languages, it is not just the complexity of the language itself, but also the runtime environment and the surrounding tooling.
Which tools should we use for Python package management, virtual environments? What linters and test tools to use with Typescript, and what are the configuration settings for each of those? Are we using Gradle or Maven with Java? Should we runt terragrunt with terraform, and are we setting up tflint, tfsec or some other tools? Do we use Sceptre with CloudFormation?
If a language is not used daily, then the cognitive load may be significant each time you need to do something. In these cases, it is important to consider the languages themselves, the runtimes and the surrounding tools. Also, not just the happy day scenarios, but when things go wrong as well.
If you work with a certain language daily, then the additional cognitive load will not be that much, even if the language and tooling setup may be complex. You already manage that.
Note that reducing cognitive load does not mean you should always pick languages and ecosystems that a team already knows. If you expect the team to work with a language daily or with high frequency, it could be ok to use a new language and ecosystem. It is ok to learn something new, as long as you keep using it regularly.
Cognitive load and complexity of infrastructure
The cognitive load aspect load aspect may also come into play for the actual infrastructure definitions as well.
If you work on infrastructure that changes with about the same frequency as the application itself, then the added cognitive load may not be that high.
There may be infrastructure that does not change much, like virtual networks and VPN connections. Each time you actually need to change something here, there may be some additional cognitive load if you do not work with the networking daily. Thus, the life cycles of the infrastructure may affect the cognitive load.
What does this mean? It means static infrastructure that seldom changes benefit from languages that are easy to read and get into, even if you are not that familiar with the code. It also means that the language does not need to support complex logic, and a strictly declarative language may be just fine.
Infrastructure definitions which changes more often and may have more dynamic definitions, benefit from more expressive languages. The underlying representation can still be declarative.
In fact, many of the tools that support languages that are not strictly declarative, like AWS CDK, CDKTF, Pulumi and CDK8s, generate declarative definitions under the hood.
But for reading and understanding the infrastructure definitions, a declarative language may be better for static infrastructure, which does not need any complex logic to be defined.
Of course, supporting multiple languages may add complexity as well.
Cognitive load and package/module management
Most infrastructure-as-code tools have some sort of module/package management support, to handle re-use and defining suitable abstraction layers for the users of a module/package.
Using those is part of the expected for each language choice. However, if you decide to support multiple languages with packages, then you will add cognitive load for the package maintainers.
For example, if you want to support Typescript, you typically need to deal with npm, with Python you deal with PyPi, Java you use Maven, C# you use NuGet, etc.
Cognitive load and (idiomatic) language use
If a tool support multiple languages, that means there are some restrictions on packages and/or code to work with all supported languages.
That can also be more apparent if you pick a language that may not be officially supported or have first class support, but have a supported runtime - such as the JVM or .NET runtimes.
Any such deviations might add to the cognitive load as well. Of course, the extent to which a language is used with the specific tool choice is a consideration as well. How easy is it to get help and find examples for a particular language?
Cognitive load across tools
Mixing multiple tool and tool families is certainly possible, but may add cognitive load as well, regardless of language. It is somewhat fuzzy at times.
CloudFormation and AWS CDK is in the same family in that the underlying representation used is CloudFormation, although AWS CDK introduces a few new concepts in the mix
Terraform and CDKTF in the same family in that the underlying representation used is Terraform, although there are a few new concepts with CDKTF also.
All CDK tools (AWS CDK, CDKTF, CDK8s) share some common elements as well
There are some similarities across all tools that use programming languages as well, including most of the supported languages
Both CDKTF and Pulumi have some support to use AWS CDK components in the code
Speed of feedback loop
It is slow to provide infrastructure, much slower than just starting an application most times. This is something to keep in mind.
Mostly, the speed of the language execution itself is not a bit factor. It is often more about the speed of the tool itself rather than the language used.
I did some experiments with AWS CDK and the time to generate the underlying CloudFormation was about the same regardless if I used Go, Typescript or Python for example.
Since there is an extra step to generate CloudFormation, or Terraform with CDKTF, there is some extra time added compared to using CloudFormation or Terraform directly. The extra time here may not be a good enough reason to avoid it.
Good type checking and tooling support in IDEs or editors can shorten feedback loop, if that exists for a language. This is typically an advantage for many regular programming languages.
Is the use of a particular language, its runtime and associated tooling secure? More versatile languages and tooling comes with additional risks to expose the infrastructure definitions to troublesome pieces of code.
How trustworthy are any imported packages/modules and are they locked to a specific release? Do you require fetching data from outside to import when you build your infrastructure?
What packages should be safe to import and use? Should there be guidelines and checks for how a specific language is used and what can be imported?
Again, the language runtime and the surrounding ecosystem are important to review and decide on safe usage patterns.
Compliance and regulation requirements
Your line of business may have compliance requirements that will affect language and language ecosystem choices. This is something to look at, in particular if you are introducing an entirely new language and ecosystem to the business.
Any issues are more likely around the ecosystem and tooling and practices around these than the language itself.
This also includes any potential licensing issues, although this is more likely to be a potential issue for any specific tooling or 3rd party packages used than for the language itself. Again, this may require some extra effort in particular if you are introducing a new language and ecosystem to the business.
Let us take a fictional example. The company myexample.com has about a dozen development teams and a platform team. There are multiple solutions and services these teams work on. Some services are built with Typescript, others are built with Go.
Each development team handles infrastructure close to the application solution, e.g. services as AWS Lambda, DynamoDB, ECS Fargate, etc. They have separate accounts for development, test and production in most cases, although some environments have shared networking as well.
Networking setup, IAM, backups and baseline security setup are handled by the platform team, and they also have a mix of people that work with the infrastructure setup daily, as well as more seldom.
Teams use a mix of Serverless framework, CloudFormation and a bit of AWS CDK for their infrastructure and application deployment. There are also resources that have been set up manually. For AWS CDK, there are infrastructure resources set up in Typescript and Python.
Some teams have automated deployments, some have manual deployments.
Since development teams work daily with Typescript and Go, it makes sense that those that use AWS CDK with Typescript can continue with that. It may make sense to use Go also for team infrastructure, if there are teams that use Go daily, but not Typescript.
Right now AWS CDK with Python may be an outlier, and potentially drop that and look at converting to Go or Typescript.
Platform team may benefit from having relatively static infrastructure in a more declarative format, e.g. CloudFormation YAML, Terraform HCL, or Pulumi YAML (or CUE). When a programming language would be used, one of Typescript or Go would be likely candidates. Go’s ecosystem and tools may be simpler to grasp and keep control of, and fewer 3rd party dependencies. But this would also depend on other tasks that the platform team is doing.
Technically AWS CDK, CDKTF and Pulumi all support Go and Typescript. If the platform team would build re-usable components or modules for the other teams, they would need to support multiple languages. For AWS CDK, that would mean in practice to write these components/modules in Typescript - or CloudFormation with some Typescript. It could be in HCL with some typescript or all in typescript. For Pulumi it should be possible with any of the languages.
I have avoided picking any specific tools here, since there would be a need to more in depth on the situation at myexample.com.
In this article I have tried to point to some considerations when picking a language for infrastructure as code, without going into much specifics for each tool.
Most of the text deals with cognitive load, which I think most times is not fully appreciated for the intended target audience of the infrastructure definition work.
One tool or language does not fit all, and sometimes it may be better to have a limited selection rather than a single choice, or a total free-for-all.
For example, I have worked with a few projects where AWS CDK using Typescript has been the tool and the language of choice for everyone. Sometimes that worked reasonably well, in others it did not work out. The threshold to work with the language and ecosystem, as well as the underlying tools, were too high for some people, and the maintenance suffered because of that.
Do you have any experiences yourself with language choices for infrastructure-as-code? What are your thoughts?