The on-demand nature of Amazon Web Services has changed how technology companies think about infrastructure. It lets early stage start-ups light up infrastructure at the same pace as they are growing and established businesses to easily transition between different infrastructure configurations with a few API commands. The downside of this is that if an attacker gains access to credentials with sufficient access, they can run up huge bills for nefarious purposes, (like bitcoin mining 1 2), or simply delete everything, as Code Spaces found out last year. Whilst one could argue that much of this would be mitigated with good security practice, or even just good backups, the daunting failure rate of startups makes the risk/reward temptingly low for deferring these efforts until the dollars flow in.
As we’ve grown 99designs from a small startup to over 150 people, we’ve gone through several of these inflection points where we have to rethink how we’ve approached engineering, and more recently how we approach security and our AWS infrastructure. We’ve been using AWS since it’s early betas in 2007, so we’ve regularly tried to apply best practice as they evolved, but we still had fairly rudimentary practices around how we approached credentials. We set out to figure out what the risks were and how we might mitigate them without introducing too much extra complexity.
In our estimation, the most likely attacks vectors were focused around four themes:
- Stolen or misplaced engineer’s laptops with credentials on them
- Malware on engineer laptops that steal credentials
- Accidentally published AWS credentials on Github or elsewhere
- Ex-employees who haven’t had credentials revoked
I’m certain this isn’t an exhaustive list and we’ll be working to update this over time as new vectors are discovered. We arrived at a number of high-level mitigations that we wanted to apply:
- Each team should have their own AWS account, with limited cross-account privileges. Principle of least privilege should be applied where practical.
- User creation/removal should be centrally managed and audited regularly, with keys regularly rotated.
- A second factor that isn’t stored on the laptop should be required for security sensitive operations.
- AWS Credentials should never be stored at rest in plaintext, or ideally even in memory.
These mitigations largely address #1 and #2, which we felt were the largest risks, but they also reduced the scope of what was possible with #3 and #4.
AWS Building Blocks
Amazon provides a variety of tools for managing users and permissions, namely Identity and Access Management (IAM). An overview of what these offer is beyond the scope of this post, but at a high level they allow multiple users to be created within groups, each with granular access policies. They also offer IAM Roles, which can be assumed by users to access certain things, like one would use
sudo in a linux system. Roles can be given access to other Amazon accounts via cross-account role delegation. IAM accounts and Roles can have Multi-Factor Authentication (MFA) devices associated with them, with permissions conditional on the entry of a time-based code.
The other key building block is Amazon’s Security Token Service (STS), which generates temporary credentials. The general work flow that STS uses is to create a time-limited authentication “session“, which is then used for subsequent operations, such as
AssumeRole, or whatever other AWS API calls that the original user had permissions to call.
Separating concerns into different AWS accounts
Our first step was to address our single monolithic AWS account by splitting it into multiple accounts and using a Bastion account with all our developers in it and role-based delegation into all the other accounts. This excellent post “Your single AWS account is a serious risk” explains the thought process leading up to this. The advantage to this approach is that we could still give teams admin access to their entire sub-account, but limit access to other teams accounts on a more granular basis. Immediately this strategy limits the amount of damage one set of admin credentials can do, either by accident or if leaked.
After the split, we have 11 different AWS accounts, split across 30 engineers. We looked at a variety of ways of automating configuration management for setting up the trust relationships and permissions. We ended up building a tool called IAMy that allows for all our AWS permissions and IAM configuration to be represented as a git-managed repository of YAML files, which suits our philosophy of “Infrastructure as code”. We use Bitium for SAML-based delegation elsewhere, but until we are at a larger scale, the benefits of the more granular control (particularly around security assertions based on MFA tokens) that hand-rolled IAM config provided outweighed the ease of management of a more sophisticated Identity Provider.
We’ll address the above in more detail in a subsequent post, once we’ve had some time to observe it “in the wild”.
Securing credentials on developer machines
Our biggest concern with the above changes was the complexity this introduced for developers. Thankfully AWS’s CLI tools and SDK’s have improved in leaps and bounds over the years and offer a lot of options to simplify exactly this architecture. Developers could use their IAM credentials to assume role’s on each of the different accounts either in CLI tools or via the AWS Web Console.
Physical access to credentials is somewhat mitigated for us as we required disk encryption for all laptops, which means that provided a strong password is used and the computer has locked before it’s accessed, it’s harder to lift credentials from a stolen laptop. We were more concerned about malware stealing credentials from disk.
We looked at a variety of existing solutions, including REA’s credulous (which I’m told is no longer in use) and AdRoll’s hologram, but none of the tools provided the right balance of security we wanted.
The key requirements from our point of view (and the AWS Security Credentials Best Practices) were:
- IAM credentials should never be exposed to third-party code
- Temporary security credentials should be generated for all operations
- MFA should be required as frequently as is pragmatic
- The user experience should be as close to the previous approach of storing creds in ~/.aws/credentials.
Introducing AWS Vault
As most of our engineers use OS X as their primary operating system, we looked to OS X’s Keychain to store credentials at rest. Many of us had been using pda’s aws-keychain, so we wanted something that was just as easy to use.
The tool we ended up creating is called aws-vault and is written in Go (which we are huge fans of at 99designs), with native bindings to OS X’s Keychain and Linux’s Kwallet (Windows support coming soon). It reads the same configuration file that the aws-cli tool does, so allows for nice progressive enhancement over the standard features that aws-cli offers around role switching and MFA.
$ cat ~/.aws/config [profile 99designs] region = us-east-1 [profile contests] region = us-east-1 source_profile = 99designs role_arn = arn:aws:iam::123456789:role/ReadOnly [profile contests-admin] region = us-east-1 source_profile = 99designs role_arn = arn:aws:iam::123456789:role/Administrator mfa_serial = arn:aws:iam::123456789:mfa/lachlan $ aws-vault add 99designs Enter Access Key Id: ABDCDEFDASDASF Enter Secret Key: % # Assume a read-only role $ aws-vault exec contests -- aws s3 ls bucket_1 bucket_2 # Assume an admin role for writes $ aws-vault exec contests-admin -- aws s3 cp llamas.jpg s3://testbucket Enter token for arn:aws:iam::123456789:mfa/lachlan: 123456
By default, there is a dedicated Keychain for AWS credentials and Keychain prompts you when credentials are accessed:
Beyond the strong storage-at-rest, aws-vault generates short-lived session-based credentials to expose to sub-processes and it encourages you to use the tool to run other tools, rather than exporting credentials to your environment. This means that rogue node.js packages have a harder time obtaining your credentials, and when they do, are limited to the lifetime of the session.