The market for APIs has experienced explosive growth in recent years, yet the major issues that providers still face are protection and hardening of the APIs that they expose to users. In particular, when you are exposing APIs from a cloud based platform, this becomes very difficult to achieve given the various cloud provider constraints. In order to achieve this you would have to implement a solution that will provide the hardening capabilities out of the box, but that still permits for customization of the granular settings to meet the nuances of a specific environment. If this is something you desire, this article might help you foresee the many uses and versatility.
Identify sensitive data and sensitivity of your API.
The first step in protecting sensitive data is identifying it as such. This could be PII, PHI and PCI data (PII – Personally Identifiable Information, PHI – Protected/ Personal Health Information, PCI – Payment Card Industry). Perform a complete analysis of your inbound and outbound data to your API, including all parameters, to figure this out.
Once identified, make sure only authorized people can access the data.
This will require solid identity, authentication, and authorization systems to be in place. These all can be provided by the same system. Your API should be able to identify multiple types and classes of identities. In order to achieve an effective identity strategy, your system has to accept identities of the older formats such as X.509, SAML, WS-Security as well as the newer breed of OAuth, Open ID, etc. In addition, your identity systems must mediate the identities, as an Identity Broker, so it can securely and efficiently relate these credentials to your API for consumption.
You should implement identity-based governance policies. These policies need to be enforced globally, not just locally. Effectively, this means you must have predictable results that are reproducible regardless of where you deploy your policies. Once the user is identified and authenticated, then you can use that result to authorize the user based on not only that credential, but also based on the location where the invocation came from, time of the day, day of the week, etc. Furthermore, for highly sensitive systems the data or user can be classified as well. Top secret data can be accessed only by top classified credentials, etc. In order to build very effective policies and govern them at run time, you need to integrate with a mature policy decision engine. It can be either standard based, such as XACML, or integrated with an existing legacy system provider.
Protect your data as if your business depends on it, as it often does, or should. Make sure that the sensitive data, whether in transit or at rest (storage), is not in an unprotected original format. While there are multiple ways the data can be protected, the most common ones are encryption or tokenization. In the case of encryption, the data will be encrypted, so only authorized systems can decrypt the data back to its original form. This will allow the data to circulate encrypted and decrypt as necessary along the way by secured steps. While this is a good solution for many companies, you need to be careful about the encryption standard you choose, your key management and key rotation policies. The other standard, “tokenization”, is based on the fact you can’t steal what is not there. You can basically tokenize anything from PCI, PII or PHI information. The original data is stored in a secure vault and a token (or pointer, representing the data) will be sent in transit downstream. The advantage is that if any unauthorized party gets hold of the token, they wouldn’t know where to go to get the original data, let alone have access to the original data. Even if they do know where the token data is located, they are not white listed, so the original data is not available to them. The greatest advantage with tokenization systems is that it reduces the exposure scope throughout your enterprise, as you have eliminated vulnerabilities throughout the system by eliminating the sensitive and critical data from the stream thereby centralizing your focus and security upon the stationary token vault rather than active, dynamic and pliable data streams. While you’re at it, you might want to consider a mechanism, such as DLP, which is highly effective in monitoring for sensitive data leakage. This process can automatically tokenize or encrypt the sensitive data that is going out. You might also want to consider policy based information traffic control. While certain groups of people may be allowed to communicate certain information (such as company financials by an auditor,etc.) the groups may not be allowed to send that information. You can also enforce that by a location based invocation (ie. intranet users vs. mobile users who are allowed to get certain information).
I wrote a series of Context Aware Data Protection articles on this recently.
While APIs exposed in the cloud can let you get away with scalability from an expansion or a burst during peak hours, it is still a good architectural design principle to make sure that you limit or rate access to your API. This is especially valuable if you are offering an open API and exposure to anyone, which is an important and valuable factor. There are two sides to this: a business side and a technical side. The technical side will allow your APIs to be consumed in a controlled way, and the business side will let you negotiate better SLA contracts based on usage model you have handy.
You also need to have a flexible throttling mechanism. The throttling mechanism should allow you to have the following options: just notify, throttle the excessive traffic, or shape the traffic by holding the messages until the next sampling period starts.In addition, there should be a mechanism to monitor and manage traffic, both for long term and for short term, which can be based on two different policies.
Protect your API.
The attacks or misuse of your publicly exposed API can be intentional or accidental. Either way, you can’t afford for anyone to bring your API down. You need to have application aware firewalls that can look into the application level messages and prevent attacks. Generally the application attacks tend to fall under Injection attacks (SQL Injection, Xpath injection, etc.), Script attacks, or attack on the infrastructure itself.
You also must provide both transport level and message level security features. While transport security features, such as SSL and TSL, provide some data privacy you need to have an option to encrypt/ sign message traffic, so it will reach the end systems safely and securely and can authenticate the end user who sent the message.
If you don’t collect metrics on the usage of the APIs by monitoring,you will be shooting blind. Unless you understand who is using it, when, how they are using itand the patterns of usage,it is going to be very hard to protect it. All of the above actions are built proactively based on certain assumptions. You need to monitor your traffic not only to validate your assumptions, but also to make sure you are ready for reactive measures based on what is happening. This becomes critical in mitigating the risk for cloud based API deployments.
Andy is the Chief Architect & Group CTO for the Intel unit responsible for Cloud/ Application security, API, Big Data, SOA and Mobile middleware solutions, where he is responsible for architecting API, SOA, Cloud, Governance, Security, and Identity solutions for their major corporate customers. In his role, he is responsible for helping Intel/McAfee field sales, technical teams and customer executives. Prior to this role, he has held technology architecture leadership and executive positions with L-1 Identity Solutions, IBM (Datapower), BMC, CSC, and Nortel. His interests and expertise include Cloud, SOA, identity management, security, governance, and SaaS. He holds a degree in Electrical and Electronics engineering and has over 25+ years of IT experience.
He blogs regularly at www.thurai.net/securityblog