In the previous post, I walked you through the foundational steps to secure your Apigee proxies, covering essentials like encryption, input validation, and rate limiting. However, Apigee is a powerful platform, and securing it effectively requires more than just the basics.
In this follow-up, I’ll introduce five more advanced security techniques to help protect your APIs from more complex threats. Whether you’re managing sensitive data, integrating with external systems, or scaling your APIs, these steps will fortify your security posture and ensure robust protection.
Step 6: Monitor and Log Everything
Maintaining secure APIs requires constant monitoring of all activities within your system. While logging is essential, it’s crucial to be selective about what you log. Capturing every request, response, and event could lead to privacy concerns and potential data leakage, especially if sensitive information like API keys, tokens, or PII is included.
External Logging Systems
Since Apigee itself doesn’t persist logs, it relies on external systems for log storage and management. While Apigee does offer the Trace (Debug) Tool, it's primarily for runtime debugging purposes and not long-term log storage. For persistent logging, Apigee offers the MessageLogging policy.
With MessageLogging, you can log API activity to Google Cloud Logging or other external services like Splunk, Datadog, Dynatrace or Loggly. This allows you to track API performance and security incidents, ensuring that your logs are stored securely and offloaded to reliable third-party systems.
You can find detailed examples of MessageLogging policies here, but it’s essential to highlight one of its key characteristics - MessageLogging is not tied to the flow execution. Even if it fails (e.g., due to a connection or validation issue), Apigee Trace (Debug) will still show a successful execution, which makes troubleshooting difficult.
Using ServiceCallout for Reliable Logging
To ensure more control over the logging process, I recommend switching to a ServiceCallout for sending logs to external services via HTTPS. This approach offers synchronous logging, meaning you'll know if the logging failed and can react accordingly.
When using ServiceCallout, it’s crucial to secure the communication between your proxy and logging service. Enabling TLS ensures that logs are transmitted securely, preventing man-in-the-middle attacks.
<ServiceCallout name="SC-LogRequest">
<Request variable="logRequest">
<Set>
<Verb>POST</Verb>
<Headers>
<Header name="Content-Type">application/json</Header>
</Headers>
<Payload contentType="application/json">{...}</Payload>
</Set>
</Request>
<HTTPTargetConnection>
<URL>https://logging-service.example.com/logs</URL>
<SSLInfo>
<Enabled>true</Enabled>
<TrustStore>ref://myTrustStore</TrustStore>
<IgnoreValidationErrors>false</IgnoreValidationErrors>
<Protocols>
<Protocol>TLSv1.2</Protocol>
<Protocol>TLSv1.3</Protocol>
</Protocols>
</SSLInfo>
</HTTPTargetConnection>
</ServiceCallout>
This configuration ensures that all logs are transmitted securely over HTTPS, and the TrustStore guarantees that only certificates from trusted Certificate Authorities (CAs) are accepted. Using the TLSv1.2 and TLSv1.3 protocols avoids outdated and vulnerable encryption standards.
Real-World Example: How a Logging Misconfiguration Can Conceal Malicious Activity
During one of my assessments, I came across a particularly interesting vulnerability in an Apigee API. The issue was related to the way the MessageLogging policy was configured to send logs to an external logging service like Datadog.
The MessageLogging policy was set up to generate a JSON payload containing various metadata, such as timestamps, request methods, and more, which were then sent to Datadog for monitoring. One of the fields included in the JSON was the User-Agent header, which was directly fetched from the incoming request using the {request.header.user-agent} variable.
While this seems harmless, it created a significant vulnerability. An attacker could craft a malicious request with a specially formatted User-Agent value containing a double-quote ("), which would break the JSON structure. Here’s a simplified example of what the log entry would look like:
{
"timestamp": "2024-09-21T14:20:00Z",
"method": "GET",
"user-agent": "malicious-user-agent""
}
In this case, the double-quote at the end of the user-agent field disrupts the JSON format, causing Datadog to reject the log entry because it cannot be parsed correctly. This means that no logs would be recorded for this request, effectively allowing the attacker to hide their presence and activities from the logging system.
To prevent this kind of log tampering, the User-Agent header should be sanitized before being included in the log message. Apigee provides a function called escapeJSON(string), which safely escapes special characters like double-quotes in a JSON string.
Here’s how the MessageLogging policy should be modified to safely include the User-Agent header:
<MessageLogging name="Log-Request">
<Syslog>
<Message>
{private.datadogKey} {"timestamp": "{system.timestamp}", "method": "{request.verb}", "user-agent": "{escapeJSON(request.header.user-agent)}"}
</Message>
<Host><site_intake_endpoint></Host>
<Port><site_port></Port>
<Protocol>TCP</Protocol>
</Syslog>
</MessageLogging>
By wrapping the request.header.user-agent
in the escapeJSON
function, the JSON payload is protected from breaking due to any malicious input. This ensures that all logs are correctly formatted and successfully received by Datadog, leaving no room for an attacker to hide their tracks.
Be cautious with your logging configuration
Attackers can exploit logging misconfigurations to evade detection, as seen in the example above. Always sanitize and validate any user input that is included in logs, especially when using Message Templates to dynamically generate log messages.
Step 7: Implementing Secure Caching
Apigee provides several policies for managing cache, including PopulateCache for writing data to the cache, LookupCache for reading cached data, and InvalidateCache for removing entries from the cache. There is also ResponseCache policy for caching whole responses. Proper use of caching can significantly enhance the performance and scalability of your APIs. But to use it securely, it’s essential to configure these policies properly and avoid common pitfalls.
Sensitive Data as a Cache-key
Security frameworks and best practices emphasize the importance of protecting sensitive data in all forms, whether at rest or in transit. According to OWASP, PCI DSS, and other security standards, sensitive information must not be stored or transmitted in clear text or in places where it could be inadvertently exposed, such as cache keys or logs.
In Apigee, cache keys are stored "as is" in Cassandra, which means any sensitive data used as part of the cache key is directly stored without encryption or obfuscation. This is a critical security concern because if cache keys contain confidential information, such as PII or API tokens, they can easily be extracted and misused by anyone with access to the database.
Although the risk of data leakage from the Apigee cache stored in Cassandra is relatively low, I always adhere to the principle of "defense in depth" to ensure comprehensive protection. This means implementing multiple layers of security, even when the likelihood of an incident is minimal, to mitigate potential threats and provide additional safety nets.
Why Cache Access Should Follow Authentication and Authorization
When managing cache in Apigee, it’s vital to remember that all cache interactions should occur after the user has been properly authenticated and authorized. This ensures that only those who are permitted can access or manipulate cached data, preventing unauthorized access to potentially sensitive information.
For instance, let's say you're caching user-specific details, like account details or personalized settings. If the cache is accessed without verifying the user’s identity first, there’s a risk that unauthorized users could retrieve cached data intended for someone else. This not only compromises the security of your API but also goes against fundamental security principles of ensuring that data is only accessible to those who have the right to view it.
Let’s dive into a more detailed, real-world scenario to illustrate the risks of not properly handling authentication and authorization in conjunction with caching.
Scenario Breakdown
A user makes a request to an API proxy, providing an order number as a header (e.g., X-Order-ID: 123456789). The proxy performs all necessary authentication and authorization checks to verify the user’s identity and ensure they have the right to access the order information. After successfully authenticating the user, the proxy forwards the request to the backend system, retrieves the order details, and responds to the user.
To improve performance for subsequent requests, the proxy uses the PopulateCache policy to cache part of the response. The cache key is set to the order number (123456789) with a specific TTL (Time to Live), say 10 minutes. The caching process saves this data so that if the same user requests this information again, the proxy can respond faster by using the cached data.
<PopulateCache async="false" continueOnError="false" enabled="true" name="Populate-Order-Cache">
<CacheKey>
<KeyFragment>order</KeyFragment>
<KeyFragment ref="request.header.X-Order-ID"/>
</CacheKey>
<Scope>Exclusive</Scope>
<ExpirySettings>
<TimeoutInSeconds>600</TimeoutInSeconds> <!-- Cache expires in 10 minutes -->
</ExpirySettings>
<Source>flow.addressDetails</Source>
</PopulateCache>
A short time later, the same user or an attacker makes another request to the proxy, again providing the order number as a header (X-Order-ID: 123456789). However, this time, the LookupCache policy is executed before the authentication to check if the order information is already cached.
<LookupCache async="false" continueOnError="false" enabled="true" name="Lookup-Order-Cache">
<CacheKey>
<KeyFragment>order</KeyFragment>
<KeyFragment ref="request.header.X-Order-ID"/>
</CacheKey>
<Scope>Exclusive</Scope>
<AssignTo>flow.addressDetails</AssignTo>
</LookupCache>
The cached data is found, and the proxy returns the cached response to the requester without verifying their identity or permissions. This effectively exposes the cached order information to any requester who knows or guesses the order number.
Because the cache entry is valid for 10 minutes, an attacker has a 10-minute window to exploit this vulnerability. During this time, the attacker can send multiple requests with different order numbers to access various cached order details. If the order numbers are predictable or follow a sequential pattern, this becomes even easier for the attacker to enumerate and exploit.
To avoid this risk, it’s critical to implement caching with proper security measures:
Always Perform Authentication and Authorization Before Cache Access
Ensure that every request to the proxy is authenticated and authorized before any interaction with the cache. This means using policies like OAuthV2 or VerifyJWT before LookupCache or PopulateCache to confirm that the requester has the right to access the data.
<Flow name="GetOrderDetails">
<Description>Retrieve order details from cache or backend.</Description>
<Condition>(proxy.pathsuffix == "/orders") and (request.verb = "GET")</Condition>
<!-- Request handling steps -->
<Request>
<Step>
<Name>Decode-User-JWT</Name>
</Step>
<Step>
<Name>Verify-User-JWT</Name>
</Step>
<Step>
<Name>Lookup-Order-Cache</Name>
</Step>
</Request>
<!-- Response handling steps -->
<Response>
<Step>
<Name>Populate-Order-Cache</Name>
<Condition>lookupcache.Lookup-Order-Cache.cachehit == false</Condition>
</Step>
</Response>
</Flow>
Design Cache Keys Carefully
Use non-sensitive, user-specific identifiers or secure hashes that are unique to the authenticated user session. This helps prevent attackers from guessing cache keys based on order numbers or other predictable patterns.
Invalidate Cache on Logout or Session Expiry
Invalidate the cache entries related to a user session when the user logs out or their session expires. This limits the time window in which the data can be accessed without proper authentication.
CodeSent has you covered!
Using confidential data as cache keys or allowing access to the cache without proper authentication can expose your system to serious threats. CodeSent automatically identifies these vulnerabilities, ensuring that sensitive data like user PII or tokens isn’t misused and that your cache is protected from unauthorized access. Learn how CodeSent can help secure your APIs and protect your data from hidden threats.
Step 8: Implementing Robust Error Handling
When securing your Apigee API proxies, it's crucial to think about how errors are handled and communicated. A well-designed error handling strategy ensures that your APIs remain resilient, user-friendly, and secure, even when something goes wrong.
Why Error Handling Matters
Things don't always go as planned - network failures, misconfigurations, or even malformed requests can lead to errors. When this happens, it’s important not only to return a clear message to the user but also to avoid exposing sensitive internal information that could be used by attackers. Proper error handling provides a graceful way to deal with these situations, ensuring that your API remains reliable and secure.
Errors can arise in various ways within an API proxy. A client might send an invalid API key, a request could exceed the allowed rate limit, or a backend service could be down. In Apigee, such conditions are automatically caught by policies like VerifyAPIKey, Quota, or SpikeArrest. These policies trigger errors, which you can then handle with custom logic using FaultRules.
Other times, you might need to raise a custom error manually, for example, if a certain condition in the response from the backend signals a failure. This is where the RaiseFault policy comes into play, allowing you to trigger and manage custom errors based on specific conditions.
Designing Custom Error Messages
One of the most important aspects of error handling is the message that gets returned to the client. By default, Apigee may return messages that are too cryptic or too revealing - neither of which is ideal. A cryptic message doesn’t provide enough context to the user, leading to confusion and frustration. On the other hand, messages that are too revealing can expose internal system details, potentially aiding an attacker in crafting more sophisticated attacks.
For example, if a client sends an invalid API key, instead of simply returning a generic “401 Unauthorized” response, you could enrich it with additional context: “Your API key is invalid. Please ensure that it is correct and has not expired.” This kind of message is informative without being overly technical, providing guidance to the user while keeping your system's internal workings hidden.
Using FaultRules and Default Fault Handling
In Apigee, the FaultRules configuration allows you to manage how different types of errors are handled. You can create specific rules for common errors, such as quota violations or invalid API keys. When these rules are defined, Apigee will use them to handle errors gracefully, returning custom messages and triggering any additional logic you've specified.
For example, if a Quota policy throws a QuotaViolation error, you might want to return a specific message explaining that the user has exceeded their usage limit. This can be done using an AssignMessage policy attached to a FaultRule:
<FaultRule name="QuotaViolationHandler">
<Condition>(fault.name = "QuotaViolation")</Condition>
<Step>
<Name>Assign-QuotaExceeded-Message</Name>
</Step>
</FaultRule>
But what happens if an error occurs that you haven’t explicitly planned for? That’s where the DefaultFaultRule comes into play. This rule acts as a catch-all for any unhandled errors, allowing you to provide a generic response like: “An unexpected error occurred. Please contact support.”
<DefaultFaultRule name="DefaultFault">
<Step>
<Name>Assign-Message-503</Name>
</Step>
<AlwaysEnforce>true</AlwaysEnforce>
</DefaultFaultRule>
Handling Error Flow Variables in JavaScript Policies
In Apigee proxies, it’s common to use JavaScript policies to manipulate flow variables, perform validations, or execute complex logic. Often, these scripts set error variables to indicate an issue that occurred during execution. However, if these error variables are not checked later in the same flow phase, they may go unnoticed, and the proxy could continue processing the request as if everything is fine. This can lead to unexpected behavior, incorrect responses, or even security vulnerabilities.
Imagine a JavaScript policy sets an error variable like this:
context.setVariable("errorState", true);
If this error variable isn’t checked immediately after, the proxy might continue with subsequent steps, oblivious to the fact that something went wrong. Proper error handling requires you to catch this error state and handle it appropriately - either by raising a custom fault, logging the error, or both.
A simple way to address this is by adding a RaiseFault step after the JavaScript policy to check the error variable:
<Step>
<Name>RaiseFault.HandleJSError</Name>
<Condition>errorState == true</Condition>
</Step>
This ensures that any errors set by the JavaScript policy are immediately acted upon, preventing the proxy from processing faulty requests further.
Dealing with continueOnError Attribute
In some scenarios, policies in Apigee might have the continueOnError attribute set to true. This attribute allows the flow to continue executing even if the policy encounters an error. While this can be useful for specific use cases, it also comes with risks. If the errors encountered by these policies are not properly handled later in the flow, they can lead to unintended consequences, such as incorrect responses or unhandled security issues.
For example, if a ServiceCallout policy fails because the backend service is unavailable and continueOnError is set to true, the proxy will continue to execute as if everything is normal. Since the ServiceCallout policy sets a failure variable (i.e., servicecallout.policy-name.failed), this variable should be checked in subsequent steps to ensure that the error is addressed:
<Step>
<Name>RaiseFault.HandleServiceCalloutFailure</Name>
<Condition>servicecallout.ServiceCallout-policy.failed == true</Condition>
</Step>
Don't let policy errors slip through the cracks!
When policies fail in Apigee, it’s crucial to catch and handle these errors before they lead to bigger issues. CodeSent ensures that any policy configured with continueOnError has proper error handling in place, so your proxies never continue under faulty conditions. Ensure your APIs fail securely and gracefully, every time. Check out the rules for catching unhandled policy errors and validating error flow variables to see how it can keep your proxies secure and reliable.
Step 9: Enforcing Code Quality and Managing Flow Variables
High-quality code in your Apigee proxies ensures better maintainability, readability, and performance. It reduces the chances of introducing vulnerabilities through misconfigurations or unhandled scenarios. Clear and well-documented code also simplifies the onboarding process for new team members and minimizes the potential for errors during updates or audits.
A particular area of enforcing code quality must be the use and management of flow variables within Apigee. Flow variables are essential for passing data between policies, steps, and different parts of your proxy. However, if not handled correctly, they can introduce a range of issues - from minor bugs to critical security vulnerabilities.
Undefined flow variables, for example, can lead to unpredictable behavior when they are referenced before being properly initialized. This can disrupt the proxy flow and cause errors that are hard to trace. On the other hand, unused variables clutter the code, making it difficult to maintain and increasing the likelihood of overlooking potential issues.
Imagine a scenario where you have the following configuration in your proxy:
A SpikeArrest policy:
<SpikeArrest continueOnError="false" enabled="true" name="SpikeArrest.ArrestByClientID">
<DisplayName>SpikeArrest.ArrestByClientID</DisplayName>
<Identifier ref="apigee.client_id"/>
<Rate ref="verifyapikey.VerifyAPIKey.spike.arrest.value">500ps</Rate>
</SpikeArrest>
attached to a PreFlow after a VerifyAPIKey policy:
<PreFlow name="PreFlow">
<Request>
<Step>
<Name>VerifyApiKey.VerifyClientID</Name>
</Step>
<Step>
<Name>SpikeArrest.ArrestByClientID</Name>
</Step>
...
</Request>
...
</PreFlow>
At first glance, this configuration seems fine. But it contains a subtle error that can lead to a significant issue. According to the Apigee documentation, general flow variables populated by the VerifyAPIKey policy follow the pattern verifyapikey.{policy_name}. However, they do not follow the structure presented in the SpikeArrest policy above.
In this case, the correct flow variable should be something like verifyapikey.VerifyClientID.spike.arrest.value because the VerifyApiKey policy populates variables under the pattern verifyapikey.VerifyClientID.*.
This discrepancy means that the SpikeArrest policy might not work as intended, since the variable verifyapikey.VerifyAPIKey.spike.arrest.value is never populated. Instead, the variable verifyapikey.VerifyClientID.spike.arrest.value should be used.
This is not just a hypothetical example - I’ve seen it happen in real-world scenarios. A small mistake like this can lead to a proxy failing silently, with the SpikeArrest policy not throttling as expected. The proxy continues to receive requests without the intended protection, exposing backends to potential Denial-of-Service (DoS) attacks.
So, always verify that the flow variables you're using would be defined during the proxy runtime. A mismatch, as shown here, could lead to unexpected behaviors and potentially severe security vulnerabilities.
Don't let undefined flow variables disrupt your proxy!
Undefined flow variables can lead to unpredictable behavior and security vulnerabilities in your Apigee proxies. CodeSent helps you identify and address these issues, ensuring that your proxy flows are clean, efficient, and secure. Prevent misconfigurations before they happen and maintain optimal proxy performance.
Step 10: Leverage Automated Tools
When it comes to securing and maintaining Apigee proxies, relying solely on manual reviews and best practices is not enough. Automated tools are invaluable in detecting issues that may be overlooked, reducing the time spent on code reviews, and ensuring consistent application of security and quality standards.
Why Automated Tools Matter
Automated tools ensure that every proxy is reviewed with the same level of scrutiny, applying your security and quality rules uniformly across all projects. They can analyze thousands of lines of code in seconds, highlighting issues that would take a manual reviewer much longer to find. And by identifying potential issues early in the development cycle, you reduce the risk of security breaches and ensure your proxies are resilient against attacks!
Start with the Basics, but Aim for More
One popular tool is apigeelint, which helps enforce best practices and avoid anti-patterns in Apigee configurations. It’s a great starting point for maintaining proxy quality and ensuring your configurations follow the recommended guidelines. However, apigeelint focuses primarily on best practices, and while it’s excellent for keeping your configurations clean and avoiding common pitfalls, it doesn’t delve deeply into security issues or complex misconfigurations that can put your APIs at risk. So, if you want to go beyond basics and ensure comprehensive security and vulnerability detection, you’ll need a more robust solution like CodeSent.
CodeSent: The Next Level of Apigee API Security
CodeSent is designed to address the limitations of traditional tools, offering a deeper and more comprehensive analysis of your Apigee proxies. Here’s why it stands out:
- Flow-Sensitive Static Analysis: This unique feature enables CodeSent to understand the execution flow within your proxies, catching vulnerabilities that may be deeply embedded and overlooked by other tools.
- Comprehensive Flow Variable Analysis: CodeSent provides an in-depth view of how data is manipulated throughout the proxy, helping identify potential data leaks and security gaps related to improper variable handling.
- Data Tagging and Contextual Analysis: By tagging sensitive data and analyzing its use across different contexts, CodeSent ensures that your proxies adhere to strict security policies.
- SharedFlow Context Awareness: CodeSent can analyze how sharedflows interact with your proxies, giving you a complete picture of your API security posture.
- Automatic CVSS Scoring and CWE Mapping: CodeSent offers automated vulnerability scoring and mapping to CWE identifiers, helping you prioritize issues and align with industry standards.
These features make CodeSent an essential tool for teams that want to ensure their Apigee proxies are not only following best practices but are also secure and resilient against potential threats.
Ready to take your Apigee security to the next level? Get Started with CodeSent!
Beyond the Basics