18 February 2022

Using Semgrep to find security issues and misconfigurations in AWS Cloud Development Kit projects

Learn how to find security issues and misconfigurations in AWS Cloud Development Kit projects with Semgrep.

Dakota Riley
Dakota Riley Principal Security Engineer LinkedIn
  • The AWS CDK provides a helpful infrastructure as code abstraction layer for developers. However, nearly all current IaC security scanning tools scan the generated Cloudformation output, which adds an additional step to fixing issues.

  • In this post, I walk through how to write Semgrep rules to find issues directly in AWS CDK code, using some open source rules I’ve contributed as examples. I’ll also show how Semgrep can enforce usage of company-specific custom constructs, enabling cloud security teams to define secure by default primitives that developers can use.

Intro

Many teams today are taking advantage of Infrastructure-As-Code (IaC) and its numerous automation and security benefits. Due to the declarative nature of IaC tools like CloudFormation, Terraform, or Kubernetes manifests, Static Code Analysis is a reliable technique to automatically identify security misconfigurations in IaC.

A recent trend in the IaC landscape is the emergence of frameworks that utilize true programming languages, like Python and Javascript/Typescript, to define infrastructure, rather than data expression languages like JSON or YAML. Two examples of this are the AWS Cloud Development Kit (CDK) and Pulumi. These frameworks provide all the features of a turing complete programming language, but still offer declarative style resource definitions.

In this blog, we will explore applying static code analysis with Semgrep to the AWS Cloud Development Kit to find security misconfigurations.

What is the AWS CDK?

The AWS Cloud Development Kit (CDK) is an IaC framework that allows you to define your cloud resources with familiar programming languages and an object-oriented approach.

The CDK uses “constructs” that represent AWS resources. AWS CDK constructs come in two flavors: L1 constructs that map 1:1 with AWS Cloudformation resources, and L2 constructs that provide higher-level, intent-based interfaces. The following example shows how we can create an S3 Bucket using the AWS CDK L2 Bucket Construct:

import * as s3 from '@aws-cdk/aws-s3';
import * as cdk from '@aws-cdk/core';

export class CdkStarterStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const bucket = new s3.Bucket(this, 's3-bucket', {
      encryption: s3.BucketEncryption.S3_MANAGED,
      enforceSSL: true
    })
  }
}

L2 Constructs are one of the biggest value propositions of the AWS CDK. They often take something that would require coordinating multiple CloudFormation resources, and condense them into an easy to reason about package. Another great example of this is the AWS CDKs L2 VPC Construct. For those familiar with AWS - you are likely aware that a VPC consists of several moving parts in order to actually work. In raw CloudFormation JSON, these would all have to be explicitly expressed. With the AWS CDK - this is a single entity, but still configurable via parameters/methods:

import * as ec2 from '@aws-cdk/aws-ec2';
import * as cdk from '@aws-cdk/core';

export class CdkStarterStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const vpc = new ec2.Vpc(this, 'VPC', {
        cidr: '10.0.0.0/21',
        maxAzs: 1
    });
  }
}

The code above produces a VPC with multiple public and private subnets, that can also can be configured. For more information, check out the documentation for the VPC construct here. The AWS CDK construct library is designed to reduce the “cognitive lift” required for a developer to build AWS Infrastructure. There are numerous other benefits, for example: static typing, intellisense/autocomplete, but those go beyond the scope of this article.

What is Semgrep?

Semgrep (short for Semantic Grep) is a fast, lightweight, and open-source static analysis tool for finding bugs and automating code reviews. Semgrep can be run against codebases locally via an IDE, in CI systems, or externally via the Semgrep GitHub app. One of the biggest benefits of Semgrep is its highly accessible custom rule system. Most other tools that expose this either: require a deep working knowledge of how an Abstract Syntax Tree works, learning a custom DSL, or both! With Semgrep, the rules look almost exactly like the code you want to find - so, if you can code, you can write rules!

For a quick example of how Semgrep works, take the following slightly fun/hypothetical python code snippet:

from doomsday import start_skynet

def main():
    print('Simulating what would happen if we started Skynet')
    start_skynet(simulation=True)

    print('Actually starting Skynet')
    start_skynet()

In case the reference didn’t land - Skynet is very bad

In the above example, we want to enforce that engineers don’t execute the start_skynet() function without the simulation=True parameter, for obvious bad reasons. Because we know Python and know what the bad code in question looks like, we can develop the following rule:

# rule.yml
rules:
- id: skynet_started
  patterns:
    - pattern: start_skynet(...)
    - pattern-not: start_skynet(..., simulation=True, ...)
  message: You are starting skynet! Do you actually want to end the world? Add simulation=True parameter
  languages: [python]
  severity: ERROR

The rule above would find the bad implementations we are looking for:

$ scratch % semgrep --config rule.yml
Running 1 rules...
skynet.py
rule:skynet_started: You are starting skynet! Do you actually want to end the world? Add simulation=True parameter
8:    start_skynet()
ran 1 rules on 1 files: 1 findings

This example is really only scratching the surface of Semgrep’s capabilities - and for those already familiar with static analysis - Semgrep is “code aware”, meaning that it goes beyond pattern matching and supports features like constant propagation, import aliases, and metavariables (think capture groups). See here for a list of features.

Why apply static analysis to the AWS CDK?

Now, why would we apply Static Analysis directly to the AWS CDK? Because the AWS CDK ultimately synthesizes AWS Cloudformation - a known pattern is to synthesize the templates and then pass them to a tool that understands cloudformation (CfnGuard, Checkov, cfn-nag, Snyk IaC are among many available tools that exist for this). A couple of thoughts to this point:

  • Speaking the language of the developers. In a past life, we had a lot of development teams who their first IaC experience was with the AWS CDK. As a result, most didn’t know or care about the underlying Cloudformation that was produced. We (the security team) found we had a lot more traction in resolving issues at the IaC level when we addressed how to fix them in the AWS CDK project itself, as opposed to yelling about a Cloudformation level issue. Anything we can do to make security issues easier to fix is a huge plus. We can liken the CDK to a compiler that outputs Cloudformation. We wouldn’t yell at developers about assembly-level issues, right? We would address them at the code level. The same concept applies to the CDK.

  • Taking advantage of the CDKs powerful abstractions. The AWS CDK’s powerful but readable abstractions over AWS resources creates cases where we might be able to identify security issues easier. The earlier example illustrates how developers can enforce setting Encryption and EnforceSSL on the Bucket construct.

  • Lastly, just to experiment and see if we can produce better security outcomes instead of sticking with the “tried and true!”. This blog is about exploring if this is a feasible approach.

At the time of writing, we were unable to find any tooling that performs Static Code Analysis directly against the AWS CDK. CDK-Nag is another awesome security tool for the AWS CDK, but it operates almost as more of “runtime” check, requiring you to run cdk synth to get results, which requires a buildable environment. In addition, CDK-Nag “walks” the construct tree of the stack - reading the underlying Cloudformation level settings of a construct, as opposed to directly looking at the code of the L2 Construct itself, so it is a different approach then we are taking here.

Writing Semgrep rules to find AWS CDK Security Issues

The first step of writing a Semgrep rule is to know what bad things we want to look for. We will cover two specific classes of issues related to the CDK:

  • Specific misconfigurations with the AWS Construct Library L2 Constructs
  • Enforcing that your team utilizes a custom construct with secure defaults, rather than the out-of-the-box configurations.

CDK high level construct misconfigurations

Below are 5 common security issues I wanted to hunt for with L2 Constructs. These can be found in the Semgrep Registry.

  • CodeBuild Project constructs with the Badge: true (Will make the project public)
  • S3 Bucket constructs lacking an Encryption property with a valid setting (Will create a Bucket without default encryption enabled)
  • S3 Bucket constructs lacking the EnforceSSL: true property (Will create the bucket without adding a statement to enforce encryption in transit to the bucket policy)
  • Calling the GrantPublicAccess() method on Bucket Constructs (Will make the bucket publicly accessible)
  • SQS Queue constructs lacking an Encryption property with a valid setting (Will create an SQS Queue without encryption at rest enabled)

NOTE: If you want to follow along- you can either download the Semgrep CLI tool and run it locally, or make use of the Semgrep Playground, which provides everything you need in browser to build and test Semgrep rules

Let’s look at the process of developing a rule for the “Calling the GrantPublicAccess() method on Bucket Constructs” issue. We now need to know what good code, and bad code looks like in this case. For those familiar with the Test Driven Development approach, let’s start with a test file. First, lets create a rule template with basic things like rule-id and metadata:

# bucketpublic.yaml
rules:
 - id: awscdk-bucket-grantpublicaccessmethod
   patterns: #purposely blank - for now
   message: Using the GrantPublicAccess method on bucket construct $X will make the objects in the bucket world accessible. Verify if this is intentional.
   languages: [ts]
   severity: WARNING
   metadata:
     cwe: 'CWE-306: Missing Authentication for Critical Function'
     category: security
     technology:
     - AWS-CDK
     references:
     - https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-control-overview.html

Semgrep rules are defined in YAML, which contains the id and pattern fields, the message presented for the finding, and additional metadata fields. These fields will be super helpful for the people responsible for fixing said finding. You will notice that the pattern field is empty - we will come back to this later. Now, we can write the code for a true positive and true negative occurrence of this issue:

import * as cdk from '@aws-cdk/core';
import * as s3 from '@aws-cdk/aws-s3';

 export class CdkStarterStack extends cdk.Stack {
   constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
     super(scope, id, props);

     // ruleid:awscdk-bucket-grantpublicaccessmethod
     const publicBucket1 = new s3.Bucket(this, 'bucket')
     console.log('something unrelated')
     publicBucket1.grantPublicAccess()

     // ok:awscdk-bucket-grantpublicaccessmethod
     const nonPublicBucket = new s3.Bucket(this, 'bucket')

  }
}

In the code snippet above, we create two buckets publicBucket1 and nonPublicBucket, and then call the grantPublicAccess() method on the first bucket, making it publicly accessible. In addition, note the comments above both snippets of code on lines 9 and 13 respectively - // ruleid:... and // ok:.... Semgrep has a rule testing functionality that allows us to effectively write unit tests for our rules, and it works by writing code that you would and would not expect to produce a finding by appending the // ruleid:YOURRULEID or // ok:YOURRULEID comments above the snippets of code in question. Then we can use either the Semgrep CLIs --test functionality, or the Semgrep playground to make sure our Rule behaves as expected while we develop it. Check out the docs for testing rules in Semgrep for more information. Now that we have both a true positive and true negative scenario. We can finally begin to develop our rule!

Going back to the patterns key in our rule file that we left blank earlier, this is where we will begin to write our rule. A quick primer on patterns:

  • A “pattern” in Semgrep is the code you want to find, with some added capability:
    • the ellipsis operator ... is used to match any argument, statement, parameter, etc. This is great for when we don’t know or care what a particular piece of code will look like. Eg - a function called where we don’t care about the presence of arguments, but still want to match if there are arguments provided.
    • Semgrep supports metavariables, which are used to keep track of a particular value, and see if it is used later. These metavariables are expressed with the $ character, like $X or $VALUE. A great example of this is by matching a variable/constant created from a certain function call, and seeing if we call a specific method. Hint: We will definitely need this for our rule!
  • Semgrep rule files support different types of pattern statements. This allows us to chain multiple patterns together using logic to achieve our desired result. Below are a few examples:
    • pattern - this contains a single pattern
    • patterns - Acts as a logical AND operation - meaning all patterns contained must match to produce a finding
    • pattern-either - Acts as a logical OR operation - meaning one of the patterns contained must match to produce a finding
    • pattern-not - Acts as a logical NOT operation - meaning the rule eliminates a pattern from being a finding if matched
  • Pattern statements can be nested, and a rule file is required to have at least one pattern statement to be a valid rule. Important to keep in mind while you can get incredibly complex logic with nesting - it will effect the performance of the rule.

This blog only scratches the surface of Semgrep’s capabilities; check out both the pattern syntax and rule syntax for deeper information.

To start writing our rule - we can break it up into a few pieces:

  1. The code imports the @aws-cdk/aws-s3 module
  2. We instantiate a Bucket construct
  3. We call the grantPublicAccess() method on that construct later in the code

These things are necessary for a match (Logical AND) - so we can list them as multiple patterns under a patterns key. Using the pattern-inside key, we can verify that we are importing the @aws-cdk/aws-s3 module correctly:

# bucketpublic.yaml
- patterns:
  - pattern-inside: |
    import * as $Y from '@aws-cdk/aws-s3'
    ...

So the above pattern is a workaround. For other languages, the pattern matching for the import statement isn’t needed if you specify the full path of the module. This feature is a work-in-progress for Javascript/Typescript. As a stopgap, we are using the $Y metavariable to capture the the import alias of the @aws-cdk/aws-s3 module, so we can reference it in our pattern. Then, using the ellipsis operator, we will match any code following the import statement.

Now, we can utilize a single pattern to catch the bucket instantiation and method call:

# bucketpublic.yaml
 - patterns:
   - pattern-inside: |
      import * as $Y from '@aws-cdk/aws-s3'
      ...
   - pattern: |
       const $X = new $Y.Bucket(...)
       ...
       $X.grantPublicAccess(...)

We are using multiple metavariables. Notice we are using $X to capture the variable name of the Bucket, and $Y to match the import alias/name we captured earlier. By using the ellipsis operator, we can match any code the user may, or may not, have inserted between the first line and the last line in the pattern. Finally, we reference the $X metavariable we captured to match someone calling the grantPublicAccess() method, with the ellipsis operator in place of the arguments to cover the case if someone passes arguments to it, or leaves it blank. Adding the above to our final rule - we get the following:

# bucketpublic.yaml
rules:
 - id: awscdk-bucket-grantpublicaccessmethod
   patterns:
   - pattern-inside: |
      import * as $Y from '@aws-cdk/aws-s3'
      ...
   - pattern: |
       const $X = new $Y.Bucket(...)
       ...
       $X.grantPublicAccess(...)
   message: Using the GrantPublicAccess method on bucket construct $X will make the objects in the bucket world accessible.
     Verify if this is intentional.
   languages: [ts]
   severity: WARNING
   metadata:
     cwe: 'CWE-306: Missing Authentication for Critical Function'
     category: security
     technology:
     - AWS-CDK
     references:
     - https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-control-overview.html

The finished rule can be found here. Now, execute the rule using either the Semgrep Playground or Semgrep CLI:

$ semgrep % semgrep --quiet --test
1 yaml files tested
check id scoring:
--------------------------------------------------------------------------------
(TODO: 0) bucketpublic.yaml
	✔ awscdk-bucket-grantpublicaccessmethod                        TP: 1 TN: 1 FP: 0 FN: 0
--------------------------------------------------------------------------------
final confusion matrix: TP: 1 TN: 1 FP: 0 FN: 0
--------------------------------------------------------------------------------

Secure defaults by enforcing usage of custom constructs

The second “category” of issues mentioned earlier in this blog, is enforcing best practices or team standards via custom constructs, and writing Semgrep rules to ensure standards are followed. The AWS CDK allows you to author custom constructs - which are analogous to modules in other IaC frameworks. It is common for companies or teams to create “secure by default” modules for IaC. Semgrep encourages teams to “… create and enforce code guardrails”. To give a practical example, lets say that the lead software engineer of the team has developed a construct called SecureBucket, which encapsulates a number of secure defaults over the base Bucket construct:

import * as cdk from '@aws-cdk/core'
import * as s3 from '@aws-cdk/aws-s3'

export class SecureBucket extends s3.Bucket {
    constructor(scope: cdk.Construct, id: string, props?: s3.BucketProps){
        super(scope, id, {
            encryption: s3.BucketEncryption.S3_MANAGED,
            enforceSSL: true,
            serverAccessLogsBucket: s3.Bucket.fromBucketName(scope, 'LoggingBucket', 'AccountLoggingbucket'),
            blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL
        })
    }
}

The team now wants to ensure that the SecureBucket construct is used in their codebase instead of the Bucket construct. They want to enforce that “Bucket Encryption at Rest”, “Encryption in Transit”, “Logging”, and “Public Access Block” are all enabled by default. This can be done with a Semgrep rule, since we really only care about finding all usages of the Bucket construct. In the spirit of working backwards from the known bad code, lets write some code describing what we do, and don’t, wish to find with our rule:

import * as s3 from '@aws-cdk/aws-s3';
import * as cdk from '@aws-cdk/core';
import * as ec2 from '@aws-cdk/aws-ec2';
import * as secureConstructs from './secureConstructs';

export class CdkStarterStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // ruleid:awscdk-use-secure-bucket
    const bucket = new s3.Bucket(this, 's3-bucket')

    // ok:awscdk-use-secure-bucket
    const secureBucket = secureConstructs.SecureBucket(this, '')

    // ok:awscdk-use-secure-bucket
    const vpc = new ec2.Vpc(this, 'Vpc')

  }
}

We need our rule to detect the usage of the s3.Bucket construct, and not alert on the “true negative” cases. An example of this false positive would be actually using the secureBucket construct, or instantiating a different resource altogether. Given that we are looking for direct usage of the Bucket construct, we can actually modify our previous rule by removing a line from the pattern:

# awscdk-use-secure-bucket.yml

rules:
- id: awscdk-use-secure-bucket
  patterns:
    - pattern: const $X = new $Y.Bucket(...)
    - pattern-inside: |
        import * as $Y from '@aws-cdk/aws-s3'
        ...
  message: |
      Construct $X is using the standard Bucket construct - use the SecureConstruct.SecureBucket wrapper construct instead
  languages: [ts]
  severity: WARNING

Check out the live example in the Semgrep Playground here!

Be sure to check out the AWS-CDK Semgrep rules we contributed! https://semgrep.dev/r?q=aws.cdk

Thanks to Dustin Whited, Casey Douglas, and Jono Sosulska for their editor contributions!

Helpful Links and Additional Reading

AWS Cloud Development Kit: https://aws.amazon.com/cdk/
AWS CDK API Reference: https://docs.aws.amazon.com/cdk/api/v1/docs/aws-construct-library.html
AWS CDK Workshop: https://cdkworkshop.com
Pulumi: https://www.pulumi.com/
Semgrep: https://semgrep.dev
First Rule Example - Semgrep Playground: https://semgrep.dev/s/AyyL
Second Rule Example - Semgrep Playground: https://semgrep.dev/s/7nne

If you have any questions, or would like to discuss this topic in more detail, feel free to contact us and we would be happy to schedule some time to chat about how Aquia can help you and your organization.

Categories

Security AWS IaC