ramblingsAWS Multi-account domains, certificates and GH CI/CD

AWS Multi-account domains, certificates and GH CI/CD

Published: 14-11-2022

Using domains, certificates and GH Actions CI/CD in multi-account AWS.

Contents

Info

Overview

Account structure

Domains and certificates

Creating cross account certificates

Renewing cross account certificates

CI/CD with GitHub Actions

Example deploy-app-prod.yml workflow

Resources

Overview

AWS is pretty relaxed until multiple accounts start showing up. Anything vaguely production requires multiple accounts!

There are probably endless available permutations when setting up account structures and CI/CD, depending on the application's size and complexity. Instead of trying to cover all bases I tend to focus on the mappings between application(s), services, CDK stacks and accounts, making sure that the CDK code is flexible enough to deploy to shared or isolated accounts.

Note, this is me operating from a serverless point of view, there are no VPCs for providing isolation.

Account structure

So the rest of the rambling makes sense, it's best to briefly touch on account structures. There is a whitepaper on organising AWS accounts, which this draws from. Apart from helping to scale throughput, multiple accounts make for good security isolation, bounded context/team separation and compliance.

When starting projects I go for an application 'monoaccount' structure, where the entire application runs in a single account. There are still multiple accounts, but they are for different environments and isolated - it's has more in parallel with a monorepo than a monolith. When an application starts getting bigger and the team grows, it can be gradually split out into an account-per-service structure, but there's no need to start with a gazillion accounts.

All the AWS examples include something that resembles a shared infrastructure account. I tend to stick with the management account at the beginning for things like domains/email etc - they can always be split out.

That looks something like:

When not using GitHub Actions, a group of 'deployment' accounts are also useful for running CI/CD pipelines. Sandbox developer accounts are also used whenever working in a team.

Domains and certificates

Domains and certificates can cause a few cross account headaches. Certificates require DNS validation and services/applications often need alias records. Stack deployments with the CDK are single account/region only, and handing out permissions to sub accounts for root domain hosted zones access isn't a good idea.

A good solution for subdomains is to delegate DNS to sub accounts so all records needed by the accounts service(s)/application(s) can be applied internally.

Using root domains is a little trickier. Most of the projects I work on need to use a root domain certificate in the production account, which makes automated DNS validation less straightforward, as the root domain hosted zone is hiding in an account elsewhere.

One solution is to create and validate the certificate manually - not a bad option, considering how infrequently it needs to be done in basic scenarios.

I've seen a few people say just put the root domain hosted zone in the production account - probably not the best idea security and subdomain delegation wise.

The best solution seems to be, have a locked down shared infrastructure account (or management account) with the root domain hosted zone. Subdomains can be delegated to other accounts as needed. When the root domain needs to be accessed, a certificate can be created and verified manually or using a custom resource when bootstrapping (shown later).

Delegating DNS

To delegate a subdomain, first, a hosted zone needs to be set up in the designated account. Then, a NS (nameserver) record needs to be created in the shared infrastructure account with the NS details from the designated account. Now the designated account can create certificates and subdomains by itself.

The subdomains can of course be anything. I use Org Formation to sort this out for me in a nice automated way - it's pretty much built in.

This is all good if we need a subdomain in an account, but using the root domain from a workload account still requires cross account permissions. For example, when using CloudFront, a domain alias needs to point at the distribution and the workload account must have a valid certificate. Both require records to be created in the root domain hosted zone. Giving permission for a sub account to do that is a no go, security wise.

Creating cross account certificates

Unfortunately, the AWS::CertificateManager::Certificate CloudFormation resource hangs 'in-progress' until the certificate is validated, which never happens when the hosted zone is in another account. We also don't know the CNAME resource record details because nothing is returned during the process. The only way to get this working seems to be with a custom resource.

Using a custom resource, the certificate can be created with an ACM API call and returned without waiting for validation. The CNAME record name and value can also be returned, and used with the standard AWS::Route53::RecordSet resource to create the validation record.

As this is not part of an application stack, rather general infrastructure, it doesn't matter too much that the validation is asynchronous. If synchronicity was required, I guess another 'abstract' custom resource could be created to poll and wait for validation.

An Org Formation snippet with custom certificate resource is shown below:

RootCertificate:
  Type: update-stacks
  Template: ./templates/cross-account-certificate.yml
  StackName: AppRootCertificate
  Parameters:
    rootHostedZoneName: mydomain.com
    certificateDomainName: mydomain.com
  OrganizationBindings:
    SubAccountBinding:
      Account: !Ref WorkloadsProdAppAccount
      Region: us-east-1

organization-tasks.yml

AWSTemplateFormatVersion: '2010-09-09-OC'

# Include file that contains Organization Section.
# The Organization Section describes Accounts, Organizational Units, etc.
Organization: !Include ../organization.yml

# Any Binding that does not explicitly specify a region will default to this.
# Value can be either string or list
DefaultOrganizationBindingRegion: eu-west-2

Parameters:
  rootHostedZoneName:
    Type: String

  certificateDomainName:
    Type: String

# Section contains a named set of Bindings.
# Bindings determine what resources are deployed where
# These bindings can be !Ref'd from the Resources in the resource section
OrganizationBindings:
  SubAccountBinding:
    Account: !Ref WorkloadsProdAppAccount
    Region: us-east-1

  RootAccountBinding:
    IncludeMasterAccount: true

Resources:
  SubAccountCreateCertificateResource:
    Type: 'Custom::SubAccountCertificateValidation'
    OrganizationBinding: !Ref SubAccountBinding
    Properties:
      ServiceToken: !GetAtt CertificateResourceLambdaFunction.Arn
      certificateDomainName: !Sub '${certificateDomainName}'

  CertificateResourceLambdaFunctionRole:
    Type: 'AWS::IAM::Role'
    OrganizationBinding: !Ref SubAccountBinding
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
            Action:
              - 'sts:AssumeRole'
      Path: /
      Policies:
        - PolicyName: root
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - 'logs:CreateLogGroup'
                  - 'logs:CreateLogStream'
                  - 'logs:PutLogEvents'
                Resource: '*'
              - Effect: Allow
                Action:
                  - 'acm:RequestCertificate'
                  - 'acm:DescribeCertificate'
                  - 'acm:deleteCertificate'
                Resource: '*'

  CertificateResourceLambdaFunction:
    Type: 'AWS::Lambda::Function'
    OrganizationBinding: !Ref SubAccountBinding
    Properties:
      Runtime: nodejs16.x
      Timeout: '300'
      Handler: index.handler
      Role: !GetAtt CertificateResourceLambdaFunctionRole.Arn
      Code:
        ZipFile: |
          const AWS = require('aws-sdk')
          const response = require('cfn-response')

          exports.handler = async function(event, context) {
            console.log("REQUEST RECEIVED:\n" + JSON.stringify(event))
            const acm = new AWS.ACM({apiVersion: '2015-12-08'});

            const send = (evt, ctx, status, data, physicalResourceId = null) => {
              return new Promise(() => { response.send(evt, ctx, status, data, physicalResourceId) });
            }

            const createCert = async() => {

              const created = await acm.requestCertificate({
                DomainName: event.ResourceProperties.certificateDomainName,
                ValidationMethod: 'DNS'
              }).promise();

              const described = await acm.describeCertificate({
                CertificateArn: created.CertificateArn
              }).promise();

              await send(event, context, 'SUCCESS', {
                CertificateArn: described.Certificate.CertificateArn,
                ResourceRecordName: described.Certificate.DomainValidationOptions[0].ResourceRecord.Name,
                ResourceRecordValue: described.Certificate.DomainValidationOptions[0].ResourceRecord.Value
              }, described.Certificate.CertificateArn);
            }

            if (event.RequestType === "Create") {
              try {
                await createCert()
              } catch (e) {
                const responseData = { Error: "Certificate create error" }
                console.log(responseData.Error + ":\n", err)
                await send(event, context, 'FAILED', responseData);
              }
            }

            if (event.RequestType === "Update") {
              try {
                await createCert()
              } catch (e) {
                const responseData = { Error: "Certificate update error" }
                console.log(responseData.Error + ":\n", err)
                await send(event, context, 'FAILED', responseData);
              }
            }

            if (event.RequestType === "Delete") {
              try {
                const deleted = await acm.deleteCertificate({
                  CertificateArn: event.PhysicalResourceId
                }).promise();
                await send(event, context, "SUCCESS", deleted, event.PhysicalResourceId)
              } catch (e) {
                const responseData = { Error: "Certificate delete error" }
                console.log(responseData.Error + ":\n", err)
                await send(event, context, 'FAILED', responseData);
              }
            }
          }

  RootAccountCertificateCnameRecord:
    DependsOn: SubAccountCertificateValidationResource
    Type: AWS::Route53::RecordSet
    OrganizationBinding: !Ref RootAccountBinding
    Properties:
      HostedZoneName: !Sub '${rootHostedZoneName}.'
      Type: CNAME
      Name: !GetAtt SubAccountCertificateValidationResource.ResourceRecordName
      TTL: 60
      ResourceRecords:
        [!GetAtt SubAccountCertificateValidationResource.ResourceRecordValue]

Outputs:
  CertificateArn:
    Value: !GetAtt SubAccountCertificateValidationResource.CertificateArn
  CertificateResourceRecordName:
    Value: !GetAtt SubAccountCertificateValidationResource.ResourceRecordName
  CertificateResourceRecordValue:
    Value: !GetAtt SubAccountCertificateValidationResource.ResourceRecordValue

templates/cross-account-certificate.yml

Note, this is is Org Formation annotated CloudFormation (has the Organization bindings), which is why it works cross account.

The other important part to note is using the certificate ARN as the PhysicalResourceId - this is important for making update and delete events behave correctly. There are lots of articles around, explaining the importance of PhysicalResourceId in custom resources.

Renewing cross account certificates

Once in place and attached to a service such as CloudFront, the certificate will be renewed automatically, which is nice. This works as the CNAME record doesn't change for renewals, and is already in place.

CI/CD with GitHub Actions

Firstly, why GH Actions and not CDK Pipelines or CodePipelines? The whole GitHub integrated developer experience works well and and keeps everything developer related in one place. Actions also seem more popular making it easier to use as a team straight away. CodePipeline is still plenty good though - more in another rambling.

Basic flow

The GitHub Actions CI/CD workflow file structure is quite dependant on the version control process used, so there are a whole bunch of different ways to integrate CI/CD! Here is the simple process I've started using.

Workflow files are created with the following structure

.github/workflows/<process>-<application>-<service?>-<environment?>.yml

where service and environment are not required but can be added. As an example, workflow files for deploying a simple app with GitHub flow style version control, look like:

ci_test(c)-app.yml
e2e_test(c)-app.yml
open_merge_request-app.yml
close_merge_request-app.yml
deploy-app.yml

(c) means callable - a reusable workflow. Combining this with the account structure from earlier:

In this example, versioned applications are created in the shared dev account when a pull request is made, allowing changes to be viewed/tested (if needed). After the pull request is rejected or merged, the application version is removed from the dev account, and the deploy process starts. Deploying consists of spinning up the application in a dedicated staging account where further testing can take place. Given manual approval, the application is moved into production. Reusable workflows are used to run integration tests on individual commits and e2e tests on deployed applications.

Other version control flows

There are other version control flows like GitFlow, GitLab Flow and Trunk-based development, but fundamentally the same process can be used. For example, adding a release branch, as sometimes used in GitLab and Trunk based flows, to enable the timing of releases, might look like.

ci_test(c)-app.yml
e2e_test(c)-app.yml
open_merge_request-app.yml
close_merge_request-app.yml
delploy-app-dev.yml
deploy-app-prod.yml

A release can be made at any point along the main branch, allowing releases to be deployed independently from development work.

Example deploy-app-prod.yml workflow

Below is a workflow example that:

Runs some tests from a (not included) reusable workflow.
Deploys to a staging account and runs some more tests.
Waits for manual approval using a public step
Deploys to prod if approved, and running some more tests afterwards.

name: deploy app prod

on:
  push:
    branches:
      - main

jobs:
  test:
    name: Run integrations tests
    uses: <github-org>/<github-repo>/.github/workflows/ci_test(c)-app.yml@main

  deploy_to_dev:
    name: Deploy to staging
    environment: staging
    needs: test
    runs-on: ubuntu-latest
    timeout-minutes: 15

    # These permissions are needed to interact with GitHub's OIDC Token endpoint.
    permissions:
      id-token: write
      contents: read
      issues: write

    steps:
      # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.x'
      - uses: actions/setup-node@v3
        with:
          node-version: '16'
          cache: 'yarn'

      - name: yarn install
        run: yarn install --immutable --immutable-cache --check-cache

      - name: Build frontend
        run: |
          yarn build:prod

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          role-to-assume: <staging-github-role-arn>
          aws-region: <aws-region>

      - name: deploy to staging
        run: |
          yarn cdk deploy appApi app --require-approval never -c stage=staging

      - name: Run e2e tests
        uses: <github-org>/<github-repo>/.github/workflows/e2e_test(c)-app.yml@main
        with:
          stage: staging

      - uses: trstringer/manual-approval@v1
        with:
          secret: ${{ secrets.GITHUB_TOKEN }}
          approvers: <github-username(s)-for-approval>
          minimum-approvals: 1
          issue-title: 'Deploy ${{ github.sha }}'

  deploy_to_prod:
    name: Deploy to prod
    environment: prod
    needs: deploy_to_staging
    runs-on: ubuntu-latest
    timeout-minutes: 15

    # These permissions are needed to interact with GitHub's OIDC Token endpoint.
    permissions:
      id-token: write
      contents: read

    steps:
      # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.x'
      - uses: actions/setup-node@v3
        with:
          node-version: '16'
          cache: 'yarn'

      - name: yarn install
        run: yarn install --immutable --immutable-cache --check-cache

      - name: Build frontend
        run: |
          yarn build:prod

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          role-to-assume: <prod-github-role-arn>
          aws-region: <aws-region>

      - name: deploy to prod
        run: |
          yarn cdk deploy appApi app --require-approval never -c stage=prod

      - name: Run e2e tests
        uses: <github-org>/<github-repo>/.github/workflows/e2e_test(c)-app.yml@main
        with:
          stage: prod

This is a fairly straightforward deployment to a single application/account, additions could be made for a whole range of uses cases - a few ideas:

Allowing deployments to multiple siloed applications (SaaS environment), including customer controlled version upgrading using callable workflows.
Adding automated feature flag rollout with continuous testing and rollback.

Manual approval

Manual approval is bit of a pain without the enterprise GitHub plan, where it is natively supported, but there are some workarounds. There is a public step action that emulates manual approval by creating an GH Issue that requires approval by selectable team members, only after which the job will continue. This step is used in the example workflow above. Another option could be to fire off an async messages to somewhere like slack, with a hook that can run a callable workflow passing in a commit hash.

Or, just don't use use a manual approval in between staging and prod. Creating a pool of isolated staging accounts that can be used during the pull request stage might inspire more confidence to go straight to production.

CDK deploy

It's handy to build some flexibility into the CDK stacks that allow them to be deployed to isolated and/or shared accounts - something like:

import { AppStack } from '../lib/appStack';

const branch = app.node.tryGetContext('branch')
  ? app.node.tryGetContext('branch') + '-'
  : '';

new AppStack(app, `${branch}appStack`, {});

This means in a shared dev account the stack can be called using a dynamic branch name and deployed alongside other pull requests in the same account:

// Deploy with branch name
cdk deploy ${{ github.ref_name }}-appStack -c branch=${{ github.ref_name }}

// Deploy without branch name
cdk deploy appStack

This only works if the stacks avoid explicitly named resources, which is a best practice. Care also needs to be taken over how certain resources, like databases and S3 Bucket are removed/cleaned up when destroying CDK stacks. Orphans are not welcome in the dev account, but production databases must never be accidentally deleted. Passing in the environment

What about when a single service needs to be deployed across multiple accounts (think isolated storage compliance)?

Run the OIDC credentials provider again between multiple deployment steps (stacks).
Have a separate job per deployment account.
Split out the CDK app into multiple apps.

The point being that even though multiple account IDs and regions can be specified in a CDK app the OIDC role ARN is account specific. Typically, CodePipelines or CDK pipelines do handle this is a more automated way, by setting up trusted roles between accounts. That is one of the trades when using GH actions for CI/CD.

Resources

Organizing Your AWS Environment Using Multiple Accounts whitepaper - https://docs.aws.amazon.com/whitepapers/latest/organizing-your-aws-environment/production-starter-organization.html
GitHub Flow - https://docs.github.com/en/get-started/quickstart/github-flow
GitLab Flow - https://docs.gitlab.com/ee/topics/gitlab_flow.html
GitFlow - https://nvie.com/posts/a-successful-git-branching-model/
Trunk based development - https://trunkbaseddevelopment.com