Published: 14-11-2022
Using domains, certificates and GH Actions CI/CD in multi-account AWS.
AWS is pretty relaxed until multiple accounts start showing up. Anything vaguely production requires multiple accounts!
There are probably endless available permutations when setting up account structures and CI/CD, depending on the application's size and complexity. Instead of trying to cover all bases I tend to focus on the mappings between application(s), services, CDK stacks and accounts, making sure that the CDK code is flexible enough to deploy to shared or isolated accounts.
Note, this is me operating from a serverless point of view, there are no VPCs for providing isolation.
So the rest of the rambling makes sense, it's best to briefly touch on account structures. There is a whitepaper on organising AWS accounts, which this draws from. Apart from helping to scale throughput, multiple accounts make for good security isolation, bounded context/team separation and compliance.
When starting projects I go for an application 'monoaccount' structure, where the entire application runs in a single account. There are still multiple accounts, but they are for different environments and isolated - it's has more in parallel with a monorepo than a monolith. When an application starts getting bigger and the team grows, it can be gradually split out into an account-per-service structure, but there's no need to start with a gazillion accounts.
All the AWS examples include something that resembles a shared infrastructure account. I tend to stick with the management account at the beginning for things like domains/email etc - they can always be split out.
That looks something like:
When not using GitHub Actions, a group of 'deployment' accounts are also useful for running CI/CD pipelines. Sandbox developer accounts are also used whenever working in a team.
Domains and certificates can cause a few cross account headaches. Certificates require DNS validation and services/applications often need alias records. Stack deployments with the CDK are single account/region only, and handing out permissions to sub accounts for root domain hosted zones access isn't a good idea.
A good solution for subdomains is to delegate DNS to sub accounts so all records needed by the accounts service(s)/application(s) can be applied internally.
Using root domains is a little trickier. Most of the projects I work on need to use a root domain certificate in the production account, which makes automated DNS validation less straightforward, as the root domain hosted zone is hiding in an account elsewhere.
One solution is to create and validate the certificate manually - not a bad option, considering how infrequently it needs to be done in basic scenarios.
I've seen a few people say just put the root domain hosted zone in the production account - probably not the best idea security and subdomain delegation wise.
The best solution seems to be, have a locked down shared infrastructure account (or management account) with the root domain hosted zone. Subdomains can be delegated to other accounts as needed. When the root domain needs to be accessed, a certificate can be created and verified manually or using a custom resource when bootstrapping (shown later).
To delegate a subdomain, first, a hosted zone needs to be set up in the designated account. Then, a NS (nameserver) record needs to be created in the shared infrastructure account with the NS details from the designated account. Now the designated account can create certificates and subdomains by itself.
The subdomains can of course be anything. I use Org Formation to sort this out for me in a nice automated way - it's pretty much built in.
This is all good if we need a subdomain in an account, but using the root domain from a workload account still requires cross account permissions. For example, when using CloudFront, a domain alias needs to point at the distribution and the workload account must have a valid certificate. Both require records to be created in the root domain hosted zone. Giving permission for a sub account to do that is a no go, security wise.
Unfortunately, the AWS::CertificateManager::Certificate
CloudFormation resource hangs 'in-progress' until the certificate is validated, which never happens when the hosted zone is in another account. We also don't know the CNAME resource record details because nothing is returned during the process. The only way to get this working seems to be with a custom resource.
Using a custom resource, the certificate can be created with an ACM API call and returned without waiting for validation. The CNAME record name and value can also be returned, and used with the standard AWS::Route53::RecordSet resource to create the validation record.
As this is not part of an application stack, rather general infrastructure, it doesn't matter too much that the validation is asynchronous. If synchronicity was required, I guess another 'abstract' custom resource could be created to poll and wait for validation.
An Org Formation snippet with custom certificate resource is shown below:
RootCertificate:
Type: update-stacks
Template: ./templates/cross-account-certificate.yml
StackName: AppRootCertificate
Parameters:
rootHostedZoneName: mydomain.com
certificateDomainName: mydomain.com
OrganizationBindings:
SubAccountBinding:
Account: !Ref WorkloadsProdAppAccount
Region: us-east-1
AWSTemplateFormatVersion: '2010-09-09-OC'
# Include file that contains Organization Section.
# The Organization Section describes Accounts, Organizational Units, etc.
Organization: !Include ../organization.yml
# Any Binding that does not explicitly specify a region will default to this.
# Value can be either string or list
DefaultOrganizationBindingRegion: eu-west-2
Parameters:
rootHostedZoneName:
Type: String
certificateDomainName:
Type: String
# Section contains a named set of Bindings.
# Bindings determine what resources are deployed where
# These bindings can be !Ref'd from the Resources in the resource section
OrganizationBindings:
SubAccountBinding:
Account: !Ref WorkloadsProdAppAccount
Region: us-east-1
RootAccountBinding:
IncludeMasterAccount: true
Resources:
SubAccountCreateCertificateResource:
Type: 'Custom::SubAccountCertificateValidation'
OrganizationBinding: !Ref SubAccountBinding
Properties:
ServiceToken: !GetAtt CertificateResourceLambdaFunction.Arn
certificateDomainName: !Sub '${certificateDomainName}'
CertificateResourceLambdaFunctionRole:
Type: 'AWS::IAM::Role'
OrganizationBinding: !Ref SubAccountBinding
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
Action:
- 'sts:AssumeRole'
Path: /
Policies:
- PolicyName: root
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- 'logs:CreateLogGroup'
- 'logs:CreateLogStream'
- 'logs:PutLogEvents'
Resource: '*'
- Effect: Allow
Action:
- 'acm:RequestCertificate'
- 'acm:DescribeCertificate'
- 'acm:deleteCertificate'
Resource: '*'
CertificateResourceLambdaFunction:
Type: 'AWS::Lambda::Function'
OrganizationBinding: !Ref SubAccountBinding
Properties:
Runtime: nodejs16.x
Timeout: '300'
Handler: index.handler
Role: !GetAtt CertificateResourceLambdaFunctionRole.Arn
Code:
ZipFile: |
const AWS = require('aws-sdk')
const response = require('cfn-response')
exports.handler = async function(event, context) {
console.log("REQUEST RECEIVED:\n" + JSON.stringify(event))
const acm = new AWS.ACM({apiVersion: '2015-12-08'});
const send = (evt, ctx, status, data, physicalResourceId = null) => {
return new Promise(() => { response.send(evt, ctx, status, data, physicalResourceId) });
}
const createCert = async() => {
const created = await acm.requestCertificate({
DomainName: event.ResourceProperties.certificateDomainName,
ValidationMethod: 'DNS'
}).promise();
const described = await acm.describeCertificate({
CertificateArn: created.CertificateArn
}).promise();
await send(event, context, 'SUCCESS', {
CertificateArn: described.Certificate.CertificateArn,
ResourceRecordName: described.Certificate.DomainValidationOptions[0].ResourceRecord.Name,
ResourceRecordValue: described.Certificate.DomainValidationOptions[0].ResourceRecord.Value
}, described.Certificate.CertificateArn);
}
if (event.RequestType === "Create") {
try {
await createCert()
} catch (e) {
const responseData = { Error: "Certificate create error" }
console.log(responseData.Error + ":\n", err)
await send(event, context, 'FAILED', responseData);
}
}
if (event.RequestType === "Update") {
try {
await createCert()
} catch (e) {
const responseData = { Error: "Certificate update error" }
console.log(responseData.Error + ":\n", err)
await send(event, context, 'FAILED', responseData);
}
}
if (event.RequestType === "Delete") {
try {
const deleted = await acm.deleteCertificate({
CertificateArn: event.PhysicalResourceId
}).promise();
await send(event, context, "SUCCESS", deleted, event.PhysicalResourceId)
} catch (e) {
const responseData = { Error: "Certificate delete error" }
console.log(responseData.Error + ":\n", err)
await send(event, context, 'FAILED', responseData);
}
}
}
RootAccountCertificateCnameRecord:
DependsOn: SubAccountCertificateValidationResource
Type: AWS::Route53::RecordSet
OrganizationBinding: !Ref RootAccountBinding
Properties:
HostedZoneName: !Sub '${rootHostedZoneName}.'
Type: CNAME
Name: !GetAtt SubAccountCertificateValidationResource.ResourceRecordName
TTL: 60
ResourceRecords:
[!GetAtt SubAccountCertificateValidationResource.ResourceRecordValue]
Outputs:
CertificateArn:
Value: !GetAtt SubAccountCertificateValidationResource.CertificateArn
CertificateResourceRecordName:
Value: !GetAtt SubAccountCertificateValidationResource.ResourceRecordName
CertificateResourceRecordValue:
Value: !GetAtt SubAccountCertificateValidationResource.ResourceRecordValue
Note, this is is Org Formation annotated CloudFormation (has the Organization bindings), which is why it works cross account.
The other important part to note is using the certificate ARN as the PhysicalResourceId - this is important for making update and delete events behave correctly. There are lots of articles around, explaining the importance of PhysicalResourceId in custom resources.
Once in place and attached to a service such as CloudFront, the certificate will be renewed automatically, which is nice. This works as the CNAME record doesn't change for renewals, and is already in place.
Firstly, why GH Actions and not CDK Pipelines or CodePipelines? The whole GitHub integrated developer experience works well and and keeps everything developer related in one place. Actions also seem more popular making it easier to use as a team straight away. CodePipeline is still plenty good though - more in another rambling.
The GitHub Actions CI/CD workflow file structure is quite dependant on the version control process used, so there are a whole bunch of different ways to integrate CI/CD! Here is the simple process I've started using.
Workflow files are created with the following structure
.github/workflows/<process>-<application>-<service?>-<environment?>.yml
where service
and environment
are not required but can be added. As an example, workflow files for deploying a simple app with GitHub flow style version control, look like:
ci_test(c)-app.yml
e2e_test(c)-app.yml
open_merge_request-app.yml
close_merge_request-app.yml
deploy-app.yml
(c)
means callable - a reusable workflow. Combining this with the account structure from earlier:
In this example, versioned applications are created in the shared dev account when a pull request is made, allowing changes to be viewed/tested (if needed). After the pull request is rejected or merged, the application version is removed from the dev account, and the deploy process starts. Deploying consists of spinning up the application in a dedicated staging account where further testing can take place. Given manual approval, the application is moved into production. Reusable workflows are used to run integration tests on individual commits and e2e tests on deployed applications.
There are other version control flows like GitFlow, GitLab Flow and Trunk-based development, but fundamentally the same process can be used. For example, adding a release branch, as sometimes used in GitLab and Trunk based flows, to enable the timing of releases, might look like.
ci_test(c)-app.yml
e2e_test(c)-app.yml
open_merge_request-app.yml
close_merge_request-app.yml
delploy-app-dev.yml
deploy-app-prod.yml
A release can be made at any point along the main branch, allowing releases to be deployed independently from development work.
Below is a workflow example that:
name: deploy app prod
on:
push:
branches:
- main
jobs:
test:
name: Run integrations tests
uses: <github-org>/<github-repo>/.github/workflows/ci_test(c)-app.yml@main
deploy_to_dev:
name: Deploy to staging
environment: staging
needs: test
runs-on: ubuntu-latest
timeout-minutes: 15
# These permissions are needed to interact with GitHub's OIDC Token endpoint.
permissions:
id-token: write
contents: read
issues: write
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.x'
- uses: actions/setup-node@v3
with:
node-version: '16'
cache: 'yarn'
- name: yarn install
run: yarn install --immutable --immutable-cache --check-cache
- name: Build frontend
run: |
yarn build:prod
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
role-to-assume: <staging-github-role-arn>
aws-region: <aws-region>
- name: deploy to staging
run: |
yarn cdk deploy appApi app --require-approval never -c stage=staging
- name: Run e2e tests
uses: <github-org>/<github-repo>/.github/workflows/e2e_test(c)-app.yml@main
with:
stage: staging
- uses: trstringer/manual-approval@v1
with:
secret: ${{ secrets.GITHUB_TOKEN }}
approvers: <github-username(s)-for-approval>
minimum-approvals: 1
issue-title: 'Deploy ${{ github.sha }}'
deploy_to_prod:
name: Deploy to prod
environment: prod
needs: deploy_to_staging
runs-on: ubuntu-latest
timeout-minutes: 15
# These permissions are needed to interact with GitHub's OIDC Token endpoint.
permissions:
id-token: write
contents: read
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.x'
- uses: actions/setup-node@v3
with:
node-version: '16'
cache: 'yarn'
- name: yarn install
run: yarn install --immutable --immutable-cache --check-cache
- name: Build frontend
run: |
yarn build:prod
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
role-to-assume: <prod-github-role-arn>
aws-region: <aws-region>
- name: deploy to prod
run: |
yarn cdk deploy appApi app --require-approval never -c stage=prod
- name: Run e2e tests
uses: <github-org>/<github-repo>/.github/workflows/e2e_test(c)-app.yml@main
with:
stage: prod
This is a fairly straightforward deployment to a single application/account, additions could be made for a whole range of uses cases - a few ideas:
Manual approval is bit of a pain without the enterprise GitHub plan, where it is natively supported, but there are some workarounds. There is a public step action that emulates manual approval by creating an GH Issue that requires approval by selectable team members, only after which the job will continue. This step is used in the example workflow above. Another option could be to fire off an async messages to somewhere like slack, with a hook that can run a callable workflow passing in a commit hash.
Or, just don't use use a manual approval in between staging and prod. Creating a pool of isolated staging accounts that can be used during the pull request stage might inspire more confidence to go straight to production.
It's handy to build some flexibility into the CDK stacks that allow them to be deployed to isolated and/or shared accounts - something like:
import { AppStack } from '../lib/appStack';
const branch = app.node.tryGetContext('branch')
? app.node.tryGetContext('branch') + '-'
: '';
new AppStack(app, `${branch}appStack`, {});
This means in a shared dev account the stack can be called using a dynamic branch name and deployed alongside other pull requests in the same account:
// Deploy with branch name
cdk deploy ${{ github.ref_name }}-appStack -c branch=${{ github.ref_name }}
// Deploy without branch name
cdk deploy appStack
This only works if the stacks avoid explicitly named resources, which is a best practice. Care also needs to be taken over how certain resources, like databases and S3 Bucket are removed/cleaned up when destroying CDK stacks. Orphans are not welcome in the dev account, but production databases must never be accidentally deleted. Passing in the environment
What about when a single service needs to be deployed across multiple accounts (think isolated storage compliance)?
The point being that even though multiple account IDs and regions can be specified in a CDK app the OIDC role ARN is account specific. Typically, CodePipelines or CDK pipelines do handle this is a more automated way, by setting up trusted roles between accounts. That is one of the trades when using GH actions for CI/CD.