About MCaaS

Continuous Deployment

Overview

MCAAS will use Helm to package application deployments in EKS. A Helm chart contains the necessary boilerplate to create an instance of a Kubernetes application. A Helm release is an instance of a chart running in a Kubernetes cluster.

The typical structure of a Helm Chart is:

charts/
  Chart.yaml  # A YAML file containing information about the chart
  values.yaml # The default configuration values for this chart
  charts/     # A directory containing any charts upon which this chart depends
  templates/  # A directory of templates that, when combined with values,
              # will generate valid Kubernetes manifest files
  templates/NOTES.txt  # A plain text file containing short usage notes

Tenants are strongly encouraged to leverage the Helm charts provided by MCAAS. Tenants have the option to bring/manage their own charts (BYOC), however please note that MCAAS is not obligated to provide support for any chart-specific issues, though will attempt to do so when time permits.

MCAAS team will utilize FluxCD for continuous deployment the GitOps way. Flux leverages the Helm SDK. Continuous deployment is handled by multiple runtime components:

Source Controller monitors source control repositories and produces artifacts that will be used by the other components. For most tenants, it will watch the <tenant>-flux-config and mcaas-tenant-charts repos.
Kustomize Controller monitors, validates and deploys Kubernetes manifest (YAML) files
Helm Controller monitors and validates Kubernetes manifests of kind: HelmRelease and uses Helm to create/upgrade chart resources
Image Reflector Controller scans container image repositories for image changes
Image Automation Controller automates updates to YAML files when new container images are detected by the Image Reflector Controller

A typical high-level flow from build to deploy:

Tenant pushes/merges code changes to app repo
Jenkins picks up the GitHub push/merge event and triggers a pipeline build
App container image is built, retagged and pushed to ECR
Image Reflector Controller detects the new image tag in ECR
Image Automation Controller pushes a commit to <tenant>-flux-config for updating spec.values.image.tag in app HelmRelease with new image tag
Source Controller picks up the commit and stores a new artifact with this revision
Kustomize Controller lints and deploys the HelmRelease yaml
Helm Controller validates the HelmRelease yaml and runs helm upgrade -i

Tenant’s Flux Repository

Each tenant will have their own Flux configuration repository. Below is the Flux repository structure.

<module>/ <-- this is a sample module folder for the tenant
<module>/base <-- resources here will be deployed to all environments' clusters
<module>/development <-- resources here will be deployed to development environment cluster
<module>/test <-- resources here will be deployed to test environment cluster
<module>/staging <-- resources here will be deployed to staging environment cluster
<module>/production <-- resources here will be deployed to production environment cluster

Deep Dive on `HelmRelease`

A HelmRelease is a custom resource for declaratively managing Helm releases. A HelmRelease yaml file will be required for each app and env. For each revision of a HelmRelease yaml, Flux's Helm Controller will run the appropriate Helm action (e.g. helm [install|upgrade|test|uninstall|rollback]) and increment the release version (REVISION in helm list). Example HelmRelease files can be found in mcaas-tenant-charts.

The branches for this repository correspond to your environments. Please use the right branch in your HelmRelease file when referencing this repository:

tenant-dev-test - development & test environments
tenant-staging - staging environment
tenant-prod - production environment

Below is an example of what a HelmRelease file contains. Let’s go over this file line by line.

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: web
  namespace: tenant-flux-ns
spec:
  releaseName: web
  targetNamespace: fake-module1-robotshop
  serviceAccountName: flux2-tenant
  interval: 1m
  timeout: 5m
  chart:
    spec:
      chart: tenant-stateless-application
      sourceRef:
        kind: GitRepository
        name: mcaas-tenant-charts
  values:
    # required labels
    mcaasLabels:
      tenantShortCode: fake
      moduleShortCode: module1
      applicationShortCode: robotshop
      serviceShortCode: web
      environment: development
    # required annotations
    mcaasAnnotations:
      POC: "Tom Huang"
      POCEmail: "xicheng.huang@gsa.gov"
      launchedBy: "Tom Huang"
    service:
      create: true # true when exposing it as a service
      port: 80
    containerPort: 8080 #port inside container
    image:
      repository: 882271373783.dkr.ecr.us-east-1.amazonaws.com/module1-robotshop-web
      pullPolicy: Always
      tag: dev-77f8a3e58b79a964b06aa562cb00adc719c87aa7-1684975531745 # {"$imagepolicy": "tenant-flux-ns:module1-robotshop-web:tag"}

apiVersion defines the yaml template. The value will occasionally be changed by MCaaS to reflect the HelmRelease apiVersion of the latest upstream release.
kind is the type of this Kubernetes object. This always stays the same as HelmRelease.
metadata:
- name is the name of the release. It must be unique and must adhere to the constraints defined in this section. Please use your application service name.
- namespace is where the HelmRelease object will be deployed to. This will always be tenant-flux-ns.
spec:
- releaseName will be used as a common name for the Kubernetes objects created for this release. Please use your application service name.
- targetNamepace is the namespace for the deployment. Please use the custom namespace provided by MCAAS.
- serviceAccountName is the Kubernetes service account used for reconciliations. This is always flux2-tenant.
- interval is the interval at which to check this HelmRelease for changes. Please set this to 1m (MCaaS may adjust later as needed).
- timeout is the time to wait for any individual Kubernetes operations during the performance of a Helm upgrade action. The default 5m is sufficient for most workloads. The value should always be greater than spec.readinessProbe.initialDelaySeconds with additional buffer for the kubelet to pull your application's container image from ECR at pod launch (set to 10m or longer for extremely large images).
- chart.spec
  - chart defines the HelmChart resource, for most tenants it will be a chart in mcaas-tenant-charts.
  - sourceRef
    - kind is a *Repository resource type of the Source API. This will always be GitRepository for current tenants.
    - name is the name of the GitRepository resource, for most tenants this will be mcaas-tenant-charts.
- values section is important as it defines various components of the application deployment. These values will override the corresponding values in the chart's values.yaml

Important

MCAAS team recommends starting with an minimal HelmRelease file like the example shown above and only add/customize the ones that the application requires. Only parts that are marked as required need to be add to HelmRelease file. If you are unsure of any values, please do not add it to the values section as they will override the helm chart's default values. Please review the comments for the appropriate section in values.yaml. If you have any questions or need assistance with your deployments, please submit a JSM ticket and include relevant links such as your HelmRelease yaml and Jenkins build console logs.

Troubleshooting

Missing deployments:

Check the Jenkins build log and ensure the container image was built, retagged and pushed to ECR
1. mcaas_retag() in Jenkinsfile must be invoked because it appends the unix timestamp that's needed by Flux's Image Reflector Controller
If the Jenkins build didn't clone the application git repository, ensure that the head and/or base branch name matches the keywords {} regexes in pipeline_config.groovy
Check container logs for kustomize-controller and helm-controller in Datadog for reconciliation errors
1. Datadog > Logs > Search (set <helmrelease_spec_releaseName> appropriately or with * to search for all HelmRelease): kube_namespace:flux-system @error:*failed* service:(dockerfiles_flux2-kustomize-controller OR dockerfiles_flux2-helm-controller) (@Kustomization.name:tenant-flux2-sync-* OR @HelmRelease.name:<helmrelease_spec_releaseName>). Check the appropriate kube_cluster in the left panel and adjust the time range filter at the top right corner.
2. Drill down to one of the log events and check the error attribute
3. Some error messages can be confusing to deduce the cause of error. See below for commonly reported errors.

Important

Kustomization errors can block all HelmRelease reconciliations for an environment. This means that any subsequent updates of yaml files for that environment will not be reflected in the corresponding cluster until Kustomization errors are resolved.

Common Errors:

kustomize-controller:

Error: kustomize build failed: accumulating resources: accumulation err='merging resources from './core-demo-helmrelease.yaml': may not add resource with an already registered id: HelmRelease.v2beta1.helm.toolkit.fluxcd.io/demo.tenant-flux-ns': must build at directory: '/tmp/kustomization-3490659544/core/development/core-demo-helmrelease.yaml': file is not directory
Cause: metadata.name is being used by another app HelmRelease.
Fix: See note in Deep Dive section above
Error: failed to decode Kubernetes YAML from /tmp/kustomization-123456/core/development/myapp-helmrelease.yaml: MalformedYAMLError: yaml: line 26: did not find expected key <nil>
Cause: Bad indentation, for example:

    spec:
      values:
        extraEnv:
        - name: DB_USER
          value: admin
          - name: LOG_LEVEL
          value: DEBUG

Fix: Review the HelmRelease yaml or a yaml linter to ensure proper indentation.

helm-controller:

Error: Upgrade \"myApp\" failed: cannot patch \"myApp\" with kind Deployment: The order in patch list:* doesn't match $setElementOrder list:*
Cause: The spec.extraEnv or spec.envFromConfigMap array has a name that's already defined in the yaml
Fix: Remove the element with the duplicate environment variable name
Error: Helm upgrade failed: could not get information about the resource: serviceaccounts "myApp" is forbidden: User "system:serviceaccount:tenant-flux-ns:flux2-tenant" cannot get resource "serviceaccounts" in API group "" in the namespace "myTargetNamespace"
Cause: spec.targetNamespace is invalid or does not exist
Fix: Use the namespace that was provided at onboarding
Error: HelmChart 'tenant-flux-ns/tenant-flux-ns-web' is not ready
Cause: spec.chart.spec.chart or spec.chart.sourceRef.name is referencing a Helm Chart that's not recognized by Flux
Fix: If using mcaas-tenant-charts, ensure spec.chart.spec.chart is set to one of the directory names and spec.chart.sourceRef.name is set to mcaas-tenant-charts. If using BYOC or public HTTPS/OCI charts, contact MCAAS for assistance with configuring the required *Repository resources.
Error: Helm upgrade failed: error validating "": error validating data: [ValidationError(Deployment.spec.template.spec.containers[0].env[10]): unknown field "names" in io.k8s.api.core.v1.EnvVar, ValidationError(Deployment.spec.template.spec.containers[0].env[10]): missing required field "name" in io.k8s.api.core.v1.EnvVar]
Cause: Invalid key name specified. extraEnv is injected into the Deployment template and the Deployment resource expects the array of key-value pairs to have key names name and value. See DeploymentSpec > PodTemplateSpec > PodSpec > Container > EnvVar.
Fix: Follow the commented examples in the chart's values.yaml. Replace key with name or value
Error: Upgrade "myApp" failed: cannot patch "myApp" with kind Deployment: "" is invalid: patch: Invalid value: "{...}": json: cannot unmarshal [bool||number] into Go struct field EnvVar.spec.template.spec.containers.env.value of type string
Cause: Unquoted boolean or numeric environment variable
Fix: Ensure spec.values.extraEnv[]?.value boolean or numeric values are enclosed in double quotes
Error: timed out waiting for the condition
Cause: Helm could not create or update resources
Fix:
- See note for spec.timeout in the example HelmRelease above
- Go to Datadog > Infrastructure > Kubernetes > Explorer. Check if the new pod is deployed with Ready status. If the status is CrashLoopBackOff, the issue is likely within your application code or Dockerfile.
- Go to Datadog > Service Mgmt > Event Management. Filter by Event Type: kubernetes_apiserver (@evt.type:kubernetes_apiserver), check the appropriate Kubernetes Cluster, adjust time range if needed, enter the application HelmRelease metadata.name/spec.releaseName in the search box and review any results.
- Submit a JSM ticket

On this page:

Overview
Tenant’s Flux Repository
Deep Dive on
Troubleshooting
- Missing deployments:
- Common Errors:
  - kustomize-controller:
  - helm-controller:

test