About MCaaS
Continuous Deployment
Overview
MCAAS will use Helm to package application deployments in EKS. A Helm chart contains the necessary boilerplate to create an instance of a Kubernetes application. A Helm release is an instance of a chart running in a Kubernetes cluster.
The typical structure of a Helm Chart is:
charts/
Chart.yaml # A YAML file containing information about the chart
values.yaml # The default configuration values for this chart
charts/ # A directory containing any charts upon which this chart depends
templates/ # A directory of templates that, when combined with values,
# will generate valid Kubernetes manifest files
templates/NOTES.txt # A plain text file containing short usage notes
Tenants are strongly encouraged to leverage the Helm charts provided by MCAAS. Tenants have the option to bring/manage their own charts (BYOC), however please note that MCAAS is not obligated to provide support for any chart-specific issues, though will attempt to do so when time permits.
MCAAS team will utilize FluxCD for continuous deployment the GitOps way. Flux leverages the Helm SDK. Continuous deployment is handled by multiple runtime components:
- Source Controller monitors source control repositories and produces artifacts that will be used by the other components. For most tenants, it will watch the
<tenant>-flux-config
andmcaas-tenant-charts
repos. - Kustomize Controller monitors, validates and deploys Kubernetes manifest (YAML) files
- Helm Controller monitors and validates Kubernetes manifests of kind: HelmRelease and uses Helm to create/upgrade chart resources
- Image Reflector Controller scans container image repositories for image changes
- Image Automation Controller automates updates to YAML files when new container images are detected by the Image Reflector Controller
A typical high-level flow from build to deploy:
- Tenant pushes/merges code changes to app repo
- Jenkins picks up the GitHub push/merge event and triggers a pipeline build
- App container image is built, retagged and pushed to ECR
- Image Reflector Controller detects the new image tag in ECR
- Image Automation Controller pushes a commit to
<tenant>-flux-config
for updatingspec.values.image.tag
in app HelmRelease with new image tag - Source Controller picks up the commit and stores a new artifact with this revision
- Kustomize Controller lints and deploys the HelmRelease yaml
- Helm Controller validates the HelmRelease yaml and runs
helm upgrade -i
Tenant’s Flux Repository
Each tenant will have their own Flux configuration repository. Below is the Flux repository structure.
<module>/ <-- this is a sample module folder for the tenant
<module>/base <-- resources here will be deployed to all environments' clusters
<module>/development <-- resources here will be deployed to development environment cluster
<module>/test <-- resources here will be deployed to test environment cluster
<module>/staging <-- resources here will be deployed to staging environment cluster
<module>/production <-- resources here will be deployed to production environment cluster
HelmRelease
Deep Dive on A HelmRelease
is a custom resource for declaratively managing Helm releases. A HelmRelease
yaml file will be required for each app and env. For each revision of a HelmRelease yaml, Flux's Helm Controller will run the appropriate Helm action (e.g. helm [install|upgrade|test|uninstall|rollback]
) and increment the release version (REVISION
in helm list
). Example HelmRelease files can be found in mcaas-tenant-charts.
The branches for this repository correspond to your environments. Please use the right branch in your HelmRelease
file when referencing this repository:
tenant-dev-test
- development & test environmentstenant-staging
- staging environmenttenant-prod
- production environment
Below is an example of what a HelmRelease
file contains. Let’s go over this file line by line.
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: web
namespace: tenant-flux-ns
spec:
releaseName: web
targetNamespace: fake-module1-robotshop
serviceAccountName: flux2-tenant
interval: 1m
timeout: 5m
chart:
spec:
chart: tenant-stateless-application
sourceRef:
kind: GitRepository
name: mcaas-tenant-charts
values:
# required labels
mcaasLabels:
tenantShortCode: fake
moduleShortCode: module1
applicationShortCode: robotshop
serviceShortCode: web
environment: development
# required annotations
mcaasAnnotations:
POC: "Tom Huang"
POCEmail: "xicheng.huang@gsa.gov"
launchedBy: "Tom Huang"
service:
create: true # true when exposing it as a service
port: 80
containerPort: 8080 #port inside container
image:
repository: 882271373783.dkr.ecr.us-east-1.amazonaws.com/module1-robotshop-web
pullPolicy: Always
tag: dev-77f8a3e58b79a964b06aa562cb00adc719c87aa7-1684975531745 # {"$imagepolicy": "tenant-flux-ns:module1-robotshop-web:tag"}
apiVersion
defines the yaml template. The value will occasionally be changed by MCaaS to reflect the HelmRelease apiVersion of the latest upstream release.kind
is the type of this Kubernetes object. This always stays the same asHelmRelease
.metadata
:name
is the name of the release. It must be unique and must adhere to the constraints defined in this section. Please use your application service name.namespace
is where theHelmRelease
object will be deployed to. This will always betenant-flux-ns
.
spec
:releaseName
will be used as a common name for the Kubernetes objects created for this release. Please use your application service name.targetNamepace
is the namespace for the deployment. Please use the custom namespace provided by MCAAS.serviceAccountName
is the Kubernetes service account used for reconciliations. This is alwaysflux2-tenant
.interval
is the interval at which to check this HelmRelease for changes. Please set this to1m
(MCaaS may adjust later as needed).timeout
is the time to wait for any individual Kubernetes operations during the performance of a Helm upgrade action. The default5m
is sufficient for most workloads. The value should always be greater thanspec.readinessProbe.initialDelaySeconds
with additional buffer for the kubelet to pull your application's container image from ECR at pod launch (set to10m
or longer for extremely large images).chart.spec
chart
defines the HelmChart resource, for most tenants it will be a chart in mcaas-tenant-charts.sourceRef
kind
is a*Repository
resource type of the Source API. This will always beGitRepository
for current tenants.name
is the name of the GitRepository resource, for most tenants this will bemcaas-tenant-charts
.
values
section is important as it defines various components of the application deployment. These values will override the corresponding values in the chart'svalues.yaml
Important
MCAAS team recommends starting with an minimal
HelmRelease
file like the example shown above and only add/customize the ones that the application requires. Only parts that are marked asrequired
need to be add toHelmRelease
file. If you are unsure of any values, please do not add it to thevalues
section as they will override the helm chart's default values. Please review the comments for the appropriate section invalues.yaml
. If you have any questions or need assistance with your deployments, please submit a JSM ticket and include relevant links such as your HelmRelease yaml and Jenkins build console logs.
Troubleshooting
Missing deployments:
- Check the Jenkins build log and ensure the container image was built, retagged and pushed to ECR
- mcaas_retag() in Jenkinsfile must be invoked because it appends the unix timestamp that's needed by Flux's Image Reflector Controller
- If the Jenkins build didn't clone the application git repository, ensure that the head and/or base branch name matches the
keywords {}
regexes in pipeline_config.groovy - Check container logs for
kustomize-controller
andhelm-controller
in Datadog for reconciliation errors- Datadog > Logs > Search (set
<helmrelease_spec_releaseName>
appropriately or with*
to search for all HelmRelease):kube_namespace:flux-system @error:*failed* service:(dockerfiles_flux2-kustomize-controller OR dockerfiles_flux2-helm-controller) (@Kustomization.name:tenant-flux2-sync-* OR @HelmRelease.name:<helmrelease_spec_releaseName>)
. Check the appropriatekube_cluster
in the left panel and adjust the time range filter at the top right corner. - Drill down to one of the log events and check the
error
attribute - Some error messages can be confusing to deduce the cause of error. See below for commonly reported errors.
- Datadog > Logs > Search (set
Important
Kustomization errors can block all HelmRelease reconciliations for an environment. This means that any subsequent updates of yaml files for that environment will not be reflected in the corresponding cluster until Kustomization errors are resolved.
Common Errors:
kustomize-controller:
-
Error:
kustomize build failed: accumulating resources: accumulation err='merging resources from './core-demo-helmrelease.yaml': may not add resource with an already registered id: HelmRelease.v2beta1.helm.toolkit.fluxcd.io/demo.tenant-flux-ns': must build at directory: '/tmp/kustomization-3490659544/core/development/core-demo-helmrelease.yaml': file is not directory
Cause:metadata.name
is being used by another app HelmRelease.
Fix: See note in Deep Dive section above -
Error:
failed to decode Kubernetes YAML from /tmp/kustomization-123456/core/development/myapp-helmrelease.yaml: MalformedYAMLError: yaml: line 26: did not find expected key <nil>
Cause: Bad indentation, for example:
spec:
values:
extraEnv:
- name: DB_USER
value: admin
- name: LOG_LEVEL
value: DEBUG
Fix: Review the HelmRelease yaml or a yaml linter to ensure proper indentation.
helm-controller:
-
Error:
Upgrade \"myApp\" failed: cannot patch \"myApp\" with kind Deployment: The order in patch list:* doesn't match $setElementOrder list:*
Cause: Thespec.extraEnv
orspec.envFromConfigMap
array has aname
that's already defined in the yaml
Fix: Remove the element with the duplicate environment variable name -
Error:
Helm upgrade failed: could not get information about the resource: serviceaccounts "myApp" is forbidden: User "system:serviceaccount:tenant-flux-ns:flux2-tenant" cannot get resource "serviceaccounts" in API group "" in the namespace "myTargetNamespace"
Cause:spec.targetNamespace
is invalid or does not exist
Fix: Use the namespace that was provided at onboarding -
Error:
HelmChart 'tenant-flux-ns/tenant-flux-ns-web' is not ready
Cause:spec.chart.spec.chart
orspec.chart.sourceRef.name
is referencing a Helm Chart that's not recognized by Flux
Fix: If using mcaas-tenant-charts, ensurespec.chart.spec.chart
is set to one of the directory names andspec.chart.sourceRef.name
is set tomcaas-tenant-charts
. If using BYOC or public HTTPS/OCI charts, contact MCAAS for assistance with configuring the required *Repository resources. -
Error:
Helm upgrade failed: error validating "": error validating data: [ValidationError(Deployment.spec.template.spec.containers[0].env[10]): unknown field "names" in io.k8s.api.core.v1.EnvVar, ValidationError(Deployment.spec.template.spec.containers[0].env[10]): missing required field "name" in io.k8s.api.core.v1.EnvVar]
Cause: Invalid key name specified. extraEnv is injected into the Deployment template and the Deployment resource expects the array of key-value pairs to have key namesname
andvalue
. See DeploymentSpec > PodTemplateSpec > PodSpec > Container > EnvVar.
Fix: Follow the commented examples in the chart's values.yaml. Replacekey
withname
orvalue
-
Error:
Upgrade "myApp" failed: cannot patch "myApp" with kind Deployment: "" is invalid: patch: Invalid value: "{...}": json: cannot unmarshal [bool||number] into Go struct field EnvVar.spec.template.spec.containers.env.value of type string
Cause: Unquoted boolean or numeric environment variable
Fix: Ensurespec.values.extraEnv[]?.value
boolean or numeric values are enclosed in double quotes -
Error:
timed out waiting for the condition
Cause: Helm could not create or update resources
Fix:- See note for
spec.timeout
in the example HelmRelease above - Go to Datadog > Infrastructure > Kubernetes > Explorer. Check if the new pod is deployed with
Ready
status. If the status isCrashLoopBackOff
, the issue is likely within your application code or Dockerfile. - Go to Datadog > Service Mgmt > Event Management. Filter by
Event Type: kubernetes_apiserver
(@evt.type:kubernetes_apiserver
), check the appropriateKubernetes Cluster
, adjust time range if needed, enter the application HelmReleasemetadata.name/spec.releaseName
in the search box and review any results. - Submit a JSM ticket
- See note for