# ADR-010: Shared Container Registry Across Environments

## Status

Accepted

## Context

The project deploys the same application to three environments (dev,
staging, prd). Each environment runs a container image pulled from an
Azure Container Registry (ACR). The question is whether to use one
shared ACR for all environments or a separate ACR per environment.

A core design goal of the deployment pipeline is **promote-by-digest**:
build a container image once, push it once, and deploy the exact same
bytes (identified by their immutable SHA-256 digest) through dev →
staging → prd. This guarantees bit-identical artifacts across
environments, eliminating "works in dev but not in prod" failures caused
by non-deterministic builds.

## Decision

Use a single shared ACR, hosted in a dedicated platform/shared resource
group (via `TF_VAR_acr_resource_group_name`), referenced by all
environments as a Terraform `data` source.

## Options Considered

### Option A: One ACR per environment (rejected)

Each environment (dev, staging, prd) gets its own ACR in its own
resource group.

- **Pro**: Clean blast radius — compromising dev's ACR credentials cannot
  affect prod's images. A `terraform destroy` on dev deletes only dev's
  images.
- **Pro**: No cross-RG IAM needed — each ACR lives in its own environment
  RG, so the managed identity only needs roles on local resources.
- **Con**: **Breaks promote-by-digest.** The same image built on one ACR
  must be copied to each subsequent ACR (via `az acr import` or similar).
  Even with digest-preserving copies, you now have N copies of the same
  bytes, N security scans, N retention policies, and the promotion
  pipeline needs explicit copy steps. This adds complexity and latency
  to every promotion.
- **Con**: Images with the same tag on different registries are not
  guaranteed to be bit-identical unless you copy by digest. This defeats
  the primary benefit of containerization (immutable, promotable
  artifacts).
- **Con**: Higher cost — storage is duplicated across registries.
- **Con**: The blast-radius benefit is largely theoretical at this
  project's scale: same subscription, same tenant, same team, same CI
  system. True isolation requires separate subscriptions, not just
  separate ACRs.

### Option B: One shared ACR, all environments pull from it (chosen)

A single ACR lives in a shared resource group. Dev CI pushes images.
All environments pull from the same registry by digest.

- **Pro**: **Build once, promote by digest.** The image pushed during
  `release.yaml` is the exact same bytes that run in dev, staging, and
  prd. No copies, no rebuilds, no divergence.
- **Pro**: One place for security scanning, SBOMs, retention policies,
  and geo-replication.
- **Pro**: Promotion is a pointer change (update the App Service's image
  reference to a different digest), not a data movement operation.
  Rollback is the same operation in reverse.
- **Pro**: Cheaper — one copy of each image, one storage bill.
- **Con**: The ACR becomes a cross-environment dependency. If it has an
  outage, no environment can cold-start new instances. Mitigation: ACR
  Premium supports geo-replication (a future upgrade path, not needed at
  current scale).
- **Con**: Broader IAM surface — dev's CI identity needs `AcrPush`,
  staging and prd's App Service identities need `AcrPull`, all on the
  same registry. This is manageable via resource-scoped role assignments
  (already in place in `cicd.tf` and `app_service.tf`).

### Option C: Per-environment ACRs with digest-preserving promotion copies (rejected)

Each environment has its own ACR. Promotion copies the image by digest
via `az acr import`, preserving bit-identity.

- **Pro**: Bit-identical artifacts (same as B) plus environment isolation
  (same as A).
- **Pro**: Each ACR's IAM is minimal and environment-scoped.
- **Pro**: Promotion is an explicit, auditable event.
- **Con**: More moving parts — requires a promotion pipeline (or manual
  `az acr import`) for each environment transition. Storage duplicated.
- **Con**: Overkill at this project's scale (small team, single
  subscription, three environments). The operational overhead of managing
  promotion copies outweighs the isolation benefit until the team grows
  or compliance requirements mandate it.

## Consequences

- The ACR is created by `bootstrap.sh` in
  `$TF_VAR_acr_resource_group_name` and referenced in Terraform via
  `data "azurerm_container_registry"` in `container_registry.tf`.
- `release.yaml` builds and pushes images to the shared ACR, tagged with
  the release version. The registry-assigned digest is captured and
  written into the GitHub Release body.
- Promotion workflows (`promote-to-staging.yaml`, `promote-to-prd.yaml`)
  read the digest from the release body and update the target App
  Service's image reference. No image copy or rebuild occurs.
- Rollback is symmetric with promotion: the same `az webapp config
  container set` command, pointing at a previous release's digest.
- The App Service's system-assigned managed identity gets
  `Container Registry Repository Reader` on the shared ACR
  (in `app_service.tf`). The GitHub Actions identity gets
  `Container Registry Repository Writer` (in `cicd.tf`).
- Cross-RG RBAC works identically to same-RG RBAC — Azure role
  assignments are scoped to the target resource ID, not to the
  principal's resource group.

## When to revisit

- If a second team with a different release cadence onboards and needs
  its own promotion pipeline.
- If a security incident involving dev credentials leaks makes ACR
  isolation a requirement.
- If a funder or auditor requires physical environment separation in
  writing.

In any of these cases, upgrade to Option C (per-environment ACRs with
digest-preserving copies). The promote-by-digest pipeline is already
in place; the change is adding a copy step and a per-environment
registry, not redesigning the promotion model.

## Participants

Marius