APIM v2 Stopped Trusting My Internal CA and the Real Fix Is Not Where Most Teams Look
We moved an APIM instance from classic Premium to StandardV2 overnight. The new gateway came up, APIs imported, developer portal rendered, the network path was correct all the way to the backend VNet. Then the first real request to an internal API returned a 500. Every internal backend returned a 500.
The backend hostnames are on an internal CA (our corporate root plus an issuing sub-CA we have been running for years). The classic APIM never had a problem with this because we had uploaded the root and the sub-CA to the classic certificate store and everything downstream of that Just Worked. StandardV2 did not care.
The APIM gateway log on v2 said:
The remote certificate was rejected by the provided RemoteCertificateValidationCallback
That log line tells you exactly what happened, if you read it right. The backend presented its certificate. APIM received it. APIM rejected it during the validation callback. It is a trust error, not a network error, and debugging it on the network side (which is what we did first) gets you nowhere.
The Global Certificate Store Is a Ghost Town on v2
On classic APIM, the global certificate store is how you make the gateway trust an internal CA. You upload the root and any intermediates under Microsoft.ApiManagement/service/{name}/certificates, the classic runtime merges them into its backend TLS trust set, and all your backends under that CA work.
On v2, that store is still there. You can still upload to it. You can still list the uploads. None of it does anything for backend validation. The v2 runtime does not consult the global store for backend trust.
Microsoft does document this, but it is one line buried in the v2 tiers overview under “Classic feature availability”:
CA certificates managed in the global certificate list → CA certificates referenced in backend entity
If you do not notice that line (and we did not, before we started debugging), you will spend an afternoon verifying network routes, private DNS zones, and NSGs that have nothing to do with the real problem.
Worse, the global store is still populated on our v2 instance. The entries we had uploaded on the classic instance got carried forward somehow (probably by our Terraform still managing them). They look valid, they show expected expiration dates, they are just dead weight. Nothing on v2 uses them.
The Workaround Everyone Ships Before Reading the Docs
Under time pressure, the first fix is always the same. Disable TLS validation on the affected backend:
resource "azurerm_api_management_backend" "tic_insecure" {
name = "tic-backend-insecure"
resource_group_name = var.resource_group_name
api_management_name = var.apim_name
protocol = "http"
url = var.backend_url
tls {
validate_certificate_chain = false
validate_certificate_name = false
}
}
Reference that backend from the API policy with <set-backend-service backend-id="tic-backend-insecure" /> and requests start going through. That is what we did at 2am. That is what you will probably do if you hit this under the same pressure.
It is not a fix. Disabling validate_certificate_chain means APIM accepts any certificate regardless of who signed it. Disabling validate_certificate_name means APIM does not check that the hostname on the certificate matches the one you are calling. The posture is roughly equivalent to running curl -k against your internal APIs from a production gateway.
If you are reading this with the insecure backend already in production, the absolute minimum hardening step is to flip validate_certificate_name back to true. That restores hostname pinning, which catches the obvious class of attack where someone terminates TLS with a cert issued for a different host. It does not restore chain validation, but it closes half the gap with a one-line change per backend and no new dependencies.
It is still not the real answer. It is a hardening of a workaround, not a fix.
What Microsoft Actually Built for This
The backends documentation spells out the v2 replacement, again in one line:
Currently, you can only configure CA certificate details in a backend entity in the v2 tiers.
Translation: CA trust for backends on v2 lives on the backend entity itself, not in a service-level collection. The mechanism is a field on the BackendTlsProperties schema called serverX509Names. It takes a list of {name, issuerCertificateThumbprint} tuples and tells APIM: “when you call this backend, accept its certificate only if the CN or SAN matches this hostname and the issuer’s thumbprint matches this value.”
That is not general CA trust. That is explicit pinning per backend per expected issuer. It is narrower than the classic model and more secure as a result. Classic uploaded-root trust meant anything signed by that CA chain was accepted. v2 pinning means only the specific hostname issued by the specific CA is accepted. One cert rotation on one backend that introduces a different intermediate will break the pin, which is usually what you want for a production trust boundary.
Also documented, and worth knowing before you spend time second-guessing: once you configure serverX509Names, APIM always validates the name and chain regardless of the validateCertificateChain and validateCertificateName flags. The pinning takes precedence over the flags. You cannot accidentally leave validation disabled once the pins are in place.
Why azurerm Alone Is Not Enough
Here is where the Terraform story gets annoying. The hashicorp/azurerm provider’s azurerm_api_management_backend resource has a tls block, but the only fields exposed in that block are validate_certificate_chain and validate_certificate_name. server_x509_names and server_certificate_thumbprints are not there. They exist in the provider, but only inside the service_fabric_cluster block, which is unrelated and not what you want.
This is true as of the current provider version at the time of writing, and as of the provider’s main branch. The ARM REST API has supported these fields on regular backends for years; the Terraform surface just never caught up.
Pragmatic options, in order of increasing effort:
Use the azapi provider to overlay the TLS properties. azapi writes arbitrary ARM payloads and plays nicely alongside azurerm on the same resource as long as the two do not fight over the same subtree. This is the path we took.
Replace azurerm with azapi entirely for the backend resource. Slightly more boilerplate, one provider per backend, clean state. Reasonable if you prefer not to mix.
Submit a PR to hashicorp/terraform-provider-azurerm adding the fields to the tls block. A well-scoped enhancement, a decent first-time contribution. Until it lands, azapi is the answer.
The Terraform Shape
azurerm creates the backend with the basic properties. azapi overlays the TLS pinning. The two coexist because azapi_update_resource targets only the properties.tls subtree:
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.36"
}
azapi = {
source = "Azure/azapi"
version = "~> 2.0"
}
}
}
provider "azapi" {}
resource "azurerm_api_management_backend" "tic" {
name = "tic-backend"
resource_group_name = var.resource_group_name
api_management_name = var.apim_name
protocol = "http"
url = var.backend_url
# no tls{} block here, azapi owns it
}
resource "azapi_update_resource" "tic_backend_tls" {
type = "Microsoft.ApiManagement/service/backends@2025-03-01-preview"
resource_id = azurerm_api_management_backend.tic.id
body = {
properties = {
tls = {
validateCertificateChain = true
validateCertificateName = true
serverX509Names = [{
name = "tic.contoso.com"
issuerCertificateThumbprint = "CBB1D641C07D753D903D4D73E194B0D0CEAEF021"
}]
}
}
}
}
module "apim_api" {
# ...
depends_on = [azapi_update_resource.tic_backend_tls]
}
Three details in that snippet that matter and are easy to miss.
The resource_id reference on the azapi resource creates an implicit Terraform dependency on the azurerm backend. You do not need an explicit depends_on between those two; the graph sorts itself.
The module.apim_api depends_on shifts from the azurerm backend (where it probably lives today, if you followed the workaround pattern) to the azapi overlay. The API’s policy references the backend by name through <set-backend-service backend-id="..." />, and you want the API to land only after the TLS pinning is active. Otherwise there is a race window on first apply where the API exists but the backend it references has default TLS config that may not match your pins.
The API version. 2025-03-01-preview supports serverX509Names. 2024-05-01 is the current stable version and also supports it. Choose based on your internal policy on preview API versions. The examples on Microsoft Learn use the preview version because it is the current one they publish; production templates should probably pin stable.
Getting the Thumbprint
You need the thumbprint of the CA that issues your backend certificates, specifically whatever directly signs the leaf. If your PKI has a root and one sub-CA and the sub-CA issues server certs, the sub-CA thumbprint is what you want. Not the root.
Two ways to extract it. If you have access to the issuing CA’s certificate file (.cer or .pem), the OpenSSL way:
openssl x509 -in subca.cer -fingerprint -sha1 -noout
If you migrated from classic and you uploaded the chain to the classic global store, the thumbprints are already visible in the certificate store of your v2 instance (still uploaded, still dead weight, but the thumbprints are the same):
az rest --method get \
--url "https://management.azure.com/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ApiManagement/service/<apim>/certificates?api-version=2024-05-01" \
--query "value[].{name:name, subject:properties.subject, thumbprint:properties.thumbprint}" -o table
Grab the thumbprint of the issuing sub-CA from that listing and put it in the Terraform config.
When Pinning Is Not the Right Answer
serverX509Names is the correct v2 mechanism for our case, but it is not universally correct.
If your backend certificates rotate frequently or on uncoordinated schedules, pinning breaks every rotation. Put something in front of the backend that you control (an Application Gateway, or a reverse proxy on App Service) that terminates TLS with a stable public cert. APIM then talks to that proxy over public PKI and never has to trust your internal chain.
If the backend can be moved to a public CA, move it. DigiCert, Sectigo, Let’s Encrypt depending on your organization’s policy. The v2 managed trust store accepts all of them with zero configuration. This is Microsoft’s recommended path for new deployments and the reason it does not always work is organizational: some backends are on internal-only DNS or constrained by policy from getting public certificates. For everything else, it is the cleanest answer.
If the backend is disposable (staging, a short-lived test target), the insecure workaround is fine as long as it has an expiration date in your ticket system.
What We Cleaned Up Afterward
Once serverX509Names was in place and verified, we removed four categories of dead or dangerous code.
The classic /certificates uploads on the v2 instance. They do nothing on v2. They were carried forward in the Terraform from classic days. Deleted, no functional impact.
The _insecure backend names. Renamed to plain tic-backend, b2c-backend, etc. The <set-backend-service> references had to move in the same commit. Sequence matters: rename resource, redeploy API with updated policy, verify, then merge.
The validate_certificate_chain = false and validate_certificate_name = false lines still sitting in Terraform. Removed entirely. Defaults are true, and with pinning in place APIM enforces validation regardless, so the flags are noise.
Any code that referenced the classic uploaded certs by name (we had none, but worth checking if you have downstream automation assuming those certs mean something).
If You Are Reading This With the Workaround Already in Production
You are not the first team here. The insecure backend pattern is the natural first response to the 500, and plenty of production APIM v2 instances are running it right now.
A staged approach to move off it without downtime:
Step one, cheap and immediate. Flip validate_certificate_name = true on every _insecure backend. azurerm handles this natively, it is a one-line change per resource, it closes the hostname side of the validation gap. Do this today if you can.
Step two, the inventory. For each backend, capture the exact CN or SAN on its certificate, the issuing CA, and the issuing CA’s thumbprint. openssl s_client -connect host:443 -showcerts from a host that can reach the backend, or your PKI team’s export, or the thumbprints still sitting in the classic /certificates store if you migrated from classic.
Step three, the pilot. Pick one low-risk backend. Implement the serverX509Names overlay with azapi on that backend. Verify end to end: requests succeed, gateway logs are clean, and removing the insecure flags does not regress anything. This is the step that proves the pattern works in your environment.
Step four, the rollout. Apply the pattern to the remaining backends in small batches. Do not rename the resource and flip the TLS config in the same change; that is two risks stacked. Do the TLS fix first, verify, then rename in a separate commit.
Step five, the cleanup described in the previous section.
The One Line That Would Have Saved Us an Afternoon
On APIM v2, CA trust for backends is per-backend, not service-wide, and the azurerm Terraform provider does not surface the fields you need. Microsoft documented it. We did not read it carefully enough.
For the broader comparison of what else changes between classic APIM and v2 (networking, Terraform resource shape, feature parity gaps), the genioct.be writeup on APIM v2 vs classic covers the migration scope. This post is the TLS trust story specifically, because that was the part that surprised us.
Azure docs: v2 tier overview · Backend CA certificates · Backend REST API (2025-03-01-preview) · azapi_update_resource