J blog.genicot.eu
All posts

The CORS Error That Was Really APIM v2's 2 KB Query-String Limit: A Debugging Story

/ 10 min read
Azure APIM CORS Debugging Security
Also published on genioct.be

A P1 landed in the ticket queue about 30 hours after a classic-to-StandardV2 APIM cutover. Browser console showed the same message on every file upload:

Access to XMLHttpRequest at 'https://api.contoso.be/b2c/api/customer/uploadFile?captchav3=...'
from origin 'https://www.contoso.be' has been blocked by CORS policy:
Response to preflight request doesn't pass access control check:
No 'Access-Control-Allow-Origin' header is present on the requested resource.

Postman worked. curl worked. Only browsers failed. And only on the uploadFile endpoint. Every other operation on the same API was fine for the same users.

Over the next few hours I chased five different “smoking guns,” each plausible at the time and each wrong. The root cause turned out to be a documented APIM v2 limit that sits in a footnote most teams never notice, including me until this ticket. Writing it up so the next person who sees this exact pattern doesn’t lose a day.

The false trails, briefly

Every one of these looked promising for an hour.

buffer-request-body="true" on the API-level backend policy. Standard v2 is documented as having tighter body-buffering limits. File uploads seemed like the perfect fit. Gateway logs didn’t back it up: not a single large-body failure. Dropped it.

240-second LB idle timeout on v2-family tiers. There’s a Microsoft Q&A thread reporting large-upload failures around that timeout. Logs showed no slow requests. Dropped it.

Origin not in the CORS allowed-origins list. Mobile app user-agents might send Origin: null or capacitor://... that wouldn’t match. A HAR from an actual failing browser showed Origin: https://www.contoso.be, which was in the list. Dropped it.

CORS policy position in inbound. Microsoft docs say CORS should be the first policy in <inbound>. Ours was at the bottom, below <validate-jwt> and <rate-limit-by-key>. Plausible: v2 is stricter than v1 about this. Reproduced with curl. Preflight OPTIONS returned 200 with full CORS headers regardless of policy order. Dropped it.

Missing Ocp-Apim-Subscription-Key on the POST. APIM logs showed 401 “missing subscription key” on some upload attempts. Maybe the frontend upload code path just forgot the header. Plausible. A real browser HAR showed Ocp-Apim-Subscription-Key: <correct V2 key> sent on the POST. Dropped it.

Each of these cost time. They weren’t stupid hypotheses; given the evidence at each point, they were all reasonable. The problem was that I kept pattern-matching the browser’s CORS complaint to known CORS failure modes. The browser was reporting a symptom two layers downstream of the actual cause.

The HAR that changed the investigation

A product owner captured a HAR file from a real user session who’d just failed an upload. The two relevant entries:

OPTIONS /b2c/api/customer/uploadFile?captchav3=0cAFcWeA5...  → 404 Not Found  (preflight)
POST    /b2c/api/customer/uploadFile?captchav3=0cAFcWeA5...  → status=0        (blocked)

Two things stood out.

First: the preflight returned 404 Not Found. If the preflight fails, the browser never sends the actual POST. Status=0 on the POST in the HAR confirms it: the browser aborted it because of the preflight failure. Whatever was happening at the POST layer (subscription key present, JWT valid, form data correct) was irrelevant. The POST never left the browser.

Second: the 404 response didn’t look like a normal APIM gateway-runtime error. APIM typically returns JSON with a Request-Context header pointing at an Application Insights app ID. This 404 had:

  • Content-Type: text/html
  • Content-Length: 103
  • No Request-Context header

That shape hinted the request was being rejected before the policy/runtime layer.

Tracing the 404 through the layers

The AGW access log, for that exact request timestamp, had the answer hiding in a single field:

httpStatus_d:      404
serverStatus_s:    404
serverRouted_s:    10.100.247.36:443     ← APIM v2 Private Endpoint
timeTaken_d:       0.011                 ← 11 milliseconds
ruleName_s:        pub-api-apim-prd-https-rule

Application Gateway forwarded the request to APIM v2’s Private Endpoint. APIM responded with 404 in 11 milliseconds. AGW returned that 404 to the client. The serverRouted_s + serverStatus_s combination is what confirms it: AGW routed the request to APIM and APIM is what returned the 404. AGW didn’t invent it.

But APIM v2’s ApiManagementGatewayLogs had zero entries for the request. Not a 404 entry, not a rejected entry, nothing.

That narrows it: APIM v2 received the request, rejected it with 404 at the HTTP frontend, and never invoked the gateway runtime that writes GatewayLogs. The 404 is emitted by a layer in front of the policy engine: the layer that parses the HTTP request line, matches the API path, and decides whether to hand off to the gateway runtime.

The URL-length correlation

Hypothesis: the request-URI was too long. The failing URL had a reCAPTCHA v3 token in the query string:

/b2c/api/customer/uploadFile?captchav3=0cAFcWeA5wb-i1C7-Z7PIfG6...  → 2516 characters

My curl tests that worked used ?captchav3=test, 12 characters. Same path, same AGW, same APIM. The only differentiator I hadn’t isolated.

I tested the threshold empirically:

# captchav3 value → total URL → HTTP status
for len in 2030 2032 2034 2036 2038 2040; do
  URL=".../customer/uploadFile?captchav3=$(python3 -c "print('a' * $len)")"
  curl -o /dev/null -w "%{http_code}\n" -X OPTIONS "$URL" \
    -H "Origin: https://www.contoso.be" \
    -H "Access-Control-Request-Method: POST"
done

# Results:
# captcha_val_len=2030  request_uri_len=2069  http=200
# captcha_val_len=2036  request_uri_len=2075  http=200
# captcha_val_len=2038  request_uri_len=2077  http=200
# captcha_val_len=2040  request_uri_len=2079  http=404   ← flips here

The cap flipped between request-URI length 2077 bytes and 2079 bytes. That’s suspiciously close to 2048 bytes of query string plus the path prefix.

What Microsoft actually documents

Microsoft’s API gateway overview has a runtime-limits table that reads, for V2:

LimitClassicV2Consumption
Request URL sizeUnlimited16,384 bytes16,384 bytes
Length of URL path segment1,024 chars1,024 chars1,024 chars
Request payload sizeUnlimited1 GiB1 GiB

At first glance, 16 KB for the URL looks generous. The trap is in the footnote on the Request URL row: “Includes an up to 2048-bytes long query string.” Both numbers are true at the same time. The overall URL is capped at 16 KB, and the query string inside that URL is capped at 2048 bytes. A request with a 2 KB query string and a short path hits the query-string cap long before the 16 KB envelope matters.

That matches the empirical threshold. The failing reCAPTCHA tokens pushed the query string past 2048 bytes; everything else fit inside the 16 KB URL envelope. Classic has no URL limit at all, which is why the same client worked before the cutover.

What is not documented is the failure mode. The rejection happens in the HTTP frontend, before the gateway runtime runs. The response is a 404 with no CORS headers and no GatewayLogs entry, which makes it look like a CORS or routing problem instead of a query-string length problem.

The query-string cap is what ties the symptoms together:

  • The 404 has no CORS headers because APIM v2 emits it before the gateway runtime runs, and only the gateway runtime applies the <cors> policy.
  • The browser sees a 404 without Access-Control-Allow-Origin and reports “CORS Missing Allow Origin”, which is the visible symptom.
  • Postman and curl with short tokens work, because their test URLs don’t exceed the 2048-byte query string cap.
  • Classic worked because classic has no URL limit.

Why it surfaced exactly at the cutover

The timing is what makes it confusing. The query-string cap is a property of the APIM v2 runtime, not of the Application Gateway. The AGW is unchanged through the cutover: same config, same WAF policy, same rewrite rules. Only its backend pool changed: the IP the AGW forwards requests to moved from the classic VNet-injected private IP to the v2 Private Endpoint IP.

Before the cutover:

  • Browser → AGW → Classic Premium (no URL limit) → success

After the cutover:

  • Browser → AGW → StandardV2 (2048-byte query string cap) → 404 for long query strings

Same request, same path, same browser. Different APIM tier behind the same AGW frontend. The failure “appeared at cutover” because the cutover is what introduced the query-string cap. The reCAPTCHA tokens were always that long; classic just didn’t care.

That explains every other red herring. The CORS ordering looked relevant because this is a CORS error. The subscription key theory fit because APIM logs showed 401s. But both were downstream of requests that had short query strings and were getting through to the policy engine. The failing real-user requests with long tokens weren’t in those logs at all. They never reached the policy engine.

The fix

There’s no server-side remediation inside APIM configuration. The cap is a property of the v2 HTTP frontend. You can’t turn it off with a policy, you can’t exempt an API, you can’t raise it via REST API.

The client-side fix is straightforward and also happens to be correct independent of this bug: move the reCAPTCHA token out of the URL query string and into the POST body. Multipart form field, JSON field, header: any of the three. Query strings aren’t meant for 2–3 KB opaque tokens. Browsers, proxies, WAFs, load balancers, and logging pipelines all have opinions about long URLs.

For the upload endpoint the shape becomes:

// Before
const url = `/b2c/api/customer/uploadFile?captchav3=${captchaToken}`;
await fetch(url, { method: 'POST', body: formData, headers: { ... } });

// After
formData.append('captchav3', captchaToken);  // token in the body
const url = `/b2c/api/customer/uploadFile`;   // short URL
await fetch(url, { method: 'POST', body: formData, headers: { ... } });

Backend-side change: read the captcha from the form field instead of the query parameter. One backend commit, one frontend commit, deployed together. Problem resolved.

The CORS-on-error patch that didn’t fix this

Early in the investigation, when I still thought the CORS message was load-bearing, I shipped a patch that added Access-Control-Allow-Origin headers to the <on-error> / <return-response> block of every B2C policy. The goal: if APIM ever returned an error response, the browser should at least be able to read the body.

That patch is correct in isolation. It makes future auth failures, rate-limit rejections, and policy errors readable in the browser console instead of showing a misleading CORS message. It is not what fixed this P1, because the 404 in this P1 was emitted before the gateway runtime ran, so the on-error block never executed.

Worth keeping. Wrong layer for this specific ticket.

Signals in the logs that would have saved hours

If I’d recognised these earlier the investigation would have been much shorter:

serverStatus_s == 404 in AGW access log + no matching APIM GatewayLogs entry. This combination means the 404 came from APIM v2’s HTTP frontend, not the gateway runtime. Any theory about CORS, JWT, subscription keys, rate limits, or backend TLS is automatically irrelevant, because none of those run at the HTTP frontend layer.

Response Content-Type: text/html on an APIM error. APIM returns JSON errors from the gateway runtime. HTML errors come from AGW or APIM’s frontend layers. If you see HTML and no Request-Context header, the request didn’t reach your APIM policy code.

Zero WAF log entries at the timestamp of a 404. WAF rules that fire (allow, matched, blocked) always write log entries. Zero entries means the request wasn’t inspected by WAF. Either AGW rejected it at the HTTP frontend, or the request was forwarded and rejected by the backend before WAF policy evaluation. For the backend case, the AGW serverStatus_s tells you what the backend said.

For broader APIM v2 gateway log patterns (backend TLS rejections, subscription key failures, circuit breaker trips), the KQL recipes post covers the three signatures worth memorising. For related v2 gotchas from the same migration, see APIM v2 CA trust for internal backends and OpenAPI schema import control-plane fetch.

The one-paragraph version

Standard v2 APIM can fail browser preflight requests when a long reCAPTCHA token is sent in the query string. Microsoft documents V2 gateway limits as a 16,384-byte request URL including up to a 2048-byte query string. In this incident the request crossed the query-string threshold, APIM returned a 404 before normal gateway policy handling, and the browser surfaced it as a misleading CORS error. The fix was to move the token from the URL into the POST body.

Azure docs: APIM gateway overview (runtime limits table) · v2 tier overview · CORS policy · Return-response policy