The CORS Error That Was Really APIM v2's 2 KB Query-String Limit: A Debugging Story
A P1 hit production minutes after the StandardV2 cutover. The browser logged 'CORS Missing Allow Origin' on file uploads. Five different smoking guns led nowhere. The root cause was a documented APIM v2 query-string cap, 2048 bytes inside the 16 KB request-URL limit, which surfaces as a 404 before the gateway policy layer ever runs.
APIM v2 Stopped Trusting My Internal CA and the Real Fix Is Not Where Most Teams Look
The day we cut a production APIM over to StandardV2, every internal backend started returning 500. The classic global certificate store does not apply in v2. Here is what actually replaces it, the dirty workaround everyone ships first, and the Terraform shape that gets you back to full chain validation.
Reading APIM v2 Gateway Logs: KQL Recipes for the Three Things That Actually Break
When an APIM v2 request fails, the useful diagnostic is almost never in the HTTP response. It is in the gateway logs, in Log Analytics, one KQL query away. Here are the three failure modes you will actually see in production, the log signatures that identify each, and the queries that find them fast.
OpenAPI Schema Import in APIM v2: Who Fetches the Schema, and Why Your Internal URL Will Fail
The Terraform plan looked identical to the classic version. The pipeline kept failing with a 400 ValidationError that told me nothing. The missing mental model: schema imports happen from Azure's management plane, not from inside your VNet. Here is the debug story and the shape of Terraform that actually works.
Azure Firewall vs NSGs: When the Free Option Actually Costs More
The argument that NSGs can replace Azure Firewall sounds good until you need FQDN egress filtering, centralized logging, or threat intelligence. Here is when each one makes sense and when cutting the firewall creates more problems than it solves.
Why Your Azure Monitor Workbook Shows No Data Even With the Right Permissions
The hidden access control trap in Azure Monitor Workbooks. Resource-context vs workspace-context queries, why Monitoring Reader is not always enough, and the fix that takes five minutes.
Azure Functions Flex Consumption with Locked Storage and the Gotchas That Break Deployments
How to deploy Azure Functions Flex Consumption to secured storage accounts. One Deploy, managed identity, the AzureWebJobsStorage format that matters, and Terraform workarounds.
Serverless Observability at Near-Zero Cost with an Existing Grafana Stack
How I added business metrics to a serverless AWS app using CloudWatch Logs Insights, a k3s CronJob, and an existing Loki/Grafana stack. No new services, no new bills, but several assumptions that matter.
From Kubernetes to $0.50/month: Migrating a Real-Time App to AWS Serverless
How I replaced a Node.js + Socket.io + Kubernetes deployment with API Gateway WebSocket, Lambda, and DynamoDB, cutting costs to near-zero while improving reliability.
When Your FinOps Tool Becomes Your Biggest Cost: The AWS Cost Explorer Trap
I built a Grafana dashboard to track AWS costs. The Cost Explorer API calls ended up costing more than all other AWS services combined. Here's what happened and how I fixed it.
When Your Platform Team Can't Agree on the Stack
A real story from an enterprise platform team split over infrastructure tooling. The technical debate was the easy part. The human side, sunk cost, identity, and fear of starting over, is where it gets hard.