The brief

A B2B data company was launching an API product for the first time. Their existing business was bulk data delivery (FTP, CSVs). They wanted to ship an API that enterprise customers could integrate into their own products, with the typical SaaS controls — usage-based billing, rate limiting, per-tenant configuration, audit logs.

The architectural decisions

The interesting calls happened in the first month. The team had to pick a tenant isolation model, a billing strategy, and an auth approach. We chose:

  • Shared schema with Row-Level Security for the operational data. Cheap, simple, and the customer profile didn't require dedicated databases. Our multi-tenant article explains the framework.
  • Per-request metering pushed into Postgres asynchronously via a batched queue, then rolled up nightly into billing records. We avoided the complexity of real-time billing systems for the early product.
  • API keys with tenant + role context, validated at the API gateway and propagated as signed claims through downstream services.
  • Audit logging on every authenticated request, stored separately from operational data so it could be retained on different schedules.

The stack

TypeScript across the stack. Fastify for the API. Postgres for the operational store and for short-term metering. ClickHouse for long-term analytics and audit. Cloudflare in front for global edge presence, rate limiting, and DDoS protection. Stripe for billing collection, with our own metering layer doing the per-request accounting.

The hard parts

  • Idempotency. Enterprise customers integrate APIs in messy environments. Every write endpoint accepts an idempotency key. Replay protection runs in Postgres with a deduplication index.
  • Rate limiting fairness. The naive "X requests per minute per tenant" approach starves small tenants when one large tenant is bursting. We use a token bucket with per-tenant burst allowance and global fairness backpressure.
  • Customer onboarding. Enterprise customers wanted dedicated test environments, sandbox keys, IP allowlisting, and a clean migration path from sandbox to production. None of this is technically hard; it's all detail work that has to be done carefully.

The outcome

  • The API launched on schedule and onboarded 50+ enterprise customers in the first six months.
  • P99 latency held under 200ms for 99% of operations.
  • Zero cross-tenant data incidents (verified by ongoing penetration testing).
  • The metering and billing system has reconciled to within ~0.05% of Stripe over its first year. Close enough for sales not to complain.

What we'd do differently

We underestimated the ops effort around customer-facing observability. Enterprise customers wanted detailed usage dashboards, exportable logs, and per-endpoint analytics — and they wanted these features sooner than we'd scheduled them. If we built the platform again, we'd ship customer observability as part of v1, not v2.