Abstract

This paper documents a reference architecture for production multi-tenant B2B SaaS, covering tenant isolation, identity, billing, observability, and compliance patterns. It is intended as a starting point for greenfield projects and a checklist for existing ones.

The reference stack

  • Application: TypeScript end-to-end (Next.js or Astro frontend, Node or Bun backend).
  • Database: Postgres with Row-Level Security.
  • Cache: Redis or Cloudflare KV.
  • Background jobs: Postgres-backed queue (river, graphile-worker) for moderate scale.
  • Identity: Better Auth, Lucia, or a hosted provider (Clerk, WorkOS, Auth0) for SSO-heavy products.
  • Hosting: Cloudflare or AWS.
  • Observability: OpenTelemetry → Grafana Tempo / Honeycomb / Datadog.
  • Billing: Stripe + custom metering layer.

Tenant isolation

Shared schema with Row-Level Security is the default. Every table has a tenant_id column. RLS policies enforce isolation at the database level. Connection pool sets the tenant context per request. Full details in our guide.

Reserve dedicated-database isolation for enterprise customers who require it (and pay accordingly).

Identity

  • Users belong to tenants. A user can belong to multiple tenants (consultants, agencies).
  • Authentication produces a session token containing user ID and active tenant ID.
  • API authorisation checks both user permissions and tenant membership on every request.
  • SSO/SAML support is enterprise-required. Build it in early or budget for a painful retrofit.

Billing

  • Stripe handles payment collection. Don't roll your own.
  • Subscription state is mirrored in your own database for fast access.
  • Usage-based metering is your responsibility. Stripe's metered billing API helps but doesn't replace the metering layer.
  • Reconcile your records to Stripe nightly. Disputes are inevitable; have data ready.

Audit logging

  • Every authenticated mutation is logged with: user, tenant, action, target, before/after state, timestamp.
  • Logs are stored separately from operational data — different retention, different indexes.
  • Sensitive actions (auth changes, data exports, deletions) get extra observability.

Observability

  • Every log line, metric, and trace span is tagged with tenant ID.
  • Distributed tracing across services. The browser-to-database trace is the minimum.
  • Per-tenant dashboards for support triage. "Tenant X is reporting issues" needs to be debuggable in seconds.

Compliance scaffolding

  • Encryption at rest (database, object storage) by default.
  • Encryption in transit (TLS) by default.
  • Secret management via a managed service (AWS Secrets Manager, GCP Secret Manager, 1Password Service Accounts).
  • SOC 2 evidence collection is automated where possible (Vanta, Drata, Secureframe).
  • Data deletion requests have a documented process. GDPR / CCPA timelines are real.

Background processing

  • Workers route jobs to the correct tenant context.
  • Job idempotency is mandatory. Workers will retry.
  • Per-tenant job rate limits prevent one customer's workload from starving others.
  • Failed jobs go to a dead-letter queue with alerting.

Caching

  • Cache keys always include tenant ID. Always.
  • Cache invalidation strategies depend on the data; prefer stale-while-revalidate for analytics and time-to-live for everything else.
  • Don't cache things that change per-user unless the cost of staleness is well understood.

API design

  • REST for most resources. tRPC if you're shipping a TypeScript-only stack.
  • Idempotency keys on every mutation endpoint.
  • Cursor-based pagination, not offset.
  • Strict versioning. API changes break customer integrations.
  • Rate limiting per tenant + per API key.

Onboarding flow

  • Self-serve sign-up creates a personal tenant by default.
  • Team invitations work over email with single-use tokens.
  • Enterprise customers get manual provisioning with SSO setup.
  • First-run UX is curated — don't dump new users into an empty product.

Recommendations

  • Pick boring, proven tools. Most "new" data infrastructure is unnecessary.
  • Build the seams for the migrations you'll need later. RLS-shared today, dedicated-database in three years.
  • Invest in observability before you need it. Adding it after the first crisis is too late.
  • Get billing right from day one. Bad billing is a constant tax on the company.

Conclusion

Multi-tenant SaaS architecture is well-understood in 2026. The mistake isn't usually a wrong technology choice — it's a missing operational discipline (audit logging, observability, compliance) added years later under duress. Build the discipline in from the start, and the architecture takes care of itself.