Abstract
This paper documents a reference architecture for production multi-tenant B2B SaaS, covering tenant isolation, identity, billing, observability, and compliance patterns. It is intended as a starting point for greenfield projects and a checklist for existing ones.
The reference stack
- Application: TypeScript end-to-end (Next.js or Astro frontend, Node or Bun backend).
- Database: Postgres with Row-Level Security.
- Cache: Redis or Cloudflare KV.
- Background jobs: Postgres-backed queue (river, graphile-worker) for moderate scale.
- Identity: Better Auth, Lucia, or a hosted provider (Clerk, WorkOS, Auth0) for SSO-heavy products.
- Hosting: Cloudflare or AWS.
- Observability: OpenTelemetry → Grafana Tempo / Honeycomb / Datadog.
- Billing: Stripe + custom metering layer.
Tenant isolation
Shared schema with Row-Level Security is the default. Every table has a tenant_id column. RLS policies enforce isolation at the database level. Connection pool sets the tenant context per request. Full details in our guide.
Reserve dedicated-database isolation for enterprise customers who require it (and pay accordingly).
Identity
- Users belong to tenants. A user can belong to multiple tenants (consultants, agencies).
- Authentication produces a session token containing user ID and active tenant ID.
- API authorisation checks both user permissions and tenant membership on every request.
- SSO/SAML support is enterprise-required. Build it in early or budget for a painful retrofit.
Billing
- Stripe handles payment collection. Don't roll your own.
- Subscription state is mirrored in your own database for fast access.
- Usage-based metering is your responsibility. Stripe's metered billing API helps but doesn't replace the metering layer.
- Reconcile your records to Stripe nightly. Disputes are inevitable; have data ready.
Audit logging
- Every authenticated mutation is logged with: user, tenant, action, target, before/after state, timestamp.
- Logs are stored separately from operational data — different retention, different indexes.
- Sensitive actions (auth changes, data exports, deletions) get extra observability.
Observability
- Every log line, metric, and trace span is tagged with tenant ID.
- Distributed tracing across services. The browser-to-database trace is the minimum.
- Per-tenant dashboards for support triage. "Tenant X is reporting issues" needs to be debuggable in seconds.
Compliance scaffolding
- Encryption at rest (database, object storage) by default.
- Encryption in transit (TLS) by default.
- Secret management via a managed service (AWS Secrets Manager, GCP Secret Manager, 1Password Service Accounts).
- SOC 2 evidence collection is automated where possible (Vanta, Drata, Secureframe).
- Data deletion requests have a documented process. GDPR / CCPA timelines are real.
Background processing
- Workers route jobs to the correct tenant context.
- Job idempotency is mandatory. Workers will retry.
- Per-tenant job rate limits prevent one customer's workload from starving others.
- Failed jobs go to a dead-letter queue with alerting.
Caching
- Cache keys always include tenant ID. Always.
- Cache invalidation strategies depend on the data; prefer stale-while-revalidate for analytics and time-to-live for everything else.
- Don't cache things that change per-user unless the cost of staleness is well understood.
API design
- REST for most resources. tRPC if you're shipping a TypeScript-only stack.
- Idempotency keys on every mutation endpoint.
- Cursor-based pagination, not offset.
- Strict versioning. API changes break customer integrations.
- Rate limiting per tenant + per API key.
Onboarding flow
- Self-serve sign-up creates a personal tenant by default.
- Team invitations work over email with single-use tokens.
- Enterprise customers get manual provisioning with SSO setup.
- First-run UX is curated — don't dump new users into an empty product.
Recommendations
- Pick boring, proven tools. Most "new" data infrastructure is unnecessary.
- Build the seams for the migrations you'll need later. RLS-shared today, dedicated-database in three years.
- Invest in observability before you need it. Adding it after the first crisis is too late.
- Get billing right from day one. Bad billing is a constant tax on the company.
Conclusion
Multi-tenant SaaS architecture is well-understood in 2026. The mistake isn't usually a wrong technology choice — it's a missing operational discipline (audit logging, observability, compliance) added years later under duress. Build the discipline in from the start, and the architecture takes care of itself.