All posts
scaleprocess

RFCs at Scale: What Changes When Your Team Grows from 10 to 100 Engineers

The RFC process that works for a small team will collapse at scale — here's how to evolve it at each growth stage.

DesignDoc Team··6 min read

Your RFC process is a reflection of your organization. At ten engineers, it's a conversation. At a hundred, it's infrastructure. The mistake most teams make is treating it the same way at both ends.

We've watched dozens of engineering teams grow through distinct phases, and the RFC process breaks in predictable ways at each stage. Here's what actually changes, and what to do about it.

The Small Team (5-10 Engineers)

At this size, RFCs are almost optional. Everyone sits in the same room (or the same Slack channel), and context flows freely. When someone writes an RFC, it's usually because the decision is genuinely complex — a database migration, a new service boundary, an authentication overhaul.

The process looks something like this: someone writes a Google Doc, drops a link in Slack, three people comment over lunch, and the team reaches consensus by Thursday. The RFC might be a page long. Nobody complains about process overhead because there isn't any.

This works because of two properties that don't scale: shared context and low coordination cost. Every engineer roughly knows what every other engineer is working on. There's no need for formal discovery because you'll hear about everything at standup.

Enjoy this phase. It won't last.

The Growing Team (20-50 Engineers)

This is where RFC processes most commonly break. The team has split into 3-6 squads, each owning different parts of the system. Engineers on the payments team have no idea what the infrastructure team decided last month about container orchestration, and vice versa.

Three problems emerge simultaneously:

Cross-team decisions become the hard ones. The most important RFCs are now the ones that span team boundaries — API contracts, shared libraries, data model changes that affect multiple services. These are exactly the RFCs that are hardest to get right because no single team owns them.

Discovery collapses. That Google Doc from six months ago? Nobody can find it. A new engineer joins the platform team and proposes an event-driven architecture, not knowing that the infrastructure team evaluated and rejected Apache Kafka three months ago in favor of a simpler message queue. The reasons were documented somewhere, but "somewhere" might as well be nowhere.

Review becomes inconsistent. Some teams write thorough RFCs. Others write a paragraph and call it done. There's no shared expectation for what an RFC should cover, which means reviewers waste time asking for basics (what are the alternatives? what's the rollback plan?) instead of engaging with the actual decision.

At this stage, you need three things:

  1. A single, searchable location for all RFCs. Not Google Docs. Not Notion. Not Confluence. One place where any engineer can search for "authentication" and find every decision the company has made about it. Stripe is well-known for their internal RFC culture, and one reason it works is that RFCs are indexed and discoverable across the entire engineering org.

  2. Lightweight templates. Not bureaucratic forms — just enough structure to ensure every RFC covers the problem statement, proposed solution, alternatives considered, and risks. This saves more time than it costs because reviewers stop asking the same clarifying questions on every RFC.

  3. Cross-team visibility. When the data team publishes an RFC that changes the event schema, the three teams that consume those events need to know about it. This means tagging, notifications, or some mechanism for surfacing relevant RFCs to affected stakeholders.

The Large Organization (100+ Engineers)

At this scale, the RFC process is organizational infrastructure. It needs to be treated with the same rigor as your CI/CD pipeline or your incident response process.

New challenges at this stage:

Review bottleneck becomes acute. A staff engineer or architect who's a domain expert might be tagged on 15 RFCs simultaneously. Without clear expectations about review timelines, some RFCs languish for weeks. Decisions stall. Teams route around the process — they just build the thing and ask for forgiveness later, which defeats the entire purpose.

Quality variance is enormous. With 20+ teams writing RFCs, the range of quality is staggering. Some RFCs are beautifully structured technical documents. Others are stream-of-consciousness brain dumps. Reviewers spend more time parsing the document than evaluating the decision.

Institutional knowledge becomes critical. When engineer #47 writes an RFC about caching strategy, they need to know what engineers #3, #12, and #31 decided about caching over the past three years. Without this context, the organization relitigates the same decisions repeatedly.

Google's design document culture is perhaps the most cited example at this scale. Their process includes designated reviewers, explicit approval requirements, and a culture where senior engineers are expected to engage meaningfully with design docs in their area. But even Google has struggled with the discovery problem — finding relevant prior decisions across tens of thousands of documents.

At 100+ engineers, you need to layer on:

Review SLAs. Not aggressive ones — but something like "designated reviewers should provide initial feedback within 3 business days." This prevents the slow death of RFCs that never get reviewed and never get rejected, just ignored.

Governance for cross-cutting concerns. Security, data privacy, infrastructure costs — these topics affect every team but can't be reviewed by every team. You need designated reviewers for specific domains, or an architecture review board that triages RFCs affecting shared systems. The key is keeping this lightweight. Uber's architecture review process, for example, explicitly scopes review to decisions above a certain impact threshold to avoid bottlenecking small changes.

Status tracking that actually reflects reality. When you have 200 active RFCs across the org, leadership needs to know which decisions are blocked, which are in review, and which have been approved but not yet implemented. This isn't micromanagement — it's the same visibility you'd want for any critical workflow.

Patterns That Work Across Scales

A few things hold true regardless of team size:

Make the lightweight path the default. Not every decision needs a full RFC. The cost of writing and reviewing an RFC should be proportional to the impact of the decision. Many organizations define tiers — a short decision record for small changes, a full RFC for architectural shifts.

Optimize for the reader, not the writer. An RFC is read 10x more than it's written. Consistent structure, clear headings, and a summary at the top save hundreds of engineer-hours over the lifetime of a document. Amazon's practice of starting meetings by reading a document in silence exists precisely because they've optimized for reader comprehension.

Link decisions forward and backward. An RFC that supersedes a previous decision should link to it. An implementation that deviates from an RFC should note why. This creates a decision trail that's invaluable when someone asks "why did we build it this way?" six months later.

Treat discovery as a first-class problem. The best RFC in the world is worthless if nobody can find it when they need it. Full-text search, tagging, and status filters aren't nice-to-haves at scale — they're the difference between a living knowledge base and a graveyard of forgotten documents.

Where Tooling Fits

We built DesignDoc because we watched teams try to solve these problems with general-purpose tools — Google Docs, Notion, GitHub PRs — and hit the same walls every time. General-purpose tools are great for writing, but they don't understand the lifecycle of a technical decision.

An RFC isn't a document. It's a decision with a status, reviewers, dependencies, and an implementation timeline. The tooling should reflect that. Search should understand technical context. Templates should encode your team's standards. Status tracking should show you what's decided, what's blocked, and what's been superseded.

The process problems at each scale are well-understood. The challenge is having tooling that grows with your team instead of becoming the next thing you need to migrate away from.

Stop losing decisions in Slack and Docs

DesignDoc gives every RFC a structured workflow, inline reviews, and a permanent home.

Get Started