Organizing and scaling an effective data team

Here’s a high-level sketch of how to organize and scale an effective data team.

First, for the avoidance of doubt, a company should have an expressly-organized data team whose strategic purpose is advancing the company through data. Some analysts siloed within the finance and product teams won’t cut it. Instead, there should be a distinct, first-class team whose scope eventually includes:

Ensuring focus on the right hierarchy of input and output metrics
Steering the roadmap through insightful analysis and research
Driving optimization through experimentation and machine learning
Developing and maintaining data infrastructure

Second, a data team should be functionally-managed and structured similarly to its engineering, design, marketing, product management, etc. counterparts. It should have both centralized and distributed elements and become more distributed with time. Concretely, a good heuristic is that any client team which routinely consumes >16 data-oncall hours per week should be made a data team member’s primary focus. This statement implies two core operational practices:

That data team members maintain primary focus areas wherein they spend ~75% of their time. Usually this involves embedding within a cross-functional team, though sometimes they focus on the data team itself (as infrastructure developers or managers).
That there exists a data-oncall rotation whereby whoever is oncall handles requests that aren’t covered by a primary focus and supports data teammates with review requests and fires. Data-oncall email and slack aliases should be used to route requests to the appropriate team member, who can then prioritize them alongside their primary work.

Why operate like this? Primary focuses enable deep domain expertise and strong cross-functional relationships. Meanwhile, an oncall rotation minimizes context-switching costs by grouping randomizing requests to be as concentrated and predictable as possible. High-quality data work requires deep concentration and a maker’s schedule. Also, scalability! Focuses grow naturally with clients’ needs, and the data team grows naturally with focuses.

Finally, speaking of growth, here’s an outline of how a data organization might evolve within a startup…

At ~1 member:

At least as soon as a company has any material volume of users (and thus data), it needs to start building its data team intentionally. It’s natural for a data team’s first member to be a technically-skilled bizops person or analytically-skilled engineer who gets sucked into doing analytics work and may not naturally think of themselves as a founding data team member. Encourage them to embrace the opportunity! A high-functioning data team has outsized influence in steering the company’s direction, and there’s tremendous value in establishing good practices early on.
At this early stage, time is primarily spent setting up foundational infrastructure and dashboards. These days “infrastructure” should mostly be plug-and-play SaaS, but backend software engineers can be asked to pitch in on any additional custom stuff that’s needed.

At ~2 members:

The company has made its first dedicated data team hire. The data team will eventually consist of data analysts, data scientists, data engineers, and ML engineers. Early on, however, founders should seek jack-of-all-trade data scientists who can range from strong business analysis to rolling their own infrastructure. They are hard to find but well worth waiting (and paying a premium) for.
The data team operates mostly as a centralized service desk for the rest of the company, which itself is usually still operating more-or-less as a single team.
Data work is still mainly about reporting metrics, and task ownership is informal and round-robin.
There are a couple of reasonable ways of managing a nascent data team. One option is for one of the first hires to be someone with people experience who is hired to be the team’s leader from the start, though with a clear understanding that 1) they’ll mostly be doing IC work for a while and 2) they’ll be layered if their personal growth doesn’t keep pace with the team’s/company’s needs. The other is to have the team to report to a non-data leader for a while, provided this person is sufficiently technical and analytical and excited about devoting meaningful time to both developing the data team’s roadmap and being the voice of data in leadership settings. Usually the best person for this is the company’s seniormost engineering leader. After a while, the team can be spun out under an externally-hired or internally-developed senior data leader, or remain under the original person if they enjoy and are good at the role.

At ~4 members:

The data team begins to specialize. As the company splits into multiple cross-functional teams, focus areas should emerge.
Increasing specialization and data volumes enable new types of data work: experimentation and exploratory analysis. Over time, more and more of the team’s focus should shift from reactive reporting to proactive exploration and development.

At ~8 members:

The data team should by now have a long-term-committed leader responsible for managing its members and determining its centralized roadmap.
As the data team grows its membership, there is increasing leverage in high-quality infrastructure. So far, data infra has been chiefly maintained by members in their spare time, but now it may make sense to hire someone to focus on it. Such a person should not spend all their days rotely modeling tables. Instead, their responsibility should be framed as developing tools to make this and other frequent tasks (e.g., experiment configuration, dashboard creation) as easy as possible in order to increase the data team’s aggregate output. A culture in which every team member contributes at least a little bit to shared infrastructure is forever desirable.

At ~16 members:

The data team has gotten large enough that a single person cannot manage it. Somewhere before this stage, 1-2 members should have been promoted to leads and eventually transitioned to manager roles as able and desired. The team’s leader now manages managers.
There is a sufficient volume of data and impact in optimization that meaningful machine learning work is occurring.

At ~32+ members

The data team now consists of multiple sub-teams, perhaps centralized data infra and ML platform teams who support distributed product and business data science/analytics teams.
The data team actively fosters an external brand, mostly for recruiting purposes. Frequent blog posts about interesting insights/projects and open-sourcing broadly-applicable tools/resources are a great way to do this.

Hope this brings some clarity and confidence to your organization! Shoot me a note if you’d like elaboration on anything above. Last, shout out to Wiktor Macura with whom I learned much of the above 🙂.

Organizing and scaling an effective data team

Get notified about new posts!