west all insights
neural intelligence Nov 8, 2025 9 min read

feature stores, minus the hype: what to build and what to buy

a practical teardown of the feature store decision, aimed at teams running ML in production without a hundred-person ML platform org.

jagadeesha

co-founder

Feature stores are the part of an ML platform that every team should have and almost no team gets right on the first attempt. The reasons are cultural as much as technical: the feature store sits at the boundary between the data engineering team, the ML engineering team, and the product engineering team, and boundary work is always the most contested. By the time a team knows they need a feature store, they usually have three partial implementations and no single source of truth.

This essay is the teardown we give every client in the first week of a neural intelligence engagement. It is not exhaustive. It tries to answer the two questions that matter: what does a feature store need to do, and where does the buy-versus-build line honestly sit for a team in 2026?

what a feature store is, mechanically

A feature store has one job: produce the same value for the same feature, for the same entity, at the same point in time, regardless of whether it is being asked for by a training job (offline) or a model serving the live traffic (online).

That sounds trivial until you try to do it. The subtleties compound:

  1. Features are computed from raw data, which means the feature store needs a materialisation layer. This is usually a batch pipeline for offline features and a streaming or on-demand pipeline for online features.
  2. Features are versioned, because you will change the definition of "customer lifetime value" at some point and both the old and new definitions must remain available for as long as any model references them.
  3. Features are time-travel aware, because when you train a model on historical data, you must retrieve the feature values as they were at the historical moment, not as they are today. This is the single most commonly botched requirement and the source of roughly half of all subtle training-serving skew bugs.
  4. Features have ownership, because at any reasonable scale there are hundreds of features and you need to be able to answer "who can I talk to when this one breaks" without a meeting.

A feature store that does not cleanly solve all four of these is a lookup table, not a feature store. Lookup tables are fine. Just do not call them feature stores.

the three common architectures

the lookup-table-with-good-intentions

A Redis or DynamoDB keyspace is populated by batch jobs. Online serving reads directly; offline training reads from the batch source. This is the default shape of a first-attempt feature store and it works for roughly six months.

It breaks on time-travel. When you train a new model, you need features as they were three months ago, not as they are now. Your Redis has been overwritten daily since then. You have no way to reconstruct the training data without re-running the batch jobs in a time-travel mode most batch jobs do not support. Your training-serving skew is invisible but nonzero, and your model quality gradually degrades.

the dual-pipeline pattern

Two materialisation paths: an offline path that writes to a versioned historical store (usually a data lake with partitioned parquet), and an online path that writes to a low-latency store (Redis, DynamoDB, etc.). The feature definitions live in code and are shared between the two paths.

This works and is the architecture most serious in-house feature stores end up with. It also costs real engineering time to maintain, because every change to a feature is a change in two places that must remain consistent. Teams that own this architecture should own it fully — clear ownership, on-call, SLOs — or not own it at all.

the managed feature store

A commercial or open-source feature store (Tecton, Feast, Hopsworks, provider-specific offerings) does the dual-pipeline work for you. You declare features once; the platform handles the two materialisation paths, the versioning, the time-travel queries, and the online serving.

These products have matured. Five years ago I would have said "build your own because the products are not there yet." In 2026, I say "buy unless you have a specific reason to build." The specific reasons are narrower than people think.

when to build your own

There are three situations in which building your own feature store makes sense:

  1. You have unusual infrastructure. Your online serving runs on hardware or regions the managed products do not cleanly support.
  2. You have unusual scale. Your feature volume or query rate is large enough that the managed product's pricing model breaks down. This is less common than you'd think; most ML teams are nowhere near this threshold.
  3. You have a pre-existing platform team that is already solving this problem. Sunk cost is not a reason to build, but a working in-house platform is a reason to extend rather than migrate.

If none of the three applies, buying is the right answer. The engineering time you will save goes into model improvement, which is where your team's leverage actually lies.

the part nobody writes about

The hardest part of adopting a feature store — whether built or bought — is not the technology. It is the organisational shift from "ML engineers write their own pipelines" to "ML engineers declare features against a shared platform." The shift requires clear ownership boundaries, a feature review process, and a willingness to say no to ad-hoc feature definitions that bypass the platform.

Most feature store migrations fail not because the technology failed but because the team kept writing bespoke pipelines alongside the platform. The platform solves no problems if it is one of three places features can live. The discipline is the point; the platform is the enforcement mechanism.

the minimum viable feature store, when you are starting

For a team shipping their first ML feature and not ready to adopt a full platform, here is the minimum viable architecture:

  • one source-of-truth feature definition, in code, reviewed in PR
  • one offline materialisation path writing parquet to a dated partition
  • one online serving path writing to a key-value store, keyed by entity and feature name
  • a versioning convention: feature name includes a version suffix, and the PR that changes a feature definition bumps the suffix
  • a convention for training-data generation: always reconstruct historical feature values from the offline store, never from the online store

This is not a feature store. It is a scaffold. It will carry you to the point where you have five to ten features in production and an honest reason to adopt a real platform. That reason will arrive sooner than you expect.

keep reading

more from the arkavix practice on platform engineering, applied AI, and the unglamorous details of making systems endure.

all insights