How nested second-stage MMM creates confident misallocation.
Media Mix Modeling has a constraint most teams try to talk around:
You typically have too few observations for the level of detail you’re being asked to estimate.
Two years of weekly data gives you about 104 points. One year gives you 52. Monthly data is worse.
Now compare that with what marketing usually wants:
- Social → TikTok, Meta, Instagram, Snap, Pinterest
- Search → brand, non-brand, competitor, shopping
- Display → prospecting, retargeting, networks
- CTV → publishers, audiences, creatives
Across every channel, the request expands into dozens of coefficients—often with separate adstock and saturation behavior.
You can’t estimate that many distinct effects reliably with that few independent “moves” in the data.
And when you can’t, the model doesn’t stop producing answers.
It just stops producing effects the data can actually separate.
Two responses to the subchannel explosion
When the problem becomes unmodelable, teams usually take one of two paths.
1) Hierarchical structure (the defensible path)
Hierarchical models acknowledge the constraint:
- estimate an average effect at a parent level (“Paid Social”)
- estimate subchannel deviations (TikTok vs Meta vs Snap)
- share information across subchannels
- and—critically—express uncertainty where the data can’t support resolution
This doesn’t magically create variation. But it is at least structurally aligned with the reality of the data.
2) “Nested” allocation (the tempting shortcut)
The other approach shows up under different names, but the most common version looks like this:
- Fit an MMM at the channel level and estimate a coefficient for a channel (e.g., Social).
- Convert that into a channel “contribution” time series:
contribution_Social(t) = β_Social × Impressions_Social(t) (or spend, or a transformed version). - Run a second regression where:
- outcome = that channel contribution time series
- predictors = subchannel impressions/spend (TikTok, Meta, Snap…)
This is often described as a way to “recover subchannel effects without overfitting the main model.”
But the second stage doesn’t recover effects.
It allocates a fixed pie.
The key issue: the second regression adds no information
Here’s the problem in plain terms:
- The stage-two outcome is constructed from the channel total.
- The stage-two predictors are the parts that sum to that channel total.
So the second regression is trying to explain:
a total using its parts.
In that setup, the second regression can’t create new variation.
It can only redistribute the channel total.
That can produce a clean-looking decomposition.
But it cannot create measurement.
It isn’t learning how TikTok differs from Meta.
It is finding a stable way to reconstruct the channel total using subchannel volume.
That’s not measurement.
It’s accounting with coefficients.
What it collapses to in the simplest case
Assume—just for a moment—no adstock and no saturation.
Then:
- Impressions_Social(t) = Σ Impressions_subchannel(t)
- Contribution_Social(t) = β_Social × Impressions_Social(t)
So stage two becomes:
regress β_Social × Σ Impressions_subchannel(t) on the Impressions_subchannel(t)
But the predictors already sum to the outcome up to a multiplier.
The “fit” is almost guaranteed. The coefficients mostly serve to distribute credit across subchannels.
And absent strong constraints, the most natural result is that credit aligns with how much volume each subchannel produced.
So the nested model’s “TikTok effect” becomes:
TikTok’s share of impressions/spend inside Social
rather than
TikTok’s incremental impact per unit exposure
It doesn’t discover subchannel incrementality.
It distributes a channel-level story.
“We add adstock and saturation” doesn’t fix it
At this point, teams often respond:
“Sure, in the linear case it collapses. But we add adstock and saturation, so it becomes real.”
That’s the moment the method becomes more dangerous.
Adstock and saturation add knobs.
They do not add degrees of freedom.
When the second stage is under-identified, extra nonlinear flexibility doesn’t rescue truth. It increases the space of plausible narratives that can fit the constructed outcome.
What you get is often:
- unstable parameters that still produce a visually clean split
- saturation curves that “explain” noise
- adstock rates that soak up misalignment
- results that look sophisticated but don’t generalize
Complexity isn’t proof of identification.
It’s often camouflage for its absence.
The sanity check: simulation
There’s a simple test for whether a method is doing real recovery:
If you simulate data where you know the true subchannel effects, can the method recover them?
In repeated simulations, this nested two-stage approach typically fails to recover true subchannel effects—especially once adstock and saturation are introduced.
And the failure mode isn’t subtle.
The results are often less sensible than naive allocation.
That’s the signature of a method that is not estimating what it claims to estimate.
Why this matters: it creates false resolution
Most MMM limitations are survivable because they fail in an obvious way:
- wide uncertainty
- unstable coefficients
- sensitivity to priors
- weak separation among correlated channels
Nested second-stage allocation fails differently.
It produces:
- crisp splits
- plausible subchannel ROI bars
- a confident-looking decomposition that feels “decision-ready”
It takes a real constraint—lack of degrees of freedom—and turns it into fake precision.
And once organizations start using those subchannel “effects” for planning, targets, and optimization, the measurement system stops being descriptive and starts becoming governing.
The point
When you don’t have enough degrees of freedom, you have two honest options:
- Use structure that shares information and admits uncertainty (hierarchies, calibration anchors, experiments).
- Avoid pretending you have resolution the data cannot support.
The nested second-stage regression is a third option:
It produces resolution by construction.
It doesn’t answer “what’s TikTok’s incremental impact?”
It answers “how should I split a channel-level story across subchannels?”
Those are not the same question.
And if you treat the second as the first, the model won’t just be wrong.
It will be wrong in a way that feels operationally certain—until the business pays for it.