All Models Are Wrong — Part 2
You can't compress what you never collected

“Essentially, all models are wrong, but some are useful.” — George E. P. Box
The purpose of research is to learn something you do not already know. If you knew the answer, you would not pay to run the study. That observation sounds trivial, but it turns out to be the key to understanding both the promise and the limit of synthetic respondents.
Part 1 of this series catalogued how synthetic respondents fail. This post is about why the most important of those failures cannot be engineered away. A synthetic respondent is a model fit to existing data. Every answer it produces is drawn from the distribution it learned in training. It can interpolate among the things it has seen, often impressively. What it cannot do is produce information it never contained.
This would be a manageable limitation if the model behaved differently inside and outside its training distribution. It does not. Ask a synthetic respondent something out of sample and it will not go quiet or flag its uncertainty. It generates a plausible answer and delivers it with the same fluency and confidence as when it is right.
Now consider which questions a client actually pays to answer: a new concept, an unfamiliar market, a behavior no one has measured before. These are out of sample by definition — if the answer were already in someone's data, there would be nothing to commission. Synthetic research is therefore weakest exactly where research matters most. It is excellent at confirming what you could have guessed and unreliable on what you needed to know.
The obvious objection is calibration: anchor the model to real survey data and correct its biases. Calibration works, but only where real data exists. It sharpens the model's coverage of ground that has already been collected. The novel question is, by definition, the ground that has not. Covering it requires collecting real human answers — which is the one thing a synthetic-only approach set out to avoid.
It is worth being precise about why this cannot be engineered away. A model is a compression of its training data, and you cannot compress what you never collected. The limitation is not a missing feature or a sign of an immature product. It is information theory. Every synthetic-only vendor runs into it, and no model release will remove it, because the limit is arithmetic rather than code.
We think the limit is also the opportunity, provided you can see it. Flashpoint.AI scores every study, question by question. The Response Fit Score reads each question and estimates how far the synthetic answer can be trusted: which results rest on well-covered ground, and which are guesses. In effect, it marks your blind spots for you.
That changes the economics of fieldwork. Traditional research puts every question to real people, including the many questions a model could have answered for free, which is slow and expensive. We invert the order. The synthetic panel answers everything in minutes; the fit score identifies what falls out of sample; and the same survey, with no rebuild, is routed to real respondents through Dynata, Cint, or Prolific, aimed only at the questions where humans are the only source of truth. The result is real market data gathered faster and more precisely than the traditional sequence allows, because the fit score showed you where to aim.
Synthetic panels provide speed, real respondents provide evidence, and the fit score sits between them, telling you which one you are standing on. Much of the industry has demonstrated the easy half of this — that a model can answer familiar questions quickly — and called it the whole theorem. We use the easy half to find the hard half, then go collect the real answer faster than a model can fake one.
Next in the series: what this means for how agencies should actually work.