Sourdough — John Beck

▸ contents

An ML scientist with two provisional patents had a pointed question: why does the localization industry keep failing to solve a quality problem everyone knows is structural? She brought me in to take her research from proof-of-concept to market. Figure out who the buyers were, what they were already funding, and what path got us from thesis to evidence as fast as possible.

The Problem Space

At Amazon, I worked across the full stack of the localization problem: building the linguist marketplace, redesigning performance measurement, and redesigning editorial governance. That last one mattered—I moved governance from scattered and unenforceable across decentralized content systems to a single entry point into the translation pipeline, the one place it could actually stick. I also built systems to turn post-edit corrections into improvement signals, routing them back into the pipeline rather than letting them disappear. The goal was to close the loops. By the time I left, LLMs had arrived and made most of that infrastructure beside the point. A small team with the right setup could bypass years of work. The ceiling I’d been working against was visible across the entire industry at once.

The ML scientist I was working with had spent the previous months proving something she’d long believed: the localization industry’s quality problem isn’t a model problem. Post-edit-based correction is structurally lossy regardless of model quality—corrections fix individual outputs but never encode the rule behind them. The next identical error passes through uncorrected because the rule was never recorded. Give a language model precise, versioned organizational context upstream of generation, and most of what enterprises attribute to model failure disappears. And crucially, it worked with simpler, cheaper non-frontier models too. You didn’t need the best model on the market. You needed the right specification.

The problem isn’t the model. It’s the specification layer.

Her proof used public domain data—strong enough to demonstrate the principle, but limited to a specific domain and content type. To replicate at scale, she needed real enterprise linguistic data. That was the core business challenge: either a data partnership with companies that had it, an investment that funded continued development, or a localization company willing to build it internally. The market investigation was how we figured out which path was real.

Post-edit-based correction captures what was wrong but never encodes why. The correction lives in translation memory, where it can be reused. It doesn’t reach the model or the rule set, where it could prevent the next occurrence. Fit errors—whether output matches this organization’s specific intent, voice, and context—aren’t solvable through better correction because the models aren’t guessing randomly. They’re filling a specification gap the organization left open. Every enterprise has implicit rules about voice, terminology, and context that exist in people’s heads and nowhere else. Post-editing uses those rules to fix outputs. It never transfers them to the system. The fix for this isn’t a better model. It’s moving linguistic expertise upstream, before generation, where it can inform output rather than correct it afterward.

The Investigation

The market analysis covered ten buyer and supplier segments. On the buyer side, enterprises were decentralizing: AI-native companies were generating content in multiple languages directly through LLMs, bypassing translation management systems and the localization teams that used to hold quality governance. On the supplier side, language service providers were responding by upgrading the technology behind unchanged pipelines. Better models in the same architecture. More capable components in a fundamentally unchanged process.

The four companies I built profiles on—Spotify, Remitly, Zeiss, and Zoetis—were targets for a data partnership, not just customer profiling. The strategy was to offer analytical reports and positioning work in exchange for access to the linguistic data needed to replicate the results at scale. I worked through the range of possible arrangements, prototyped what a useful analysis deliverable would look like for each company, and developed positioning guidance for approaching existing contacts. Working through those profiles also surfaced something the abstract market segmentation hadn’t: where localization teams existed inside these organizations, they were increasingly marginal to the content generation decisions that mattered. The people actually driving multilingual output were AI teams, product teams, and content operations. None of them were in the traditional localization buyer profile.

To support the broader search for the right buyer, I built a set of ideal customer profile criteria and a discovery tracking system, and developed a structured pilot proposal: a defined engagement where we’d provide diagnostic analysis and implementation support in exchange for access to the organization’s linguistic data. Something tangible to offer in the first conversation, and a way to generate the data we needed through the engagement itself.

Each deliverable was built to answer a specific question on the path from thesis to market signal:

10-segment market analysis — Where does the problem live across the localization landscape? Who has the pain and who has the budget?
Customer intelligence profiles (Spotify, Remitly, Zeiss, Zoetis) — Are these the right data partners? What would a credible offer look like to each?
ICP frameworks — What does a qualified prospect actually look like beyond these four companies?
Discovery Tracker — What do we need to hear to validate each buyer hypothesis, and how will we know when we’ve heard it?
Project Charter (data-for-service) — How do we solve the data problem and the revenue problem in the same first engagement?
Go-to-market hypothesis — Which track do we test first, and what does success look like?

The Insight That Changed the Frame

The customer profiles reinforced something the market segmentation was beginning to show: budget authority for language and content was migrating inside enterprises. Dedicated localization teams—the traditional buyers of translation technology—were losing organizational power as AI-native workflows bypassed them. CSA Research’s 2025 trends report noted that “many organizations have decimated their localization teams” in anticipation of AI-driven cost reductions, even as multilingual content volume increased. Slator’s 2024 market report described the pattern directly: localization teams getting reduced budgets because leadership assumed GenAI would just solve the problem.

	Language Services (traditional)	AI Governance (emerging)
Market size	$49.7B (2023)	$890M (2024)
Direction	Contracting, −4.5% YoY	Growing, 45% CAGR to 2029
Why	LLMs commoditizing per-word translation work	Spec/context gap across all AI deployment
Budget holder	Localization teams (losing org power)	AI teams, content ops (gaining budget)

The organizations cutting localization headcount weren’t eliminating their multilingual content needs. They were moving that budget to AI teams and content operations. The money wasn’t disappearing. It was migrating to a different pocket. Pursuing the AI governance angle wasn’t just a bet on a faster-growing market. It was following the capital as it moved.

Her professional network ran through the traditional localization ecosystem: LSPs, language technology vendors, enterprise localization buyers. That ecosystem understood the problem deeply but was losing the organizational standing to act on it. The new budget holders—AI leads, product teams, content ops—weren’t localization practitioners. They were generating multilingual content through LLMs with no governance infrastructure, and experiencing the consequences as a brand consistency problem, not a translation quality problem.

The specification problem the thesis described wasn’t unique to translation. Every industry running AI into production was hitting the same structural gap. McKinsey’s 2025 State of AI report found that 51% of organizations had experienced at least one significant negative AI consequence in the prior year, with inaccuracy the most commonly cited issue, and fewer than half had taken concrete steps to address it. Gartner projected that over 40% of agentic AI projects would be abandoned by 2027, with inadequate output control as a primary driver. Whole product categories organized around the gap—platforms like Writer and Jasper built their core enterprise value propositions around output specification and brand governance for AI-generated content.

The product could solve the specification problem multilingually, with linguistic expertise embedded from the start. General-purpose context management tools weren’t built for that. Companies generating content in twelve languages through LLMs weren’t finding a solution in the existing market.

Where It Was Heading

The investigation produced two go-to-market hypotheses. Each had its own buyer, entry angle, and proof required.

	Localization track	Brand owners track
Buyer	LSPs, enterprise loc teams	AI leads, content ops, product teams
Pitch	Better governance for translation quality	Brand consistency across multilingual AI output
Advantage	Deep credibility, existing relationships	Direct path to budget holder; pain already funded
Headwind	Loc teams losing org power; 20-year MT skepticism	Different product framing; more buyer education
First proof	Translation quality improvement at scale	Brand coherence gap; multilingual solution

For the brand owners track, I developed a discovery plan targeting three buyer profiles, with structured conversations and defined criteria for what a useful signal would look like—not to validate the full product concept, but to establish which profile had the clearest path from recognized pain to funded engagement.

The engagement closed with two active tracks and a developed hypothesis for each. Which to test first was the open question. That’s usually where the most interesting product strategy work happens.

Reflection

Change from within, or go around the establishment. That tension shows up constantly in AI-adjacent markets, and it was the defining question here.

The localization path ran through established relationships, deep domain credibility, buyers who already understood the problem. The case against it wasn’t just about org charts. The localization industry has navigated every wave of automation for 20+ years with the same move: absorb the technology, preserve the human correction layer, call it quality. It worked. Asking the same ecosystem to embrace a paradigm where human expertise moves upstream—where the machine is good enough once you spec it correctly—means asking them to voluntarily dismantle the argument that kept them relevant. Even buyers who intellectually understood the thesis had organizational and cultural reasons to resist it.

The brand owners path was an end run. It bypassed the industry entirely and went directly to buyers generating multilingual content with no governance infrastructure. No 20-year history with MT skepticism. A brand consistency problem they were actively funding in their primary language, with no multilingual solution. The pitch was shorter, the budget was available, and the industry establishment was irrelevant to the conversation.

Both paths were real. The strategic question was which to test first—a go-to-market question, not a product question.

The obvious counter is that AI changes the calculation. You can build faster now, so just iterate your way to the answer. There’s something real in that. But faster building doesn’t reduce the importance of knowing what to build. It shifts the methodology. When you can prototype quickly, the question stops being “can we build this?” and becomes “what’s the smallest foothold we can get into and learn from?” You don’t pitch the full concept and wait for a check. You find the wedge, get in the door, and move fast once you’re there.

That’s what the investigation was trying to set up. Not a product roadmap. A clear first move, with the right buyer, around a problem they were already trying to solve.