Latest News

Home / Articles / AI Can Cross Disciplines. It Still Can’t Replace Expert Judgment

AI Can Cross Disciplines. It Still Can’t Replace Expert Judgment

image
By Philip O. Obazee
Share:

AI Can Cross Disciplines. It Still Can’t Replace Expert Judgment

first published on Substack on April 28, 2026

Large language models (LLMs) have become astonishingly good at moving across fields. They can summarize legal briefs, explain econometric methods, draft memos on regulation, compare competing theories, and synthesize technical material into readable prose. That has led many leaders to a seductive conclusion: if modern organizations suffer from overspecialization, maybe AI can become the integrator they’ve been waiting for.

That is the wrong conclusion.

The more useful question is not whether AI sounds broadly knowledgeable. It is whether AI can be trusted to make the final call when one domain’s claims must be translated into another domain’s standards. In finance, that might mean turning a model into a price. In research, it might mean turning observational evidence into a causal claim. In regulation, it might mean turning a transaction description into a legal classification. In publishing, it might mean turning a manuscript’s argument into an editorial verdict.

Those boundary points are where organizations are most fragile. And they are exactly where AI should be treated with the most caution.

The core insight from the paper behind this article is simple: large language models can often help at the borders of knowledge, but they usually cannot replace experts there. They are valuable tools for translation, triage, and synthesis. They are not, in general, reliable terminal arbiters.

The Real Problem Isn’t Knowledge. It’s Judgment at the Seam.

Most discussions of AI ask the wrong question. They ask whether the model “knows enough” across many domains. But organizations rarely fail because no one can generate enough words about a topic. They fail because a claim that was locally acceptable in one domain is treated as globally acceptable after being moved into another.

Think of these as seams.

A seam is the point where one domain hands off to another: where a statistical result becomes a policy recommendation, where a scientific model becomes a business decision, where a legal description becomes a regulatory judgment, or where a technical argument becomes an editorial conclusion.

Inside a single field, expertise can be deep and stable. At the seam, things become delicate. Assumptions that held in one setting may not survive translation into the next. An economist may assume the data are fit for causal interpretation. A regulator may assume the classification tracks economic substance. A reviewer may assume the statistical machinery really supports the prose claim.

In practice, many of the most consequential mistakes in organizations happen not at the center of a discipline, but at these handoff points.

That is why the paper focuses on two actors: the LLM and the Expert. The LLM brings speed, range, and linguistic fluency. The Expert brings final judgment over what counts as admissible once the claim crosses the seam. The paper’s argument is not that AI is unintelligent. It is that the final burden of seam judgment still sits with expertise.

Why AI Often Looks More Capable Than It Really Is

Large language models are persuasive because they reduce the visible friction of crossing fields. They can translate vocabularies, surface analogies, reframe concepts, and produce an integrated narrative where previously there was only fragmentation.

That creates an illusion of arbitration.

But synthesis is not the same thing as adjudication.

A model can generate a highly coherent explanation of a derivative pricing setup, a causal inference design, or a cross-jurisdictional structure. What it cannot safely do, in many cases, is determine whether the crucial assumptions have actually survived the crossing from one domain to another. That requires more than broad textual coverage. It requires distinguishing exactly those cases where superficially similar inputs deserve different final judgments.

This is where the paper’s framework becomes especially powerful. It does not argue that AI fails because it “doesn’t really understand” in some vague philosophical sense. It argues something sharper: the model may not preserve the distinctions that matter for expert judgment.

In other words, the model’s internal way of grouping problems may be too coarse. It can place different real-world cases into the same practical bucket even though an expert would need to treat them differently. Once that happens, final substitution becomes dangerous.

That is the paper’s key point: the problem is not simplythat AI can be wrong. It is that AI can fold distinctions that matter precisely where decisions become consequential.

The Difference Between Coverage and Commitment

One of the paper’s most important distinctions is between coverage and commitment.

Coverage means the model can produce something plausible in response to a query. It can give you a number, a confidence statement, a polished explanation, or a summary. On this dimension, LLMs are impressive.

Commitment is different. Commitment would mean the model’s answer actually tracks the structure of the decision problem in a way that supports final judgment.

That difference matters enormously in organizations.

A model may be able to answer every question in a workflow. That does not mean it is entitled to decide every question in that workflow.

This is exactly the confusion many firms are now at risk of making. Because the model can produce a fluent output on nearly every task, leaders begin to treat fluency as evidence of admissibility. But producing an answer is not the same thing as warranting the answer.

That gap is especially dangerous in seam-heavy environments, where a technically competent looking answer can conceal unresolved assumptions. AI may appear confident and complete precisely where an expert would hesitate.

The paper formalizes this as a distinction between the model’s internal information and the expert’s target judgment. In managerial terms, the message is straightforward: don’t confuse broad response capability with final decision authority.

Where AI Helps — and Where It Should Stop

This is not an anti-AI argument. In fact, one of the strongest features of the paper is that it rejects both extremes: the claim that AI can replace experts wholesale and the claim that AI adds no value.

The better way to think about the technology is this:

AI is often highly valuable when it is used to reduce the cost of work on easy or structurally clean cases. It becomes dangerous when organizations let it stand in for expert verification on hard or structurally mixed cases.

That distinction leads to a much clearer operational principle.

Use AI to:

• translate across vocabularies,

• draft first-pass analyses,

• surface analogies,

• summarize large bodies of material,

• organize candidate reasoning paths,

• flag likely areas of concern,

• and triage workloads.

Do not assume AI can safely:

• certify the final validity of cross-domain reasoning,

• decide that hidden assumptions survived translation,

• resolve ambiguous regulatory or legal classifications,

• or replace expert review on structurally mixed cases.

Put differently, AI is often a superb screening device. It is not automatically a sovereign judge.

This is where the paper offers one of its most useful conceptual results: the AI signal can be better than silence while still being worse than expert judgment. That is the right asymmetry. Managers should stop asking whether AI is “good” or “bad” in the abstract and start asking where it falls between those two poles in their specific workflow.

The Hidden Risk in AI Adoption: More Output, More Seams

Many organizations treat AI adoption as a productivity story: more documents, faster drafting, cheaper analysis, quicker iteration. That is all true.

But the paper points to a deeper institutional issue. AI does not simply process existing work more cheaply. It can also change the composition of the work itself.

When AI makes it easier to produce interdisciplinary, cross-functional, or cross-domain output, it often increases the number of seams in the system.

This is the hidden risk.

A company may think it is saving time by using AI to generate reports that connect legal, financial, technical, and policy dimensions in a single workflow. But if those integrated outputs create more seam-heavy decisions than the organization has expert capacity to verify, the firm may enter a more dangerous regime even while appearing more efficient.

In effect, AI can increase both productivity and verification burden at once.

That matters because expert verification capacity is scarce. Most organizations do not have abundant supplies of people who can validate a pricing model, a cross-border classification, an econometric design, or a technical claim after it has been translated into business language. If AI dramatically increases the volume of seam-heavy output without a corresponding increase in expert review capacity, then the firm’s apparent efficiency may conceal a growing integrity problem.

This is one of the paper’s most practical insights: the benefits of AI adoption depend not just on what the model can do, but on whether the organization’s verification institutions can keep up with the new task environment the model creates.

The Management Question: When Does AI Help, and When Does It Hurt?

To answer that, the paper builds a delegation model that should be very familiar to executives, even if the paper states it formally.

The sequence is simple:

1. The LLM screens the case.

2. If the case looks clean, the process may move forward cheaply.

3. If the case is structurally mixed, an expert should verify it.

4. If the organization fails to verify mixed cases, AI-assisted workflows can become welfare-destroying.

That last phrase is technical, but the managerial meaning is plain: the organization can end up worse off than if it had simply relied on expert review from the beginning.

Why? Because the value of AI comes from saving expert time on easy cases. The harm comes from not checking the hard cases.

The paper shows that there is a threshold — a critical mixed-case density — beyond which the organization starts losing more from unverified hard cases than it gains from cheap screening on easy ones. Once that threshold is crossed, the AI-enabled process looks efficient but is actually degrading decision quality.

This is a useful way to think about AI deployment portfolios.

If most of your workflow consists of relatively clean, target-pure cases, then AI screening can create substantial value.

If a large share of your workflow consists of edge cases, translation-heavy decisions, regulatory ambiguity, model-risk exposure, or multi-domain interpretation, then the value of AI depends heavily on whether expert verification is reliably triggered where needed.

In short: AI helps when it screens. It hurts when organizations confuse screening with closure.

Four Places Leaders Should Pay Close Attention

The paper illustrates its framework with examples that map directly onto real organizational environments.

1. Finance and model validation

In derivative pricing and related financial work, moving from a model description to a tradable pricing claim is exactly the kind of seam the paper has in mind. AI can explain the setup, draft documentation, and accelerate communication. But whether the pricing claim is actually admissible depends on deeper structural conditions — the kind of things model validators, not prose generators, are supposed to determine.

If a firm lets AI-generated coherence substitute for model validation, it is taking risk without admitting it.

2. Causal claims and empirical strategy

In research, consulting, and analytics functions, AI can help generate empirical strategies and explain identification methods. But it cannot safely certify whether the assumptions that make a strategy valid actually hold in the case at hand.

That means AI can accelerate design conversation but not replace genuine expert review of causal claims.

3. Regulation and legal classification

Cross-jurisdictional and cross-functional regulatory work is full of seams. AI can be extremely useful in mapping language from one framework to another. But regulatory admissibility depends on distinctions that often lie deeper than vocabulary matching.

This is exactly where a polished AI output can create false comfort.

4. Knowledge work and publishing

AI now makes it easy to produce polished cross-domain writing. That is useful. But it also raises a risk that organizations, journals, and decision-makers mistake smooth integration for verified integration.

The more AI lowers the cost of cross-domain expression, the more valuable expert review becomes — not less.

What Leaders Should Do Now

The paper’s implications for management are clear.

Treat AI as a triage system, not a universal decision-maker

The right default use for LLMs in high-stakes organizations is as a screening or triage instrument. Let them reduce the burden on experts where the cases are genuinely clean. Do not assume that because a model helps in 70% of the workflow it is qualified to close the remaining 30%.

Identify your seam-heavy decisions

Most organizations do not explicitly map where their major seams are. They should. Ask:

• Where do technical results become business decisions?

• Where do legal interpretations become operating choices?

• Where do statistical claims become strategic conclusions?

• Where does model output become customer-facing or regulator-facing action?

Those are the points where AI adoption should be evaluated most carefully.

Build escalation rules for mixed cases

The paper’s framework implies a simple operational rule: whenever the model’s internal grouping is likely to contain cases that experts would treat differently, escalation should be mandatory.

In ordinary managerial language, that means defining trigger conditions for expert review rather than leaving verification to discretion or optimism.

Align incentives for verification

One of the paper’s most important institutional insights is that harm arises not only from model limitations, but from weak incentives to verify when verification is costly. If experts are rewarded for speed but not for catching seam failures, then AI-assisted workflows will drift toward unsafe deference.

Leaders should measure, reward, and protect the act of checking the hard cases.

Don’t confuse higher throughput with higher integrity

AI will often increase the volume of polished output faster than it increases the capacity to verify that output. This is especially dangerous in organizations that mistake productivity gains for epistemic gains.

If output scales faster than expert validation, the surface of the organization may look better while the foundation quietly weakens.

A Better Way to Think About AI and Expertise

The paper’s contribution is ultimately architectural. It does not tell us to reject AI. It tells us where to place it.

That is exactly what executives need.

The right question is not whether AI is intelligent in the abstract. The right question is whether, in a given institutional role, the model preserves the distinctions that matter for final judgment.

If it does, AI can take on more responsibility.

If it does not, then AI should remain what it often already is at its best: an amplifier, a translator, a triage device, and a force multiplier for experts — not a substitute for them.

This is a much more useful way to think about the future of professional work. The coming years will not be defined by whether machines can produce polished output. They already can. They will be defined by whether organizations learn the difference between systems that help experts work faster and systems that are mistakenly given the authority to decide what only experts are still equipped to judge.

That distinction may turn out to be one of the central management challenges of the AI era.

The Bottom Line

Large language models are powerful because they make it easier to move across domains. But moving across a boundary is not the same thing as deciding what survives on the other side.

That second task — final judgment at the seam — still belongs to expert verification whenever the model’s own internal distinctions remain too coarse for the decision at hand.

Leaders who understand that will get much more value from AI.

Leaders who ignore it may build organizations that are faster, cheaper, and more articulate — but less trustworthy where it matters most.

Further Reading

This article is based on:

Obazee, Philip. LLMs Are Not Polymaths and Reconciliation at the Seams Still Needs Humans: Output-Law Partitions, Expert Arbitration, and Delegated Verification. Polymetrics Americas Research Working Paper No. Econ 04.026.3, April 2026. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6659418

For readers interested in the broader background, see:

Aghion, Philippe, and Jean Tirole. “Formal and Real Authority in Organizations.” Journal of Political Economy105, no. 1 (1997): 1–29.

Blackwell, David. “Comparison of Experiments.” In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 93–102. 1951.

Blackwell, David. “Equivalent Comparisons of Experiments.” Annals of Mathematical Statistics 24, no. 2 (1953): 265–272.

Collins, Harry, and Robert Evans. Rethinking Expertise. Chicago: University of Chicago Press, 2007.

Kitcher, Philip. “The Division of Cognitive Labor.” Journal of Philosophy 87, no. 1 (1990): 5–22.

Bommasani, Rishi, et al. 2022. “On the Opportunities and Risks of Foundation Models.” arXiv:2108.07258.

Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623.

Kalai, Adam T., and Santosh S. Vempala. 2024. “Calibrated Language Models Must Hallucinate.” In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, 160–171.

RETURN

Articles

More Articles