Most AI systems aren't ready. Check yours in 15 min →
DL

Datalab’s Lift: 9B Vision Model for Schema-Valid JSON From PDFs

AuthorAndrew
Published on:
Published in:AI

This is one of those releases that sounds like the solution… until you read the fine print and realize it might also make the problem harder to notice.

Datalab just dropped a 9B open-weights vision model called lift that pulls structured JSON out of PDFs using schemas. On paper, that’s the dream: you give it a schema, it reads the PDF, and it gives you JSON that matches your structure. No more “please output valid JSON” begging. No more parse-fail-retry loops. Just clean shape, every time.

And yes, that part is genuinely good. If you’ve ever tried to extract data from PDFs with a normal “ask an LLM for JSON” prompt, you know the pain isn’t only wrong answers. It’s also the format breaking in tiny ways that blow up downstream systems. lift attacks that directly by enforcing structure during generation. It compiles the schema into a grammar and blocks invalid tokens so it can’t output malformed JSON. That’s real engineering, not vibes.

But here’s the catch that matters more than the demo: it guarantees the JSON shape, not that the JSON is right.

That sounds like a small detail until you imagine the real workflow. Say you’re processing invoices. Your schema says invoice_total is a number. Great—lift will return a number. But it can still be the wrong number. It can read the subtotal. Or a shipping line. Or a “balance due” from a prior invoice. And because the output is perfectly shaped, it slides through all the early gates that used to catch failures. You don’t get an obvious parse error. You get clean-looking wrong data.

That’s the kind of reliability that can hurt you, because it fails quietly.

The performance numbers underline this tension. The reported field-level accuracy is strong—90.2% field accuracy on an adversarial benchmark—and a median around 9.5 seconds per document, with a comparison implying it’s faster than a hosted baseline they cite. If your job is “extract a handful of fields and route to a human for review,” that’s exciting. You can see the use case: an ops team drowning in PDFs gets a tool that fills most fields correctly, fast, and doesn’t break the pipeline.

But full-document correctness is only 20.9%. That’s the number that should change how people talk about this. If a document has many fields, “pretty good on each field” turns into “almost always wrong somewhere.” And in many businesses, “wrong somewhere” is the same as “not automated.”

Imagine you’re pulling 30 fields from a multi-page insurance form. If even one field is wrong—policy number, date, coverage amount—you can’t just shrug. A wrong policy number doesn’t fail gracefully. It attaches the document to the wrong account. A wrong date can trigger a denial. A wrong tax ID can create a compliance mess. This is where the nice-looking JSON becomes dangerous: it encourages you to trust it because it fits your database perfectly.

To their credit, lift also tries to handle a classic failure mode: hallucinating missing info. They do this by making fields nullable and training the model to return null instead of inventing a value when the PDF doesn’t contain it or it can’t read it. That’s the right direction. In real documents, things are smudged, cut off, missing, or placed in weird spots. Having the model admit “I don’t know” is better than confident nonsense.

Still, “null” is not automatically safe either. If your process treats null as “not applicable” instead of “unknown,” you can silently lose required info. A human might have caught that the tax ID is on page 4 in a faint footer. The system might just accept null and move on. Reliability isn’t only about the model; it’s also about how your workflow interprets uncertainty.

Then there’s the part that would make me nervous if I were shipping this into production: schema limitations that can cause the system to silently fall back to unconstrained generation. Some schema features like enum, ref, and anyOf may fail to compile. And the warning is that lift may drop the structural guarantee without a hard error. That’s not a minor edge case. That’s a footgun.

If your whole reason for using this approach is “valid by construction,” then a silent fallback means you might think you’re protected when you’re not. You’ll build downstream systems assuming the shape is guaranteed, and then one day it isn’t, and you’re back to broken JSON in the pipeline—except now it’s intermittent and harder to debug.

So what do I think this actually is?

It’s a strong step toward industrializing document extraction, but it’s not the finish line people want it to be. It shifts the problem from “formatting and parsing” to “semantic correctness and business rules,” which is the harder problem. That’s not a criticism of the team—honestly, it’s a more honest framing of what “AI for PDFs” really is. The messy part isn’t producing a JSON object. The messy part is being right in a way that the business can safely act on without a human staring at it.

The winners here are teams who already understand that automation is a spectrum. If you treat lift as a fast assistant that produces structured candidates—then you validate, cross-check, and route exceptions—you’ll probably love it. The losers are teams that will see “guaranteed JSON” and interpret it as “guaranteed truth,” then wire it straight into payments, compliance, or customer records.

If you were building a real system around this, would you rather optimize for outputs that are always well-formed even when they might be wrong, or outputs that are messier but fail loudly when the model is unsure?

Frequently asked questions

What is AI agent governance?

AI agent governance is the set of policies, controls, and monitoring systems that ensure autonomous AI agents behave safely, comply with regulations, and remain auditable. It covers decision logging, policy enforcement, access controls, and incident response for AI systems that act on behalf of a business.

Does the EU AI Act apply to my company?

The EU AI Act applies to any organisation that develops, deploys, or uses AI systems in the EU, regardless of where the company is headquartered. High-risk AI systems face strict obligations starting 2 August 2026, including risk management, data governance, transparency, human oversight, and conformity assessments.

How do I test an AI agent for security vulnerabilities?

AI agent security testing evaluates agents for prompt injection, data exfiltration, policy bypass, jailbreaks, and compliance violations. Talan.tech's Talantir platform runs 500+ automated test scenarios across 11 categories and produces a certified security score with remediation guidance.

Where should I start with AI governance?

Start with a free AI Readiness Assessment to benchmark your current maturity across 10 dimensions (strategy, data, security, compliance, operations, and more). The assessment takes about 15 minutes and produces a prioritised roadmap you can act on immediately.

Ready to secure and govern your AI agents?

Start with a free AI Readiness Assessment to benchmark your maturity across 10 dimensions, or dive into the product that solves your specific problem.