Compare PDF to Markdown Extraction Using the JFK Files
Kirk Marple
March 22, 2025
With the release of the JFK Files, it provided a robust set of real-world examples of scanned and handwritten PDFs.
Given the variety of API services, as well as visual LLMs, for PDF to Markdown extraction, here we will compare the output from an example PDF.
This PDF is three pages long, and appears to be a scanned classified message form.

For each output example below, we have copy/pasted the output Markdown into Markdown Live Preview and taken a screenshot of the formatted Markdown.
Using Graphlit
Azure AI Document Intelligence: Layout Model

Azure AI Document Intelligence: Read/OCR Model

Anthropic Claude Sonnet 3.5

Anthropic Claude Sonnet 3.7

Anthropic Claude Sonnet 3.7 (w/ Thinking enabled)

OpenAI GPT-4o

Gemini 2.0 Flash

Gemini 2.0 Pro

Mistral OCR

Using other APIs
Chunkr

Reducto

Reducto (Agentic mode)

LlamaCloud (Premium mode)

LLMWhisperer

Summary
This comparison shows the diversity of PDF extraction results, across available APIs and visual LLMs.
You will need to evaluate the proper solution based on the layout and type of content that you are starting with.
Also, for each of these results, they come with a difference in cost per page, depending on the compute required.
Please email any questions on this article or the Graphlit Platform to questions@graphlit.com.
For more information, you can read our Graphlit Documentation, visit our marketing site, or join our Discord community.