Convert to Markdown

converting various files to Markdown for use with LLMs and related text analysis pipelines.

# PDF ⮺

# Marker ⮺

Marker converts documents to markdown, JSON, chunks, and HTML quickly and accurately.

Marker benchmarks favorably compared to cloud services like Llamaparse and Mathpix, as well as other open source tools.

see also

  • docling - converts messy documents into structured data and simplifies downstream document and AI processing by detecting tables, formulas, reading order, OCR, and much more.

# Install

Install using nix flake

# Convert a single file
$ marker_single --output_dir . /path/to/file.pdf # or image

# MarkItDown ⮺

is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. To this end, it is most comparable to textract, but with a focus on preserving important document structure and content as Markdown (including: headings, lists, tables, links, etc.) While the output is often reasonably presentable and human-friendly, it is meant to be consumed by text analysis tools – and may not be the best option for high-fidelity document conversions for human consumption.

# HTML ⮺

# Pandoc

# Convert a single file
$ pandoc -f html -t markdown input.html -o output.md

# see also

Written on June 11, 2025, Last update on June 11, 2025
markdown pdf doc LLM