Caio Pizzol

Why your document editor is lying to you

Open a .docx file in any web-based document editor. It looks fine. The headings are bold, the tables have borders, the numbered lists count up correctly. Everything checks out.

Now save it. Open it in Microsoft Word. Something's off. The tracked changes are gone. The table style lost its banded rows. The numbered list restarted when it shouldn't have. The document looks correct in the editor but the file is corrupted.

This is the conversion problem. And every editor that uses HTML as its rendering layer has it.

The HTML trap

A .docx file is not what most developers think it is. It's a ZIP archive containing structured XML that follows the ECMA-376 specification - Office Open XML, or OOXML. The format has explicit constructs for things HTML was never designed to express: page boundaries, section breaks with different margins, numbered lists with restart rules, table conditional formatting, tracked changes with author metadata, comment threads.

Most document editors don't render OOXML. They convert it to HTML, render the HTML, and convert back when you save. The problem is that conversion is lossy by design.

Here's what happens during DOCX -> HTML:

  • Style references become inline formatting. Word's style system is hierarchical - document defaults, style definitions, conditional table formatting, inline overrides. When you flatten that to inline CSS, you can't edit the style definition and have it cascade anymore.
  • Numbered list metadata disappears. Word stores list numbering as abstract definitions with restart rules and level overrides. HTML's <ol> can't express "restart at 10" or "use letters for level 2."
  • Tracked changes lose their structure. Author, timestamp, and revision scope get baked into plain text.
  • Section breaks are impossible. A Word document can have section 1 in Letter size with 1-inch margins and section 2 in Legal size with 1.5-inch margins. There is no HTML/CSS equivalent.
  • Table conditional formatting vanishes. Word table styles apply different backgrounds to first row, last row, banded columns - all based on cell position. HTML tables don't support this.

The editor renders something that looks right. The file underneath is damaged. That's the lie.

Why this is hard

OOXML is a 5,000-page specification. Most of it is noise for any given use case. But the parts that matter often omit critical details that only surface when you compare your rendering against Word's actual behavior. We've been documenting these gaps at ooxml.dev as we find them.

Some examples from building SuperDoc:

The spec says tblGrid is optional. Word crashes without it. If you're generating .docx files programmatically and skip the table grid element because the spec says you can, the file won't open in Word. Nowhere in the spec does it say this.

Border grouping has undocumented rules. Consecutive paragraphs with matching border properties and a w:between element form a visual group. Setting w:between to nil does NOT prevent grouping - it means "group but don't draw a separator." If you normalize that to undefined during parsing, you lose the grouping signal entirely.

Units are a mess. OOXML uses twips, EMUs, half-points, eighths of a point, and regular points - sometimes in the same element. Border spacing is in points but border width uses eighths of a point, so sz="12" means 1.5pt, not 12pt. Mix these up and your rendering is silently wrong.

RTL text breaks every assumption. w:bidi (paragraph direction) and w:rtl (run direction) are independent. You need both. Tab positions are measured from the leading edge - which is the right margin in RTL. If your engine always measures from the left, every RTL tab lands on the wrong side.

Developers have been doing "XML surgery" on .docx files for years because every abstraction layer fails. Python-docx, docx4j, OpenXML SDK - they all force you into direct XML manipulation eventually.

Staying in OOXML

At SuperDoc, we made a different choice. Instead of converting DOCX to HTML, we built a rendering pipeline that stays in OOXML territory.

The converter parses the XML and stores raw OOXML properties - it doesn't resolve styles during import. A dedicated style engine applies Word's actual cascade rules at render time: document defaults -> style definitions -> conditional table formatting -> inline overrides. A layout engine computes pagination from OOXML properties directly, handling section breaks, multi-column layouts, and page breaks with Word's algorithms.

The result: when you edit and save, the OOXML structure is preserved. Style references don't become inline formatting. Numbered list metadata stays intact. Comments and tracked changes keep their author and timestamp.

Round-trip fidelity isn't a feature we added. It's a consequence of never leaving the format.

Why this matters now

AI agents are starting to work with documents. LLMs can read contracts, generate reports, fill templates. But the moment an agent edits a .docx through an HTML conversion layer, the same problems apply - tracked changes disappear, styles flatten, formatting breaks.

If the integration layer between AI agents and real-world documents is built on HTML conversion, every agent-generated edit will silently corrupt the file. The agent will report success. The document will be damaged.

The tools need to get this right. Not approximately right - actually right. That's what native OOXML rendering means.

The editor should show you what the file actually contains. Not an approximation. Not a projection. The truth.