Multilingual Markdown: handling German, Japanese, French and Spanish content cleanly
AI assistants handle multilingual content surprisingly well. Documents often handle it surprisingly badly. The Markdown is fine; the export breaks. Here's what tends to go wrong when you take non-English Markdown through to PDF or DOCX, and what to watch for in each language.
The common ground
Before the language-specific details, two things will save you grief in any language:
- UTF-8 everywhere. Make sure your source file is UTF-8 (without BOM), your editor is UTF-8, and your converter declares UTF-8. Anything else and you'll watch characters silently turn into question marks.
- A design system whose fonts cover your script. The default fonts on macOS/Windows cover Latin alphabets just fine. They do not cover Japanese or Chinese. If your PDF renderer can't find a Japanese glyph, you'll get little squares ("tofu") where text should be. Pick a design system that includes Noto Sans or another broad-coverage font for non-Latin work.
German (DE)
German is mostly straightforward Latin-alphabet content with three special concerns:
- Umlauts and sharp s.
ä ö ü ßall render fine in any modern font, but watch for legacy systems that mangle them. If you copy from a PDF, the umlauts can come through as decomposed sequences (a + combining diaeresis) that look identical but match differently in search. - Long compound words. "Versicherungsvertragsbedingungen" is one word. In a narrow column in PDF it can run past the margin. Cleanup can't fix this, but the design system can — use a system with hyphenation enabled for DE.
- Number formatting. Germans use
1.000,50, not1,000.50. AI assistants often default to US formatting. Catch it on the way out, especially in tables.
Japanese (JA)
Japanese is where most multilingual document workflows fall over.
- Fonts. Your design system must include a CJK font (Noto Sans CJK or similar). Without it, every character is tofu.
- Line-breaking. Japanese has no spaces between words. Line-breaking rules are different (you don't break before certain punctuation). Most PDF renderers handle this if the font is right; some don't. Spot-check a long paragraph in the output.
- Width. Full-width punctuation (
。「」) is wider than half-width. Tables that fit in your editor may overflow on export. - Mixed content. A document with both English code blocks and Japanese prose needs a font that does both. Monospace fonts that include Japanese (Source Han Code JP, etc.) exist; default monospaces don't.
French (FR)
French is the language most likely to silently look wrong because of typographic conventions.
- Non-breaking spaces before punctuation. French typographic tradition puts a thin non-breaking space before
: ; ! ? « ». AI assistants don't always insert these; cleanup can. The space prevents the punctuation from wrapping to the next line. - Apostrophes. Use
’(curly), not'(straight), in body prose. Inside code, use straight — same rule as English. - Guillemets. French uses
« »for quotation marks. AI outputs are inconsistent — sometimes correct, sometimes English-style" ". Pick one and normalise.
Spanish (ES)
Spanish is closer to English typographically, with two real concerns:
- Inverted question and exclamation marks.
¿Cómo estás?,¡Hola!. AI outputs are usually correct but sometimes drop the opener. Spot-check. - Accented characters. Same UTF-8 considerations as French and German. Decomposed vs precomposed forms can cause search/match weirdness if you copy from older PDFs.
- Regional variants. Castilian, Mexican, Argentine, etc., have slightly different conventions. Cleanup can't pick a variant for you; pick one for the document and stay consistent.
How Markdown Tidy handles it
The cleanup pipeline preserves all UTF-8 input as-is — it doesn't try to normalise accented characters, smart quotes that are conventional in the source language, or punctuation that's correct for the script. It does normalise the AI-introduced inconsistencies (mixed bullet styles, invisible Unicode, broken tables) that affect any language equally. The design systems include broad-coverage fonts so Japanese, Chinese, Korean, accented Latin and Cyrillic all render rather than turning to tofu.
The product itself is available in English, German, Japanese, French and Spanish — pick the locale in the header and the interface, transactional emails, and content all follow.
Related reading: The AI Markdown cleanup checklist · DOCX vs PDF