Cleaning, reflow, fixed-layout, and language tagging
<span> elements that share the same
attributes. Preserves HTML entities — ideal for
foreign-language text.
lang-XX comments
from an annotated PDF, then wraps the matching EPUB
phrases with proper xml:lang attributes.