Visually Lossless HTML Compression http://books.google.com/books...
"The verbosity of the Hypertext Markup Language (HTML) remains one of its main weaknesses. This problem can be solved with the aid of HTML specialized compression algorithms. In this work, we describe a visually lossless HTML transform that, combined with generally used compression algorithms, allows to attain high compression ratios. Its core is a transform featuring substitution of words in an HTML document using a static English dictionary, effective encoding of dictionary indexes, numbers, and specific patterns. Visually lossless compression means that the HTML document layout will be modified, but the document displayed in a browser will provide the exact fidelity with the original. The experimental results show that the proposed transform improves the HTML compression efficiency of general purpose compressors on average by 21% in the case of gzip, achieving comparable processing speed. Moreover, we show that the compression ratio of gzip can be improved by up to 32% for the price of higher memory requirements and much slower processing." - Ray Cromwell
This is an HTML analog to my JS "one way transform" clustering work. - Ray Cromwell
Neat paper. The end result of this transform isn't displayable by a browser though, unlike your re-ordered JS that can still be executed. Still a pretty cool idea for archiving vast quantities of HTML (something we're actually doing). Technically any time you strip whitespace, you run the risk of changing visual fidelity, thanks to white-space: pre. - Matt M (inactive)
Yeah, although I like the idea of recognizing various browser's liberal tolerance for broken HTML and removing theoretically required constructs or end tags with full knowledge that the parser will add them back in. - Ray Cromwell
I've been running all of our static content through the validator.nu parser, then using the Java transformer code to output HTML4. There's a lot of repeated close tags in the output (</li> is a big one). Might make for an interesting experiment to see what sort of savings you could see. - Matt M (inactive)