This document serves as a guide through some crutial theoretical terms of compression domain in a general manner and in addition focuses on users, who could make use of this SDK. The content of this document is as follows:
There are more ways how to compress a data depending on its character. Evidently, a picture or voice compression will strongly differ from file compression, for example. Intuitively, we can divide the compression methods into the following groups:
A lossmaking compression means, that we can certain content throw away and the reconstructed (uncompressed) result will be "very similar" to the original. Formerly mentioned voice compression falls in most cases into the lossmaking group (recall MP3 compression), whereas the general data compression falls in lossless, because we want not any bit of it get lost.
Another dividing is whether to uncompress compressed data we only need to invert and reverse compression operations in order to obtain uncompression routine, or wheter uncompression is more complicated than that. In this case of dividing we talk about symectrical compression routine, or asymetrical compression routine respectively.
Finall dividing concerns in routine's best performance. If the algorithm does not care about what the input is and all the time runs the same way, we talk about non-adaptive compression. In most simple situations the non-adaptive algorithm is best performing, but when it is unsufficient (i.e. because of compression rate), semi-adaptive algorithm comes in play, which means the algorithm first "looks" at the data and decides what is the best way to compress it. In principle it is some kind of preprocessing. Semi-adaptive compression is good suited for buffered data, but when we need to compress a stream, only the adaptive compression is possible. Adaptive means, that the algorithm adapts its behaviour on the fly depending on the data being on input.
You now have a good fundament to class the HTML compression. In principle HTML compression is lossless and symetrical. The HTML Athlete implementation is adaptive.
The basic question of this paragraph was already said in its caption. Of course, you can oppose:
When it is so, you can stop reading, delete this SDK and doing your own bussines - this SDK is not targeted to you. HTML compression appreciate especially developers who
HTML compression is similar to classical compression, except the outgoing document is again a HTML. However, the basic principle is the same - throwing redundant information away. In our case, the redundancies are white characters like enters (LC, LF), tabs and spaces. Eliminating the right ones means reducing the size of document, which immediately means speeding up the transmition on the internet. If you still don't believe that only compressing your sites you can save your customers (or visitors) time, let's look at the following example.
Example. In this example, there is at the input (stored in a file) the HTML code shown below. This example also serves as a demonstration of what actually HTML Athlete can do and does.<html> <head> <title>My homepage created by myself</title> <style> BODY { // I like silver colored background background-color:silver; // I also like serif font type font-family:"New Times Roman","MS Sans Serif"; } </style> <script language="javascript"> function Welcome() { alert("Welcome to my homepage."); } </script> </head> <body onload="javascript:Welcome()"> <!-- in the following table the visitors are offered my services --> <table cellspacing="0" cellpadding="0"> <tr> <! first service is speech recognition > <td width="30%"> Speech recognition </td> <td width="60%"> Anything about speech recognition. </td> <td width="10%"> <input type="button" value="ORDER"></input> </tr> </table> <!-- no other services are available by now --> </body> </html>
This HTML takes exactly 1043 bytes on the disk. You can see that there is a lot of unnecessary information contained - being it comments or white characters, no of them affect the site lay-out in the browser. So getting all these characters off, the document takes the following content:
<html><head><title>My homepage created by myself</title><style>BODY{background-color:silver;font-family:"New Times Roman","MS Sans Serif"}</style><script language=javascript>function Welcome(){alert("Welcome to my homepage.")}</script></head><body onload=javascript:Welcome()><table cellspacing=0 cellpadding=0><tr><td width=30%>Speech recognition</td><td width=60%>Anything about speech recognition.</td><td width=10%><input type=button value=ORDER></tr></table></body></html>
Maybe you will have to scroll this page to see the complete compressed content. But be aware, this HTML document takes only 477 bytes, which is more than cut in half! If you check the lay-out of compressed and original page, you will discover no differences between them. Multiple this example and you will obtain quite good guess of average amount of bytes saved using HTML Athlete compression routine.
End of example
So now you are ready to jump into the program part of this documentation. In the following we will discuss some essentials about HTML Athlete.
Let's now take a look at some (not only) HTML Athlete's compression characteristics, mentioned in the caption of this paragraph. As you can feel, there is a bit differce between the words "quality" and "rate". We can simply define the term compression quality as the ability of the program to recognize only those parts of input document, which are unnecessary to be produced to the output document. Althrough this plain definition is very lax, it is fully correct and intuitive.
Now, knowing what compression quality is, we can define the curly term compression quality rate as a quotient of recognized unnecessary document parts and all unnecessary document parts. Multiplied by 100 we obtain it in percents. Again, this little definition is nothing else we would expect.
Note. The term compression quality rate is not interchangeable with the compression rate. The compression rate is known as a quotient of output file and input file. Compression quality rate is something a bit different.
Let's now take a look at how it is about HTML Athlete's compression quality and quality rate. I can right now discover, that HTML Athlete is fully satisfating the compression quality definition, however, it does not reach up to 100% compression quality rate. The compression quality bias is 95.54%, which can be seen as very good, but still to be excelent, there is the difference of 4.46%. It is present because of reasonable algorithm speed. The presented reminder is a good challenge for the future work.
Thanks for using HTML Athlete compression routine. However, when you get in trouble while using it, send me an e-mail and try to describe your found bug as closely as possible.
If you become a glad user of HTML Athlete, I would be also thankful if you could forth propagate this program by laying its icon somewhere on at least one of your compressed pages. Here you can see the whole HTML code so just copy it:
<a target=_blank href=http://nestorovic.hyperlink.cz/html/en/athlete.html alt="Page compressed by HTML Athlete"> <img src=http://nestorovic.hyperlink.cz/download/athletfree/icon.gif> </a>
My e-mail: thom dot as at centrum dot cz [ thom.as@centrum.cz ]
EOF