boilerpipe - algorithms for extracting main textual content of a web pages - https://code.google.com/p...