Sunday, December 30, 2007

HTML Parser -- A good html parser.

I have found many parser used in my projects. By asking google, we can find some parsers. But after try many parsers, i can tell you that HTMLPaser is the one should be considered.
Website: htmlparser.sourceforge.net
HTMLParser is a parser which can help you parser html file. It has 2 features which i like very much
1. Extracter: It can help us extract information from html file very easily with many filters.
All we have to do is call parser.parse(Filter). A filter can be an individual filter or it can be predicate which combine many filter(and , or , not, user-defined filter) to help you extract as much as possible information.
2. Visitor pattern: With visitor pattern you can do what you want with a tag when parser traverse to that tag. You can change the attribute of node, get information, ... whatever you want.
Besides, htmlparser is well-structured, so it can speed-up coding, maintain easily.

If you want to find a good html parser, why don't give it a try.
PS: You should try html parser version 2.0

No comments:

Google