I just bought and read the Getting Started with Pyparsing PDF book. And it’s good. PyParsing is a way of building a parser using Python code. You should think Yacc/Lex, but readable. It can be used to parse text, and it can also handle HTML.
This is the example from the PyParsing website :
from pyparsing import Word, alphas greet = Word( alphas ) + "," + Word( alphas ) + "!" # <-- grammar defined here hello = "Hello, World!" print hello, "->", greet.parseString( hello )