The "SparqPlug" RDFizing service converts the XHTML DOM into an RDF graph which can be queried with SPARQL to extract the required/desired data, which (if we use CONSTRUCT queries) is then available as a nice neat bit of RDF/XML according to whichever vocabs/ontologies we choose, ready for use in some other application.

There are full technical details of the approach in the paper from the upcoming Linked Data on the Web workshop at WWW2008, and the live SparqPlug service is at