Tuesday, 6 August 2013

Extracting html tags based on attribute

Extracting html tags based on attribute

I have a crawled page and I have retrieved html of the page into String
object.
Now i want to parse this string and to extract all tags that have itemprop
defined into an array that would be associative for example
String[] itemprops;
itemprops['title'] = "Some title";
itemprops['description'] = "Some description";
Can I do this with regex somehow or is there some library that can do this.

No comments:

Post a Comment