[ajug-members] html / website screen scraper API?
Dan Marchant
driedtoast at gmail.com
Mon Feb 19 14:47:06 EST 2007
Also Gmail suggests this : http://www.screen-scraper.com/
On 2/19/07, Dan Marchant <driedtoast at gmail.com> wrote:
> Try just using a combination of :
>
> JTidy (for getting content) - http://jtidy.sourceforge.net/
> HttpClient - to handle the protocol and connections.
>
> Also XMLUnit works better than HtmlUnit for some reason on html based documents.
>
> hope this helps.
>
> - Dan
>
>
> On 2/19/07, Curt Smith <csmith at javadepot.com> wrote:
> > Greetings ajug'ers,
> >
> > I need to scrape info off a dozen different public websites and incoming
> > email that's in html format. The info is typically a table or single
> > values next to labels but it'll get more complex I'm sure. Some info
> > will require logging via custom login pages, cookies etc.
> >
> > There's two sourceforge projects: HtmlUnit and httpunit. Both would be
> > good for simple scraping values and tables.
> >
> > googling: "scraping public websites" finds commercial APIs (links on
> > the right side of the google results page).
> >
> > Is there any experience or discussion on this technology or APIs?
> >
> > Thanks, Curt Smith
> >
> >
> >
> > _______________________________________________
> > ajug-members mailing list
> > ajug-members at ajug.org
> > http://www.ajug.org/mailman/listinfo/ajug-members
> >
>
More information about the ajug-members
mailing list