[ajug-members] html / website screen scraper API?
Steven Robbins
steven.robbins at benefitfocus.com
Mon Feb 19 15:56:24 EST 2007
Depending on how you need the data you might use something like Dapper
(www.dappit.com) to create "services" from web page content.
I have played with it some but not used it "in production" yet.
Hope this helps,
Steven Robbins
-----Original Message-----
From: ajug-members-bounces at ajug.org
[mailto:ajug-members-bounces at ajug.org] On Behalf Of Curt Smith
Sent: Monday, February 19, 2007 2:31 PM
To: ajug-members at ajug.org
Subject: [ajug-members] html / website screen scraper API?
Greetings ajug'ers,
I need to scrape info off a dozen different public websites and incoming
email that's in html format. The info is typically a table or single
values next to labels but it'll get more complex I'm sure. Some info
will require logging via custom login pages, cookies etc.
There's two sourceforge projects: HtmlUnit and httpunit. Both would be
good for simple scraping values and tables.
googling: "scraping public websites" finds commercial APIs (links on
the right side of the google results page).
Is there any experience or discussion on this technology or APIs?
Thanks, Curt Smith
_______________________________________________
ajug-members mailing list
ajug-members at ajug.org
http://www.ajug.org/mailman/listinfo/ajug-members
****************************************************************************************
BENEFITFOCUS.COM CONFIDENTIALITY NOTICE: This electronic message is intended only for the individual or entity to which it is addressed and may contain information that is confidential and protected by law. Unauthorized review, use, disclosure, or dissemination of this communication or its contents in any way is prohibited and may be unlawful. If you are not the intended recipient or a person responsible for delivering this message to an intended recipient, please notify the original sender immediately by e-mail or telephone, return the original message to the original sender or to bfpostmaster at benefitfocus.com, and destroy all copies or derivations of the original message. Thank you. (BFeComNote Rev. 08/01/2005)
***************************************************************************************
More information about the ajug-members
mailing list