Data Extractor
Posted: Mon Jan 05, 2009 11:28 pm
I've never used these forums to plug other software, and this one has little or nothing to do with graphing. But I was so impressed with this one I feel compelled to speak up about it. I recently had a need to grab e-mail addresses from a web site - specifically, I needed bike club contact e-mail addresses from 7 states to send race announcements to. The USA cycling site has all those addresses, but the way it is organized doesn't lend itself to an easy copy/paste. You get a map of the US, click on a state, that sends you to a state-specific page that has links to each club page, and on those pages you'll find the individual e-mail addresses. I was in a bit of a rush and so did all of this manually the first time - it took me a bit over 2 hours of really tedious effort to retrieve 330 e-mail addresses. There has to be a better way, right? Right. And here it is:
http://www.iconico.com/DataExtractor/index.aspx
With Data Extractor I loaded the addresses for the 7 states I was interested in and selected "Extract URLs from webpages", which gave me a list of all links from those pages. Sorted the list and deleted all that didn't include "club=" in the URL. With that list I selected "Extract Emails from webpages" and, voila, 330 e-mail addresses from 330 pages in 5 minutes (most of which was spent on downloading those files. My actual interaction with the program was less than 1 minute.)
Really well done program that I highly recommend if you have need for this sort of thing. Under $30.
I found out about this application by way of the Association of Shareware Professionals (ASP) newsgroups. If you're a software developer and aren't a member, this is the best hundred bucks you'll ever spend. See http://asp-shareware.org for more information. I've been a member since '02 and have never regretted it.
Now back to DPlot...
http://www.iconico.com/DataExtractor/index.aspx
With Data Extractor I loaded the addresses for the 7 states I was interested in and selected "Extract URLs from webpages", which gave me a list of all links from those pages. Sorted the list and deleted all that didn't include "club=" in the URL. With that list I selected "Extract Emails from webpages" and, voila, 330 e-mail addresses from 330 pages in 5 minutes (most of which was spent on downloading those files. My actual interaction with the program was less than 1 minute.)
Really well done program that I highly recommend if you have need for this sort of thing. Under $30.
I found out about this application by way of the Association of Shareware Professionals (ASP) newsgroups. If you're a software developer and aren't a member, this is the best hundred bucks you'll ever spend. See http://asp-shareware.org for more information. I've been a member since '02 and have never regretted it.
Now back to DPlot...
