PDA

View Full Version : how is a crawl done


gribbs
05-02-2006, 12:03 PM
I am curious how this works. I hear a lot that webmasters will do a "crawl" and gather data from the net to be used in compiling a database. for example, if there is a public directory website for lets say " hair salons" in a certian zip code and you want to extract them from that site, isnt there a program that "crawls" the site and collects the data.

lets say the site has 2000 hair salons listed with urls to their websites, phone #s etc. how does one gather that info.(i dont mean simply copying the html) I though there were programs that would grab it and columize the results. how is this done?

ExtraDog
05-03-2006, 01:31 AM
Basic concept:
http://en.wikipedia.org/wiki/Web_crawler

There are apps out there to do it with just google it.

quest
05-04-2006, 02:42 PM
it is a process called screen scraping.

using regular expressions to suck up the data you are looking for, from an HTML page.

you really need to be a regex master (http://www.phpbuilder.com/columns/dario19990616.php3).

-Beau