How can I make my PHP web crawler automatically skill broken links?

Jerome Yurow December 17, 2011
Pinterest Stumbleupon Whatsapp

I have a list of links on a website that I wish to check with a PHP web crawler program. Some of these links may be broken. Rather than attempting to follow a broken link and getting, say, a 404 error message, I would like to skip over the link before trying to load it.

I am using the features of simple_html_dom.php in my web crawler. So I would like to detect a broken link before I perform $html->load_file($link);

How can I do this?

Ads by Google

  1. Jeff Fabish
    December 17, 2011 at 7:24 pm
    • Jerry Yurow
      December 18, 2011 at 10:02 pm

      Jeff,

      Thanks for your answer.  It turns out that, this time, I do not have to worry so much about broken links, but, rather than seeing a warning message like the one below appearing on my output screen.  I would like it instead to go into a file in a sub-directory of my own choosing on my website.  I have tried the PHP statements:

      ini_set('error_log','www.yurowdesigns.com/programs/error.log');
      ini_set('log_errors',TRUE);

      in my PHP script that are supposed to do this, but they do not seem to do anything My error.log file remains empty and I still am getting warning messages like the one below on my output screen.   I do not want to suppress these messages--they are useful--just to re-route them to my error.log file.

      Warning: "file_get_contents(http://www.yurowdesigns.com/UkraineSIG/test.asp) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found
      in /home/yurow/wwwroot/yurowdesigns.com/programs/simple_html_dom.php on line 555"

      Any ideas?

      • Jeff Fabish
        December 20, 2011 at 11:55 pm

        Hi Jerry,

        Can you post the source code to PasteBin so I can analyze it?

        Thanks,
        - Jeff