How can I make my PHP web crawler automatically skill broken links?

Jerome Yurow December 17, 2011

I have a list of links on a website that I wish to check with a PHP web crawler program. Some of these links may be broken. Rather than attempting to follow a broken link and getting, say, a 404 error message, I would like to skip over the link before trying to load it.

I am using the features of simple_html_dom.php in my web crawler. So I would like to detect a broken link before I perform $html->load_file($link);

How can I do this?

  1. Jeff Fabish
    December 17, 2011 at 7:24 pm
    • Jerry Yurow
      December 18, 2011 at 10:02 pm


      Thanks for your answer.  It turns out that, this time, I do not have to worry so much about broken links, but, rather than seeing a warning message like the one below appearing on my output screen.  I would like it instead to go into a file in a sub-directory of my own choosing on my website.  I have tried the PHP statements:


      in my PHP script that are supposed to do this, but they do not seem to do anything My error.log file remains empty and I still am getting warning messages like the one below on my output screen.   I do not want to suppress these messages--they are useful--just to re-route them to my error.log file.

      Warning: "file_get_contents( [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found
      in /home/yurow/wwwroot/ on line 555"

      Any ideas?

      • Jeff Fabish
        December 20, 2011 at 11:55 pm

        Hi Jerry,

        Can you post the source code to PasteBin so I can analyze it?

        - Jeff