When I create my own web crawler, why is the page blank with no links when I follow the advice in an old MakeUseOf article?

Lisa Swanstrom March 6, 2012
Pinterest Stumbleupon Whatsapp

I’ve been reading James Bruce’s article on building a basic web crawler (Part 1 How To Build A Basic Web Crawler To Pull Information From A Website (Part 1) How To Build A Basic Web Crawler To Pull Information From A Website (Part 1) Read More and Part 2 How To Build A Basic Web Crawler To Pull Information From A Website (Part 2) How To Build A Basic Web Crawler To Pull Information From A Website (Part 2) Read More ). It’s super clear, but the second example isn’t working for me. It’s supposed to print a bunch of links, but I just get a blank page. Any ideas what I might be doing wrong?
Thanks!

Ads by Google

  1. James Bruce
    March 7, 2012 at 8:45 am

    Hi Lisa. Did you change the target URL? My old site that was used in the example code is dead now, so it's not going to be able to grab anything from there. 

    $target_url = "http://www.tokyobit.com";
    try 

    $target_url = "http://www.ipadboardgames.org";
    instead ;)

    • Swanstro
      March 8, 2012 at 7:32 pm

      Hi, Bruce. Thanks so much for taking the time to reply.  Yes, I tried several different target URLs, but no dice.  When I looked at the error log on my site, I saw this message:

      Filename cannot be empty in simple /home4/swanstre/public_html/bots/simple_html_dom.php on line 555ml_dom.php on line 555thanks so much for your time,Lisa

      • James Bruce
        March 11, 2012 at 8:59 am

        Could you post all your code to a pastebin somewhere? I could try running it on my own server - but I think maybe your server is limited in some way thats breaking the Simple HTLMDom parsing...

        • Swanstro
          March 16, 2012 at 7:19 pm

          hi, bruce -- yes, or if you have an email address that would wouldn't mind throwing my way, that would work too.   my address is swanstro at gmail dot com

        • Swanstro
          March 16, 2012 at 7:40 pm

          I meant James. Gah!

        • James Bruce
          March 18, 2012 at 1:44 pm

          Sorry, yes. jamesbruce @ this website. 

Ads by Google