Member Offer
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Preventing Search Engines Crawling Content

Discussion in 'Website Design Forum:' started by bamme, Mar 19, 2010.

  1. bamme

    bamme Senior Member

    Hi everyone

    I have gone really quiet on DF recently.. Im dealing with a bit of a distressing problem which is some illegal quite questionnable content on the net - I have gone thru legal procedures to ask the webmasters in question to remove/change the content and they have, but my problem now is with google - though the sensitive images have been removed, they still display them in their cache

    ive tried to use the removal request tool, but in cases where the webmasters have not removed the image from their server, just changed the content of the image itself, the removal request tool doesnt work and denies my request.

    in some cases, the webmaster has redirected a page ive asked them to remove to another page. when you type in an image url too, you get sent to the same redirected page. google removal request tool doesnt work for me here either.

    ive sought help from a couple of people who both say the webmaster needs either to make the content lead to a 404 or use meta information to tell search engines not to crawl. unfortunately i doubt any of the webmasters who have decided to change rather than remove the images in question in order to preserve the page itself will want to make their page uncrawlable or with an ugly plain 404.

    so my task is to give these webmasters a nice option - eg a custom 404 page with a link to different content, or meta information not to index the content ive asked them to remove along with their page redirects theyve set up. i have NEVER done anything like that before, and am afraid i will ask the webmasters to do something so complicated that it puts them off and they wont do it.

    asking them to go thru google image removal request tool themselves after is not an option - there are several images per page, and i very very much doubt anyone would want to spend time doing this, i'd do this bit after they have done the above, and also if i can i would like to take as much of the workload as poss so that webmasters have less to do and are more likely to comply.

    does anyone know the simplest way for a webmaster to pretty much tell google either "hey this image has changed" or "hey dont crawl this" or "theres nothing here" (custom 404)? which one is easiest, and could i do it for them?

    wow that was an essay but hope the point is across - and that someone can help! thankyou.
     
  2. Greg

    Greg Active Member

    Hi Emma,

    You could ask the webmaster to add rules to their robots.txt file (assuming they have one) to prevent any search engines being able to crawl and index the content/pages in question?

    So you would simply need to ask the webmaster to add the rules to their robots.txt file, if they don't have one ask them to upload the one you provide to their root directory.

    Code:
    User-agent: *Disallow: /images/imagename1.jpgDisallow: /pagename.html/
    So go to each site, take a note of each image file, and/or page, then add these to a .txt document (notepad will be fine) and send them the robots.txt file with your e-mail.

    Of course this won't remove the content from their site, it will just prevent the search engines crawling and indexing the content. That's the simplest/easiest way I can think of asking them, aside from the legal option which it sounds like you've come to a dead end on.

    Hope that helps,
    Greg
     
  3. Harry

    Harry Senior Member

    Uh oh, what have you been up to? ;)
     
  4. bamme

    bamme Senior Member

    thanks greg that seems simple enough for the pics that have been removed from servers, thanks :) but what if the person has just changed whats in the picture, the picture name and its place on whatever webpage stays the same? can they do anything in this case apart from get google to noindex? can they not say to google "hey my page needs recaching" - i understand for this purpose google would probably say ahh go away, but what if a site branding/logo changed or something.. there must be a way..pinging? im not sure at all..

    @Harry: Haha i havent done anything bad, its actually just horrible really theyre pics from when i was for couple years pretty sick and cause i look ill (of course) its ruining modelling jobs i get.. pretty upsetting actually.
     
  5. Jazajay

    Jazajay Active Member

    Hay Emma, sorry will get round to reading your PM soon. Pics you want removed hay wont be asking what your've been up to then. :D

    Mmm....interesting.
    Want to PM me the pic address as it is found in Google image search/cache of it may help me better in helping you get the said image changed.

    Here are several solutions for you to go on.

    1. If the image has changed get links off pages already indexed in Google, bare in mind it's probably in Bing, Yahoo! and Ask as well, and point them to the image it's self.

    So for example <a href="site/noughty-pics-of-you-cough-ill.png">New changed picture</a>

    The search engine then revisits the image notices that it has been changed and changes it's cached version.

    2. Get the webmaster/mistress to add this to the head of their page:
    <meta name="robots" content="noimageindex" />

    For say 2 months.


    What that does is tells the search engines not to index images from within the page, the rest of the content is still indexable. Then wait until they have revisited and hopefully removed the image.

    3. Wait, if the image has been changed then theoretically all you have to do is wait until they have been round again and updated the new image. Now as this is Google it could take months TBH, Bing will probably remove it quite fast if you link to it as Bing indexes pages completely differently.

    Let me know the URL of the image in Google's cache and I may be able to help you out futher. :)
     
  6. Harry

    Harry Senior Member

    Subtle, dude. Subtle...
     
  7. Jazajay

    Jazajay Active Member

    Ssshh, my intentions where totally honourable, lol. I mean she's sick in them after all right. :D
     
  8. bamme

    bamme Senior Member

    Hahaa love Harrys comeback to that - didnt occur to me at all though tbh was just going to say that i would, even if they are from my modelling days but they are not nice to look at, or i wouldnt be bothered about it - was a 2-3 year job after all!

    Anyway, so i dont particularly want to give the url and its not because i dont trust you Jaz its because its just a bit uncomfortable for me even to look at and although some forums are boring this one has cool people that i actually want to keep chatting to as a normal sized healthy looking person :)

    So yeah!

    The robots meta thing is my last resort i guess - as this requires the webmasters to do something. what i have done is given these instructions to them:

    Would the robots meta be easier than this? I thought doing it the above way would save their SEO..

    With this method:

    This isnt as simple as it first seems :(

    My current process of trying to remove these is pretty much as you directed but there seems no option for "theyve changed whats in the picture, not the name of the pic, or the location, i dont mind that you dont remove it, i just want you to update your cache for that pic":

    1. go to webpage removal request tool.
    2. choose radio button option
    "
    Information or image that appears in the Google search results."it does not return a 404, so i cannot choose the "Outdated page/image" option, nor does it appear in SafeSearch results so i cant choose that option.

    3. I am faced with these 2 options:
    The site owner has modified this page so that it no longer contains the information or image that concerns me
    or
    The site owner has removed this page/image or blocked it from being indexed by using robots.txt or meta tags

    Well, they havent removed it as I explained above. And they have not modified their page. They have changed what is in the picture. So.. im not sure what else to do really.
     
  9. bamme

    bamme Senior Member

    Okay.. well ive talked one of the webmasters into using a robots.txt for removing the images from googles cache.

    here is what ive given him


    Is this okay and a working robots.txt?
     
  10. bamme

    bamme Senior Member

    i havent written "with the backslash on the end" in my robots.txt file i was just pointing it out as im not sure if theres meant to be a backslash or not :/
     
  11. Jazajay

    Jazajay Active Member

    Change it to this:

    User-agent: *
    Disallow: /the/directory/holding_the/images/the_image1.jpg
    Disallow: /the/directory/holding_the/images/the_image2.jpg

    As that tells ALL search engines to not index the_image1.jpg and the_image2.jpg

    The rest is redundant and would block all their images from being indexed in that image directory not just your 2. : )
     
  12. bamme

    bamme Senior Member

    thanks jaz :) was just reading back on this.. maybe i dismissed this one too fast..


    What do you mean by point them to the image itself? Literally put links to the images on a blank webpage, whack it on the internet, and hope google finds the links?

    Also, these images dont return 404s as you know, and some return redirects (301 i think) - would the method above (whatever it is) still work?
     
  13. Jazajay

    Jazajay Active Member

    Hence why I asked to see the URL of the image as I don't know of any images that redirect TBH. :)

    Well that's one way of doing it but as it will be a page of one link, probably orphaned, the chances of it ever being picked up, or even having that linked followed, would be really small.

    Got a site?
    Link to the image in the href on one of your indexed pages.

    The thing that most people don't get about images, hence why you never want them indexed, is that images acquire PageRank get good links pointing to it from pages with a high PageRank (Equity/link...whatever) then the more times ALL the search engines will follow it and the quicker it will be changed, especially if it is a magical redirecting image. :D
     
  14. bamme

    bamme Senior Member

    ahh okay then i will :) thats a good idea. thanks.

    to be honest i dont actually know what you meant there :S
     
  15. Jazajay

    Jazajay Active Member

    Don't worry about it, basically just get a link from a page that is all ready indexed in the search engines. : )
     
  16. bamme

    bamme Senior Member

    thanks okay ill give that a try! is there any way to notify the search engines or prompt them a bit more to look at the new page? someone said pinging might help a while ago but ive never seen that having an effect really.
     
  17. bamme

    bamme Senior Member

    can i ask jaz - if the image itself has changed, but its placement in the code on the page, and the image name itself, has not changed - will this still work?
     
  18. Jazajay

    Jazajay Active Member

    Why it should get updated......eventually. Just one of those things TBH. :)
     
  19. bamme

    bamme Senior Member

    Okay, well, they are taking months..
    Can i use "Disallow" in a robots.txt to upload to prevent them keeping it there? Or will this only take effect at the same time as Google would naturally remove an image thats no longer there, as this is the only time theyd crawl the site and find the Disallow?
     
  20. bamme

    bamme Senior Member

    And would "noarchive" in a robots.txt work as well?
     

Share This Page