• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Preventing Search Engines Crawling Content


bamme

Senior Member
#1
Hi everyone

I have gone really quiet on DF recently.. Im dealing with a bit of a distressing problem which is some illegal quite questionnable content on the net - I have gone thru legal procedures to ask the webmasters in question to remove/change the content and they have, but my problem now is with google - though the sensitive images have been removed, they still display them in their cache

ive tried to use the removal request tool, but in cases where the webmasters have not removed the image from their server, just changed the content of the image itself, the removal request tool doesnt work and denies my request.

in some cases, the webmaster has redirected a page ive asked them to remove to another page. when you type in an image url too, you get sent to the same redirected page. google removal request tool doesnt work for me here either.

ive sought help from a couple of people who both say the webmaster needs either to make the content lead to a 404 or use meta information to tell search engines not to crawl. unfortunately i doubt any of the webmasters who have decided to change rather than remove the images in question in order to preserve the page itself will want to make their page uncrawlable or with an ugly plain 404.

so my task is to give these webmasters a nice option - eg a custom 404 page with a link to different content, or meta information not to index the content ive asked them to remove along with their page redirects theyve set up. i have NEVER done anything like that before, and am afraid i will ask the webmasters to do something so complicated that it puts them off and they wont do it.

asking them to go thru google image removal request tool themselves after is not an option - there are several images per page, and i very very much doubt anyone would want to spend time doing this, i'd do this bit after they have done the above, and also if i can i would like to take as much of the workload as poss so that webmasters have less to do and are more likely to comply.

does anyone know the simplest way for a webmaster to pretty much tell google either "hey this image has changed" or "hey dont crawl this" or "theres nothing here" (custom 404)? which one is easiest, and could i do it for them?

wow that was an essay but hope the point is across - and that someone can help! thankyou.
 

Greg

Active Member
#2
Hi Emma,

You could ask the webmaster to add rules to their robots.txt file (assuming they have one) to prevent any search engines being able to crawl and index the content/pages in question?

So you would simply need to ask the webmaster to add the rules to their robots.txt file, if they don't have one ask them to upload the one you provide to their root directory.

Code:
User-agent: *Disallow: /images/imagename1.jpgDisallow: /pagename.html/
So go to each site, take a note of each image file, and/or page, then add these to a .txt document (notepad will be fine) and send them the robots.txt file with your e-mail.

Of course this won't remove the content from their site, it will just prevent the search engines crawling and indexing the content. That's the simplest/easiest way I can think of asking them, aside from the legal option which it sounds like you've come to a dead end on.

Hope that helps,
Greg
 

bamme

Senior Member
#4
thanks greg that seems simple enough for the pics that have been removed from servers, thanks :) but what if the person has just changed whats in the picture, the picture name and its place on whatever webpage stays the same? can they do anything in this case apart from get google to noindex? can they not say to google "hey my page needs recaching" - i understand for this purpose google would probably say ahh go away, but what if a site branding/logo changed or something.. there must be a way..pinging? im not sure at all..

@Harry: Haha i havent done anything bad, its actually just horrible really theyre pics from when i was for couple years pretty sick and cause i look ill (of course) its ruining modelling jobs i get.. pretty upsetting actually.
 

Jazajay

Active Member
#5
Hay Emma, sorry will get round to reading your PM soon. Pics you want removed hay wont be asking what your've been up to then. :D

Mmm....interesting.
Want to PM me the pic address as it is found in Google image search/cache of it may help me better in helping you get the said image changed.

Here are several solutions for you to go on.

1. If the image has changed get links off pages already indexed in Google, bare in mind it's probably in Bing, Yahoo! and Ask as well, and point them to the image it's self.

So for example <a href="site/noughty-pics-of-you-cough-ill.png">New changed picture</a>

The search engine then revisits the image notices that it has been changed and changes it's cached version.

2. Get the webmaster/mistress to add this to the head of their page:
<meta name="robots" content="noimageindex" />

For say 2 months.


What that does is tells the search engines not to index images from within the page, the rest of the content is still indexable. Then wait until they have revisited and hopefully removed the image.

3. Wait, if the image has been changed then theoretically all you have to do is wait until they have been round again and updated the new image. Now as this is Google it could take months TBH, Bing will probably remove it quite fast if you link to it as Bing indexes pages completely differently.

Let me know the URL of the image in Google's cache and I may be able to help you out futher. :)
 

bamme

Senior Member
#8
Hahaa love Harrys comeback to that - didnt occur to me at all though tbh was just going to say that i would, even if they are from my modelling days but they are not nice to look at, or i wouldnt be bothered about it - was a 2-3 year job after all!

Anyway, so i dont particularly want to give the url and its not because i dont trust you Jaz its because its just a bit uncomfortable for me even to look at and although some forums are boring this one has cool people that i actually want to keep chatting to as a normal sized healthy looking person :)

So yeah!

The robots meta thing is my last resort i guess - as this requires the webmasters to do something. what i have done is given these instructions to them:

For the images to be fully removed from googles image results, a webmaster needs to notify google through Webmaster Tools, following these steps.

1. Login to Webmaster tools.
2. On the Webmaster Tools home page, click the site you want.
3. On the Dashboard, click *Site configuration* in the left-hand
navigation.
4. Click *Crawler access*, and then click *Remove URL*.
5. Click *New removal request*.
6. Select *Cached copy of a Google search result *and then
click *Next*.
7. Type the URL of the page whose cache you want removed from
search results, and then click *Submit Removal Request*. Note
that the URL is case-sensitive—you will need to submit the URL
using exactly the same characters and the same capitalization
that the site uses.
Would the robots meta be easier than this? I thought doing it the above way would save their SEO..

With this method:

1. If the image has changed get links off pages already indexed in Google, bare in mind it's probably in Bing, Yahoo! and Ask as well, and point them to the image it's self.

So for example <a href="site/noughty-pics-of-you-cough-ill.png">New changed picture</a>
This isnt as simple as it first seems :(

My current process of trying to remove these is pretty much as you directed but there seems no option for "theyve changed whats in the picture, not the name of the pic, or the location, i dont mind that you dont remove it, i just want you to update your cache for that pic":

1. go to webpage removal request tool.
2. choose radio button option
"
Information or image that appears in the Google search results."it does not return a 404, so i cannot choose the "Outdated page/image" option, nor does it appear in SafeSearch results so i cant choose that option.

3. I am faced with these 2 options:
The site owner has modified this page so that it no longer contains the information or image that concerns me
or
The site owner has removed this page/image or blocked it from being indexed by using robots.txt or meta tags

Well, they havent removed it as I explained above. And they have not modified their page. They have changed what is in the picture. So.. im not sure what else to do really.
 

bamme

Senior Member
#9
Okay.. well ive talked one of the webmasters into using a robots.txt for removing the images from googles cache.

here is what ive given him

User-agent: *
Disallow: /the/directory/holding_the/images/ (with the backslash on the end)
Disallow: /the/directory/holding_the/images/the_image1.jpg
Disallow: /the/directory/holding_the/images/the_image2.jpg

User-agent: googlebot-image
Disallow: /the/directory/holding_the/images/ (with the backslash on the end)
Disallow: /the/directory/holding_the/images/the_image1.jpg
Disallow: /the/directory/holding_the/images/the_image2.jpg

Is this okay and a working robots.txt?
 

bamme

Senior Member
#10
i havent written "with the backslash on the end" in my robots.txt file i was just pointing it out as im not sure if theres meant to be a backslash or not :/
 

Jazajay

Active Member
#11
Change it to this:

User-agent: *
Disallow: /the/directory/holding_the/images/the_image1.jpg
Disallow: /the/directory/holding_the/images/the_image2.jpg

As that tells ALL search engines to not index the_image1.jpg and the_image2.jpg

The rest is redundant and would block all their images from being indexed in that image directory not just your 2. : )
 

bamme

Senior Member
#12
thanks jaz :) was just reading back on this.. maybe i dismissed this one too fast..


1. If the image has changed get links off pages already indexed in Google, bare in mind it's probably in Bing, Yahoo! and Ask as well, and point them to the image it's self.

So for example <a href="site/noughty-pics-of-you-cough-ill.png">New changed picture</a>
What do you mean by point them to the image itself? Literally put links to the images on a blank webpage, whack it on the internet, and hope google finds the links?

Also, these images dont return 404s as you know, and some return redirects (301 i think) - would the method above (whatever it is) still work?
 

Jazajay

Active Member
#13
Also, these images dont return 404s as you know, and some return redirects (301 i think) - would the method above (whatever it is) still work?
Hence why I asked to see the URL of the image as I don't know of any images that redirect TBH. :)

Literally put links to the images on a blank webpage, whack it on the internet, and hope google finds the links?
Well that's one way of doing it but as it will be a page of one link, probably orphaned, the chances of it ever being picked up, or even having that linked followed, would be really small.

Got a site?
Link to the image in the href on one of your indexed pages.

The thing that most people don't get about images, hence why you never want them indexed, is that images acquire PageRank get good links pointing to it from pages with a high PageRank (Equity/link...whatever) then the more times ALL the search engines will follow it and the quicker it will be changed, especially if it is a magical redirecting image. :D
 

bamme

Senior Member
#14
Got a site?
Link to the image in the href on one of your indexed pages.
ahh okay then i will :) thats a good idea. thanks.

The thing that most people don't get about images, hence why you never want them indexed, is that images acquire PageRank get good links pointing to it from pages with a high PageRank (Equity/link...whatever) then the more times ALL the search engines will follow it and the quicker it will be changed, especially if it is a magical redirecting image
to be honest i dont actually know what you meant there :S
 

bamme

Senior Member
#16
thanks okay ill give that a try! is there any way to notify the search engines or prompt them a bit more to look at the new page? someone said pinging might help a while ago but ive never seen that having an effect really.
 

bamme

Senior Member
#17
can i ask jaz - if the image itself has changed, but its placement in the code on the page, and the image name itself, has not changed - will this still work?
 

bamme

Senior Member
#19
Okay, well, they are taking months..
Can i use "Disallow" in a robots.txt to upload to prevent them keeping it there? Or will this only take effect at the same time as Google would naturally remove an image thats no longer there, as this is the only time theyd crawl the site and find the Disallow?