Bandwidth Limitation for Robots

Blogkultur

I noticed the increasing ammount of searchbots trying to spider my blog. Especially „slurp“ and „msn bot“ are reaching a critical stage concerning the already limited bandwith. Already half of the bandwith is wasted to the robots, and the more active and publically aware my blog becomes, the sooner the bandwidth will be reached.

I suggest to allow the users at Blogspirit to create their own robot.txt file to limit, restrict and possibly exclude the robots‘ aggressive behavior. For example, the entry „User-Agent: slurp“ with a „Crawl-Delay: 20“ already helps to delay the robots, while the meta tag invites them to index the complete blog. Here is an example from the „Stats > Detailed Statistics“ page:

Robots (from search engines)

Robots	Hits	Bandwidth	Last Visit
slurp	410	17 MB	15/01/2005
msnbot	101	2 MB	15/01/2005
googlebot	70	3 MB	14/01/2005
crawl	24	517 KB	14/01/2005
ia_archiver	5	146 KB	14/01/2005
spider	4	164 KB	11/01/2005
webclipping.com	3	125 KB	13/01/2005
bbot	1	42 KB	09/01/2005

15. Januar 2005/von Mike Schnoor

4 Kommentare

Tweedle DEE sagte:
15. Januar 2005 um 23:05

Do you ever get the feeling that someone is watching you, or better yet, looking for you? That is how I feel when I see these on my sites. Spooky…..
Mike Schnoor sagte:
16. Januar 2005 um 11:40

Well, in fact… ah – no. I don’t get this feeling. I know these are robots and automated software systems, so there’s nothing to be afraid of. Until now, they haven’t had a chance to threaten my digitial existence, you see? :)
Ann sagte:
29. Januar 2005 um 22:00

Hi Mike,
I noticed the major increase in traffic from slurp, etc. as well as lots of commercial links in my detailed status report in January.

Do I need to worry about intrusions as a result of this traffic?

So much information at your blog – and so much that I simply cannot understand!

Best wishes,
Ann
Mike Schnoor sagte:
30. Januar 2005 um 17:55

Dear Ann,

the robots are basically harmless since they are nothing but the services ran by search engine providers like Google. The robots crawl (or „spider“) through your whole site and download all possible data, they follow the links and add the content to the search engine’s cache (this is again „spidering“). Since there is usually no limitation for robots, they continously crawl the site – the more links and entries you include in your blog, or the more people link to your own site, the robots return again. You can restrict them with a robots.txt file in the server’s root directory which would be i.e. https://mikeschnoor.com/robots.txt or include special meta tags in the html – refer to my source code for example:

Here I allow the robots to index the site and instruct them to follow the links.

Here I basically inform the browser (ms ie, firefox) and any other robot not to cache the site.

This expiry is set for 1 hour. 60 seconds x 60 minutes = 3600 seconds.

The robot is instructed to revisit this site in 14 days, and not within one day which is the usual routine.

Best regards
Mike

Bandwidth Limitation for Robots

Kommentare sind deaktiviert.

Über mich

Aktuelle Beiträge

Blick in die digitale Kristallkugel: Meine Top-10 Technologietrends für 2023

Pro und Kontra zur effektiven Content-Erstellung mit ChatGPT

Digitale Geschäftsmodelle für kleine und mittlere Unternehmen (KMU)

Mittelstand-Digital Zentrum Rheinland veröffentlicht Leitfaden zum einfachen Einstieg in die Digitalisierung

Wie können KMU und Startups bei der Digitalisierung voneinander profitieren?

Suche