Records of inappropriate access to websites

For the benefit of the wider web community, this page gives details of inappropriate access by robots, scripts, and other forms of suspicious access detected at the following websites:,, and

Robot name Site accessed Log file Form of bad access
Linguee Bot (; linguee-bot-access.txt Ignored or did not read robots.txt

This robot did not look at robots.txt before downloading the entire site, including the section /norobots/ which is specifically forbidden by robots.txt. The IP address of the robot,, corresponds to that given on the web page in the robot's self-described "user agent".

Lightspeed ( lightspeed.txt Ignored or did not read robots.txt

This robot identified itself as an Internet Explorer browser in its user agent field and ignored (did not attempt to read) robots.txt. It requests pages without any time delay between requests, and does not request compressed content, resulting in a larger-than-necessary use of bandwidth.

Multiple Korean-language Android phones kanji-logs-memory-cgi-ko-kr-unique-ip.txt Bandwidth drain

A series of IP addresses, all with user agent strings of the form

Mozilla/5.0 (Linux; U; Android 2.2; ko-kr; SHW-M180L Build/FROYO) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1
, apparently Korean-language Android mobile phones, attempted to download PNG images from A total of 20738 accesses from 4814 unique IP addresses were made from 5th January to 14th February 2011. This continued even when an empty response or redirect response was sent, suggesting that this was an attempt to hog bandwidth rather than to obtain a collection of images.

The attached extract from the Apache log file contains one line for each unique IP address, but omits repeated addresses. Search for vulnerabilities

This robot looks for badly-programmed PHP files (there are none on this site).

Trend Micro (,,,,, trend-micro-attacks.txt Bandwidth drain

A robot misidentifying itself as Internet Explorer version 6.0, from multiple IP addresses belonging to Trend Micro, repeatedly downloads the same pages. It ignores robots.txt. It does not use compression to download. See also Trend Micro Malware Discussions (some of the IPs are the same).


A site sent a series of repeated automated requests to English to katakana converter.

Yeti/1.0 yeti.txt Badly-programmed robot
The "Yeti" robot with a user agent string
Yeti/1.0 (NHN Corp.;
is a badly-programmed robot. It fires off multiple requests for pages at an alarming rate. Unlike almost every other web robot, the incompetently programmed Yeti robot does not use any gzip compression, thus resulting in a huge waste of bandwidth as it downloads pages in uncompressed format. The page contains a URL with help but the page is only provided in Korean, even though the crawler downloads pages in English.

Copyright © Ben Bullock 2009-2023. All rights reserved. For comments, questions, and corrections, please email Ben Bullock ( or use the discussion group at Google Groups. / Disclaimer