User agent
|
A user agent is the client application used with a particular network protocol; the phrase is most commonly used in reference to those which access the World Wide Web. Web user agents range from web browsers to search engine crawlers ("spiders"), as well as screen readers and braille browsers used by people with disabilities.
When Internet users visit a web site, a text string is generally sent to identify the user agent to the server. This forms part of the HTTP request, prefixed with User-agent: or User-Agent: and typically includes information such as the application name, version, host operating system, and language. Bots, such as web crawlers, often also include a URL and/or e-mail address so that the webmaster can contact the operator of the bot.
The user-agent string is one of the criteria by which crawlers can be excluded from certain pages or parts of a website using the "Robots Exclusion Standard" (robots.txt). This allows webmasters who feel that certain parts of their website should not be included in the data gathered by a particular crawler, or that a particular crawler is using up too much bandwidth, to request that crawler not to visit those pages.
Contents |
User agent spoofing
At various points in its history, use of the Web has been dominated by one browser to the extent that many websites are designed to work with that particular browser, rather than according to standards from bodies such as the W3C and IETF. Such sites often include "browser sniffing" code, which alters the information sent out depending on the User-Agent string received. This can mean that less popular browsers are not sent complex content, even though they might be able to deal with it correctly, or in extreme cases refused all content. Thus various browsers "cloak" or "spoof" this string, in order to identify themselves as something else to such detection code; often, the browser's real identity is then included later in the string.
The earliest example of this is Internet Explorer's use of a User-Agent string beginning "Mozilla/<version> (compatible; MSIE <version>...", in order to receive content intended for Netscape Navigator, its main rival at the time of its development. It should be stressed that this is not a reference to the open-source Mozilla browser, which was developed much later, but to the original codename for Navigator, which was also the name of the Netscape company mascot. This format of User-Agent string has since been copied by other user agents, partly because Explorer, in turn, came to dominate.
More recently, with Internet Explorer becoming by far the dominant browser, rivals such as Opera and Safari implemented systems whereby the user could select a false User-Agent string to send, such as that of a recent version of Explorer. Some – e.g. Safari – duplicate the User-Agent string they are trying to spoof exactly; others – e.g. Opera – duplicate the User-Agent string but add the genuine browser name to the end. This latter approach, of course, leads to a string containing three names and versions: first, the user agent claims to be "Mozilla" (i.e. Netscape Navigator); then, "MSIE" (Internet Explorer); and finally, the actual browser, such as "Opera".
This vicious circle is expected to continue in the area of web browsers. Some standards-based web developers have started the "Viewable With Any Browser" campaign which encourages developers to design webpages according to official standards, not for any particular browser(s).
As of 2005, many websites are more standards-compliant than at other times in the history of the web. However, out-dated JavaScript, which effectively locks out browsers other than Explorer or Navigator, is still in use - especially on smaller, non-corporate, websites. This is often blamed on use of voodoo programming, in the form of copying and pasting older code without actually understanding what effect this will have on the website.
Example user-agent strings
Browsers
- Internet Explorer 5.5 on Windows 2000: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
- Internet Explorer 6.0 in MSN on Windows 98: Mozilla/4.0 (compatible; MSIE 6.0; MSN 2.5; Windows 98)
- Internet Explorer 7.0 beta running on Windows Longhorn: Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)
- Internet Explorer 5.2 on Mac OS X: Mozilla/4.0 (compatible; MSIE 5.23; Mac_PowerPC)
- Konqueror 3.1 (French): Mozilla/5.0 (compatible; Konqueror/3.1; Linux 2.4.22-10mdk; X11; i686; fr, fr_FR)
- Mozilla 1.7.8 on Linux: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050511
- Mozilla Firefox 1.0.4 on Windows XP: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4
- Mozilla Firefox 1.0.4 on Ubuntu Linux, on AMD64: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.6) Gecko/20050512 Firefox
- Mozilla Firefox 1.0.4 on FreeBSD 5.4 on i386: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.8) Gecko/20050609 Firefox/1.0.4
- Netscape 4.8 on Windows XP: Mozilla/4.8 [en] (Windows NT 5.0; U)
- Netscape 7 on Sun Solaris 8: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020920 Netscape/7.0
- Netscape 8.0.1 on Windows XP using Gecko: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20050519 Netscape/8.0.1
- Netscape 8.0.1 on Windows XP using MSHTML (with .NET installed) : Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50215) Netscape/8.0.1
- Opera 6.03 on Windows 2000, cloaked as MSIE: Mozilla/4.0 (compatible; MSIE 5.0; Windows 2000) Opera 6.03 [en]
- Opera 7.23 on Windows 98: Opera/7.23 (Windows 98; U) [en]
- Opera 8.00 on Windows XP: Opera/8.00 (Windows NT 5.1; U; en)
- Opera 8.00 on Gentoo Linux: Opera/8.0 (X11; Linux i686; U; cs)
- Safari v125 on Mac OS X: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/124 (KHTML, like Gecko) Safari/125
- Safari v125 on Mac OS X, cloaked as MSIE: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2)
- ELinks 0.4pre5 on Linux: ELinks (0.4pre5; Linux 2.4.27 i686; 80x25)
- Links 0.99pre14 under Cygwin on Windows 2000: Links (0.99pre14; CYGWIN_NT-5.0 1.5.16(0.128/4/2) i686; 80x25)
- Links 2.1pre17 under Gentoo Linux: Links (2.1pre17; Linux 2.6.11-gentoo-r8 i686; 80x24)
- Lynx 2.8.4rel.1 on Linux: Lynx/2.8.4rel.1 libwww-FM/2.14
- Off By One 3.5a on Windows XP: Mozilla/4.7 (compatible; OffByOne; Windows 2000)
- w3m on FreeBSD: w3m/0.5.1
Bots
- Crawler for Ask Jeeves/Teoma: Mozilla/2.0 (compatible; Ask Jeeves/Teoma)
- cURL: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7b zlib/1.2.2
- Googlebot: Googlebot/2.1 (+http://www.google.com/bot.html)
- Googlebot alternate: Mozilla/5.0 (compatible; googlebot/2.1; +http://www.google.com/bot.html)
- Grub: Mozilla/4.0 (compatible; grub-client-1.4.3; Crawl your own stuff with http://grub.org)
- MSN bot: msnbot/0.11 (+http://search.msn.com/msnbot.htm)
- wget: Wget/1.9
- Yahoo! Slurp: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
External links
- List of user agent strings (http://www.pgts.com.au/pgtsj/pgtsj0208c.html)
- User Agent Switcher for Firefox and Mozilla (GNU GPL license) (http://www.chrispederick.com/work/firefox/useragentswitcher/)de:User Agent