Wget

Missing image
Wget-screenshot.png
Screenshot of a simple invocation of Wget, taken on Linux.

GNU Wget is a free software program that implements simple and powerful content retrieval from web servers and is part of the GNU project. Its name is derived from a World Wide Web and get, connotative of its primary function. It currently supports downloading via HTTP, HTTPS, and FTP protocols, the most popular TCP/IP-based protocols used for web browsing.

Wget's features include recursive download, conversion of links for offline viewing of local HTML, support for proxies, and much more. It appeared in 1996, coinciding with the boom of popularity of the web, causing its wide use among Unix users and distributed with all major Linux distributions. Written in portable C, Wget can be easily installed on any Unix-like system and has been ported to diverse environments, including MacOS X, Microsoft Windows [1] (http://xoomer.virgilio.it/hherold/), and VMS [2] (http://www.antinode.org/dec/sw/wget.html). It has been used as the basis for graphical programs such as gwget. [3] (http://gnome.org/projects/gwget/index.html)

Contents

Features

Robustness

Wget has been designed for robustness over slow or unstable network connections. If a download does not complete due to a network problem, Wget will automatically try to continue the download from where it left off, and repeat this until the whole file has been retrieved. It was one of the first clients to make use of the then-new Range HTTP header to support this feature.

Recursive download

Wget can extract resources linked from HTML pages and download them in sequence, repeating the process recursively until all the visible pages have been downloaded or a maximum recursion depth specified by the user has been reached. The downloaded pages are saved in a directory structure resembling that on the remote server. This "recursive download" enables partial or complete mirroring of web sites via HTTP. Links in downloaded HTML pages can be adjusted to point to locally downloaded material for offline viewing. When performing this kind of automatic mirroring of web sites, Wget supports the Robots Exclusion Standard.

Recursive download works with FTP as well, where Wget issues the LIST command to find which additional files to download, repeating this process for directories and files under the one specified in the top URL. Shell-like wildcards are supported when the download of FTP URLs is requested.

When downloading recursively over either HTTP or FTP, Wget can be instructed to inspect the timestamps of local and remote files, and download only the remote files newer than the corresponding local ones. This allows easy mirroring of HTTP and FTP sites, but is considered inefficient and more error-prone when compared to programs designed for mirroring from the ground up, such as rsync. On the other hand, Wget does not require special server-side software for mirroring, so the comparison is at least somewhat flawed.

Non-interactiveness

Wget is non-interactive in the sense that, once started, it does not require user interaction and does not need to control a TTY, being able to log its progress to a separate file for later inspection. That way the user can start Wget and log off, leaving the program unattended. By contrast, most graphical and curses-based Web browsers require the user to remain logged in and to manually restart failed downloads, which can be a great hindrance when transferring a lot of data.

Portability

Written in a highly portable style of C with minimal dependencies on third-party libraries, Wget requires little more than a C compiler and a BSD-like interface to TCP/IP networking. Designed as a Unix program invoked from the Unix shell, Wget has been ported to numerous Unix-like environments and systems, such as Cygwin and Mac OS X, as well as to Microsoft Windows.

Other

  • Wget supports download through proxies, which are widely deployed to provide web access inside company firewalls and to cache and quickly deliver frequently accessed content.
  • It makes use of persistent HTTP connections where available.
  • IPv6 is supported on systems that include the appropriate interfaces.
  • SSL/TLS is supported for encrypted downloads using the OpenSSL library.
  • Files larger than 2 GB are supported on 32-bit systems that include the appropriate interfaces.
  • Download speed may be limited to avoid using up all of the available bandwidth.

Using Wget

Typical usage of GNU Wget consists of invoking it from the command line, providing one or more URLs as arguments.

# Download the title page of the English language wikipedia to a file
# named "index.html".
wget http://en.wikipedia.org/

# Download Wget's source code from the GNU ftp site.
wget ftp://ftp.gnu.org/pub/gnu/wget/wget-1.10.tar.gz

More complex usage includes automatic download of multiple URLs into a directory hierarchy.

# Download the title page of the English language wikipedia, along with
# the images and style sheets needed to display the page, and convert the
# URLs inside it to refer to locally available content.
wget -p -k http://en.wikipedia.org/

# Download the entire contents of en.wikipedia.org
wget -r -l0 http://en.wikipedia.org/

Authors and copyright

GNU Wget was written by Hrvoje Nikšić with contributions by many other people, including Dan Harkless, Ian Abbott, and Mauro Tortonesi. Significant contributions are credited in the AUTHORS file included in the distribution, and all remaining ones are documented in the changelogs, also included with the program. Wget is now maintained by Mauro Tortonesi.

The copyright to Wget belongs to the Free Software Foundation, whose policy is to require copyright assignments for all non-trivial contributions to GNU software.

History

Early history

Wget is the descendant of an earlier program named Geturl by the same author, the development of which commenced in late 1995. The name was changed to Wget after the author became aware of an earlier Amiga program named GetURL, written by James Burton [4] (http://jmsh.net/james/) in AREXX.

Wget filled a gap in the web downloading software available in the mid-1990s. No single program was able to reliably download files via both HTTP and FTP protocols. Existing programs either only supported FTP (such as NcFTP and dl (ftp://gnjilux.srk.fer.hr/pub/unix/util/dl/)) or were written in Perl, which was not yet ubiquitous at the time. While Wget was inspired by features of some of the existing programs, it aimed to support both HTTP and FTP and to enable the users to build it using only the standard development tools found on every Unix system.

At that time many Unix users struggled behind extremely slow university and dial-up Internet connections, leading to a growing need for a downloading agent that could deal with transient network failures without assistance from the human operator.

Notable releases

The following releases represent notable milestones in Wget's development. Features listed next to each release are edited for brevity and do not constitute comprehensive information about the release, which is available in the NEWS file distributed with Wget [5] (http://svn.dotsrc.org/repo/wget/tags/WGET_1_10/NEWS).

  • Geturl 1.0, released January 1996, was the first publically available release. The first English-language announcement can be traced to this Usenet news posting (http://groups-beta.google.com/group/comp.infosystems.www.announce/msg/4268334d269d42ce?hl=en), which probably refers to Geturl 1.3.4 released shortly before.
  • Wget 1.4.0, released November 1996, was the first version to use the name Wget. It was also the first release distributed under the terms of the GNU GPL, Geturl having been distributed under an ad-hoc no-warranty license.
  • Wget 1.4.3, released February 1997, was the first version released as part of the GNU project with the copyright assigned to the FSF.
  • Wget 1.5.3, released September 1998, was a milestone in Wget's popularity. This version was bundled with many Linux distributions, which exposed the program to a much wider audience.
  • Wget 1.6, released December 1999, incorporated many bug fixes for the (by then stale) 1.5.3 release, largely thanks to the effort of Dan Harkless.
  • Wget 1.7, released June 2001, introduced SSL support, cookies, and persistent connections.
  • Wget 1.8, released December 2001, added bandwidth throttling, new progress indicators, and the breadth-first traversal of the hyperlink graph.
  • Wget 1.9, released October 2003, included experimental IPv6 support, and ability to POST data to HTTP servers.
  • Wget 1.10, released June 2005, introduced large file support, IPv6 support on dual-family systems, NTLM authorization, and SSL improvements. The maintainership was picked up by Mauro Tortonesi.

Development and release cycle

Wget is developed in an open fashion, most of the design decisions typically being discussed on the public mailing list [6] (http://news.gmane.org/gmane.comp.web.wget.general) followed by users and developers. Bug reports are relayed to the same list.

Source contributions

The preferred method of contributing to Wget's code and documentation is through source updates in the form of patch files generated by the diff utility. Patches intended for inclusion in Wget are submitted to a designated mailing list [7] (http://news.gmane.org/gmane.comp.web.wget.patches) where they are reviewed by the maintainers. Patches that pass the maintainers' scrutiny are installed in the sources, and all others are rejected. Instructions on patch creation as well as style guidelines are outlined in the PATCHES document provided with the distribution [8] (http://svn.dotsrc.org/repo/wget/trunk/PATCHES). Because all changes go through this list, even ones from core developers, the subscribers of the list can track Wget development and provide feedback.

The source code can also be tracked via version control, which has recently switched from CVS to Subversion. The repository used to be browsable, but the latest commits are unavailable by web until a new web interface is installed for the Subversion repository.

Releases

When a sufficient number of features or bug fixes accumulate during development, Wget is released to the general public via the GNU FTP site and its mirrors. Being entirely run by volunteers, there is no external pressure to issue a release nor are there enforcable release deadlines.

Releases are numbered as versions of the form of major.minor[.revision], such as Wget 1.10 or Wget 1.8.2. An increase of the major version number represents large and possibly incompatible changes in Wget's behavior or a radical redesign of the code base. An increase of the minor version number designates addition of new features and bug fixes. A new revision indicates a release that, compared to the previous revision, only contains bug fixes. Revision zero is omitted, meaning that for example Wget 1.10 is the same as 1.10.0. Wget does not use the odd-even release number convention popularized by the Linux kernel.

At any moment there are two branches of development: the trunk, where new features get added, and the stable branch, forked after each minor release, where only the bug fixes are applied. All revision-level releases are built off the stable branch; all minor version releases are built off the trunk.

Criticisms of Wget

Several criticisms of Wget have recurred in public forums and mailing lists. The most important ones are:

  • Wget supports few download protocols, especially compared to cURL. It doesn't support any of the media streaming protocols, such as mms and rtsp, nor the increasingly popular P2P protocols. While not supporting media protocols can be explained with their lack of specifications, it is also true that many see Wget's code base as being centered around HTTP and FTP.
  • It has lagged behind with support for the more recent HTTP and FTP features. For example, it still uses HTTP/1.0, which was acceptable several years ago, but less so now that most of the world has upgraded to HTTP/1.1.
  • It is not flexible enough to be used as a building block for shell scripts because its recursive download either does all or nothing—it cannot be made to do only part of the work and output the information needed to continue, which could then be filtered or otherwise processed by other programs. (Printing out just the list of URLs to download without actually fetching them would be an example of such output.) This has lead some people to use cURL for some tasks for which Wget might be otherwise useful.
  • Its mirroring facilities are ill-suited for modern web sites, especially inability to download content referenced from JavaScript or from CSS.
  • Wget is easily confused when mirroring complex sites, especially on repeated mirroring runs because it lacks a database where it would store metadata describing previous mirroring runs.
  • Wget's development has often been perceived as slow and sporadic; many users have been frustrated by the lack of response to feature requests. An example of this would be the inclusion of large file support, which happened in 2005, although the feature has been widely and frequently requested since at least 2002.
  • Wget's large number of options has been criticized as feature creep—as of this writing (June 2005), Wget has more than 100 command-line switches. Although the options are well-documented in the manual, they are daunting even to the experienced user, and the interaction of various options can sometimes lead to surprises.
  • Several security flaws have been exposed in Wget. [9] (http://marc.theaimsgroup.com/?l=bugtraq&m=103962838628940&w=2) [10] (http://marc.theaimsgroup.com/?l=bugtraq&m=108481268725276&w=2) [11] (http://marc.theaimsgroup.com/?l=bugtraq&m=110269474112384&w=2) Although the flaws have not adversely impacted the majority of users and have since been corrected, it has been claimed that the code base was not written with security in mind and that the maintainers are insufficiently sensitive to security-related concerns.

Wget's maintainers have stated their awareness of these criticisms, and claim to be working on addressing them in future releases.

License

GNU Wget is distributed under the terms of the GNU General Public License, version 2 or later, with a special exception that allows distribution of binaries linked against the OpenSSL library. The text of the exception follows:

In addition, as a special exception, the Free Software Foundation gives permission to link the code of its release of Wget with the OpenSSL project's "OpenSSL" library (or with modified versions of it that use the same license as the "OpenSSL" library), and distribute the linked executables. You must obey the GNU General Public License in all respects for all of the code used other than "OpenSSL". If you modify this file, you may extend this exception to your version of the file, but you are not obligated to do so. If you do not wish to do so, delete this exception statement from your version.

Because the OpenSSL exception makes the license uglier and harder to understand due to the license no longer being just the well-studied GNU GPL, it is expected that the exception clause will be removed once Wget is modified to also link with the GnuTLS library.

Wget's documentation, in the form of a Texinfo reference manual, is distributed under the terms of the GNU Free Documentation License, version 1.2 or later. The man page usually distributed on Unix-like systems is automatically generated from a subset of the Texinfo manual and falls under the terms of the same license.

See also

External links

es:Wget fr:Wget pl:Wget ru:Wget

Navigation

  • Art and Cultures
    • Art (https://academickids.com/encyclopedia/index.php/Art)
    • Architecture (https://academickids.com/encyclopedia/index.php/Architecture)
    • Cultures (https://www.academickids.com/encyclopedia/index.php/Cultures)
    • Music (https://www.academickids.com/encyclopedia/index.php/Music)
    • Musical Instruments (http://academickids.com/encyclopedia/index.php/List_of_musical_instruments)
  • Biographies (http://www.academickids.com/encyclopedia/index.php/Biographies)
  • Clipart (http://www.academickids.com/encyclopedia/index.php/Clipart)
  • Geography (http://www.academickids.com/encyclopedia/index.php/Geography)
    • Countries of the World (http://www.academickids.com/encyclopedia/index.php/Countries)
    • Maps (http://www.academickids.com/encyclopedia/index.php/Maps)
    • Flags (http://www.academickids.com/encyclopedia/index.php/Flags)
    • Continents (http://www.academickids.com/encyclopedia/index.php/Continents)
  • History (http://www.academickids.com/encyclopedia/index.php/History)
    • Ancient Civilizations (http://www.academickids.com/encyclopedia/index.php/Ancient_Civilizations)
    • Industrial Revolution (http://www.academickids.com/encyclopedia/index.php/Industrial_Revolution)
    • Middle Ages (http://www.academickids.com/encyclopedia/index.php/Middle_Ages)
    • Prehistory (http://www.academickids.com/encyclopedia/index.php/Prehistory)
    • Renaissance (http://www.academickids.com/encyclopedia/index.php/Renaissance)
    • Timelines (http://www.academickids.com/encyclopedia/index.php/Timelines)
    • United States (http://www.academickids.com/encyclopedia/index.php/United_States)
    • Wars (http://www.academickids.com/encyclopedia/index.php/Wars)
    • World History (http://www.academickids.com/encyclopedia/index.php/History_of_the_world)
  • Human Body (http://www.academickids.com/encyclopedia/index.php/Human_Body)
  • Mathematics (http://www.academickids.com/encyclopedia/index.php/Mathematics)
  • Reference (http://www.academickids.com/encyclopedia/index.php/Reference)
  • Science (http://www.academickids.com/encyclopedia/index.php/Science)
    • Animals (http://www.academickids.com/encyclopedia/index.php/Animals)
    • Aviation (http://www.academickids.com/encyclopedia/index.php/Aviation)
    • Dinosaurs (http://www.academickids.com/encyclopedia/index.php/Dinosaurs)
    • Earth (http://www.academickids.com/encyclopedia/index.php/Earth)
    • Inventions (http://www.academickids.com/encyclopedia/index.php/Inventions)
    • Physical Science (http://www.academickids.com/encyclopedia/index.php/Physical_Science)
    • Plants (http://www.academickids.com/encyclopedia/index.php/Plants)
    • Scientists (http://www.academickids.com/encyclopedia/index.php/Scientists)
  • Social Studies (http://www.academickids.com/encyclopedia/index.php/Social_Studies)
    • Anthropology (http://www.academickids.com/encyclopedia/index.php/Anthropology)
    • Economics (http://www.academickids.com/encyclopedia/index.php/Economics)
    • Government (http://www.academickids.com/encyclopedia/index.php/Government)
    • Religion (http://www.academickids.com/encyclopedia/index.php/Religion)
    • Holidays (http://www.academickids.com/encyclopedia/index.php/Holidays)
  • Space and Astronomy
    • Solar System (http://www.academickids.com/encyclopedia/index.php/Solar_System)
    • Planets (http://www.academickids.com/encyclopedia/index.php/Planets)
  • Sports (http://www.academickids.com/encyclopedia/index.php/Sports)
  • Timelines (http://www.academickids.com/encyclopedia/index.php/Timelines)
  • Weather (http://www.academickids.com/encyclopedia/index.php/Weather)
  • US States (http://www.academickids.com/encyclopedia/index.php/US_States)

Information

  • Home Page (http://academickids.com/encyclopedia/index.php)
  • Contact Us (http://www.academickids.com/encyclopedia/index.php/Contactus)

  • Clip Art (http://classroomclipart.com)
Toolbox
Personal tools