Wget

GNU Wget is a free software package for retrieving files using HTTP, HTTPS, FTP and FTPS, the most widely used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.

URL Format

http://host[:port]/directory/file
ftp://host[:port]/directory/file

ftp://user:password@host/path
http://user:password@host/path

Can also use:
--user=user
--password=password
--ask-password

Basics

-b, --background

Go to background immediately after startup. If no output file is specified via the ‘-o’, output is redirected to wget-log.

-o logfile, --output-file=logfile

Log all messages to logfile. The messages are normally reported to standard error.

-a logfile, --append-output=logfile

Append to logfile. This is the same as ‘-o’, only it appends to logfile instead of overwriting the old log file. If logfile does not exist, a new file is created.


-i file, --input-file=file

Read URLs from a local or external file. If ‘-’ is specified as file, URLs are read from the standard input. (Use ‘./-’ to read from a file literally named ‘-’.)

-B URL, --base=URL

Resolves relative links using URL as the point of reference. For instance, if you specify ‘http://foo/bar/a.html’ for URL, and Wget reads ‘../baz/b.html’ from the input file, it would be resolved to ‘http://foo/baz/b.html’.


-t number, --tries=number

Set number of tries to number. Specify 0 or ‘inf’ for infinite retrying. The default is to retry 20 times, with the exception of fatal errors like “connection refused” or “not found” (404), which are not retried.

-c, --continue

Continue getting a partially-downloaded file. This is useful when you want to finish up a download started by a previous instance of Wget, or by another program. Without ‘-c’, it would just download the remote file to FILENAME.1, leaving the truncated file alone.

--no-check-certificate

Don’t check the server certificate against the available certificate authorities. Also don’t require the URL host name to match the common name presented by the certificate.


-p, --page-requisites

This option causes Wget to download all the files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets.

After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.

Each link will be changed in one of the two ways:

  • The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.

Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ‘../bar/img.gif’. This kind of transformation works reliably for arbitrary combinations of directories.

  • The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.

Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.

Because of this, local browsing works reliably.

-r, --recursive

Turn on recursive retrieving. The default maximum depth is 5.

-l depth, --level=depth

Set the maximum number of subdirectories that Wget will recurse into to depth. In order to prevent one from accidentally downloading very large websites when using recursion this is limited to a depth of 5 by default, i.e., it will traverse at most 5 directories deep starting from the provided URL. Set ‘-l 0’ or ‘-l inf’ for infinite recursion depth.

-D domain-list, --domains=domain-list

Set domains to be followed. domain-list is a comma-separated list of domains.

--exclude-domains domain-list

Sources