pwget man page on Cygwin

Man page or keyword search:  
man Server   22533 pages
apropos Keyword Search (all sections)
Output format
Cygwin logo
[printable version]

PWGET(1)		 Perl pwget URL fetch utility		      PWGET(1)

NAME
       pwget - Perl Web URL fetch program

SYNOPSIS
	   pwget http://example.com/ [URL ...]
	   pwget --config $HOME/config/pwget.conf --tag linux --tag emacs ..
	   pwget --verbose --overwrite http://example.com/
	   pwget --verbose --overwrite --Output ~/dir/ http://example.com/
	   pwget --new --overwrite http://example.com/package-1.1.tar.gz

DESCRIPTION
       Automate periodic downloads of files and packages.

       If you retrieve latest versions of certain program blocks periodically,
       this is the Perl script for you. Run from cron job or once a week to
       upload newest versions of files around the net. Note:

   Wget and this program
       At this point you may wonder, where would you need this perl program
       when wget(1) C-program has been the standard for ages. Well, 1) Perl is
       cross platform and more easily extendable 2) You can record file
       download criterias to a configuration file and use perl regular
       epxressions to select downloads 3) the program can anlyze web-pages and
       "search" for the download only links as instructed 4) last but not
       least, it can track newest packages whose name has changed since last
       downlaod. There are heuristics to determine the newest file or package
       according to file name skeleton defined in configuration.

       This program does not replace pwget(1) because it does not offer as
       many options as wget, like recursive downloads and date comparing. Use
       wget for ad hoc downloads and this utility for files that change (new
       releases of archives) or which you monitor periodically.

   Short introduction
       This small utility makes it possible to keep a list of URLs in a
       configuration file and periodically retrieve those pages or files with
       simple commands. This utility is best suited for small batch jobs to
       download e.g. most recent versions of software files. If you use an URL
       that is already on disk, be sure to supply option --overwrite to allow
       overwriting existing files.

       While you can run this program from command line to retrieve individual
       files, program has been designed to use separate configuration file via
       --config option. In the configuration file you can control the
       downloading with separate directives like "save:" which tells to save
       the file under different name. The simplest way to retreive the latest
       version of apackage from a FTP site is:

	   pwget --new --overwite --verbose \
	      http://www.example.com/package-1.00.tar.gz

       Do not worry about the filename "package-1.00.tar.gz". The latest
       version, say, "package-3.08.tar.gz" will be retrieved. The option --new
       instructs to find newer version than the provided URL.

       If the URL ends to slash, then directory list at the remote machine is
       stored to file:

	   !path!000root-file

       The content of this file can be either index.html or the directory
       listing depending on the used http or ftp protocol.

OPTIONS
       -A, --regexp-content REGEXP
	   Analyze the content of the file and match REGEXP. Only if the
	   regexp matches the file content, then download file. This option
	   will make downloads slow, because the file is read into memory as a
	   single line and then a match is searched against the content.

	   For example to download Emacs lisp file (.el) written by Mr. Foo in
	   case insensitive manner:

	       pwget -v -r '\.el$' -A "(?i)Author: Mr. Foo" \
		 http://www.emacswiki.org/elisp/index.html

       -C, --create-paths
	   Create paths that do not exist in "lcd:" directives.

	   By default, any LCD directive to non-existing directory will
	   interrupt program. With this option, local directories are created
	   as needed making it possible to re-create the exact structure as it
	   is in configuration file.

       -c, --config FILE
	   This option can be given multiple times. All configurations are
	   read.

	   Read URLs from configuration file. If no configuration file is
	   given, file pointed by environment variable is read. See
	   ENVIRONMENT.

	   The configuration file layout is envlained in section CONFIGURATION
	   FILE

       --chdir DIRECTORY
	   Do a chdir() to DIRECTORY before any URL download starts. This is
	   like doing:

	       cd DIRECTORY
	       pwget http://example.com/index.html

       -d, --debug [LEVEL]
	   Turn on debug with positive LEVEL number. Zero means no debug.
	   This option turns on --verbose too.

       -e, --extract
	   Unpack any files after retrieving them. The command to unpack
	   typical archive files are defined in a program. Make sure these
	   programs are along path. Win32 users are encouraged to install the
	   Cygwin utilities where these programs come standard. Refer to
	   section SEE ALSO.

	     .tar => tar
	     .tgz => tar + gzip
	     .gz  => gzip
	     .bz2 => bzip2
	     .zip => unzip

       -F, --firewall FIREWALL
	   Use FIREWALL when accessing files via ftp:// protocol.

       -h, --help
	   Print help page in text.

       --help-html
	   Print help page in HTML.

       --help-man
	   Print help page in Unix manual page format. You want to feed this
	   output to c<nroff -man> in order to read it.

	   Print help page.

       -m, --mirror SITE
	   If URL points to Sourcefoge download area, use mirror SITE for
	   downloading.	 Alternatively the full full URL can include the
	   mirror information. And example:

	       --mirror kent http://downloads.sourceforge.net/foo/foo-1.0.0.tar.gz

       -n, --new
	   Get newest file. This applies to datafiles, which do not have
	   extension .asp or .html. When new releases are announced, the
	   version number in filename usually tells which is the current one
	   so getting harcoded file with:

	       pwget -o -v http://example.com/dir/program-1.3.tar.gz

	   is not usually practical from automation point of view. Adding
	   --new option to the command line causes double pass: a) the whole
	   http://example.com/dir/ is examined for all files and b) files
	   matching approximately filename program-1.3.tar.gz are examined,
	   heuristically sorted and file with latest version number is
	   retrieved.

       --no-lcd
	   Ignore "lcd:" directives in configuration file.

	   In the configuration file, any "lcd:" directives are obeyed as they
	   are seen. But if you do want to retrieve URL to your current
	   directory, be sure to supply this option. Otherwise the file will
	   end to the directory pointer by "lcd:".

       --no-save
	   Ignore "save:" directives in configuration file. If the URLs have
	   "save:" options, they are ignored during fetch. You usually want to
	   combine --no-lcd with --no-save

       --no-extract
	   Ignore "x:" directives in configuration file.

       -O, --output DIR
	   Before retrieving any files, chdir to DIR.

       -o, --overwrite
	   Allow overwriting existing files when retrieving URLs.  Combine
	   this with --skip-version if you periodically update files.

       --proxy PROXY
	   Use PROXY server for HTTP. (See --Firewall for FTP.). The port
	   number is optional in the call:

	       --proxy http://example.com.proxy.com
	       --proxy example.com.proxy.com:8080

       -p, --prefix PREFIX
	   Add PREFIX to all retrieved files.

       -P, --postfix POSTFIX
	   Add POSTFIX to all retrieved files.

       -D, --prefix-date
	   Add iso8601 ":YYYY-MM-DD" prefix to all retrived files.  This is
	   added before possible --prefix-www or --prefix.

       -W, --prefix-www
	   Usually the files are stored with the same name as in the URL dir,
	   but if you retrieve files that have identical names you can store
	   each page separately so that the file name is prefixed by the site
	   name.

	       http://example.com/page.html    --> example.com::page.html
	       http://example2.com/page.html   --> example2.com::page.html

       -r, --regexp REGEXP
	   Retrieve file matching at the destination URL site. This is like
	   "Connect to the URL and get all files matching REGEXP". Here all
	   gzip compressed files are found form HTTP server directory:

	       pwget -v -r "\.gz" http://example.com/archive/

	   Caveat: currently works only for http:// URLs.

       -R, --config-regexp REGEXP
	   Retrieve URLs matching REGEXP from configuration file. This cancels
	   --tag options in the command line.

       -s, --selftest
	   Run some internal tests. For maintainer or developer only.

       --sleep SECONDS
	   Sleep SECONDS before next URL request. When using regexp based
	   downlaods that may return many hits, some sites disallow successive
	   requests in within short period of time. This options makes program
	   sleep for number of SECONDS between retrievals to overcome 'Service
	   unavailable'.

       --stdout
	   Retrieve URL and write to stdout.

       --skip-version
	   Do not download files that have version number and which already
	   exists on disk. Suppose you have these files and you use option
	   --skip-version:

	       package.tar.gz
	       file-1.1.tar.gz

	   Only file.txt is retrieved, because file-1.1.tar.gz contains
	   version number and the file has not changed since last retrieval.
	   The idea is, that in every release the number in in distribution
	   increases, but there may be distributions which do not contain
	   version number. In regular intervals you may want to load those
	   packages again, but skip versioned files. In short: This option
	   does not make much sense without additional option --new

	   If you want to reload versioned file again, add option --overwrite.

       -t, --test, --dry-run
	   Run in test mode.

       -T, --tag NAME [NAME] ...
	   Search tag NAME from the config file and download only entries
	   defined under that tag. Refer to --config FILE option description.
	   You can give Multiple --tag switches. Combining this option with
	   --regexp does not make sense and the concequencies are undefined.

       -v, --verbose [NUMBER]
	   Print verbose messages.

       -V, --version
	   Print version information.

EXAMPLES
       Get files from site:

	   pwget http://www.example.com/dir/package.tar.gz ..

       Display copyright file for package GNU make from Debian pages:

	   pwget --stdout --regexp 'copyright$' http://packages.debian.org/unstable/make

       Get all mailing list archive files that match "gz":

	   pwget --regexp gz  http://example.com/mailing-list/archive/download/

       Read a directory and store it to filename
       YYYY-MM-DD::!dir!000root-file.

	   pwget --prefix-date --overwrite --verbose http://www.example.com/dir/

       To update newest version of the package, but only if there is none at
       disk already. The --new option instructs to find newer packages and the
       filename is only used as a skeleton for files to look for:

	   pwget --overwrite --skip-version --new --verbose \
	       ftp://ftp.example.com/dir/packet-1.23.tar.gz

       To overwrite file and add a date prefix to the file name:

	   pwget --prefix-date --overwrite --verbose \
	      http://www.example.com/file.pl

	   --> YYYY-MM-DD::file.pl

       To add date and WWW site prefix to the filenames:

	   pwget --prefix-date --prefix-www --overwrite --verbose \
	      http://www.example.com/file.pl

	   --> YYYY-MM-DD::www.example.com::file.pl

       Get all updated files under cnfiguration file's tag updates:

	   pwget --verbose --overwrite --skip-version --new --tag updates
	   pwget -v -o -s -n -T updates

       Get files as they read in the configuration file to the current
       directory, ignoring any "lcd:" and "save:" directives:

	   pwget --config $HOME/config/pwget.conf /
	       --no-lcd --no-save --overwrite --verbose \
	       http://www.example.com/file.pl

       To check configuration file, run the program with non-matching regexp
       and it parses the file and checks the "lcd:" directives on the way:

	   pwget -v -r dummy-regexp

	   -->

	   pwget.DirectiveLcd: LCD [$EUSR/directory ...]
	   is not a directory at /users/foo/bin/pwget line 889.

CONFIGURATION FILE
   Comments
       The configuration file is NOT Perl code. Comments start with hash
       character (#).

   Variables
       At this point, variable expansions happen only in lcd:. Do not try to
       use them anywhere else, like in URLs.

       Path variables for lcd: are defined using following notation, spaces
       are not allowed in VALUE part (no directory names with spaces).
       Varaible names are case sensitive. Variables substitute environment
       variabales with the same name. Environment variables are immediately
       available.

	   VARIABLE = /home/my/dir	   # define variable
	   VARIABLE = $dir/some/file	   # Use previously defined variable
	   FTP	    = $HOME/ftp		   # Use environment variable

       The right hand can refer to previously defined variables or existing
       environment variables. Repeat, this is not Perl code although it may
       look like one, but just an allowed syntax in the configuration file.
       Notice that there is dollar to the right hand> when variable is
       referred, but no dollar to the left hand side when variable is defined.
       Here is example of a possible configuration file contant. The tags are
       hierarchically ordered without a limit.

       Warning: remember to use different variables names in separate include
       files. All variables are global.

   Include files
       It is possible to include more configuration files with statement

	   INCLUDE <path-to-file-name>

       Variable expansions are possible in the file name. There is no limit
       how many or how deep include structure is used. Every file is included
       only once, so it is safe to to have multiple includes to the same file.
       Every include is read, so put the most importat override includes last:

	   INCLUDE <etc/pwget.conf>		# Global
	   INCLUDE <$HOME/config/pwget.conf>	# HOME overrides it

       A special "THIS" tag means relative path of the current include file,
       which makes it possible to include several files form the same
       directory where a initial include file resides

	   # Start of config at /etc/pwget.conf

	   # THIS = /etc, current location
	   include <THIS/pwget-others.conf>

	   # Refers to directory where current user is: the pwd
	   include <pwget-others.conf>

	   # end

   Configuraton file example
       The configuration file can contain many <directoves:>, where each
       directive end to a colon. The usage of each directory is best explained
       by examining the configuration file below and reading the commentary
       near each directive.

	   #   $HOME/config/pwget.conf F- Perl pwget configuration file

	   ROOT	  = $HOME		       # define variables
	   CONF	  = $HOME/config
	   UPDATE = $ROOT/updates
	   DOWNL  = $ROOT/download

	   #   Include more configuration files. It is possible to
	   #   split a huge file in pieces and have "linux",
	   #   "win32", "debian", "emacs" configurations in separate
	   #   and manageable files.

	   INCLUDE <$CONF/pwget-other.conf>
	   INCLUDE <$CONF/pwget-more.conf>

	   tag1: local-copies tag1: local      # multiple names to this category

	       lcd:  $UPDATE		       # chdir directive

	       #  This is show to user with option --verbose
	       print: Notice, this site moved YYYY-MM-DD, update your bookmarks

	       file://absolute/dir/file-1.23.tar.gz

	   tag1: external

	     lcd:  $DOWNL

	     tag2: external-http

	       http://www.example.com/page.html
	       http://www.example.com/page.html save:/dir/dir/page.html

	     tag2: external-ftp

	       ftp://ftp.com/dir/file.txt.gz save:xx-file.txt.gz login:foo pass:passwd x:

	       lcd: $HOME/download/package

	       ftp://ftp.com/dir/package-1.1.tar.gz new:

	     tag2: package-x

	       lcd: $DOWNL/package-x

	       #  Person announces new files in his homepage, download all
	       #  announced files. Unpack everything (x:) and remove any
	       #  existing directories (xopt:rm)

	       http://example.com/~foo pregexp:\.tar\.gz$ x: xopt:rm

	   # End of configuration file pwget.conf

LIST OF DIRECTIVES IN CONFIGURATION FILE
       All the directives must in the same line where the URL is. The programs
       scans lines and determines all options given in line for the URL.
       Directives can be overridden by command line options.

       cnv:CONVERSION
	   Currently only conv:text is available.

	   Convert downloaded page to text. This option always needs either
	   save: or rename:, because only those directives change filename.
	   Here is an example:

	       http://example.com/dir/file.html cnv:text save:file.txt
	       http://example.com/dir/ pregexp:\.html cnv:text rename:s/html/txt/

	   A text: shorthand directive can be used instead of cnv:text.

       cregexp:REGEXP
	   Download file only if the content matches REGEXP. This is same as
	   option --Regexp-content. In this example directory listing Emacs
	   lisp packages (.el) are downloaded but only if their content
	   indicates that the Author is Mr. Foo:

	       http://example.com/index.html cregexp:(?i)author:.*Foo pregexp:\.el$

       lcd:DIRECTORY
	   Set local download directory to DIRECTORY (chdir to it). Any
	   environment variables are substituted in path name. If this tag is
	   found, it replaces setting of --Output. If path is not a directory,
	   terminate with error.  See also --Create-paths and --no-lcd.

       login:LOGIN-NAME
	   Ftp login name. Default value is "anonymous".

       mirror:SITE
	   This is relevant to Sourceforge only which does not allow direct
	   downloads with links. Visit project's Sourceforge homepage and see
	   which mirrors are available for downloading.

	   An example:

	     http://sourceforge.net/projects/austrumi/files/austrumi/austrumi-1.8.5/austrumi-1.8.5.iso/download new: mirror:kent

       new:
	   Get newest file. This variable is reset to the value of --new after
	   the line has been processed. Newest means, that an "ls" command is
	   run in the ftp, and something equivalent in HTTP "ftp directories",
	   and any files that resemble the filename is examined, sorted and
	   heurestically determined according to version number of file which
	   one is the latest. For example files that have version information
	   in YYYYMMDD format will most likely to be retrieved right.

	   Time stamps of the files are not checked.

	   The only requirement is that filename "must" follow the universal
	   version numbering standard:

	       FILE-VERSION.extension	   # de facto VERSION is defined as [\d.]+

	       file-19990101.tar.gz	   # ok
	       file-1999.0101.tar.gz	   # ok
	       file-1.2.3.5.tar.gz	   # ok

	       file1234.txt		   # not recognized. Must have "-"
	       file-0.23d.tar.gz	   # warning, letters are problematic

	   Files that have some alphabetic version indicator at the end of
	   VERSION may not be handled correctly. Contact the developer and
	   inform him about the de facto standard so that files can be
	   retrieved more intelligently.

	   NOTE: In order the new: directive to know what kind of files to
	   look for, it needs a file tamplate. You can use a direct link to
	   some filename. Here the location "http://www.example.com/downloads"
	   is examined and the filename template used is took as
	   "file-1.1.tar.gz" to search for files that might be newer, like
	   "file-9.1.10.tar.gz":

	     http://www.example.com/downloads/file-1.1.tar.gz new:

	   If the filename appeard in a named page, use directive file: for
	   template. In this case the "download.html" page is examined for
	   files looking like "file.*tar.gz" and the latest is searched:

	     http://www.example.com/project/download.html file:file-1.1.tar.gz new:

       overwrite: o:
	   Same as turning on --overwrite

       page:
	   Read web page and apply commands to it. An example: contact the
	   root page and save it:

	      http://example.com/~foo page: save:foo-homepage.html

	   In order to find the correct information from the page, other
	   directives are usually supplied to guide the searching.

	   1) Adding directive "pregexp:ARCHIVE-REGEXP" matches the A HREF
	   links in the page.

	   2) Adding directive new: instructs to find newer VERSIONS of the
	   file.

	   3) Adding directive "file:DOWNLOAD-FILE" tells what template to use
	   to construct the downloadable file name. This is needed for the
	   "new:" directive.

	   4) A directive "vregexp:VERSION-REGEXP" matches the exact location
	   in the page from where the version information is extracted. The
	   default regexp looks for line that says "The latest version ... is
	   ... N.N".  The regexp must return submatch 2 for the version
	   number.

	   AN EXAMPLE

	   Search for newer files from a HTTP directory listing. Examine page
	   http://www.example.com/download/dir for model "package-1.1.tar.gz"
	   and find a newer file. E.g. "package-4.7.tar.gz" would be
	   downloaded.

	       http://www.example.com/download/dir/package-1.1.tar.gz new:

	   AN EXAMPLE

	   Search for newer files from the content of the page. The directive
	   file: acts as a model for filenames to pay attention to.

	       http://www.example.com/project/download.html new: pregexp:tar.gz file:package-1.1.tar.gz

	   AN EXAMPLE

	   Use directive rename: to change the filename before soring it on
	   disk. Here, the version number is attached to the actila filename:

	       file.txt-1.1
	       file.txt-1.2

	   The directived needed would be as follows; entries have been broken
	   to separate lines for legibility:

	       http://example.com/files/
	       pregexp:\.el-\d
	       vregexp:(file.el-([\d.]+))
	       file:file.el-1.1
	       new:
	       rename:s/-[\d.]+//

	   This effectively reads: "See if there is new version of something
	   that looks like file.el-1.1 and save it under name file.el by
	   deleting the extra version number at the end of original filename".

	   AN EXAMPLE

	   Contact absolute page: at http://www.example.com/package.html and
	   search A HREF urls in the page that match pregexp:. In addition, do
	   another scan and search the version number in the page from thw
	   position that match vregexp: (submatch 2).

	   After all the pieces have been found, use template file: to make
	   the retrievable file using the version number found from vregexp:.
	   The actual download location is combination of page: and A HREF
	   pregexp: location.

	   The directived needed would be as follows; entries have been broken
	   to separate lines for legibility:

	       http://www.example.com/~foo/package.html
	       page:
	       pregexp: package.tar.gz
	       vregexp: ((?i)latest.*?version.*?\b([\d][\d.]+).*)
	       file: package-1.3.tar.gz
	       new:
	       x:

	   An example of web page where the above would apply:

	       <HTML>
	       <BODY>

	       The latest version of package is <B>2.4.1</B> It can be
	       downloaded in several forms:

		   <A HREF="download/files/package.tar.gz">Tar file</A>
		   <A HREF="download/files/package.zip">ZIP file

	       </BODY>
	       </HTML>

	   For this example, assume that "package.tar.gz" is a symbolic link
	   pointing to the latest release file "package-2.4.1.tar.gz". Thus
	   the actual download location would have been
	   "http://www.example.com/~foo/download/files/package-2.4.1.tar.gz".

	   Why not simply download "package.tar.gz"? Because then the program
	   can't decide if the version at the page is newer than one stored on
	   disk from the previous download. With version numbers in the file
	   names, the comparison is possible.

       page:find
	   FIXME: This opton is obsolete. do not use.

	   THIS IS FOR HTTP only. Use Use directive regexp: for FTP protocls.

	   This is a more general instruction than the page: and vregexp:
	   explained above.

	   Instruct to download every URL on HTML page matching pregexp:RE. In
	   typical situation the page maintainer lists his software in the
	   development page. This example would download every tar.gz file in
	   the page. Note, that the REGEXP is matched against the A HREF link
	   content, not the actual text that is displayed on the page:

	       http://www.example.com/index.html page:find pregexp:\.tar.gz$

	   You can also use additional regexp-no: directive if you want to
	   exclude files after the pregexp: has matched a link.

	       http://www.example.com/index.html page:find pregexp:\.tar.gz$ regexp-no:desktop

       pass:PASSWORD
	   For FTP logins. Default value is "nobody@example.com".

       pregexp:RE
	   Search A HREF links in page matching a regular expression. The
	   regular expression must be a single word with no whitespace. This
	   is incorrect:

	       pregexp:(this regexp )

	   It must be written as:

	       pregexp:(this\s+regexp\s)

       print:MESSAGE
	   Print associated message to user requesting matching tag name.
	   This directive must in separate line inside tag.

	       tag1: linux

		 print: this download site moved 2002-02-02, check your bookmarks.
		 http://new.site.com/dir/file-1.1.tar.gz new:

	   The "print:" directive for tag is shown only if user turns on
	   --verbose mode:

	       pwget -v -T linux

       rename:PERL-CODE
	   Rename each file using PERL-CODE. The PERL-CODE must be full perl
	   program with no spaces anywhere. Following variables are available
	   during the eval() of code:

	       $ARG = current file name
	       $url = complete url for the file
	       The code must return $ARG which is used for file name

	   For example, if page contains links to .html files that are in fact
	   text files, following statement would change the file extensions:

	       http://example.com/dir/ page:find pregexp:\.html rename:s/html/txt/

	   You can also call function "MonthToNumber($string)" if the filename
	   contains written month name, like <2005-February.mbox>.The function
	   will convert the name into number. Many mailing list archives can
	   be donwloaded cleanly this way.

	       #  This will download SA-Exim Mailing list archives:
	       http://lists.merlins.org/archives/sa-exim/ pregexp:\.txt$ rename:$ARG=MonthToNumber($ARG)

	   Here is a more complicated example:

	       http://www.contactor.se/~dast/svnusers/mbox.cgi pregexp:mbox.*\d$ rename:my($y,$m)=($url=~/year=(\d+).*month=(\d+)/);$ARG="$y-$m.mbox"

	   Let's break that one apart. You may spend some time with this
	   example since the possiblilities are limitless.

	       1. Connect to page
		  http://www.contactor.se/~dast/svnusers/mbox.cgi

	       2. Search page for URLs matching regexp 'mbox.*\d$'. A
		  found link could match hrefs like this:
		  http://svn.haxx.se/users/mbox.cgi?year=2004&month=12

	       3. The found link is put to $ARG (same as $_), which can be used
		  to extract suitable mailbox name with a perl code that is
		  evaluated. The resulting name must apear in $ARG. Thus the code
		  effectively extract two items from the link to form a mailbox
		  name:

		   my ($y, $m) = ( $url =~ /year=(\d+).*month=(\d+)/ )
		   $ARG = "$y-$m.mbox"

		   => 2004-12.mbox

	   Just remember, that the perl code that follows "rename:" directive
	   must must not contain any spaces. It all must be readable as one
	   string.

       regexp:REGEXP
	   Get all files in ftp directory matching regexp. Directive save: is
	   ignored.

       regexp-no:REGEXP
	   After the "regexp:" directive has matched, exclude files that match
	   directive regexp-no:

       Regexp:REGEXP
	   This option is for interactive use. Retrieve all files from HTTP or
	   FTP site which match REGEXP.

       save:LOCAL-FILE-NAME
	   Save file under this name to local disk.

       tagN:NAME
	   Downloads can be grouped under "tagN" so that e.g. option --tag1
	   would start downloading files from that point on until next "tag1"
	   is found.  There are currently unlimited number of tag levels:
	   tag1, tag2 and tag3, so that you can arrange your downlods
	   hierarchially in the configuration file.  For example to download
	   all Linux files rhat you monitor, you would give option --tag
	   linux. To download only the NT Emacs latest binary, you would give
	   option --tag emacs-nt. Notice that you do not give the "level" in
	   the option, program will find it out from the configuration file
	   after the tag name matches.

	   The downloading stops at next tag of the "same level". That is,
	   tag2 stops only at next tag2, or when upper level tag is found
	   (tag1) or or until end of file.

	       tag1: linux	       # All Linux downlods under this category

		   tag2: sunsite    tag2: another-name-for-this-spot

		   #   List of files to download from here

		   tag2: ftp.funet.fi

		   #   List of files to download from here

	       tag1: emacs-binary

		   tag2: emacs-nt

		   tag2: xemacs-nt

		   tag2: emacs

		   tag2: xemacs

       x:  Extract (unpack) file after download. See also option --unpack and
	   --no-extract The archive file, say .tar.gz will be extracted the
	   file in current download location. (see directive lcd:)

	   The unpack procedure checks the contents of the archive to see if
	   the package is correctly formed. The de facto archive format is

	       package-N.NN.tar.gz

	   In the archive, all files are supposed to be stored under the
	   proper subdirectory with version information:

	       package-N.NN/doc/README
	       package-N.NN/doc/INSTALL
	       package-N.NN/src/Makefile
	       package-N.NN/src/some-code.java

	   "IMPORTANT:" If the archive does not have a subdirectory for all
	   files, a subdirectory is created and all items are unpacked under
	   it. The defualt subdirectory name in constructed from the archive
	   name with currect date stamp in format:

	       package-YYYY.MMDD

	   If the archive name contains something that looks like a version
	   number, the created directory will be constructed from it, instead
	   of current date.

	       package-1.43.tar.gz    =>  package-1.43

       xx: Like directive x: but extract the archive "as is", without checking
	   content of the archive. If you know that it is ok for the archive
	   not to include any subdirectories, use this option to suppress
	   creation of an artificial root package-YYYY.MMDD.

       xopt:rm
	   This options tells to remove any previous unpack directory.

	   Sometimes the files in the archive are all read-only and unpacking
	   the archive second time, after some period of time, would display

	       tar: package-3.9.5/.cvsignore: Could not create file:
	       Permission denied

	       tar: package-3.9.5/BUGS: Could not create file:
	       Permission denied

	   This is not a serious error, because the archive was already on
	   disk and tar did not overwrite previous files. It might be good to
	   inform the archive maintainer, that the files have wrong
	   permissions. It is customary to expect that distributed packages
	   have writable flag set for all files.

ERRORS
       Here is list of possible error messages and how to deal with them.
       Turning on  --debug will help to understand how program has interpreted
       the configuration file or command line options. Pay close attention to
       the generated output, because it may reveal that a regexp for a site is
       too lose or too tight.

       ERROR {URL-HERE} Bad file descriptor
	   This is "file not found error". You have written the filename
	   incorrectly.	 Double check the configuration file's line.

BUGS AND LIMITATIONS
       "Sourceforge note": To download archive files from Sourceforge requires
       some trickery because of the redirections and load balancers the site
       uses. The Sourceforge page have also undergone many changes during
       their existence. Due to these changes there exists an ugly hack in the
       program to use wget(1) to get certain infomation from the site.	This
       could have been implemented in pure Perl, but as of now the developer
       hasn't had time to remove the wget(1) dependency. No doubt, this is an
       ironic situation to use wget(1). You you have Perl skills, go ahead and
       look at UrlHttGet(). UrlHttGetWget() and sen patches.

       The program was initially designed to read options from one line. It is
       unfortunately not possible to change the program to read configuration
       file directives from multiple lines, e.g. by using backslashes (\) to
       indicate contuatinued line.

ENVIRONMENT
       Variable "PWGET_CFG" can point to the root configuration file. The
       configuration file is read at startup if it exists.

	   export PWGET_CFG=$HOME/conf/pwget.conf     # /bin/hash syntax
	   setenv PWGET_CFG $HOME/conf/pwget.conf     # /bin/csh syntax

EXIT STATUS
       Not defined.

DEPENDENCIES
       External utilities:

	   wget(1)   only needed for Sourceforge.net downloads
		     see BUGS AND LIMITATIONS

       Non-core Perl modules from CPAN:

	   LWP::UserAgent
	   Net::FTP

       The following modules are loaded in run-time only if directive cnv:text
       is used. Otherwise these modules are not loaded:

	   HTML::Parse
	   HTML::TextFormat
	   HTML::FormatText

       This module is loaded in run-time only if HTTPS scheme is used:

	   Crypt::SSLeay

SEE ALSO
       lwp-download(1) lwp-mirror(1) lwp-request(1) lwp-rget(1) wget(1)

AUTHOR
       Jari Aalto

LICENSE AND COPYRIGHT
       Copyright (C) 1996-2013 Jari Aalto

       This program is free software; you can redistribute and/or modify
       program under the terms of GNU General Public license either version 2
       of the License, or (at your option) any later version.

perl v5.14.4			  2013-09-11			      PWGET(1)
[top]

List of man pages available for Cygwin

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net