fsexam(1) User Commands fsexam(1)NAMEfsexam - examine encoding of file name or content and convert to UTF-8
SYNOPSIS
fsexamc [-a] [-b] [-d dry-run-result-file] [-E module-name] [-e encod‐
ing-list] [-F] [-f 'expression'] [-g history-length] [-H] [-k] [-L log-
file] [-l] [-n] [-P] [-p] [-R] [-r] [-S] [-s] [-t] [-w]
fsexamc [-V]
fsexamc [-?]
fsexam [-a] [-b] [-E module-name] [-e encoding-list] [-F] [-f 'expres‐
sion'] [-g history-length] [-H] [-k] [-L log-file] [-l] [-n] [-P] [-p]
[-R] [-r] [-S] [-s] [-t] [-w]
fsexam [-V]
fsexam [-?]
DESCRIPTION
The fsexam graphical user interface utility examines file names or file
contents and try to convert them from legacy encodings to UTF-8 using
given encoding list, system default encoding list, or both.
The fsexamc invocation is the same as fsexam except that the utility is
now a command line interface utility.
When converting file names, fsexam will process regular file names,
directory file names, and symbolic links by default. When converting
file content, it will handle regular plain text files only by default.
Use "-E module-name" to enable special file handling.
fsexam will ignore most of non-plain text files such as binary files,
office document files, image files, and so on. It might produce unex‐
pected result if conversion of such files are forced with -F option.
Internally, fsexam uses file(1) utility to determine whether files are
plain text files or not.
By default, fsexam will convert file names. To convert file contents
instead, specify -t option.
To help find the best encoding, fsexam has encoding lists for supported
languages. They include the most popular codesets or encodings of cor‐
responding languages. For example, fsexam specifies GB18030, BIG5, EUC-
TW, and so on for Simplified Chinese. The list is used to generate con‐
version candidates. You can use "-e encoding-list" option to add more
encodings other than those system pre-defined encodings. If -a option
is specified, additional encodings that are suggested by the encoding
auto-detection library will be added to the encoding list for possible
use. The encoding specified by the -e option has higher priority than
the automatically detected encodings.
OPTIONS
The following options are supported:
-a
--auto-detect
Enable encoding auto-detection. fsexam can guess the
encodings of file names or file contents with the help
of encoding auto-detection library interfaces. Use this
option when you do not know the encodings of files.
Note that, in file name conversions, the auto-detection
based on the statistics may not be reliable due to
small number of characters in the file names.
-b
--batch
Batch mode which is also known as non-interactive mode.
With this mode, fsexam will not display candidates or
wait for user's selection or confirmation.
Please make sure your terminal can display UTF-8 char‐
acters well when using this option. Otherwise, illegi‐
ble or gibberish characters may be presented to you.
-d dry-run-result-file
--dry-run-result dry-run-result-file
Specifies the dry run result file. Used with -n option,
dry run result will be stored into the file. Used with‐
out -n option, fsexam will convert based on the sce‐
nario in the dry run result file supplied.
The dry run result file will be created if it does not
exist. If it exists as a regular file, the file will be
truncated to zero length and overwritten.
When fsexam creates a dry run result file, you can edit
and then subsequently feed it to fsexam to perform con‐
versions based on the content of the edited dry run
result file. Note that the editing should be done care‐
fully in the format preserving manner. Recommended edit
operation is to delete any wrong or inappropriate can‐
didates and make the right one as the first candidate.
For more information, refer to fsexam(4).
If the edited file does not conform to the file format
described in the fsexam(4), fsexam will print out a
warning message and quit without doing anything.
-E moduel-name
--enable-module moduel-name
Enable special file handling. Currently the only valid
option argument is "COMPRESS". "ALL" can be used to
enable all modules available.
The COMPRESS module supports several popular compress
and archive format files. Currently, the module sup‐
ports .tar, .tar.gz, .tar.bz2, .zip, and .tar.Z file
formats. Used with -t option, fsexam converts contents
of files in archived, compressed, or files of both.
Without -t, fsexam converts file names.
Note that the COMPRESS module ignores symbolic links in
the files archived, compressed, or both. It also
ignores -n option. The COMPRESS module handles files
compressed, archived, or both only if -R option is
specified. If there is no suitable ISO8859-1 codeset
locales in the system, this option is not supported as
described in the NOTES section.
-e encoding-list
--encoding-list encoding-list
Specifies one or more colon or comma separated encod‐
ings to be used during conversion.
If this or -a options are not specified, fsexam uses
system pre-defined encoding list for the current
locale.
If specified without -a, -p, or -P options, by default,
the list of encodings supplied with -e option replaces
the system pre-defined encoding list for this session.
Use -p to prepend it after the system pre-defined
encoding list. Use -P to append it before the pre-
defined encoding list. If you want to make the encod‐
ing-list permanent, instead of only for the current
session, use -S option.
When used with -a option, fsexam will merge the sup‐
plied encoding list and auto-detected encoding list.
Note that the supplied encoding-list here has higher
priority than the auto-detected encodings.
In non-interactive mode, the first encoding which can
be used to convert file name or file content to UTF-8
successfully is used. In interactive mode, fsexam will
display all candidates that are successfully converted
from the encodings in the list of encodings to UTF-8.
Note that if fsexam cannot convert successfully, such
encodings will not be displayed in the list of candi‐
dates.
-F
--force-convert
Forcible conversion mode. fsexam will determine whether
file name or file content is in UTF-8 or not, and if it
is in UTF-8 already, then, fsexam will not convert by
default. However, since fsexam has no completely accu‐
rate way to determine whether a string is in UTF-8 or
not, sometimes, a byte sequence in legacy encoding
could be treated as a valid UTF-8 string. As an exam‐
ple, three Simplified Chinese characters in GB2312 (two
bytes per character) could be treated as two valid
UTF-8 characters (three bytes per character). Use this
option to bypass the verification step and perform con‐
versions forcibly.
This option has to be used with caution and should be
also avoided to use with -R whenever possible. It may
convert real UTF-8 encoded file names or file contents
to unintended characters.
-f 'expression'
--find-expression 'expression'
Search files according to 'expression.' The 'expres‐
sion' here is a subset of the 'expression' used in
find(1). But unlike find(1), the 'expression' here must
include a path name of a starting point in the direc‐
tory hierarchy in which you want to search files from
as the first item. Following the path name, other items
valid for the expression are following options and
their combinations: -name, -amin, -atime, -cmin,
-ctime, -group, -mmin, -mtime, -user. Refer to find(1)
for more information. Internally, fsexam uses find(1)
to perform searching.
You may want to use single quote to quote the whole
expression because shell may expand special characters
in it if you use double quotes.
When this option is used, any other operands are
ignored.
-g history-length
--set-history-length history-length
Set the history length. fsexam saves the information
about on what it has done and use the information to
handle restore operations.
By default, fsexam will save history information for
100 fsexam executions as long as disk space permits. A
single batch conversion counts as one. Use this option
to change the default value.
If you change the length from a higher value to a lower
value, the older history information will be purged.
When the number of history reach to the top limit,
fsexam will discard the oldest history information in
order to accept and record new history information.
-H
--hidden
Handles hidden files. Unless the option is specified,
hidden files with names starting with a dot (.) will be
ignored by default.
-k
--no-check-symlink-content
By default, during file name conversions, if both sym‐
bolic link and its source belong to the user supplied
list of files or a starting point of a directory hier‐
archy at operands, fsexam tries to keep them consis‐
tent. In other words, if a source name is converted,
then, not only symbolic link itself when applicable but
also the content of the symbolic link is converted. If
given source names are not converted for some reason,
the corresponding symbolic link contents are also not
converted and warning messages are issued. If either is
not in the operand specified list, fsexam may break the
symbolic links.
This default behavior of symbolic link processings need
more resource and computation time and thus use of -k
option is recommended to bypass the default behavior of
symbolic link processing if you have no symbolic links.
During content conversions and dry run conversions,
fsexam does not care about the symbolic link contents.
-l
-list-avail-encoding
List all available encodings supported by fsexam.
-L log-file
--log-file log-file
If specified, fsexam writes log into the log-file.
Default is no log file writing.
The basic log file format is:
(category) fullpath: message
The "category" values possible are "ERROR", "WARNING",
and "INFO". The "fullpath" is the full path of file
that is handled. The "message" briefly describes the
operation result.
If the "fullpath" or the "message" contain non-UTF-8
characters, fsexam writes their hexadecimal byte values
prefixed with "\x" such as "\xAE\x89" into the file.
-n
--dry-run
Dry run mode. With this mode, fsexam writes conversion
information into the dry-run-result-file specified with
-d option instead of actually performing the conversion
on the file names or contents.
If used with -a option, the dry-run-result-file may get
more candidates.
Note that compressed or archived files are not sup‐
ported with this mode and symbolic links and their
source consistencies are also not kept.
-P
--append-encoding-list
When used with -e option, fsexam appends the encoding-
list to the system pre-defined encoding list. Other‐
wise, it has no effect.
-p
--prepend-encoding-list
When used with -e option, fsexam prepends the encoding-
list to the system pre-defined encoding list. Other‐
wise, it has no effect.
-R
--recursive
Recursive mode. In this mode, fsexam recursively con‐
verts all applicable files and subdirectories specified
at the operands as directories.
-r
--remote
With this option, fsexam handles files mounted as NFS
and such remote file systems. Without the option,
fsexam handles files in local disks only.
Obviously, while fsexam is running, file system mount‐
ing or unmounting at a directory hierarchy that is
being examined are not recommended.
-S
--save-encoding-list
By default, the encoding-list option argument of '-e'
option is used only for the current session. If this
option is specified, however, fsexam makes the encod‐
ing-list option argument permanent. This option may
override the default, system pre-defined encoding list.
If you do not want that to happen, use with -p or -P to
prepend or append, respectively.
-s
--restore
Restores file names to their original names. To restore
file contents, specify with -t option.
This option is useful when you want to restore files to
their last states in case wrong conversions have been
made.
When this option is used on a file, fsexam restores its
name or content. When used on a directory together with
-R option, fsexam restores all files and subdirectories
under the directory including the directory to their
original names or contents.
-t
--conv-content
Converts file contents rather than file names. fsexam
mainly handles plain text files only.
Internally, fsexam uses file(1) to determine whether a
file is a plain text file or not.
First convert file names before converting contents if
there are files or directories that contain multi-byte
characters in their files names. Otherwise, you may get
illegible characters in your log-file or dry-run-
result-file.
-w
--follow
If specified with -R, fsexam follows symbolic links if
they are symbolic links to directories as if they were
regular and normal directories. If no -R option is
specified, fsexam tries to convert symbolic links and
it source only. If the source is a symbolic link too,
fsexam keep convert source's source and so on. By
default, fsexam does not follow symbolic links.
-V
--version
Print the version number of fsexam and halt.
-?
--help
Print usage information and halt.
OPERANDS
The following operand is supported:
pathname The pathname of a file or a directory to be converted.
All arguments behind "--" will be treated as an oper‐
and, even if they begin with '-' character. If fsexam
encounters '-' as an operand or no operand at all,
fsexam will read pathnames from the standard input.
EXAMPLES
Example 1: Convert the name of a file
The following will convert the name of file "myfile" using the system
pre-defined encoding list:
example% fsexam myfile
If there is no pre-defined encoding for the current locale, fsexam will
exit with error messages.
Example 2: Recursively convert the names of files and subdirectories
under the directory "mydir" with the given encoding list
example% fsexam-e GB18030:BIG5:EUC-TW --recursive mydir
Example 3: Dry run fsexam with auto-detected encoding
The following will scan the directory "mydir" and try to convert file
and directory names under the directory with the system pre-defined
plus auto-detected encodings to UTF-8 and store the result into the
file, "mydryrunresult" without actually changing the names:
example% fsexam--auto-detect --dry-run -d mydryrunresult \
--recursive mydir
Example 4: Perform scenario based conversions using a dry run result
file
The following will perform scenario based conversions by using the
"mydryrunresult." The first candidate for each file name is used. If
there is no candidate, no action will be taken on the file:
example% fsexam-d mydryrunresult
Example 5: Forcibly convert a file name
The following will convert the file "myfile" by using the system pre-
defined encodings even if fsexam thinks it is UTF-8 encoding already.
This option should be used with caution as it may corrupt the already
UTF-8 file names and contents:
example% fsexam--force myfile
Example 6: Convert files generated by other utility
The following two examples have the same effect and it will convert
files generated by find(1) command with the system pre-defined and
auto-detected encodings:
example% /usr/bin/find . -name "*.txt" | fsexam--auto-detect
example% fsexam--auto-detect `/usr/bin/find . -name "*.txt"`
The following is similar to the above two examples except the following
uses the system pre-defined encodings only and files generated by ls(1)
utility:
example% /usr/bin/ls *.txt | fsexam
The following will search all files trailing with '.txt' under the cur‐
rent directory and convert any of them using the system pre-defined
encoding list:
example% fsexam-f '. -name "*.txt"'
Example 7: Batch mode conversion
The following will use GB18030 and BIG5 to recursively convert file
names under the directory "mydir" and use the first candidate to con‐
vert the file names.
example% fsexam--batch -e GB18030:BIG5 --recursive mydir
Example 8: Follow symbolic links and handle hidden files
The following will follow all symbolic links in the directory "mydir"
and symbolic links in the symbolic link source's directory. Hidden
files under the directory will be converted also:
example% fsexam--follow --hidden --recursive mydir
Example 9: Convert file contents recursively using specified encoding
list
The following will recursively scan files under the directory "mydir."
For each plain text file, it will automatically detect its possible
encodings, combine them with GB18030 or BIG5, and try to convert the
file using the encodings formulated one by one. If the conversion is
successful, fsexam is done with the file and rest of the encodings will
not be tried. If a file is a compressed or archived file, fsexam will
first uncompress and unarchive them into a temporary directory and per‐
form above operation, compress and archive them again, and replace the
original file:
example% fsexam--conv-content --recursive -e GB18030:BIG5 \
--auto-detect --enable-module COMPRESS mydir
Example 10: Restore a file name or a file content
The following restores the file "myfile" to its original name:
example% fsexam--restore myfile
example% fsexam--conv-content --restore myfile
The following restores the content of "myfile" to its original content:
EXIT STATUS
The following exit values are returned:
0 File names or contents are converted successfully or corre‐
sponding information is written to a dry run result file suc‐
cessfully.
>0 An error occurred. More information can be retrieved from a
log file if "-L log-file" option and option argument are sup‐
plied.
ATTRIBUTES
See attributes(5) for descriptions of the following attributes:
┌─────────────────────────────┬─────────────────────────────┐
│ ATTRIBUTE TYPE │ ATTRIBUTE VALUE │
├─────────────────────────────┼─────────────────────────────┤
│Availability │SUNWfsexam │
├─────────────────────────────┼─────────────────────────────┤
│Interface stability │Committed │
└─────────────────────────────┴─────────────────────────────┘
SEE ALSOfile(1), find(1), locale(1), tar(1), libauto_ef(3LIB), fsexam(4)NOTES
When you want to convert names of many files, do not convert them one
by one in a loop. Try to construct a list of files and give the list to
fsexam for conversions. For example, the following is not recommended:
for file in *
do
fsexamc -b $file
done
It is highly recommended to run this utility with UTF-8 locale. Other‐
wise, you may see some illegible or garbled characters. Since fsexam
has the system pre-defined and the most popular encodings for every
language, considering the best multiscript capability, it will be more
smooth if you run on a UTF-8 locale environment of your language.
As shown in the NOTES section of the tar(1) man page, if an archive is
created that contains files whose names were created by processes run‐
ning in multiple or different locales, a locale that uses a full 8-bit
coding space, i.e., 0x0 to 0xff, such as en_US.ISO8859-1 should be used
both to create the archive and to extract files from the archive. Due
to that, when you specify COMPRESS module with -E option, fsexam(1)
tries to use en_US.ISO8859-1, fr_FR.ISO8859-1, de_DE.ISO8859-1,
es_ES.ISO8859-1, it_IT.ISO8859-1, or sv_SE.ISO8859-1 locales. If there
is no such locale in the current system, use of -E option is ignored
and a warning message is issued.
SunOS 5.7 16 Apr 2007 fsexam(1)