2 .\" $Id: file.man,v 1.57 2005/08/18 15:18:22 christos Exp $
4 .Dt FILE 1 "Copyright but distributable"
8 .Nd determine file type
14 .Op Fl m Ar magicfiles
20 This manual page documents version 4.21 of the
22 utility which tests each argument in an attempt to classify it.
23 There are three sets of tests, performed in this order:
24 file system tests, magic number tests, and language tests.
27 test that succeeds causes the file type to be printed.
29 The type printed will usually contain one of the words
31 (the file contains only
32 printing characters and a few common control
33 characters and is probably safe to read on an
37 (the file contains the result of compiling a program
38 in a form understandable to some
43 meaning anything else (data is usually
46 Exceptions are well-known file formats (core files, tar archives)
47 that are known to contain binary data.
48 When modifying the file
49 .Pa /usr/share/misc/magic
50 or the program itself,
51 .Em "preserve these keywords" .
52 People depend on knowing that all the readable files in a directory
56 Do not do as Berkeley did and change
57 .Dq Li "shell commands text"
59 .Dq Li "shell script" .
61 .Pa /usr/share/misc/magic
62 is built mechanically from a large number of small files in
65 in the source distribution of this program.
67 The file system tests are based on examining the return from a
70 The program checks to see if the file is empty,
71 or if it is some sort of special file.
72 Any known file types appropriate to the system you are running on
73 (sockets, symbolic links, or named pipes (FIFOs) on those systems that
75 are intuited if they are defined in
76 the system header file
79 The magic number tests are used to check for files with data in
80 particular fixed formats.
81 The canonical example of this is a binary executable (compiled program)
83 file, whose format is defined in
87 in the standard include directory.
90 stored in a particular place
91 near the beginning of the file that tells the
94 that the file is a binary executable, and which of several types thereof.
97 has been applied by extension to data files.
98 Any file with some invariant identifier at a small fixed
99 offset into the file can usually be described in this way.
100 The information identifying these files is read from the compiled
102 .Pa /usr/share/misc/magic.mgc ,
104 .Pa /usr/share/misc/magic
105 if the compile file does not exist.
109 .Pa $HOME/.magic.mgc ,
114 If a file does not match any of the entries in the magic file,
115 it is examined to see if it seems to be a text file.
116 ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
117 (such as those used on Macintosh and IBM PC systems),
118 UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
119 character sets can be distinguished by the different
120 ranges and sequences of bytes that constitute printable text
122 If a file passes any of these tests, its character set is reported.
123 ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
126 because they will be mostly readable on nearly any terminal;
127 UTF-16 and EBCDIC are only
128 .Dq Li "character data"
130 they contain text, it is text that will require translation
131 before it can be read.
134 will attempt to determine other characteristics of text-type files.
135 If the lines of a file are terminated by CR, CRLF, or NEL, instead
138 LF, this will be reported.
139 Files that contain embedded escape sequences or overstriking
140 will also be identified.
144 has determined the character set used in a text-type file,
146 attempt to determine in what language the file is written.
147 The language tests look for particular strings (cf
149 that can appear anywhere in the first few blocks of a file.
150 For example, the keyword
152 indicates that the file is most likely a
154 input file, just as the keyword
156 indicates a C program.
157 These tests are less reliable than the previous
158 two groups, so they are performed last.
159 The language test routines also test for some miscellany
164 Any file that cannot be identified as having been written
165 in any of the character sets listed above is simply said to be
168 .Bl -tag -width indent
170 Do not prepend filenames to output lines (brief mode).
171 .It Fl c , -checking-printout
172 Cause a checking printout of the parsed form of the magic file.
173 This is usually used in conjunction with
175 to debug a new magic file before installing it.
179 output file that contains a pre-parsed version of
181 .It Fl f , -files-from Ar namefile
182 Read the names of the files to be examined from
185 before the argument list.
188 or at least one filename argument must be present;
189 to test the standard input, use
191 as a filename argument.
192 .It Fl F , -separator Ar separator
193 Use the specified string as the separator between the filename and the
194 file result returned.
197 .It Fl h , -no-dereference
198 Causes symlinks not to be followed
199 (on systems that support symbolic links).
200 This is the default if the
205 Causes the file command to output mime type strings rather than the more
206 traditional human readable ones.
208 .Dq Li "text/plain; charset=us-ascii"
210 .Dq Li "ASCII text" .
211 In order for this option to work, file changes the way
212 it handles files recognised by the command itself (such as many of the
213 text file types, directories etc), and makes use of an alternative
219 .It Fl k , -keep-going
220 Do not stop at the first match, keep going.
221 .It Fl L , -dereference
222 option causes symlinks to be followed, as the like-named option in
224 (on systems that support symbolic links).
225 This is the default if the environment variable
228 .It Fl m , -magic-file Ar list
229 Specify an alternate list of files containing magic numbers.
230 This can be a single file, or a colon-separated list of files.
231 If a compiled magic file is found alongside, it will be used instead.
236 option, the program adds
239 .It Fl n , -no-buffer
240 Force stdout to be flushed after checking each file.
241 This is only useful if checking a list of files.
242 It is intended to be used by programs that want
243 filetype output from a pipe.
245 Do not pad filenames so that they align in the output.
246 .It Fl p , -preserve-date
247 On systems that support
251 attempt to preserve the access time of files analyzed, to pretend that
255 Do not translate unprintable characters to \eooo.
258 translates unprintable characters to their octal representation.
259 .It Fl s , -special-files
262 only attempts to read and determine the type of argument files which
264 reports are ordinary files.
265 This prevents problems, because reading special files may have peculiar
271 to also read argument files which are block or character special files.
272 This is useful for determining the file system types of the data in raw
273 disk partitions, which are block special files.
274 This option also causes
276 to disregard the file size as reported by
278 since on some systems it reports a zero size for raw disk partitions.
280 Print the version of the program and exit.
281 .It Fl z , -uncompress
282 Try to look inside compressed files.
284 Print a help message and exit.
287 .Bl -tag -width ".Pa /usr/share/misc/magic.mime" -compact
288 .It Pa /usr/share/misc/magic.mgc
289 Default compiled list of magic numbers
290 .It Pa /usr/share/misc/magic
291 Default list of magic numbers
292 .It Pa /usr/share/misc/magic.mime.mgc
293 Default compiled list of magic numbers, used to output mime types when
297 .It Pa /usr/share/misc/magic.mime
298 Default list of magic numbers, used to output mime types when the
303 The environment variable
305 can be used to set the default magic number file name.
306 If that variable is set, then
308 will not attempt to open
315 to the value of this variable as appropriate.
316 The environment variable
318 controls (on systems that support symbolic links), if
320 will attempt to follow symlinks or not.
323 follows symlink, otherwise it does not.
324 This is also controlled
335 .Sh STANDARDS CONFORMANCE
336 This program is believed to exceed the
338 of FILE(CMD), as near as one can determine from the vague language
340 Its behaviour is mostly compatible with the System V program of the same name.
341 This version knows more magic, however, so it will produce
342 different (albeit more accurate) output in many cases.
344 The one significant difference
345 between this version and System V
346 is that this version treats any white space
347 as a delimiter, so that spaces in pattern strings must be escaped.
350 .Dl ">10 string language impress\ (imPRESS data)"
352 in an existing magic file would have to be changed to
354 .Dl ">10 string language\e impress (imPRESS data)"
356 In addition, in this version, if a pattern string contains a backslash,
360 .Dl "0 string \ebegindata Andrew Toolkit document"
362 in an existing magic file would have to be changed to
364 .Dl "0 string \e\ebegindata Andrew Toolkit document"
366 SunOS releases 3.2 and later from Sun Microsystems include a
368 command derived from the System V one, but with some extensions.
369 My version differs from Sun's only in minor ways.
370 It includes the extension of the
375 .Dl ">16 long&0x7fffffff >0 not stripped"
377 The magic file entries have been collected from various sources,
378 mainly USENET, and contributed by various authors.
380 (address below) will collect additional
381 or corrected magic file entries.
382 A consolidation of magic file entries
383 will be distributed periodically.
385 The order of entries in the magic file is significant.
386 Depending on what system you are using, the order that
387 they are put together may be incorrect.
390 command uses a magic file,
391 keep the old magic file around for comparison purposes
393 .Pa /usr/share/misc/magic.orig ) .
396 $ file file.c file /dev/{wd0a,hda}
397 file.c: C program text
398 file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
399 dynamically linked (uses shared libs), stripped
400 /dev/wd0a: block special (0/0)
401 /dev/hda: block special (3/0)
402 $ file -s /dev/wd0{b,d}
404 /dev/wd0d: x86 boot sector
405 $ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
406 /dev/hda: x86 boot sector
407 /dev/hda1: Linux/i386 ext2 filesystem
408 /dev/hda2: x86 boot sector
409 /dev/hda3: x86 boot sector, extended partition table
410 /dev/hda4: Linux/i386 ext2 filesystem
411 /dev/hda5: Linux/i386 swap file
412 /dev/hda6: Linux/i386 swap file
413 /dev/hda7: Linux/i386 swap file
414 /dev/hda8: Linux/i386 swap file
418 $ file -i file.c file /dev/{wd0a,hda}
420 file: application/x-executable, dynamically linked (uses shared libs),
422 /dev/hda: application/x-not-regular-file
423 /dev/wd0a: application/x-not-regular-file
430 since at least Research Version 4
431 (man page dated November, 1973).
432 The System V version introduced one significant major change:
433 the external list of magic number types.
434 This slowed the program down slightly but made it a lot more flexible.
436 This program, based on the System V version,
438 .An Ian Darwin Aq ian@darwinsys.com
439 without looking at anybody else's source code.
442 revised the code extensively, making it better than
445 found several inadequacies
446 and provided some magic file entries.
450 .An Rob McMahon Aq cudcv@warwick.ac.uk ,
453 .An Guy Harris Aq guy@netapp.com ,
454 made many changes from 1993 to the present.
456 Primary development and maintenance from 1990 to the present by
457 .An Christos Zoulas Aq christos@astron.com .
460 .An Chris Lowth Aq chris@lowth.com ,
464 option to output mime type strings and using an alternative
465 magic file and internal logic.
468 .An Eric Fischer Aq enf@pobox.com ,
470 to identify character codes and attempt to identify the languages
475 The list of contributors to the
477 directory (source for the
478 .Pa /usr/share/misc/magic
479 file) is too long to include here.
480 You know who you are; thank you.
484 Toronto, Canada, 1986-1999.
485 Covered by the standard Berkeley Software Distribution copyright; see the file
487 in the source distribution.
495 from his public-domain
497 program, and are not covered by the above license.
499 There must be a better way to automate the construction of the
501 file from all the glop in
504 Better yet, the magic file should be compiled into binary (say,
506 or, better yet, fixed-length
508 strings for use in heterogenous network environments) for faster startup.
509 Then the program would run as fast as the Version 7 program of the same name,
510 with the flexibility of the System V version.
514 utility uses several algorithms that favor speed over accuracy,
515 thus it can be misled about the contents of
521 files (primarily for programming languages)
522 is simplistic, inefficient and requires recompilation to update.
526 clause to follow a series of continuation lines.
528 The magic file and keywords should have regular expression support.
531 as a field delimiter is ugly and makes
532 it hard to edit the files, but is entrenched.
534 It might be advisable to allow upper-case letters in keywords
537 commands vs man page macros.
538 Regular expression support would make this easy.
540 The program does not grok
542 It should be able to figure
544 by seeing some keywords which
545 appear indented at the start of line.
546 Regular expression support would make this easy.
548 The list of keywords in
550 probably belongs in the
553 This could be done by using some keyword like
555 for the offset value.
557 Another optimisation would be to sort
558 the magic file so that we can just run down all the
559 tests for the first byte, first word, first long, etc, once we
561 Complain about conflicts in the magic file entries.
562 Make a rule that the magic entries sort based on file offset rather
563 than position within the magic file?
565 The program should provide a way to give an estimate
569 We end up removing guesses (e.g.\&
571 as first 5 chars of file) because
572 they are not as good as other guesses (e.g.\&
575 .Dq Li "Return-Path:" ) .
576 Still, if the others do not pan out, it should be possible to use the
579 This program is slower than some vendors' file commands.
580 The new support for multiple character codes makes it even slower.
582 This manual page, and particularly this section, is too long.
584 You can obtain the original author's latest version by anonymous FTP
588 .Pa /pub/file/file-X.YZ.tar.gz