2 .\" $Id: file.man,v 1.54 2003/10/27 18:09:08 christos Exp $
4 .Dt FILE 1 "Copyright but distributable"
8 .Nd determine file type
14 .Op Fl m Ar magicfiles
20 This manual page documents version 4.12 of the
22 utility which tests each argument in an attempt to classify it.
23 There are three sets of tests, performed in this order:
24 file system tests, magic number tests, and language tests.
27 test that succeeds causes the file type to be printed.
29 The type printed will usually contain one of the words
31 (the file contains only
32 printing characters and a few common control
33 characters and is probably safe to read on an
37 (the file contains the result of compiling a program
38 in a form understandable to some
43 meaning anything else (data is usually
46 Exceptions are well-known file formats (core files, tar archives)
47 that are known to contain binary data.
48 When modifying the file
49 .Pa /usr/share/misc/magic
50 or the program itself,
51 .Em "preserve these keywords" .
52 People depend on knowing that all the readable files in a directory
56 Do not do as Berkeley did and change
57 .Dq Li "shell commands text"
59 .Dq Li "shell script" .
61 .Pa /usr/share/misc/magic
62 is built mechanically from a large number of small files in
65 in the source distribution of this program.
67 The file system tests are based on examining the return from a
70 The program checks to see if the file is empty,
71 or if it is some sort of special file.
72 Any known file types appropriate to the system you are running on
73 (sockets, symbolic links, or named pipes (FIFOs) on those systems that
75 are intuited if they are defined in
76 the system header file
79 The magic number tests are used to check for files with data in
80 particular fixed formats.
81 The canonical example of this is a binary executable (compiled program)
83 file, whose format is defined in
87 in the standard include directory.
90 stored in a particular place
91 near the beginning of the file that tells the
94 that the file is a binary executable, and which of several types thereof.
97 has been applied by extension to data files.
98 Any file with some invariant identifier at a small fixed
99 offset into the file can usually be described in this way.
100 The information identifying these files is read from the compiled
102 .Pa /usr/share/misc/magic.mgc ,
104 .Pa /usr/share/misc/magic
105 if the compile file does not exist.
107 If a file does not match any of the entries in the magic file,
108 it is examined to see if it seems to be a text file.
109 ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
110 (such as those used on Macintosh and IBM PC systems),
111 UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
112 character sets can be distinguished by the different
113 ranges and sequences of bytes that constitute printable text
115 If a file passes any of these tests, its character set is reported.
116 ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
119 because they will be mostly readable on nearly any terminal;
120 UTF-16 and EBCDIC are only
121 .Dq Li "character data"
123 they contain text, it is text that will require translation
124 before it can be read.
127 will attempt to determine other characteristics of text-type files.
128 If the lines of a file are terminated by CR, CRLF, or NEL, instead
131 LF, this will be reported.
132 Files that contain embedded escape sequences or overstriking
133 will also be identified.
137 has determined the character set used in a text-type file,
139 attempt to determine in what language the file is written.
140 The language tests look for particular strings (cf
142 that can appear anywhere in the first few blocks of a file.
143 For example, the keyword
145 indicates that the file is most likely a
147 input file, just as the keyword
149 indicates a C program.
150 These tests are less reliable than the previous
151 two groups, so they are performed last.
152 The language test routines also test for some miscellany
157 Any file that cannot be identified as having been written
158 in any of the character sets listed above is simply said to be
161 .Bl -tag -width indent
163 Do not prepend filenames to output lines (brief mode).
164 .It Fl c , -checking-printout
165 Cause a checking printout of the parsed form of the magic file.
166 This is usually used in conjunction with
168 to debug a new magic file before installing it.
172 output file that contains a pre-parsed version of
174 .It Fl f , -files-from Ar namefile
175 Read the names of the files to be examined from
178 before the argument list.
181 or at least one filename argument must be present;
182 to test the standard input, use
184 as a filename argument.
185 .It Fl F , -separator Ar separator
186 Use the specified string as the separator between the filename and the
187 file result returned.
191 Causes the file command to output mime type strings rather than the more
192 traditional human readable ones.
194 .Dq Li "text/plain; charset=us-ascii"
196 .Dq Li "ASCII text" .
197 In order for this option to work, file changes the way
198 it handles files recognised by the command itself (such as many of the
199 text file types, directories etc), and makes use of an alternative
205 .It Fl k , -keep-going
206 Do not stop at the first match, keep going.
207 .It Fl L , -dereference
208 option causes symlinks to be followed, as the like-named option in
210 (on systems that support symbolic links).
211 .It Fl m , -magic-file Ar list
212 Specify an alternate list of files containing magic numbers.
213 This can be a single file, or a colon-separated list of files.
214 If a compiled magic file is found alongside, it will be used instead.
219 option, the program adds
222 .It Fl n , -no-buffer
223 Force stdout to be flushed after checking each file.
224 This is only useful if checking a list of files.
225 It is intended to be used by programs that want
226 filetype output from a pipe.
228 Do not pad filenames so that they align in the output.
229 .It Fl p , -preserve-date
230 On systems that support
234 attempt to preserve the access time of files analyzed, to pretend that
238 Do not translate unprintable characters to \eooo.
241 translates unprintable characters to their octal representation.
242 .It Fl s , -special-files
245 only attempts to read and determine the type of argument files which
247 reports are ordinary files.
248 This prevents problems, because reading special files may have peculiar
254 to also read argument files which are block or character special files.
255 This is useful for determining the file system types of the data in raw
256 disk partitions, which are block special files.
257 This option also causes
259 to disregard the file size as reported by
261 since on some systems it reports a zero size for raw disk partitions.
263 Print the version of the program and exit.
264 .It Fl z , -uncompress
265 Try to look inside compressed files.
267 Print a help message and exit.
270 .Bl -tag -width ".Pa /usr/share/misc/magic.mime" -compact
271 .It Pa /usr/share/misc/magic.mgc
272 Default compiled list of magic numbers
273 .It Pa /usr/share/misc/magic
274 Default list of magic numbers
275 .It Pa /usr/share/misc/magic.mime.mgc
276 Default compiled list of magic numbers, used to output mime types when
280 .It Pa /usr/share/misc/magic.mime
281 Default list of magic numbers, used to output mime types when the
285 Local additions to magic wisdom.
288 The environment variable
290 can be used to set the default magic number file name.
296 to the value of this variable as appropriate.
302 .Sh STANDARDS CONFORMANCE
303 This program is believed to exceed the
305 of FILE(CMD), as near as one can determine from the vague language
307 Its behaviour is mostly compatible with the System V program of the same name.
308 This version knows more magic, however, so it will produce
309 different (albeit more accurate) output in many cases.
311 The one significant difference
312 between this version and System V
313 is that this version treats any white space
314 as a delimiter, so that spaces in pattern strings must be escaped.
317 .Dl ">10 string language impress\ (imPRESS data)"
319 in an existing magic file would have to be changed to
321 .Dl ">10 string language\e impress (imPRESS data)"
323 In addition, in this version, if a pattern string contains a backslash,
327 .Dl "0 string \ebegindata Andrew Toolkit document"
329 in an existing magic file would have to be changed to
331 .Dl "0 string \e\ebegindata Andrew Toolkit document"
333 SunOS releases 3.2 and later from Sun Microsystems include a
335 command derived from the System V one, but with some extensions.
336 My version differs from Sun's only in minor ways.
337 It includes the extension of the
342 .Dl ">16 long&0x7fffffff >0 not stripped"
344 The magic file entries have been collected from various sources,
345 mainly USENET, and contributed by various authors.
347 (address below) will collect additional
348 or corrected magic file entries.
349 A consolidation of magic file entries
350 will be distributed periodically.
352 The order of entries in the magic file is significant.
353 Depending on what system you are using, the order that
354 they are put together may be incorrect.
357 command uses a magic file,
358 keep the old magic file around for comparison purposes
360 .Pa /usr/share/misc/magic.orig ) .
363 $ file file.c file /dev/{wd0a,hda}
364 file.c: C program text
365 file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
366 dynamically linked (uses shared libs), stripped
367 /dev/wd0a: block special (0/0)
368 /dev/hda: block special (3/0)
369 $ file -s /dev/wd0{b,d}
371 /dev/wd0d: x86 boot sector
372 $ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
373 /dev/hda: x86 boot sector
374 /dev/hda1: Linux/i386 ext2 filesystem
375 /dev/hda2: x86 boot sector
376 /dev/hda3: x86 boot sector, extended partition table
377 /dev/hda4: Linux/i386 ext2 filesystem
378 /dev/hda5: Linux/i386 swap file
379 /dev/hda6: Linux/i386 swap file
380 /dev/hda7: Linux/i386 swap file
381 /dev/hda8: Linux/i386 swap file
385 $ file -i file.c file /dev/{wd0a,hda}
387 file: application/x-executable, dynamically linked (uses shared libs),
389 /dev/hda: application/x-not-regular-file
390 /dev/wd0a: application/x-not-regular-file
397 since at least Research Version 4
398 (man page dated November, 1973).
399 The System V version introduced one significant major change:
400 the external list of magic number types.
401 This slowed the program down slightly but made it a lot more flexible.
403 This program, based on the System V version,
405 .An Ian Darwin Aq ian@darwinsys.com
406 without looking at anybody else's source code.
409 revised the code extensively, making it better than
412 found several inadequacies
413 and provided some magic file entries.
417 .An Rob McMahon Aq cudcv@warwick.ac.uk ,
420 .An Guy Harris Aq guy@netapp.com ,
421 made many changes from 1993 to the present.
423 Primary development and maintenance from 1990 to the present by
424 .An Christos Zoulas Aq christos@astron.com .
427 .An Chris Lowth Aq chris@lowth.com ,
431 option to output mime type strings and using an alternative
432 magic file and internal logic.
435 .An Eric Fischer Aq enf@pobox.com ,
437 to identify character codes and attempt to identify the languages
442 The list of contributors to the
444 directory (source for the
445 .Pa /usr/share/misc/magic
446 file) is too long to include here.
447 You know who you are; thank you.
451 Toronto, Canada, 1986-1999.
452 Covered by the standard Berkeley Software Distribution copyright; see the file
454 in the source distribution.
462 from his public-domain
464 program, and are not covered by the above license.
466 There must be a better way to automate the construction of the
468 file from all the glop in
471 Better yet, the magic file should be compiled into binary (say,
473 or, better yet, fixed-length
475 strings for use in heterogenous network environments) for faster startup.
476 Then the program would run as fast as the Version 7 program of the same name,
477 with the flexibility of the System V version.
481 utility uses several algorithms that favor speed over accuracy,
482 thus it can be misled about the contents of
488 files (primarily for programming languages)
489 is simplistic, inefficient and requires recompilation to update.
493 clause to follow a series of continuation lines.
495 The magic file and keywords should have regular expression support.
498 as a field delimiter is ugly and makes
499 it hard to edit the files, but is entrenched.
501 It might be advisable to allow upper-case letters in keywords
504 commands vs man page macros.
505 Regular expression support would make this easy.
507 The program does not grok
509 It should be able to figure
511 by seeing some keywords which
512 appear indented at the start of line.
513 Regular expression support would make this easy.
515 The list of keywords in
517 probably belongs in the
520 This could be done by using some keyword like
522 for the offset value.
524 Another optimisation would be to sort
525 the magic file so that we can just run down all the
526 tests for the first byte, first word, first long, etc, once we
528 Complain about conflicts in the magic file entries.
529 Make a rule that the magic entries sort based on file offset rather
530 than position within the magic file?
532 The program should provide a way to give an estimate
536 We end up removing guesses (e.g.\&
538 as first 5 chars of file) because
539 they are not as good as other guesses (e.g.\&
542 .Dq Li "Return-Path:" ) .
543 Still, if the others do not pan out, it should be possible to use the
546 This program is slower than some vendors' file commands.
547 The new support for multiple character codes makes it even slower.
549 This manual page, and particularly this section, is too long.
551 You can obtain the original author's latest version by anonymous FTP
555 .Pa /pub/file/file-X.YZ.tar.gz