1 .\" $File: file.man,v 1.73 2008/02/19 17:58:00 rrt Exp $
7 .Nd determine file type
15 .Op Fl m Ar magicfiles
23 This manual page documents version __VERSION__ of the
28 tests each argument in an attempt to classify it.
29 There are three sets of tests, performed in this order:
30 filesystem tests, magic tests, and language tests.
33 test that succeeds causes the file type to be printed.
35 The type printed will usually contain one of the words
37 (the file contains only
38 printing characters and a few common control
39 characters and is probably safe to read on an
43 (the file contains the result of compiling a program
44 in a form understandable to some
49 meaning anything else (data is usually
52 Exceptions are well-known file formats (core files, tar archives)
53 that are known to contain binary data.
54 When modifying magic files or the program itself, make sure to
55 .Em "preserve these keywords" .
56 Users depend on knowing that all the readable files in a directory
60 Don't do as Berkeley did and change
61 .Dq shell commands text
65 The filesystem tests are based on examining the return from a
68 The program checks to see if the file is empty,
69 or if it's some sort of special file.
70 Any known file types appropriate to the system you are running on
71 (sockets, symbolic links, or named pipes (FIFOs) on those systems that
73 are intuited if they are defined in
74 the system header file
77 The magic tests are used to check for files with data in
78 particular fixed formats.
79 The canonical example of this is a binary executable (compiled program)
81 file, whose format is defined in
86 in the standard include directory.
89 stored in a particular place
90 near the beginning of the file that tells the
91 .Dv UNIX operating system
92 that the file is a binary executable, and which of several types thereof.
95 has been applied by extension to data files.
96 Any file with some invariant identifier at a small fixed
97 offset into the file can usually be described in this way.
98 The information identifying these files is read from the compiled
101 or the files in the directory
103 if the compiled file does not exist. In addition, if
107 exists, it will be used in preference to the system magic files.
109 If a file does not match any of the entries in the magic file,
110 it is examined to see if it seems to be a text file.
111 ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
112 (such as those used on Macintosh and IBM PC systems),
113 UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
114 character sets can be distinguished by the different
115 ranges and sequences of bytes that constitute printable text
117 If a file passes any of these tests, its character set is reported.
118 ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
121 because they will be mostly readable on nearly any terminal;
122 UTF-16 and EBCDIC are only
125 they contain text, it is text that will require translation
126 before it can be read.
129 will attempt to determine other characteristics of text-type files.
130 If the lines of a file are terminated by CR, CRLF, or NEL, instead
131 of the Unix-standard LF, this will be reported.
132 Files that contain embedded escape sequences or overstriking
133 will also be identified.
137 has determined the character set used in a text-type file,
139 attempt to determine in what language the file is written.
140 The language tests look for particular strings (cf.
142 ) that can appear anywhere in the first few blocks of a file.
143 For example, the keyword
145 indicates that the file is most likely a
147 input file, just as the keyword
149 indicates a C program.
150 These tests are less reliable than the previous
151 two groups, so they are performed last.
152 The language test routines also test for some miscellany
157 Any file that cannot be identified as having been written
158 in any of the character sets listed above is simply said to be
161 .Bl -tag -width indent
163 Do not prepend filenames to output lines (brief mode).
164 .It Fl c , -checking-printout
165 Cause a checking printout of the parsed form of the magic file.
166 This is usually used in conjunction with the
168 flag to debug a new magic file before installing it.
172 output file that contains a pre-parsed version of the magic file or directory.
173 .It Fl e , -exclude Ar testname
174 Exclude the test named in
176 from the list of tests made to determine the file type. Valid test names
182 application type (only on EMX).
184 Check for various types of ascii files.
186 Don't look for, or inside compressed files.
188 Don't print elf details.
190 Don't look for fortran sequences inside ascii files.
192 Don't consult magic files.
194 Don't examine tar files.
196 Don't look for known tokens inside ascii files.
198 Don't look for troff sequences inside ascii files.
200 .It Fl f , -files-from Ar namefile
201 Read the names of the files to be examined from
204 before the argument list.
207 or at least one filename argument must be present;
208 to test the standard input, use
210 as a filename argument.
211 .It Fl F , -separator Ar separator
212 Use the specified string as the separator between the filename and the
213 file result returned. Defaults to
215 .It Fl h , -no-dereference
216 option causes symlinks not to be followed
217 (on systems that support symbolic links). This is the default if the
222 Causes the file command to output mime type strings rather than the more
223 traditional human readable ones. Thus it may say
224 .Dq text/plain charset=us-ascii
227 In order for this option to work, file changes the way
228 it handles files recognized by the command itself (such as many of the
229 text file types, directories etc), and makes use of an alternative
235 .It Fl -mime-type , -mime-encoding
238 but print only the specified element(s).
239 .It Fl k , -keep-going
240 Don't stop at the first match, keep going. Subsequent matches will be
244 (If you want a newline, see the
247 .It Fl L , -dereference
248 option causes symlinks to be followed, as the like-named option in
250 (on systems that support symbolic links).
251 This is the default if the environment variable
254 .It Fl m , -magic-file Ar list
255 Specify an alternate list of files and directories containing magic.
256 This can be a single item, or a colon-separated list.
257 If a compiled magic file is found alongside a file or directory, it will be used instead.
258 .It Fl n , -no-buffer
259 Force stdout to be flushed after checking each file.
260 This is only useful if checking a list of files.
261 It is intended to be used by programs that want filetype output from a pipe.
263 Don't pad filenames so that they align in the output.
264 .It Fl p , -preserve-date
265 On systems that support
269 attempt to preserve the access time of files analyzed, to pretend that
273 Don't translate unprintable characters to \eooo.
276 translates unprintable characters to their octal representation.
277 .It Fl s , -special-files
280 only attempts to read and determine the type of argument files which
282 reports are ordinary files.
283 This prevents problems, because reading special files may have peculiar
289 to also read argument files which are block or character special files.
290 This is useful for determining the filesystem types of the data in raw
291 disk partitions, which are block special files.
292 This option also causes
294 to disregard the file size as reported by
296 since on some systems it reports a zero size for raw disk partitions.
298 Print the version of the program and exit.
299 .It Fl z , -uncompress
300 Try to look inside compressed files.
302 Output a null character
304 after the end of the filename. Nice to
306 the output. This does not affect the separator which is still printed.
308 Print a help message and exit.
311 .Bl -tag -width __MAGIC__.mgc -compact
313 Default compiled list of magic.
315 Directory containing default magic files.
318 The environment variable
320 can be used to set the default magic file name.
321 If that variable is set, then
323 will not attempt to open
328 to the value of this variable as appropriate.
329 The environment variable
331 controls (on systems that support symbolic links), whether
333 will attempt to follow symlinks or not. If set, then
335 follows symlink, otherwise it does not. This is also controlled
342 .Xr magic __FSECTION__ ,
347 .Sh STANDARDS CONFORMANCE
348 This program is believed to exceed the System V Interface Definition
349 of FILE(CMD), as near as one can determine from the vague language
351 Its behavior is mostly compatible with the System V program of the same name.
352 This version knows more magic, however, so it will produce
353 different (albeit more accurate) output in many cases.
354 .\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
356 The one significant difference
357 between this version and System V
358 is that this version treats any white space
359 as a delimiter, so that spaces in pattern strings must be escaped.
361 .Bd -literal -offset indent
362 >10 string language impress\ (imPRESS data)
365 in an existing magic file would have to be changed to
366 .Bd -literal -offset indent
367 >10 string language\e impress (imPRESS data)
370 In addition, in this version, if a pattern string contains a backslash,
373 .Bd -literal -offset indent
374 0 string \ebegindata Andrew Toolkit document
377 in an existing magic file would have to be changed to
378 .Bd -literal -offset indent
379 0 string \e\ebegindata Andrew Toolkit document
382 SunOS releases 3.2 and later from Sun Microsystems include a
384 command derived from the System V one, but with some extensions.
385 My version differs from Sun's only in minor ways.
386 It includes the extension of the
390 .Bd -literal -offset indent
391 >16 long&0x7fffffff >0 not stripped
394 The magic file entries have been collected from various sources,
395 mainly USENET, and contributed by various authors.
396 Christos Zoulas (address below) will collect additional
397 or corrected magic file entries.
398 A consolidation of magic file entries
399 will be distributed periodically.
401 The order of entries in the magic file is significant.
402 Depending on what system you are using, the order that
403 they are put together may be incorrect.
406 command uses a magic file,
407 keep the old magic file around for comparison purposes
409 .Pa __MAGIC__.orig ).
411 .Bd -literal -offset indent
412 $ file file.c file /dev/{wd0a,hda}
413 file.c: C program text
414 file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
415 dynamically linked (uses shared libs), stripped
416 /dev/wd0a: block special (0/0)
417 /dev/hda: block special (3/0)
419 $ file -s /dev/wd0{b,d}
421 /dev/wd0d: x86 boot sector
423 $ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
424 /dev/hda: x86 boot sector
425 /dev/hda1: Linux/i386 ext2 filesystem
426 /dev/hda2: x86 boot sector
427 /dev/hda3: x86 boot sector, extended partition table
428 /dev/hda4: Linux/i386 ext2 filesystem
429 /dev/hda5: Linux/i386 swap file
430 /dev/hda6: Linux/i386 swap file
431 /dev/hda7: Linux/i386 swap file
432 /dev/hda8: Linux/i386 swap file
436 $ file -i file.c file /dev/{wd0a,hda}
438 file: application/x-executable
439 /dev/hda: application/x-not-regular-file
440 /dev/wd0a: application/x-not-regular-file
447 .Dv UNIX since at least Research Version 4
448 (man page dated November, 1973).
449 The System V version introduced one significant major change:
450 the external list of magic types.
451 This slowed the program down slightly but made it a lot more flexible.
453 This program, based on the System V version,
454 was written by Ian Darwin <ian@darwinsys.com>
455 without looking at anybody else's source code.
457 John Gilmore revised the code extensively, making it better than
459 Geoff Collyer found several inadequacies
460 and provided some magic file entries.
461 Contributions by the `&' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989.
463 Guy Harris, guy@netapp.com, made many changes from 1993 to the present.
465 Primary development and maintenance from 1990 to the present by
466 Christos Zoulas (christos@astron.com).
468 Altered by Chris Lowth, chris@lowth.com, 2000:
471 option to output mime type strings, using an alternative
472 magic file and internal logic.
474 Altered by Eric Fischer (enf@pobox.com), July, 2000,
475 to identify character codes and attempt to identify the languages
478 Altered by Reuben Thomas (rrt@sc3d.org), 2007 to 2008, to improve MIME
479 support and merge MIME and non-MIME magic, support directories as well
480 as files of magic, apply many bug fixes and improve the build system.
482 The list of contributors to the
484 directory (magic files)
485 is too long to include here.
486 You know who you are; thank you.
487 Many contributors are listed in the source files.
489 Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
490 Covered by the standard Berkeley Software Distribution copyright; see the file
491 LEGAL.NOTICE in the source distribution.
497 were written by John Gilmore from his public-domain
499 program, and are not covered by the above license.
502 There must be a better way to automate the construction of the Magic
503 file from all the glop in Magdir.
507 uses several algorithms that favor speed over accuracy,
508 thus it can be misled about the contents of
512 The support for text files (primarily for programming languages)
513 is simplistic, inefficient and requires recompilation to update.
515 The list of keywords in
517 probably belongs in the Magic file.
518 This could be done by using some keyword like
520 for the offset value.
522 Complain about conflicts in the magic file entries.
523 Make a rule that the magic entries sort based on file offset rather
524 than position within the magic file?
526 The program should provide a way to give an estimate
530 We end up removing guesses (e.g.
532 as first 5 chars of file) because
533 they are not as good as other guesses (e.g.
538 Still, if the others don't pan out, it should be possible to use the
541 This manual page, and particularly this section, is too long.
544 returns 0 on success, and non-zero on error.
546 You can obtain the original author's latest version by anonymous FTP
550 .Dv /pub/file/file-X.YZ.tar.gz