1 .\" $File: file.man,v 1.131 2018/07/24 21:33:56 christos Exp $
7 .Nd determine file type
11 .Op Fl bcdEhiklLNnprsSvzZ0
14 .Op Fl Fl mime-encoding
19 .Op Fl m Ar magicfiles
20 .Op Fl P Ar name=value
25 .Op Fl m Ar magicfiles
29 This manual page documents version __VERSION__ of the
34 tests each argument in an attempt to classify it.
35 There are three sets of tests, performed in this order:
36 filesystem tests, magic tests, and language tests.
39 test that succeeds causes the file type to be printed.
41 The type printed will usually contain one of the words
43 (the file contains only
44 printing characters and a few common control
45 characters and is probably safe to read on an
49 (the file contains the result of compiling a program
50 in a form understandable to some
55 meaning anything else (data is usually
58 Exceptions are well-known file formats (core files, tar archives)
59 that are known to contain binary data.
60 When modifying magic files or the program itself, make sure to
61 .Em "preserve these keywords" .
62 Users depend on knowing that all the readable files in a directory
66 Don't do as Berkeley did and change
67 .Dq shell commands text
71 The filesystem tests are based on examining the return from a
74 The program checks to see if the file is empty,
75 or if it's some sort of special file.
76 Any known file types appropriate to the system you are running on
77 (sockets, symbolic links, or named pipes (FIFOs) on those systems that
79 are intuited if they are defined in the system header file
82 The magic tests are used to check for files with data in
83 particular fixed formats.
84 The canonical example of this is a binary executable (compiled program)
86 file, whose format is defined in
91 in the standard include directory.
94 stored in a particular place
95 near the beginning of the file that tells the
98 that the file is a binary executable, and which of several types thereof.
101 has been applied by extension to data files.
102 Any file with some invariant identifier at a small fixed
103 offset into the file can usually be described in this way.
104 The information identifying these files is read from the compiled
107 or the files in the directory
109 if the compiled file does not exist.
114 exists, it will be used in preference to the system magic files.
116 If a file does not match any of the entries in the magic file,
117 it is examined to see if it seems to be a text file.
118 ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
119 (such as those used on Macintosh and IBM PC systems),
120 UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
121 character sets can be distinguished by the different
122 ranges and sequences of bytes that constitute printable text
124 If a file passes any of these tests, its character set is reported.
125 ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
128 because they will be mostly readable on nearly any terminal;
129 UTF-16 and EBCDIC are only
132 they contain text, it is text that will require translation
133 before it can be read.
136 will attempt to determine other characteristics of text-type files.
137 If the lines of a file are terminated by CR, CRLF, or NEL, instead
138 of the Unix-standard LF, this will be reported.
139 Files that contain embedded escape sequences or overstriking
140 will also be identified.
144 has determined the character set used in a text-type file,
146 attempt to determine in what language the file is written.
147 The language tests look for particular strings (cf.
149 that can appear anywhere in the first few blocks of a file.
150 For example, the keyword
152 indicates that the file is most likely a
154 input file, just as the keyword
156 indicates a C program.
157 These tests are less reliable than the previous
158 two groups, so they are performed last.
159 The language test routines also test for some miscellany
164 Any file that cannot be identified as having been written
165 in any of the character sets listed above is simply said to be
168 .Bl -tag -width indent
170 Causes the file command to output the file type and creator code as
171 used by older MacOS versions.
172 The code consists of eight letters,
173 the first describing the file type, the latter the creator.
174 .It Fl b , Fl Fl brief
175 Do not prepend filenames to output lines (brief mode).
176 .It Fl C , Fl Fl compile
179 output file that contains a pre-parsed version of the magic file or directory.
180 .It Fl c , Fl Fl checking-printout
181 Cause a checking printout of the parsed form of the magic file.
182 This is usually used in conjunction with the
184 flag to debug a new magic file before installing it.
186 Prints internal debugging information to stderr.
188 On filesystem errors (file not found etc), instead of handling the error
189 as regular output as POSIX mandates and keep going, issue an error message
191 .It Fl e , Fl Fl exclude Ar testname
192 Exclude the test named in
194 from the list of tests made to determine the file type.
195 Valid test names are:
196 .Bl -tag -width compress
199 application type (only on EMX).
201 Various types of text files (this test will try to guess the text
202 encoding, irrespective of the setting of the
206 Different text encodings for soft magic tests.
208 Ignored for backwards compatibility.
210 Prints details of Compound Document Files.
212 Checks for, and looks inside, compressed files.
214 Prints ELF file details, provided soft magic tests are enabled and the
217 Consults magic files.
219 Examines tar files by verifying the checksum of the 512 byte tar header.
220 Excluding this test can provide more detailed content description by using
221 the soft magic method.
227 Print a slash-separated list of valid extensions for the file type found.
228 .It Fl F , Fl Fl separator Ar separator
229 Use the specified string as the separator between the filename and the
230 file result returned.
233 .It Fl f , Fl Fl files-from Ar namefile
234 Read the names of the files to be examined from
237 before the argument list.
240 or at least one filename argument must be present;
241 to test the standard input, use
243 as a filename argument.
246 is unwrapped and the enclosed filenames are processed when this option is
247 encountered and before any further options processing is done.
248 This allows one to process multiple lists of files with different command line
249 arguments on the same
252 Thus if you want to set the delimiter, you need to do it before you specify
253 the list of files, like:
254 .Dq Fl F Ar @ Fl f Ar namefile ,
256 .Dq Fl f Ar namefile Fl F Ar @ .
257 .It Fl h , Fl Fl no-dereference
258 option causes symlinks not to be followed
259 (on systems that support symbolic links).
260 This is the default if the environment variable
263 .It Fl i , Fl Fl mime
264 Causes the file command to output mime type strings rather than the more
265 traditional human readable ones.
267 .Sq text/plain; charset=us-ascii
270 .It Fl Fl mime-type , Fl Fl mime-encoding
273 but print only the specified element(s).
274 .It Fl k , Fl Fl keep-going
275 Don't stop at the first match, keep going.
276 Subsequent matches will be
280 (If you want a newline, see the
283 The magic pattern with the highest strength (see the
286 .It Fl l , Fl Fl list
287 Shows a list of patterns and their strength sorted descending by
290 which is used for the matching (see also the
293 .It Fl L , Fl Fl dereference
294 option causes symlinks to be followed, as the like-named option in
296 (on systems that support symbolic links).
297 This is the default if the environment variable
300 .It Fl m , Fl Fl magic-file Ar magicfiles
301 Specify an alternate list of files and directories containing magic.
302 This can be a single item, or a colon-separated list.
303 If a compiled magic file is found alongside a file or directory,
304 it will be used instead.
305 .It Fl N , Fl Fl no-pad
306 Don't pad filenames so that they align in the output.
307 .It Fl n , Fl Fl no-buffer
308 Force stdout to be flushed after checking each file.
309 This is only useful if checking a list of files.
310 It is intended to be used by programs that want filetype output from a pipe.
311 .It Fl p , Fl Fl preserve-date
312 On systems that support
316 attempt to preserve the access time of files analyzed, to pretend that
319 .It Fl P , Fl Fl parameter Ar name=value
320 Set various parameter limits.
321 .Bl -column "elf_phnum" "Default" "XXXXXXXXXXXXXXXXXXXXXXXXXXX" -offset indent
322 .It Sy "Name" Ta Sy "Default" Ta Sy "Explanation"
323 .It Li indir Ta 15 Ta recursion limit for indirect magic
324 .It Li name Ta 30 Ta use count limit for name/use magic
325 .It Li elf_notes Ta 256 Ta max ELF notes processed
326 .It Li elf_phnum Ta 128 Ta max ELF program sections processed
327 .It Li elf_shnum Ta 32768 Ta max ELF sections processed
328 .It Li regex Ta 8192 Ta length limit for regex searches
329 .It Li bytes Ta 1048576 Ta max number of bytes to read from file
332 Don't translate unprintable characters to \eooo.
335 translates unprintable characters to their octal representation.
336 .It Fl s , Fl Fl special-files
339 only attempts to read and determine the type of argument files which
341 reports are ordinary files.
342 This prevents problems, because reading special files may have peculiar
348 to also read argument files which are block or character special files.
349 This is useful for determining the filesystem types of the data in raw
350 disk partitions, which are block special files.
351 This option also causes
353 to disregard the file size as reported by
355 since on some systems it reports a zero size for raw disk partitions.
356 .It Fl S , Fl Fl no-sandbox
357 On systems where libseccomp
358 .Pa ( https://github.com/seccomp/libseccomp )
361 flag disables sandboxing which is enabled by default.
362 This option is needed for file to execute external descompressing programs,
365 flag is specified and the built-in decompressors are not available.
366 .It Fl v , Fl Fl version
367 Print the version of the program and exit.
368 .It Fl z , Fl Fl uncompress
369 Try to look inside compressed files.
370 .It Fl Z , Fl Fl uncompress-noreport
371 Try to look inside compressed files, but report information about the contents
372 only not the compression.
373 .It Fl 0 , Fl Fl print0
374 Output a null character
376 after the end of the filename.
380 This does not affect the separator, which is still printed.
382 If this option is repeated more than once, then
384 prints just the filename followed by a NUL followed by the description
385 (or ERROR: text) followed by a second NUL for each entry.
387 Print a help message and exit.
390 The environment variable
392 can be used to set the default magic file name.
393 If that variable is set, then
395 will not attempt to open
400 to the value of this variable as appropriate.
401 The environment variable
403 controls (on systems that support symbolic links), whether
405 will attempt to follow symlinks or not.
408 follows symlink, otherwise it does not.
409 This is also controlled by the
415 .Bl -tag -width __MAGIC__.mgc -compact
417 Default compiled list of magic.
419 Directory containing default magic files.
425 if the operation was successful or
427 if an error was encountered.
428 The following errors cause diagnostic messages, but don't affect the program
429 exit code (as POSIX requires), unless
432 .Bl -bullet -compact -offset indent
434 A file cannot be found
436 There is no permission to read a file
438 The file type cannot be determined
441 .Bd -literal -offset indent
442 $ file file.c file /dev/{wd0a,hda}
443 file.c: C program text
444 file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
445 dynamically linked (uses shared libs), stripped
446 /dev/wd0a: block special (0/0)
447 /dev/hda: block special (3/0)
449 $ file -s /dev/wd0{b,d}
451 /dev/wd0d: x86 boot sector
453 $ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
454 /dev/hda: x86 boot sector
455 /dev/hda1: Linux/i386 ext2 filesystem
456 /dev/hda2: x86 boot sector
457 /dev/hda3: x86 boot sector, extended partition table
458 /dev/hda4: Linux/i386 ext2 filesystem
459 /dev/hda5: Linux/i386 swap file
460 /dev/hda6: Linux/i386 swap file
461 /dev/hda7: Linux/i386 swap file
462 /dev/hda8: Linux/i386 swap file
466 $ file -i file.c file /dev/{wd0a,hda}
468 file: application/x-executable
469 /dev/hda: application/x-not-regular-file
470 /dev/wd0a: application/x-not-regular-file
477 .Xr magic __FSECTION__ ,
479 .Sh STANDARDS CONFORMANCE
480 This program is believed to exceed the System V Interface Definition
481 of FILE(CMD), as near as one can determine from the vague language
483 Its behavior is mostly compatible with the System V program of the same name.
484 This version knows more magic, however, so it will produce
485 different (albeit more accurate) output in many cases.
486 .\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
488 The one significant difference
489 between this version and System V
490 is that this version treats any white space
491 as a delimiter, so that spaces in pattern strings must be escaped.
493 .Bd -literal -offset indent
494 \*[Gt]10 string language impress\ (imPRESS data)
497 in an existing magic file would have to be changed to
498 .Bd -literal -offset indent
499 \*[Gt]10 string language\e impress (imPRESS data)
502 In addition, in this version, if a pattern string contains a backslash,
505 .Bd -literal -offset indent
506 0 string \ebegindata Andrew Toolkit document
509 in an existing magic file would have to be changed to
510 .Bd -literal -offset indent
511 0 string \e\ebegindata Andrew Toolkit document
514 SunOS releases 3.2 and later from Sun Microsystems include a
516 command derived from the System V one, but with some extensions.
517 This version differs from Sun's only in minor ways.
518 It includes the extension of the
522 .Bd -literal -offset indent
523 \*[Gt]16 long\*[Am]0x7fffffff \*[Gt]0 not stripped
526 On systems where libseccomp
527 .Pa ( https://github.com/seccomp/libseccomp )
530 is enforces limiting system calls to only the ones necessary for the
531 operation of the program.
532 This enforcement does not provide any security benefit when
534 is asked to decompress input files running external programs with
538 To enable execution of external decompressors, one needs to disable
543 The magic file entries have been collected from various sources,
544 mainly USENET, and contributed by various authors.
545 Christos Zoulas (address below) will collect additional
546 or corrected magic file entries.
547 A consolidation of magic file entries
548 will be distributed periodically.
550 The order of entries in the magic file is significant.
551 Depending on what system you are using, the order that
552 they are put together may be incorrect.
555 command uses a magic file,
556 keep the old magic file around for comparison purposes
558 .Pa __MAGIC__.orig ) .
563 .Dv UNIX since at least Research Version 4
564 (man page dated November, 1973).
565 The System V version introduced one significant major change:
566 the external list of magic types.
567 This slowed the program down slightly but made it a lot more flexible.
569 This program, based on the System V version,
570 was written by Ian Darwin
571 .Aq ian@darwinsys.com
572 without looking at anybody else's source code.
574 John Gilmore revised the code extensively, making it better than
576 Geoff Collyer found several inadequacies
577 and provided some magic file entries.
580 operator by Rob McMahon,
581 .Aq cudcv@warwick.ac.uk ,
586 made many changes from 1993 to the present.
588 Primary development and maintenance from 1990 to the present by
590 .Aq christos@astron.com .
592 Altered by Chris Lowth
593 .Aq chris@lowth.com ,
596 option to output mime type strings, using an alternative
597 magic file and internal logic.
599 Altered by Eric Fischer
602 to identify character codes and attempt to identify the languages
605 Altered by Reuben Thomas
607 2007-2011, to improve MIME support, merge MIME and non-MIME magic,
608 support directories as well as files of magic, apply many bug fixes,
609 update and fix a lot of magic, improve the build system, improve the
610 documentation, and rewrite the Python bindings in pure Python.
612 The list of contributors to the
614 directory (magic files)
615 is too long to include here.
616 You know who you are; thank you.
617 Many contributors are listed in the source files.
619 Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
620 Covered by the standard Berkeley Software Distribution copyright; see the file
621 COPYING in the source distribution.
627 were written by John Gilmore from his public-domain
629 program, and are not covered by the above license.
631 Please report bugs and send patches to the bug tracker at
632 .Pa http://bugs.astron.com/
633 or the mailing list at
636 .Pa http://mailman.astron.com/mailman/listinfo/file
639 Fix output so that tests for MIME and APPLE flags are not needed all
640 over the place, and actual output is only done in one place.
642 Suggestion: push possible outputs on to a list, then pick the
643 last-pushed (most specific, one hopes) value at the end, or
644 use a default if the list is empty.
645 This should not slow down evaluation.
649 and printing \e012- between entries is clumsy and complicated; refactor
652 Some of the encoding logic is hard-coded in encoding.c and can be moved
653 to the magic files if we had a !:charset annotation
655 Continue to squash all magic bugs.
656 See Debian BTS for a good source.
658 Store arbitrarily long strings, for example for %s patterns, so that
659 they can be printed out.
660 Fixes Debian bug #271672.
661 This can be done by allocating strings in a string pool, storing the
662 string pool at the end of the magic file and converting all the string
663 pointers to relative offsets from the string pool.
665 Add syntax for relative offsets after current level (Debian bug #466037).
667 Make file -ki work, i.e. give multiple MIME types.
669 Add a zip library so we can peek inside Office2007 documents to
670 print more details about their contents.
672 Add an option to print URLs for the sources of the file descriptions.
674 Combine script searches and add a way to map executable names to MIME
675 types (e.g. have a magic value for !:mime which causes the resulting
676 string to be looked up in a table).
677 This would avoid adding the same magic repeatedly for each new
678 hash-bang interpreter.
680 When a file descriptor is available, we can skip and adjust the buffer
681 instead of the hacky buffer management we do now.
687 to check for consistency at compile time (duplicate
690 pointing to undefined
697 more efficient by keeping a sorted list of names.
698 Special-case ^ to flip endianness in the parser so that it does not
699 have to be escaped, and document it.
701 If the offsets specified internally in the file exceed the buffer size
704 variable in file.h), then we don't seek to that offset, but we give up.
705 It would be better if buffer managements was done when the file descriptor
706 is available so move around the file.
707 One must be careful though because this has performance (and thus security
710 You can obtain the original author's latest version by anonymous FTP
714 .Pa /pub/file/file-X.YZ.tar.gz .