contrib/xz/src/xz/xz.1

   1 '\" t
   2 .\"
   3 .\" Author: Lasse Collin
   4 .\"
   5 .\" This file has been put into the public domain.
   6 .\" You can do whatever you want with this file.
   7 .\"
   8 .TH XZ 1 "2014-12-16" "Tukaani" "XZ Utils"
   9 .
  10 .SH NAME
  11 xz, unxz, xzcat, lzma, unlzma, lzcat \- Compress or decompress .xz and .lzma files
  12 .
  13 .SH SYNOPSIS
  14 .B xz
  15 .RI [ option... ]
  16 .RI [ file... ]
  17 .
  18 .SH COMMAND ALIASES
  19 .B unxz
  20 is equivalent to
  21 .BR "xz \-\-decompress" .
  22 .br
  23 .B xzcat
  24 is equivalent to
  25 .BR "xz \-\-decompress \-\-stdout" .
  26 .br
  27 .B lzma
  28 is equivalent to
  29 .BR "xz \-\-format=lzma" .
  30 .br
  31 .B unlzma
  32 is equivalent to
  33 .BR "xz \-\-format=lzma \-\-decompress" .
  34 .br
  35 .B lzcat
  36 is equivalent to
  37 .BR "xz \-\-format=lzma \-\-decompress \-\-stdout" .
  38 .PP
  39 When writing scripts that need to decompress files,
  40 it is recommended to always use the name
  41 .B xz
  42 with appropriate arguments
  43 .RB ( "xz \-d"
  44 or
  45 .BR "xz \-dc" )
  46 instead of the names
  47 .B unxz
  48 and
  49 .BR xzcat .
  50 .
  51 .SH DESCRIPTION
  52 .B xz
  53 is a general-purpose data compression tool with
  54 command line syntax similar to
  55 .BR gzip (1)
  56 and
  57 .BR bzip2 (1).
  58 The native file format is the
  59 .B .xz
  60 format, but the legacy
  61 .B .lzma
  62 format used by LZMA Utils and
  63 raw compressed streams with no container format headers
  64 are also supported.
  65 .PP
  66 .B xz
  67 compresses or decompresses each
  68 .I file
  69 according to the selected operation mode.
  70 If no
  71 .I files
  72 are given or
  73 .I file
  74 is
  75 .BR \- ,
  76 .B xz
  77 reads from standard input and writes the processed data
  78 to standard output.
  79 .B xz
  80 will refuse (display an error and skip the
  81 .IR file )
  82 to write compressed data to standard output if it is a terminal.
  83 Similarly,
  84 .B xz
  85 will refuse to read compressed data
  86 from standard input if it is a terminal.
  87 .PP
  88 Unless
  89 .B \-\-stdout
  90 is specified,
  91 .I files
  92 other than
  93 .B \-
  94 are written to a new file whose name is derived from the source
  95 .I file
  96 name:
  97 .IP \(bu 3
  98 When compressing, the suffix of the target file format
  99 .RB ( .xz
 100 or
 101 .BR .lzma )
 102 is appended to the source filename to get the target filename.
 103 .IP \(bu 3
 104 When decompressing, the
 105 .B .xz
 106 or
 107 .B .lzma
 108 suffix is removed from the filename to get the target filename.
 109 .B xz
 110 also recognizes the suffixes
 111 .B .txz
 112 and
 113 .BR .tlz ,
 114 and replaces them with the
 115 .B .tar
 116 suffix.
 117 .PP
 118 If the target file already exists, an error is displayed and the
 119 .I file
 120 is skipped.
 121 .PP
 122 Unless writing to standard output,
 123 .B xz
 124 will display a warning and skip the
 125 .I file
 126 if any of the following applies:
 127 .IP \(bu 3
 128 .I File
 129 is not a regular file.
 130 Symbolic links are not followed,
 131 and thus they are not considered to be regular files.
 132 .IP \(bu 3
 133 .I File
 134 has more than one hard link.
 135 .IP \(bu 3
 136 .I File
 137 has setuid, setgid, or sticky bit set.
 138 .IP \(bu 3
 139 The operation mode is set to compress and the
 140 .I file
 141 already has a suffix of the target file format
 142 .RB ( .xz
 143 or
 144 .B .txz
 145 when compressing to the
 146 .B .xz
 147 format, and
 148 .B .lzma
 149 or
 150 .B .tlz
 151 when compressing to the
 152 .B .lzma
 153 format).
 154 .IP \(bu 3
 155 The operation mode is set to decompress and the
 156 .I file
 157 doesn't have a suffix of any of the supported file formats
 158 .RB ( .xz ,
 159 .BR .txz ,
 160 .BR .lzma ,
 161 or
 162 .BR .tlz ).
 163 .PP
 164 After successfully compressing or decompressing the
 165 .IR file ,
 166 .B xz
 167 copies the owner, group, permissions, access time,
 168 and modification time from the source
 169 .I file
 170 to the target file.
 171 If copying the group fails, the permissions are modified
 172 so that the target file doesn't become accessible to users
 173 who didn't have permission to access the source
 174 .IR file .
 175 .B xz
 176 doesn't support copying other metadata like access control lists
 177 or extended attributes yet.
 178 .PP
 179 Once the target file has been successfully closed, the source
 180 .I file
 181 is removed unless
 182 .B \-\-keep
 183 was specified.
 184 The source
 185 .I file
 186 is never removed if the output is written to standard output.
 187 .PP
 188 Sending
 189 .B SIGINFO
 190 or
 191 .B SIGUSR1
 192 to the
 193 .B xz
 194 process makes it print progress information to standard error.
 195 This has only limited use since when standard error
 196 is a terminal, using
 197 .B \-\-verbose
 198 will display an automatically updating progress indicator.
 199 .
 200 .SS "Memory usage"
 201 The memory usage of
 202 .B xz
 203 varies from a few hundred kilobytes to several gigabytes
 204 depending on the compression settings.
 205 The settings used when compressing a file determine
 206 the memory requirements of the decompressor.
 207 Typically the decompressor needs 5\ % to 20\ % of
 208 the amount of memory that the compressor needed when
 209 creating the file.
 210 For example, decompressing a file created with
 211 .B xz \-9
 212 currently requires 65\ MiB of memory.
 213 Still, it is possible to have
 214 .B .xz
 215 files that require several gigabytes of memory to decompress.
 216 .PP
 217 Especially users of older systems may find
 218 the possibility of very large memory usage annoying.
 219 To prevent uncomfortable surprises,
 220 .B xz
 221 has a built-in memory usage limiter, which is disabled by default.
 222 While some operating systems provide ways to limit
 223 the memory usage of processes, relying on it
 224 wasn't deemed to be flexible enough (e.g. using
 225 .BR ulimit (1)
 226 to limit virtual memory tends to cripple
 227 .BR mmap (2)).
 228 .PP
 229 The memory usage limiter can be enabled with
 230 the command line option \fB\-\-memlimit=\fIlimit\fR.
 231 Often it is more convenient to enable the limiter
 232 by default by setting the environment variable
 233 .BR XZ_DEFAULTS ,
 234 e.g.\&
 235 .BR XZ_DEFAULTS=\-\-memlimit=150MiB .
 236 It is possible to set the limits separately
 237 for compression and decompression
 238 by using \fB\-\-memlimit\-compress=\fIlimit\fR and
 239 \fB\-\-memlimit\-decompress=\fIlimit\fR.
 240 Using these two options outside
 241 .B XZ_DEFAULTS
 242 is rarely useful because a single run of
 243 .B xz
 244 cannot do both compression and decompression and
 245 .BI \-\-memlimit= limit
 246 (or \fB\-M\fR \fIlimit\fR)
 247 is shorter to type on the command line.
 248 .PP
 249 If the specified memory usage limit is exceeded when decompressing,
 250 .B xz
 251 will display an error and decompressing the file will fail.
 252 If the limit is exceeded when compressing,
 253 .B xz
 254 will try to scale the settings down so that the limit
 255 is no longer exceeded (except when using \fB\-\-format=raw\fR
 256 or \fB\-\-no\-adjust\fR).
 257 This way the operation won't fail unless the limit is very small.
 258 The scaling of the settings is done in steps that don't
 259 match the compression level presets, e.g. if the limit is
 260 only slightly less than the amount required for
 261 .BR "xz \-9" ,
 262 the settings will be scaled down only a little,
 263 not all the way down to
 264 .BR "xz \-8" .
 265 .
 266 .SS "Concatenation and padding with .xz files"
 267 It is possible to concatenate
 268 .B .xz
 269 files as is.
 270 .B xz
 271 will decompress such files as if they were a single
 272 .B .xz
 273 file.
 274 .PP
 275 It is possible to insert padding between the concatenated parts
 276 or after the last part.
 277 The padding must consist of null bytes and the size
 278 of the padding must be a multiple of four bytes.
 279 This can be useful e.g. if the
 280 .B .xz
 281 file is stored on a medium that measures file sizes
 282 in 512-byte blocks.
 283 .PP
 284 Concatenation and padding are not allowed with
 285 .B .lzma
 286 files or raw streams.
 287 .
 288 .SH OPTIONS
 289 .
 290 .SS "Integer suffixes and special values"
 291 In most places where an integer argument is expected,
 292 an optional suffix is supported to easily indicate large integers.
 293 There must be no space between the integer and the suffix.
 294 .TP
 295 .B KiB
 296 Multiply the integer by 1,024 (2^10).
 297 .BR Ki ,
 298 .BR k ,
 299 .BR kB ,
 300 .BR K ,
 301 and
 302 .B KB
 303 are accepted as synonyms for
 304 .BR KiB .
 305 .TP
 306 .B MiB
 307 Multiply the integer by 1,048,576 (2^20).
 308 .BR Mi ,
 309 .BR m ,
 310 .BR M ,
 311 and
 312 .B MB
 313 are accepted as synonyms for
 314 .BR MiB .
 315 .TP
 316 .B GiB
 317 Multiply the integer by 1,073,741,824 (2^30).
 318 .BR Gi ,
 319 .BR g ,
 320 .BR G ,
 321 and
 322 .B GB
 323 are accepted as synonyms for
 324 .BR GiB .
 325 .PP
 326 The special value
 327 .B max
 328 can be used to indicate the maximum integer value
 329 supported by the option.
 330 .
 331 .SS "Operation mode"
 332 If multiple operation mode options are given,
 333 the last one takes effect.
 334 .TP
 335 .BR \-z ", " \-\-compress
 336 Compress.
 337 This is the default operation mode when no operation mode option
 338 is specified and no other operation mode is implied from
 339 the command name (for example,
 340 .B unxz
 341 implies
 342 .BR \-\-decompress ).
 343 .TP
 344 .BR \-d ", " \-\-decompress ", " \-\-uncompress
 345 Decompress.
 346 .TP
 347 .BR \-t ", " \-\-test
 348 Test the integrity of compressed
 349 .IR files .
 350 This option is equivalent to
 351 .B "\-\-decompress \-\-stdout"
 352 except that the decompressed data is discarded instead of being
 353 written to standard output.
 354 No files are created or removed.
 355 .TP
 356 .BR \-l ", " \-\-list
 357 Print information about compressed
 358 .IR files .
 359 No uncompressed output is produced,
 360 and no files are created or removed.
 361 In list mode, the program cannot read
 362 the compressed data from standard
 363 input or from other unseekable sources.
 364 .IP ""
 365 The default listing shows basic information about
 366 .IR files ,
 367 one file per line.
 368 To get more detailed information, use also the
 369 .B \-\-verbose
 370 option.
 371 For even more information, use
 372 .B \-\-verbose
 373 twice, but note that this may be slow, because getting all the extra
 374 information requires many seeks.
 375 The width of verbose output exceeds
 376 80 characters, so piping the output to e.g.\&
 377 .B "less\ \-S"
 378 may be convenient if the terminal isn't wide enough.
 379 .IP ""
 380 The exact output may vary between
 381 .B xz
 382 versions and different locales.
 383 For machine-readable output,
 384 .B \-\-robot \-\-list
 385 should be used.
 386 .
 387 .SS "Operation modifiers"
 388 .TP
 389 .BR \-k ", " \-\-keep
 390 Don't delete the input files.
 391 .TP
 392 .BR \-f ", " \-\-force
 393 This option has several effects:
 394 .RS
 395 .IP \(bu 3
 396 If the target file already exists,
 397 delete it before compressing or decompressing.
 398 .IP \(bu 3
 399 Compress or decompress even if the input is
 400 a symbolic link to a regular file,
 401 has more than one hard link,
 402 or has the setuid, setgid, or sticky bit set.
 403 The setuid, setgid, and sticky bits are not copied
 404 to the target file.
 405 .IP \(bu 3
 406 When used with
 407 .B \-\-decompress
 408 .BR \-\-stdout
 409 and
 410 .B xz
 411 cannot recognize the type of the source file,
 412 copy the source file as is to standard output.
 413 This allows
 414 .B xzcat
 415 .B \-\-force
 416 to be used like
 417 .BR cat (1)
 418 for files that have not been compressed with
 419 .BR xz .
 420 Note that in future,
 421 .B xz
 422 might support new compressed file formats, which may make
 423 .B xz
 424 decompress more types of files instead of copying them as is to
 425 standard output.
 426 .BI \-\-format= format
 427 can be used to restrict
 428 .B xz
 429 to decompress only a single file format.
 430 .RE
 431 .TP
 432 .BR \-c ", " \-\-stdout ", " \-\-to\-stdout
 433 Write the compressed or decompressed data to
 434 standard output instead of a file.
 435 This implies
 436 .BR \-\-keep .
 437 .TP
 438 .B \-\-single\-stream
 439 Decompress only the first
 440 .B .xz
 441 stream, and
 442 silently ignore possible remaining input data following the stream.
 443 Normally such trailing garbage makes
 444 .B xz
 445 display an error.
 446 .IP ""
 447 .B xz
 448 never decompresses more than one stream from
 449 .B .lzma
 450 files or raw streams, but this option still makes
 451 .B xz
 452 ignore the possible trailing data after the
 453 .B .lzma
 454 file or raw stream.
 455 .IP ""
 456 This option has no effect if the operation mode is not
 457 .B \-\-decompress
 458 or
 459 .BR \-\-test .
 460 .TP
 461 .B \-\-no\-sparse
 462 Disable creation of sparse files.
 463 By default, if decompressing into a regular file,
 464 .B xz
 465 tries to make the file sparse if the decompressed data contains
 466 long sequences of binary zeros.
 467 It also works when writing to standard output
 468 as long as standard output is connected to a regular file
 469 and certain additional conditions are met to make it safe.
 470 Creating sparse files may save disk space and speed up
 471 the decompression by reducing the amount of disk I/O.
 472 .TP
 473 \fB\-S\fR \fI.suf\fR, \fB\-\-suffix=\fI.suf
 474 When compressing, use
 475 .I .suf
 476 as the suffix for the target file instead of
 477 .B .xz
 478 or
 479 .BR .lzma .
 480 If not writing to standard output and
 481 the source file already has the suffix
 482 .IR .suf ,
 483 a warning is displayed and the file is skipped.
 484 .IP ""
 485 When decompressing, recognize files with the suffix
 486 .I .suf
 487 in addition to files with the
 488 .BR .xz ,
 489 .BR .txz ,
 490 .BR .lzma ,
 491 or
 492 .B .tlz
 493 suffix.
 494 If the source file has the suffix
 495 .IR .suf ,
 496 the suffix is removed to get the target filename.
 497 .IP ""
 498 When compressing or decompressing raw streams
 499 .RB ( \-\-format=raw ),
 500 the suffix must always be specified unless
 501 writing to standard output,
 502 because there is no default suffix for raw streams.
 503 .TP
 504 \fB\-\-files\fR[\fB=\fIfile\fR]
 505 Read the filenames to process from
 506 .IR file ;
 507 if
 508 .I file
 509 is omitted, filenames are read from standard input.
 510 Filenames must be terminated with the newline character.
 511 A dash
 512 .RB ( \- )
 513 is taken as a regular filename; it doesn't mean standard input.
 514 If filenames are given also as command line arguments, they are
 515 processed before the filenames read from
 516 .IR file .
 517 .TP
 518 \fB\-\-files0\fR[\fB=\fIfile\fR]
 519 This is identical to \fB\-\-files\fR[\fB=\fIfile\fR] except
 520 that each filename must be terminated with the null character.
 521 .
 522 .SS "Basic file format and compression options"
 523 .TP
 524 \fB\-F\fR \fIformat\fR, \fB\-\-format=\fIformat
 525 Specify the file
 526 .I format
 527 to compress or decompress:
 528 .RS
 529 .TP
 530 .B auto
 531 This is the default.
 532 When compressing,
 533 .B auto
 534 is equivalent to
 535 .BR xz .
 536 When decompressing,
 537 the format of the input file is automatically detected.
 538 Note that raw streams (created with
 539 .BR \-\-format=raw )
 540 cannot be auto-detected.
 541 .TP
 542 .B xz
 543 Compress to the
 544 .B .xz
 545 file format, or accept only
 546 .B .xz
 547 files when decompressing.
 548 .TP
 549 .BR lzma ", " alone
 550 Compress to the legacy
 551 .B .lzma
 552 file format, or accept only
 553 .B .lzma
 554 files when decompressing.
 555 The alternative name
 556 .B alone
 557 is provided for backwards compatibility with LZMA Utils.
 558 .TP
 559 .B raw
 560 Compress or uncompress a raw stream (no headers).
 561 This is meant for advanced users only.
 562 To decode raw streams, you need use
 563 .B \-\-format=raw
 564 and explicitly specify the filter chain,
 565 which normally would have been stored in the container headers.
 566 .RE
 567 .TP
 568 \fB\-C\fR \fIcheck\fR, \fB\-\-check=\fIcheck
 569 Specify the type of the integrity check.
 570 The check is calculated from the uncompressed data and
 571 stored in the
 572 .B .xz
 573 file.
 574 This option has an effect only when compressing into the
 575 .B .xz
 576 format; the
 577 .B .lzma
 578 format doesn't support integrity checks.
 579 The integrity check (if any) is verified when the
 580 .B .xz
 581 file is decompressed.
 582 .IP ""
 583 Supported
 584 .I check
 585 types:
 586 .RS
 587 .TP
 588 .B none
 589 Don't calculate an integrity check at all.
 590 This is usually a bad idea.
 591 This can be useful when integrity of the data is verified
 592 by other means anyway.
 593 .TP
 594 .B crc32
 595 Calculate CRC32 using the polynomial from IEEE-802.3 (Ethernet).
 596 .TP
 597 .B crc64
 598 Calculate CRC64 using the polynomial from ECMA-182.
 599 This is the default, since it is slightly better than CRC32
 600 at detecting damaged files and the speed difference is negligible.
 601 .TP
 602 .B sha256
 603 Calculate SHA-256.
 604 This is somewhat slower than CRC32 and CRC64.
 605 .RE
 606 .IP ""
 607 Integrity of the
 608 .B .xz
 609 headers is always verified with CRC32.
 610 It is not possible to change or disable it.
 611 .TP
 612 .B \-\-ignore\-check
 613 Don't verify the integrity check of the compressed data when decompressing.
 614 The CRC32 values in the
 615 .B .xz
 616 headers will still be verified normally.
 617 .IP ""
 618 .B "Do not use this option unless you know what you are doing."
 619 Possible reasons to use this option:
 620 .RS
 621 .IP \(bu 3
 622 Trying to recover data from a corrupt .xz file.
 623 .IP \(bu 3
 624 Speeding up decompression.
 625 This matters mostly with SHA-256 or
 626 with files that have compressed extremely well.
 627 It's recommended to not use this option for this purpose
 628 unless the file integrity is verified externally in some other way.
 629 .RE
 630 .TP
 631 .BR \-0 " ... " \-9
 632 Select a compression preset level.
 633 The default is
 634 .BR \-6 .
 635 If multiple preset levels are specified,
 636 the last one takes effect.
 637 If a custom filter chain was already specified, setting
 638 a compression preset level clears the custom filter chain.
 639 .IP ""
 640 The differences between the presets are more significant than with
 641 .BR gzip (1)
 642 and
 643 .BR bzip2 (1).
 644 The selected compression settings determine
 645 the memory requirements of the decompressor,
 646 thus using a too high preset level might make it painful
 647 to decompress the file on an old system with little RAM.
 648 Specifically,
 649 .B "it's not a good idea to blindly use \-9 for everything"
 650 like it often is with
 651 .BR gzip (1)
 652 and
 653 .BR bzip2 (1).
 654 .RS
 655 .TP
 656 .BR "\-0" " ... " "\-3"
 657 These are somewhat fast presets.
 658 .B \-0
 659 is sometimes faster than
 660 .B "gzip \-9"
 661 while compressing much better.
 662 The higher ones often have speed comparable to
 663 .BR bzip2 (1)
 664 with comparable or better compression ratio,
 665 although the results
 666 depend a lot on the type of data being compressed.
 667 .TP
 668 .BR "\-4" " ... " "\-6"
 669 Good to very good compression while keeping
 670 decompressor memory usage reasonable even for old systems.
 671 .B \-6
 672 is the default, which is usually a good choice
 673 e.g. for distributing files that need to be decompressible
 674 even on systems with only 16\ MiB RAM.
 675 .RB ( \-5e
 676 or
 677 .B \-6e
 678 may be worth considering too.
 679 See
 680 .BR \-\-extreme .)
 681 .TP
 682 .B "\-7 ... \-9"
 683 These are like
 684 .B \-6
 685 but with higher compressor and decompressor memory requirements.
 686 These are useful only when compressing files bigger than
 687 8\ MiB, 16\ MiB, and 32\ MiB, respectively.
 688 .RE
 689 .IP ""
 690 On the same hardware, the decompression speed is approximately
 691 a constant number of bytes of compressed data per second.
 692 In other words, the better the compression,
 693 the faster the decompression will usually be.
 694 This also means that the amount of uncompressed output
 695 produced per second can vary a lot.
 696 .IP ""
 697 The following table summarises the features of the presets:
 698 .RS
 699 .RS
 700 .PP
 701 .TS
 702 tab(;);
 703 c c c c c
 704 n n n n n.
 705 Preset;DictSize;CompCPU;CompMem;DecMem
 706 \-0;256 KiB;0;3 MiB;1 MiB
 707 \-1;1 MiB;1;9 MiB;2 MiB
 708 \-2;2 MiB;2;17 MiB;3 MiB
 709 \-3;4 MiB;3;32 MiB;5 MiB
 710 \-4;4 MiB;4;48 MiB;5 MiB
 711 \-5;8 MiB;5;94 MiB;9 MiB
 712 \-6;8 MiB;6;94 MiB;9 MiB
 713 \-7;16 MiB;6;186 MiB;17 MiB
 714 \-8;32 MiB;6;370 MiB;33 MiB
 715 \-9;64 MiB;6;674 MiB;65 MiB
 716 .TE
 717 .RE
 718 .RE
 719 .IP ""
 720 Column descriptions:
 721 .RS
 722 .IP \(bu 3
 723 DictSize is the LZMA2 dictionary size.
 724 It is waste of memory to use a dictionary bigger than
 725 the size of the uncompressed file.
 726 This is why it is good to avoid using the presets
 727 .BR \-7 " ... " \-9
 728 when there's no real need for them.
 729 At
 730 .B \-6
 731 and lower, the amount of memory wasted is
 732 usually low enough to not matter.
 733 .IP \(bu 3
 734 CompCPU is a simplified representation of the LZMA2 settings
 735 that affect compression speed.
 736 The dictionary size affects speed too,
 737 so while CompCPU is the same for levels
 738 .BR \-6 " ... " \-9 ,
 739 higher levels still tend to be a little slower.
 740 To get even slower and thus possibly better compression, see
 741 .BR \-\-extreme .
 742 .IP \(bu 3
 743 CompMem contains the compressor memory requirements
 744 in the single-threaded mode.
 745 It may vary slightly between
 746 .B xz
 747 versions.
 748 Memory requirements of some of the future multithreaded modes may
 749 be dramatically higher than that of the single-threaded mode.
 750 .IP \(bu 3
 751 DecMem contains the decompressor memory requirements.
 752 That is, the compression settings determine
 753 the memory requirements of the decompressor.
 754 The exact decompressor memory usage is slightly more than
 755 the LZMA2 dictionary size, but the values in the table
 756 have been rounded up to the next full MiB.
 757 .RE
 758 .TP
 759 .BR \-e ", " \-\-extreme
 760 Use a slower variant of the selected compression preset level
 761 .RB ( \-0 " ... " \-9 )
 762 to hopefully get a little bit better compression ratio,
 763 but with bad luck this can also make it worse.
 764 Decompressor memory usage is not affected,
 765 but compressor memory usage increases a little at preset levels
 766 .BR \-0 " ... " \-3 .
 767 .IP ""
 768 Since there are two presets with dictionary sizes
 769 4\ MiB and 8\ MiB, the presets
 770 .B \-3e
 771 and
 772 .B \-5e
 773 use slightly faster settings (lower CompCPU) than
 774 .B \-4e
 775 and
 776 .BR \-6e ,
 777 respectively.
 778 That way no two presets are identical.
 779 .RS
 780 .RS
 781 .PP
 782 .TS
 783 tab(;);
 784 c c c c c
 785 n n n n n.
 786 Preset;DictSize;CompCPU;CompMem;DecMem
 787 \-0e;256 KiB;8;4 MiB;1 MiB
 788 \-1e;1 MiB;8;13 MiB;2 MiB
 789 \-2e;2 MiB;8;25 MiB;3 MiB
 790 \-3e;4 MiB;7;48 MiB;5 MiB
 791 \-4e;4 MiB;8;48 MiB;5 MiB
 792 \-5e;8 MiB;7;94 MiB;9 MiB
 793 \-6e;8 MiB;8;94 MiB;9 MiB
 794 \-7e;16 MiB;8;186 MiB;17 MiB
 795 \-8e;32 MiB;8;370 MiB;33 MiB
 796 \-9e;64 MiB;8;674 MiB;65 MiB
 797 .TE
 798 .RE
 799 .RE
 800 .IP ""
 801 For example, there are a total of four presets that use
 802 8\ MiB dictionary, whose order from the fastest to the slowest is
 803 .BR \-5 ,
 804 .BR \-6 ,
 805 .BR \-5e ,
 806 and
 807 .BR \-6e .
 808 .TP
 809 .B \-\-fast
 810 .PD 0
 811 .TP
 812 .B \-\-best
 813 .PD
 814 These are somewhat misleading aliases for
 815 .B \-0
 816 and
 817 .BR \-9 ,
 818 respectively.
 819 These are provided only for backwards compatibility
 820 with LZMA Utils.
 821 Avoid using these options.
 822 .TP
 823 .BI \-\-block\-size= size
 824 When compressing to the
 825 .B .xz
 826 format, split the input data into blocks of
 827 .I size
 828 bytes.
 829 The blocks are compressed independently from each other,
 830 which helps with multi-threading and
 831 makes limited random-access decompression possible.
 832 This option is typically used to override the default
 833 block size in multi-threaded mode,
 834 but this option can be used in single-threaded mode too.
 835 .IP ""
 836 In multi-threaded mode about three times
 837 .I size
 838 bytes will be allocated in each thread for buffering input and output.
 839 The default
 840 .I size
 841 is three times the LZMA2 dictionary size or 1 MiB,
 842 whichever is more.
 843 Typically a good value is 2\-4 times
 844 the size of the LZMA2 dictionary or at least 1 MiB.
 845 Using
 846 .I size
 847 less than the LZMA2 dictionary size is waste of RAM
 848 because then the LZMA2 dictionary buffer will never get fully used.
 849 The sizes of the blocks are stored in the block headers,
 850 which a future version of
 851 .B xz
 852 will use for multi-threaded decompression.
 853 .IP ""
 854 In single-threaded mode no block splitting is done by default.
 855 Setting this option doesn't affect memory usage.
 856 No size information is stored in block headers,
 857 thus files created in single-threaded mode
 858 won't be identical to files created in multi-threaded mode.
 859 The lack of size information also means that a future version of
 860 .B xz
 861 won't be able decompress the files in multi-threaded mode.
 862 .TP
 863 .BI \-\-block\-list= sizes
 864 When compressing to the
 865 .B .xz
 866 format, start a new block after
 867 the given intervals of uncompressed data.
 868 .IP ""
 869 The uncompressed
 870 .I sizes
 871 of the blocks are specified as a comma-separated list.
 872 Omitting a size (two or more consecutive commas) is a shorthand
 873 to use the size of the previous block.
 874 .IP ""
 875 If the input file is bigger than the sum of
 876 .IR sizes ,
 877 the last value in
 878 .I sizes
 879 is repeated until the end of the file.
 880 A special value of
 881 .B 0
 882 may be used as the last value to indicate that
 883 the rest of the file should be encoded as a single block.
 884 .IP ""
 885 If one specifies
 886 .I sizes
 887 that exceed the encoder's block size
 888 (either the default value in threaded mode or
 889 the value specified with \fB\-\-block\-size=\fIsize\fR),
 890 the encoder will create additional blocks while
 891 keeping the boundaries specified in
 892 .IR sizes .
 893 For example, if one specifies
 894 .B \-\-block\-size=10MiB
 895 .B \-\-block\-list=5MiB,10MiB,8MiB,12MiB,24MiB
 896 and the input file is 80 MiB,
 897 one will get 11 blocks:
 898 5, 10, 8, 10, 2, 10, 10, 4, 10, 10, and 1 MiB.
 899 .IP ""
 900 In multi-threaded mode the sizes of the blocks
 901 are stored in the block headers.
 902 This isn't done in single-threaded mode,
 903 so the encoded output won't be
 904 identical to that of the multi-threaded mode.
 905 .TP
 906 .BI \-\-flush\-timeout= timeout
 907 When compressing, if more than
 908 .I timeout
 909 milliseconds (a positive integer) has passed since the previous flush and
 910 reading more input would block,
 911 all the pending input data is flushed from the encoder and
 912 made available in the output stream.
 913 This can be useful if
 914 .B xz
 915 is used to compress data that is streamed over a network.
 916 Small
 917 .I timeout
 918 values make the data available at the receiving end
 919 with a small delay, but large
 920 .I timeout
 921 values give better compression ratio.
 922 .IP ""
 923 This feature is disabled by default.
 924 If this option is specified more than once, the last one takes effect.
 925 The special
 926 .I timeout
 927 value of
 928 .B 0
 929 can be used to explicitly disable this feature.
 930 .IP ""
 931 This feature is not available on non-POSIX systems.
 932 .IP ""
 933 .\" FIXME
 934 .B "This feature is still experimental."
 935 Currently
 936 .B xz
 937 is unsuitable for decompressing the stream in real time due to how
 938 .B xz
 939 does buffering.
 940 .TP
 941 .BI \-\-memlimit\-compress= limit
 942 Set a memory usage limit for compression.
 943 If this option is specified multiple times,
 944 the last one takes effect.
 945 .IP ""
 946 If the compression settings exceed the
 947 .IR limit ,
 948 .B xz
 949 will adjust the settings downwards so that
 950 the limit is no longer exceeded and display a notice that
 951 automatic adjustment was done.
 952 Such adjustments are not made when compressing with
 953 .B \-\-format=raw
 954 or if
 955 .B \-\-no\-adjust
 956 has been specified.
 957 In those cases, an error is displayed and
 958 .B xz
 959 will exit with exit status 1.
 960 .IP ""
 961 The
 962 .I limit
 963 can be specified in multiple ways:
 964 .RS
 965 .IP \(bu 3
 966 The
 967 .I limit
 968 can be an absolute value in bytes.
 969 Using an integer suffix like
 970 .B MiB
 971 can be useful.
 972 Example:
 973 .B "\-\-memlimit\-compress=80MiB"
 974 .IP \(bu 3
 975 The
 976 .I limit
 977 can be specified as a percentage of total physical memory (RAM).
 978 This can be useful especially when setting the
 979 .B XZ_DEFAULTS
 980 environment variable in a shell initialization script
 981 that is shared between different computers.
 982 That way the limit is automatically bigger
 983 on systems with more memory.
 984 Example:
 985 .B "\-\-memlimit\-compress=70%"
 986 .IP \(bu 3
 987 The
 988 .I limit
 989 can be reset back to its default value by setting it to
 990 .BR 0 .
 991 This is currently equivalent to setting the
 992 .I limit
 993 to
 994 .B max
 995 (no memory usage limit).
 996 Once multithreading support has been implemented,
 997 there may be a difference between
 998 .B 0
 999 and
1000 .B max
1001 for the multithreaded case, so it is recommended to use
1002 .B 0
1003 instead of
1004 .B max
1005 until the details have been decided.
1006 .RE
1007 .IP ""
1008 See also the section
1009 .BR "Memory usage" .
1010 .TP
1011 .BI \-\-memlimit\-decompress= limit
1012 Set a memory usage limit for decompression.
1013 This also affects the
1014 .B \-\-list
1015 mode.
1016 If the operation is not possible without exceeding the
1017 .IR limit ,
1018 .B xz
1019 will display an error and decompressing the file will fail.
1020 See
1021 .BI \-\-memlimit\-compress= limit
1022 for possible ways to specify the
1023 .IR limit .
1024 .TP
1025 \fB\-M\fR \fIlimit\fR, \fB\-\-memlimit=\fIlimit\fR, \fB\-\-memory=\fIlimit
1026 This is equivalent to specifying \fB\-\-memlimit\-compress=\fIlimit
1027 \fB\-\-memlimit\-decompress=\fIlimit\fR.
1028 .TP
1029 .B \-\-no\-adjust
1030 Display an error and exit if the compression settings exceed
1031 the memory usage limit.
1032 The default is to adjust the settings downwards so
1033 that the memory usage limit is not exceeded.
1034 Automatic adjusting is always disabled when creating raw streams
1035 .RB ( \-\-format=raw ).
1036 .TP
1037 \fB\-T\fR \fIthreads\fR, \fB\-\-threads=\fIthreads
1038 Specify the number of worker threads to use.
1039 Setting
1040 .I threads
1041 to a special value
1042 .B 0
1043 makes
1044 .B xz
1045 use as many threads as there are CPU cores on the system.
1046 The actual number of threads can be less than
1047 .I threads
1048 if the input file is not big enough
1049 for threading with the given settings or
1050 if using more threads would exceed the memory usage limit.
1051 .IP ""
1052 Currently the only threading method is to split the input into
1053 blocks and compress them independently from each other.
1054 The default block size depends on the compression level and
1055 can be overriden with the
1056 .BI \-\-block\-size= size
1057 option.
1058 .
1059 .SS "Custom compressor filter chains"
1060 A custom filter chain allows specifying
1061 the compression settings in detail instead of relying on
1062 the settings associated to the presets.
1063 When a custom filter chain is specified,
1064 preset options (\fB\-0\fR ... \fB\-9\fR and \fB\-\-extreme\fR)
1065 earlier on the command line are forgotten.
1066 If a preset option is specified
1067 after one or more custom filter chain options,
1068 the new preset takes effect and
1069 the custom filter chain options specified earlier are forgotten.
1070 .PP
1071 A filter chain is comparable to piping on the command line.
1072 When compressing, the uncompressed input goes to the first filter,
1073 whose output goes to the next filter (if any).
1074 The output of the last filter gets written to the compressed file.
1075 The maximum number of filters in the chain is four,
1076 but typically a filter chain has only one or two filters.
1077 .PP
1078 Many filters have limitations on where they can be
1079 in the filter chain:
1080 some filters can work only as the last filter in the chain,
1081 some only as a non-last filter, and some work in any position
1082 in the chain.
1083 Depending on the filter, this limitation is either inherent to
1084 the filter design or exists to prevent security issues.
1085 .PP
1086 A custom filter chain is specified by using one or more
1087 filter options in the order they are wanted in the filter chain.
1088 That is, the order of filter options is significant!
1089 When decoding raw streams
1090 .RB ( \-\-format=raw ),
1091 the filter chain is specified in the same order as
1092 it was specified when compressing.
1093 .PP
1094 Filters take filter-specific
1095 .I options
1096 as a comma-separated list.
1097 Extra commas in
1098 .I options
1099 are ignored.
1100 Every option has a default value, so you need to
1101 specify only those you want to change.
1102 .PP
1103 To see the whole filter chain and
1104 .IR options ,
1105 use
1106 .B "xz \-vv"
1107 (that is, use
1108 .B \-\-verbose
1109 twice).
1110 This works also for viewing the filter chain options used by presets.
1111 .TP
1112 \fB\-\-lzma1\fR[\fB=\fIoptions\fR]
1113 .PD 0
1114 .TP
1115 \fB\-\-lzma2\fR[\fB=\fIoptions\fR]
1116 .PD
1117 Add LZMA1 or LZMA2 filter to the filter chain.
1118 These filters can be used only as the last filter in the chain.
1119 .IP ""
1120 LZMA1 is a legacy filter,
1121 which is supported almost solely due to the legacy
1122 .B .lzma
1123 file format, which supports only LZMA1.
1124 LZMA2 is an updated
1125 version of LZMA1 to fix some practical issues of LZMA1.
1126 The
1127 .B .xz
1128 format uses LZMA2 and doesn't support LZMA1 at all.
1129 Compression speed and ratios of LZMA1 and LZMA2
1130 are practically the same.
1131 .IP ""
1132 LZMA1 and LZMA2 share the same set of
1133 .IR options :
1134 .RS
1135 .TP
1136 .BI preset= preset
1137 Reset all LZMA1 or LZMA2
1138 .I options
1139 to
1140 .IR preset .
1141 .I Preset
1142 consist of an integer, which may be followed by single-letter
1143 preset modifiers.
1144 The integer can be from
1145 .B 0
1146 to
1147 .BR 9 ,
1148 matching the command line options \fB\-0\fR ... \fB\-9\fR.
1149 The only supported modifier is currently
1150 .BR e ,
1151 which matches
1152 .BR \-\-extreme .
1153 If no
1154 .B preset
1155 is specified, the default values of LZMA1 or LZMA2
1156 .I options
1157 are taken from the preset
1158 .BR 6 .
1159 .TP
1160 .BI dict= size
1161 Dictionary (history buffer)
1162 .I size
1163 indicates how many bytes of the recently processed
1164 uncompressed data is kept in memory.
1165 The algorithm tries to find repeating byte sequences (matches) in
1166 the uncompressed data, and replace them with references
1167 to the data currently in the dictionary.
1168 The bigger the dictionary, the higher is the chance
1169 to find a match.
1170 Thus, increasing dictionary
1171 .I size
1172 usually improves compression ratio, but
1173 a dictionary bigger than the uncompressed file is waste of memory.
1174 .IP ""
1175 Typical dictionary
1176 .I size
1177 is from 64\ KiB to 64\ MiB.
1178 The minimum is 4\ KiB.
1179 The maximum for compression is currently 1.5\ GiB (1536\ MiB).
1180 The decompressor already supports dictionaries up to
1181 one byte less than 4\ GiB, which is the maximum for
1182 the LZMA1 and LZMA2 stream formats.
1183 .IP ""
1184 Dictionary
1185 .I size
1186 and match finder
1187 .RI ( mf )
1188 together determine the memory usage of the LZMA1 or LZMA2 encoder.
1189 The same (or bigger) dictionary
1190 .I size
1191 is required for decompressing that was used when compressing,
1192 thus the memory usage of the decoder is determined
1193 by the dictionary size used when compressing.
1194 The
1195 .B .xz
1196 headers store the dictionary
1197 .I size
1198 either as
1199 .RI "2^" n
1200 or
1201 .RI "2^" n " + 2^(" n "\-1),"
1202 so these
1203 .I sizes
1204 are somewhat preferred for compression.
1205 Other
1206 .I sizes
1207 will get rounded up when stored in the
1208 .B .xz
1209 headers.
1210 .TP
1211 .BI lc= lc
1212 Specify the number of literal context bits.
1213 The minimum is 0 and the maximum is 4; the default is 3.
1214 In addition, the sum of
1215 .I lc
1216 and
1217 .I lp
1218 must not exceed 4.
1219 .IP ""
1220 All bytes that cannot be encoded as matches
1221 are encoded as literals.
1222 That is, literals are simply 8-bit bytes
1223 that are encoded one at a time.
1224 .IP ""
1225 The literal coding makes an assumption that the highest
1226 .I lc
1227 bits of the previous uncompressed byte correlate
1228 with the next byte.
1229 E.g. in typical English text, an upper-case letter is
1230 often followed by a lower-case letter, and a lower-case
1231 letter is usually followed by another lower-case letter.
1232 In the US-ASCII character set, the highest three bits are 010
1233 for upper-case letters and 011 for lower-case letters.
1234 When
1235 .I lc
1236 is at least 3, the literal coding can take advantage of
1237 this property in the uncompressed data.
1238 .IP ""
1239 The default value (3) is usually good.
1240 If you want maximum compression, test
1241 .BR lc=4 .
1242 Sometimes it helps a little, and
1243 sometimes it makes compression worse.
1244 If it makes it worse, test e.g.\&
1245 .B lc=2
1246 too.
1247 .TP
1248 .BI lp= lp
1249 Specify the number of literal position bits.
1250 The minimum is 0 and the maximum is 4; the default is 0.
1251 .IP ""
1252 .I Lp
1253 affects what kind of alignment in the uncompressed data is
1254 assumed when encoding literals.
1255 See
1256 .I pb
1257 below for more information about alignment.
1258 .TP
1259 .BI pb= pb
1260 Specify the number of position bits.
1261 The minimum is 0 and the maximum is 4; the default is 2.
1262 .IP ""
1263 .I Pb
1264 affects what kind of alignment in the uncompressed data is
1265 assumed in general.
1266 The default means four-byte alignment
1267 .RI (2^ pb =2^2=4),
1268 which is often a good choice when there's no better guess.
1269 .IP ""
1270 When the aligment is known, setting
1271 .I pb
1272 accordingly may reduce the file size a little.
1273 E.g. with text files having one-byte
1274 alignment (US-ASCII, ISO-8859-*, UTF-8), setting
1275 .B pb=0
1276 can improve compression slightly.
1277 For UTF-16 text,
1278 .B pb=1
1279 is a good choice.
1280 If the alignment is an odd number like 3 bytes,
1281 .B pb=0
1282 might be the best choice.
1283 .IP ""
1284 Even though the assumed alignment can be adjusted with
1285 .I pb
1286 and
1287 .IR lp ,
1288 LZMA1 and LZMA2 still slightly favor 16-byte alignment.
1289 It might be worth taking into account when designing file formats
1290 that are likely to be often compressed with LZMA1 or LZMA2.
1291 .TP
1292 .BI mf= mf
1293 Match finder has a major effect on encoder speed,
1294 memory usage, and compression ratio.
1295 Usually Hash Chain match finders are faster than Binary Tree
1296 match finders.
1297 The default depends on the
1298 .IR preset :
1299 0 uses
1300 .BR hc3 ,
1301 1\-3
1302 use
1303 .BR hc4 ,
1304 and the rest use
1305 .BR bt4 .
1306 .IP ""
1307 The following match finders are supported.
1308 The memory usage formulas below are rough approximations,
1309 which are closest to the reality when
1310 .I dict
1311 is a power of two.
1312 .RS
1313 .TP
1314 .B hc3
1315 Hash Chain with 2- and 3-byte hashing
1316 .br
1317 Minimum value for
1318 .IR nice :
1319 3
1320 .br
1321 Memory usage:
1322 .br
1323 .I dict
1324 * 7.5 (if
1325 .I dict
1326 <= 16 MiB);
1327 .br
1328 .I dict
1329 * 5.5 + 64 MiB (if
1330 .I dict
1331 > 16 MiB)
1332 .TP
1333 .B hc4
1334 Hash Chain with 2-, 3-, and 4-byte hashing
1335 .br
1336 Minimum value for
1337 .IR nice :
1338 4
1339 .br
1340 Memory usage:
1341 .br
1342 .I dict
1343 * 7.5 (if
1344 .I dict
1345 <= 32 MiB);
1346 .br
1347 .I dict
1348 * 6.5 (if
1349 .I dict
1350 > 32 MiB)
1351 .TP
1352 .B bt2
1353 Binary Tree with 2-byte hashing
1354 .br
1355 Minimum value for
1356 .IR nice :
1357 2
1358 .br
1359 Memory usage:
1360 .I dict
1361 * 9.5
1362 .TP
1363 .B bt3
1364 Binary Tree with 2- and 3-byte hashing
1365 .br
1366 Minimum value for
1367 .IR nice :
1368 3
1369 .br
1370 Memory usage:
1371 .br
1372 .I dict
1373 * 11.5 (if
1374 .I dict
1375 <= 16 MiB);
1376 .br
1377 .I dict
1378 * 9.5 + 64 MiB (if
1379 .I dict
1380 > 16 MiB)
1381 .TP
1382 .B bt4
1383 Binary Tree with 2-, 3-, and 4-byte hashing
1384 .br
1385 Minimum value for
1386 .IR nice :
1387 4
1388 .br
1389 Memory usage:
1390 .br
1391 .I dict
1392 * 11.5 (if
1393 .I dict
1394 <= 32 MiB);
1395 .br
1396 .I dict
1397 * 10.5 (if
1398 .I dict
1399 > 32 MiB)
1400 .RE
1401 .TP
1402 .BI mode= mode
1403 Compression
1404 .I mode
1405 specifies the method to analyze
1406 the data produced by the match finder.
1407 Supported
1408 .I modes
1409 are
1410 .B fast
1411 and
1412 .BR normal .
1413 The default is
1414 .B fast
1415 for
1416 .I presets
1417 0\-3 and
1418 .B normal
1419 for
1420 .I presets
1421 4\-9.
1422 .IP ""
1423 Usually
1424 .B fast
1425 is used with Hash Chain match finders and
1426 .B normal
1427 with Binary Tree match finders.
1428 This is also what the
1429 .I presets
1430 do.
1431 .TP
1432 .BI nice= nice
1433 Specify what is considered to be a nice length for a match.
1434 Once a match of at least
1435 .I nice
1436 bytes is found, the algorithm stops
1437 looking for possibly better matches.
1438 .IP ""
1439 .I Nice
1440 can be 2\-273 bytes.
1441 Higher values tend to give better compression ratio
1442 at the expense of speed.
1443 The default depends on the
1444 .IR preset .
1445 .TP
1446 .BI depth= depth
1447 Specify the maximum search depth in the match finder.
1448 The default is the special value of 0,
1449 which makes the compressor determine a reasonable
1450 .I depth
1451 from
1452 .I mf
1453 and
1454 .IR nice .
1455 .IP ""
1456 Reasonable
1457 .I depth
1458 for Hash Chains is 4\-100 and 16\-1000 for Binary Trees.
1459 Using very high values for
1460 .I depth
1461 can make the encoder extremely slow with some files.
1462 Avoid setting the
1463 .I depth
1464 over 1000 unless you are prepared to interrupt
1465 the compression in case it is taking far too long.
1466 .RE
1467 .IP ""
1468 When decoding raw streams
1469 .RB ( \-\-format=raw ),
1470 LZMA2 needs only the dictionary
1471 .IR size .
1472 LZMA1 needs also
1473 .IR lc ,
1474 .IR lp ,
1475 and
1476 .IR pb .
1477 .TP
1478 \fB\-\-x86\fR[\fB=\fIoptions\fR]
1479 .PD 0
1480 .TP
1481 \fB\-\-powerpc\fR[\fB=\fIoptions\fR]
1482 .TP
1483 \fB\-\-ia64\fR[\fB=\fIoptions\fR]
1484 .TP
1485 \fB\-\-arm\fR[\fB=\fIoptions\fR]
1486 .TP
1487 \fB\-\-armthumb\fR[\fB=\fIoptions\fR]
1488 .TP
1489 \fB\-\-sparc\fR[\fB=\fIoptions\fR]
1490 .PD
1491 Add a branch/call/jump (BCJ) filter to the filter chain.
1492 These filters can be used only as a non-last filter
1493 in the filter chain.
1494 .IP ""
1495 A BCJ filter converts relative addresses in
1496 the machine code to their absolute counterparts.
1497 This doesn't change the size of the data,
1498 but it increases redundancy,
1499 which can help LZMA2 to produce 0\-15\ % smaller
1500 .B .xz
1501 file.
1502 The BCJ filters are always reversible,
1503 so using a BCJ filter for wrong type of data
1504 doesn't cause any data loss, although it may make
1505 the compression ratio slightly worse.
1506 .IP ""
1507 It is fine to apply a BCJ filter on a whole executable;
1508 there's no need to apply it only on the executable section.
1509 Applying a BCJ filter on an archive that contains both executable
1510 and non-executable files may or may not give good results,
1511 so it generally isn't good to blindly apply a BCJ filter when
1512 compressing binary packages for distribution.
1513 .IP ""
1514 These BCJ filters are very fast and
1515 use insignificant amount of memory.
1516 If a BCJ filter improves compression ratio of a file,
1517 it can improve decompression speed at the same time.
1518 This is because, on the same hardware,
1519 the decompression speed of LZMA2 is roughly
1520 a fixed number of bytes of compressed data per second.
1521 .IP ""
1522 These BCJ filters have known problems related to
1523 the compression ratio:
1524 .RS
1525 .IP \(bu 3
1526 Some types of files containing executable code
1527 (e.g. object files, static libraries, and Linux kernel modules)
1528 have the addresses in the instructions filled with filler values.
1529 These BCJ filters will still do the address conversion,
1530 which will make the compression worse with these files.
1531 .IP \(bu 3
1532 Applying a BCJ filter on an archive containing multiple similar
1533 executables can make the compression ratio worse than not using
1534 a BCJ filter.
1535 This is because the BCJ filter doesn't detect the boundaries
1536 of the executable files, and doesn't reset
1537 the address conversion counter for each executable.
1538 .RE
1539 .IP ""
1540 Both of the above problems will be fixed
1541 in the future in a new filter.
1542 The old BCJ filters will still be useful in embedded systems,
1543 because the decoder of the new filter will be bigger
1544 and use more memory.
1545 .IP ""
1546 Different instruction sets have have different alignment:
1547 .RS
1548 .RS
1549 .PP
1550 .TS
1551 tab(;);
1552 l n l
1553 l n l.
1554 Filter;Alignment;Notes
1555 x86;1;32-bit or 64-bit x86
1556 PowerPC;4;Big endian only
1557 ARM;4;Little endian only
1558 ARM-Thumb;2;Little endian only
1559 IA-64;16;Big or little endian
1560 SPARC;4;Big or little endian
1561 .TE
1562 .RE
1563 .RE
1564 .IP ""
1565 Since the BCJ-filtered data is usually compressed with LZMA2,
1566 the compression ratio may be improved slightly if
1567 the LZMA2 options are set to match the
1568 alignment of the selected BCJ filter.
1569 For example, with the IA-64 filter, it's good to set
1570 .B pb=4
1571 with LZMA2 (2^4=16).
1572 The x86 filter is an exception;
1573 it's usually good to stick to LZMA2's default
1574 four-byte alignment when compressing x86 executables.
1575 .IP ""
1576 All BCJ filters support the same
1577 .IR options :
1578 .RS
1579 .TP
1580 .BI start= offset
1581 Specify the start
1582 .I offset
1583 that is used when converting between relative
1584 and absolute addresses.
1585 The
1586 .I offset
1587 must be a multiple of the alignment of the filter
1588 (see the table above).
1589 The default is zero.
1590 In practice, the default is good; specifying a custom
1591 .I offset
1592 is almost never useful.
1593 .RE
1594 .TP
1595 \fB\-\-delta\fR[\fB=\fIoptions\fR]
1596 Add the Delta filter to the filter chain.
1597 The Delta filter can be only used as a non-last filter
1598 in the filter chain.
1599 .IP ""
1600 Currently only simple byte-wise delta calculation is supported.
1601 It can be useful when compressing e.g. uncompressed bitmap images
1602 or uncompressed PCM audio.
1603 However, special purpose algorithms may give significantly better
1604 results than Delta + LZMA2.
1605 This is true especially with audio,
1606 which compresses faster and better e.g. with
1607 .BR flac (1).
1608 .IP ""
1609 Supported
1610 .IR options :
1611 .RS
1612 .TP
1613 .BI dist= distance
1614 Specify the
1615 .I distance
1616 of the delta calculation in bytes.
1617 .I distance
1618 must be 1\-256.
1619 The default is 1.
1620 .IP ""
1621 For example, with
1622 .B dist=2
1623 and eight-byte input A1 B1 A2 B3 A3 B5 A4 B7, the output will be
1624 A1 B1 01 02 01 02 01 02.
1625 .RE
1626 .
1627 .SS "Other options"
1628 .TP
1629 .BR \-q ", " \-\-quiet
1630 Suppress warnings and notices.
1631 Specify this twice to suppress errors too.
1632 This option has no effect on the exit status.
1633 That is, even if a warning was suppressed,
1634 the exit status to indicate a warning is still used.
1635 .TP
1636 .BR \-v ", " \-\-verbose
1637 Be verbose.
1638 If standard error is connected to a terminal,
1639 .B xz
1640 will display a progress indicator.
1641 Specifying
1642 .B \-\-verbose
1643 twice will give even more verbose output.
1644 .IP ""
1645 The progress indicator shows the following information:
1646 .RS
1647 .IP \(bu 3
1648 Completion percentage is shown
1649 if the size of the input file is known.
1650 That is, the percentage cannot be shown in pipes.
1651 .IP \(bu 3
1652 Amount of compressed data produced (compressing)
1653 or consumed (decompressing).
1654 .IP \(bu 3
1655 Amount of uncompressed data consumed (compressing)
1656 or produced (decompressing).
1657 .IP \(bu 3
1658 Compression ratio, which is calculated by dividing
1659 the amount of compressed data processed so far by
1660 the amount of uncompressed data processed so far.
1661 .IP \(bu 3
1662 Compression or decompression speed.
1663 This is measured as the amount of uncompressed data consumed
1664 (compression) or produced (decompression) per second.
1665 It is shown after a few seconds have passed since
1666 .B xz
1667 started processing the file.
1668 .IP \(bu 3
1669 Elapsed time in the format M:SS or H:MM:SS.
1670 .IP \(bu 3
1671 Estimated remaining time is shown
1672 only when the size of the input file is
1673 known and a couple of seconds have already passed since
1674 .B xz
1675 started processing the file.
1676 The time is shown in a less precise format which
1677 never has any colons, e.g. 2 min 30 s.
1678 .RE
1679 .IP ""
1680 When standard error is not a terminal,
1681 .B \-\-verbose
1682 will make
1683 .B xz
1684 print the filename, compressed size, uncompressed size,
1685 compression ratio, and possibly also the speed and elapsed time
1686 on a single line to standard error after compressing or
1687 decompressing the file.
1688 The speed and elapsed time are included only when
1689 the operation took at least a few seconds.
1690 If the operation didn't finish, e.g. due to user interruption,
1691 also the completion percentage is printed
1692 if the size of the input file is known.
1693 .TP
1694 .BR \-Q ", " \-\-no\-warn
1695 Don't set the exit status to 2
1696 even if a condition worth a warning was detected.
1697 This option doesn't affect the verbosity level, thus both
1698 .B \-\-quiet
1699 and
1700 .B \-\-no\-warn
1701 have to be used to not display warnings and
1702 to not alter the exit status.
1703 .TP
1704 .B \-\-robot
1705 Print messages in a machine-parsable format.
1706 This is intended to ease writing frontends that want to use
1707 .B xz
1708 instead of liblzma, which may be the case with various scripts.
1709 The output with this option enabled is meant to be stable across
1710 .B xz
1711 releases.
1712 See the section
1713 .B "ROBOT MODE"
1714 for details.
1715 .TP
1716 .BR \-\-info\-memory
1717 Display, in human-readable format, how much physical memory (RAM)
1718 .B xz
1719 thinks the system has and the memory usage limits for compression
1720 and decompression, and exit successfully.
1721 .TP
1722 .BR \-h ", " \-\-help
1723 Display a help message describing the most commonly used options,
1724 and exit successfully.
1725 .TP
1726 .BR \-H ", " \-\-long\-help
1727 Display a help message describing all features of
1728 .BR xz ,
1729 and exit successfully
1730 .TP
1731 .BR \-V ", " \-\-version
1732 Display the version number of
1733 .B xz
1734 and liblzma in human readable format.
1735 To get machine-parsable output, specify
1736 .B \-\-robot
1737 before
1738 .BR \-\-version .
1739 .
1740 .SH "ROBOT MODE"
1741 The robot mode is activated with the
1742 .B \-\-robot
1743 option.
1744 It makes the output of
1745 .B xz
1746 easier to parse by other programs.
1747 Currently
1748 .B \-\-robot
1749 is supported only together with
1750 .BR \-\-version ,
1751 .BR \-\-info\-memory ,
1752 and
1753 .BR \-\-list .
1754 It will be supported for compression and
1755 decompression in the future.
1756 .
1757 .SS Version
1758 .B "xz \-\-robot \-\-version"
1759 will print the version number of
1760 .B xz
1761 and liblzma in the following format:
1762 .PP
1763 .BI XZ_VERSION= XYYYZZZS
1764 .br
1765 .BI LIBLZMA_VERSION= XYYYZZZS
1766 .TP
1767 .I X
1768 Major version.
1769 .TP
1770 .I YYY
1771 Minor version.
1772 Even numbers are stable.
1773 Odd numbers are alpha or beta versions.
1774 .TP
1775 .I ZZZ
1776 Patch level for stable releases or
1777 just a counter for development releases.
1778 .TP
1779 .I S
1780 Stability.
1781 0 is alpha, 1 is beta, and 2 is stable.
1782 .I S
1783 should be always 2 when
1784 .I YYY
1785 is even.
1786 .PP
1787 .I XYYYZZZS
1788 are the same on both lines if
1789 .B xz
1790 and liblzma are from the same XZ Utils release.
1791 .PP
1792 Examples: 4.999.9beta is
1793 .B 49990091
1794 and
1795 5.0.0 is
1796 .BR 50000002 .
1797 .
1798 .SS "Memory limit information"
1799 .B "xz \-\-robot \-\-info\-memory"
1800 prints a single line with three tab-separated columns:
1801 .IP 1. 4
1802 Total amount of physical memory (RAM) in bytes
1803 .IP 2. 4
1804 Memory usage limit for compression in bytes.
1805 A special value of zero indicates the default setting,
1806 which for single-threaded mode is the same as no limit.
1807 .IP 3. 4
1808 Memory usage limit for decompression in bytes.
1809 A special value of zero indicates the default setting,
1810 which for single-threaded mode is the same as no limit.
1811 .PP
1812 In the future, the output of
1813 .B "xz \-\-robot \-\-info\-memory"
1814 may have more columns, but never more than a single line.
1815 .
1816 .SS "List mode"
1817 .B "xz \-\-robot \-\-list"
1818 uses tab-separated output.
1819 The first column of every line has a string
1820 that indicates the type of the information found on that line:
1821 .TP
1822 .B name
1823 This is always the first line when starting to list a file.
1824 The second column on the line is the filename.
1825 .TP
1826 .B file
1827 This line contains overall information about the
1828 .B .xz
1829 file.
1830 This line is always printed after the
1831 .B name
1832 line.
1833 .TP
1834 .B stream
1835 This line type is used only when
1836 .B \-\-verbose
1837 was specified.
1838 There are as many
1839 .B stream
1840 lines as there are streams in the
1841 .B .xz
1842 file.
1843 .TP
1844 .B block
1845 This line type is used only when
1846 .B \-\-verbose
1847 was specified.
1848 There are as many
1849 .B block
1850 lines as there are blocks in the
1851 .B .xz
1852 file.
1853 The
1854 .B block
1855 lines are shown after all the
1856 .B stream
1857 lines; different line types are not interleaved.
1858 .TP
1859 .B summary
1860 This line type is used only when
1861 .B \-\-verbose
1862 was specified twice.
1863 This line is printed after all
1864 .B block
1865 lines.
1866 Like the
1867 .B file
1868 line, the
1869 .B summary
1870 line contains overall information about the
1871 .B .xz
1872 file.
1873 .TP
1874 .B totals
1875 This line is always the very last line of the list output.
1876 It shows the total counts and sizes.
1877 .PP
1878 The columns of the
1879 .B file
1880 lines:
1881 .PD 0
1882 .RS
1883 .IP 2. 4
1884 Number of streams in the file
1885 .IP 3. 4
1886 Total number of blocks in the stream(s)
1887 .IP 4. 4
1888 Compressed size of the file
1889 .IP 5. 4
1890 Uncompressed size of the file
1891 .IP 6. 4
1892 Compression ratio, for example
1893 .BR 0.123.
1894 If ratio is over 9.999, three dashes
1895 .RB ( \-\-\- )
1896 are displayed instead of the ratio.
1897 .IP 7. 4
1898 Comma-separated list of integrity check names.
1899 The following strings are used for the known check types:
1900 .BR None ,
1901 .BR CRC32 ,
1902 .BR CRC64 ,
1903 and
1904 .BR SHA\-256 .
1905 For unknown check types,
1906 .BI Unknown\- N
1907 is used, where
1908 .I N
1909 is the Check ID as a decimal number (one or two digits).
1910 .IP 8. 4
1911 Total size of stream padding in the file
1912 .RE
1913 .PD
1914 .PP
1915 The columns of the
1916 .B stream
1917 lines:
1918 .PD 0
1919 .RS
1920 .IP 2. 4
1921 Stream number (the first stream is 1)
1922 .IP 3. 4
1923 Number of blocks in the stream
1924 .IP 4. 4
1925 Compressed start offset
1926 .IP 5. 4
1927 Uncompressed start offset
1928 .IP 6. 4
1929 Compressed size (does not include stream padding)
1930 .IP 7. 4
1931 Uncompressed size
1932 .IP 8. 4
1933 Compression ratio
1934 .IP 9. 4
1935 Name of the integrity check
1936 .IP 10. 4
1937 Size of stream padding
1938 .RE
1939 .PD
1940 .PP
1941 The columns of the
1942 .B block
1943 lines:
1944 .PD 0
1945 .RS
1946 .IP 2. 4
1947 Number of the stream containing this block
1948 .IP 3. 4
1949 Block number relative to the beginning of the stream
1950 (the first block is 1)
1951 .IP 4. 4
1952 Block number relative to the beginning of the file
1953 .IP 5. 4
1954 Compressed start offset relative to the beginning of the file
1955 .IP 6. 4
1956 Uncompressed start offset relative to the beginning of the file
1957 .IP 7. 4
1958 Total compressed size of the block (includes headers)
1959 .IP 8. 4
1960 Uncompressed size
1961 .IP 9. 4
1962 Compression ratio
1963 .IP 10. 4
1964 Name of the integrity check
1965 .RE
1966 .PD
1967 .PP
1968 If
1969 .B \-\-verbose
1970 was specified twice, additional columns are included on the
1971 .B block
1972 lines.
1973 These are not displayed with a single
1974 .BR \-\-verbose ,
1975 because getting this information requires many seeks
1976 and can thus be slow:
1977 .PD 0
1978 .RS
1979 .IP 11. 4
1980 Value of the integrity check in hexadecimal
1981 .IP 12. 4
1982 Block header size
1983 .IP 13. 4
1984 Block flags:
1985 .B c
1986 indicates that compressed size is present, and
1987 .B u
1988 indicates that uncompressed size is present.
1989 If the flag is not set, a dash
1990 .RB ( \- )
1991 is shown instead to keep the string length fixed.
1992 New flags may be added to the end of the string in the future.
1993 .IP 14. 4
1994 Size of the actual compressed data in the block (this excludes
1995 the block header, block padding, and check fields)
1996 .IP 15. 4
1997 Amount of memory (in bytes) required to decompress
1998 this block with this
1999 .B xz
2000 version
2001 .IP 16. 4
2002 Filter chain.
2003 Note that most of the options used at compression time
2004 cannot be known, because only the options
2005 that are needed for decompression are stored in the
2006 .B .xz
2007 headers.
2008 .RE
2009 .PD
2010 .PP
2011 The columns of the
2012 .B summary
2013 lines:
2014 .PD 0
2015 .RS
2016 .IP 2. 4
2017 Amount of memory (in bytes) required to decompress
2018 this file with this
2019 .B xz
2020 version
2021 .IP 3. 4
2022 .B yes
2023 or
2024 .B no
2025 indicating if all block headers have both compressed size and
2026 uncompressed size stored in them
2027 .PP
2028 .I Since
2029 .B xz
2030 .I 5.1.2alpha:
2031 .IP 4. 4
2032 Minimum
2033 .B xz
2034 version required to decompress the file
2035 .RE
2036 .PD
2037 .PP
2038 The columns of the
2039 .B totals
2040 line:
2041 .PD 0
2042 .RS
2043 .IP 2. 4
2044 Number of streams
2045 .IP 3. 4
2046 Number of blocks
2047 .IP 4. 4
2048 Compressed size
2049 .IP 5. 4
2050 Uncompressed size
2051 .IP 6. 4
2052 Average compression ratio
2053 .IP 7. 4
2054 Comma-separated list of integrity check names
2055 that were present in the files
2056 .IP 8. 4
2057 Stream padding size
2058 .IP 9. 4
2059 Number of files.
2060 This is here to
2061 keep the order of the earlier columns the same as on
2062 .B file
2063 lines.
2064 .PD
2065 .RE
2066 .PP
2067 If
2068 .B \-\-verbose
2069 was specified twice, additional columns are included on the
2070 .B totals
2071 line:
2072 .PD 0
2073 .RS
2074 .IP 10. 4
2075 Maximum amount of memory (in bytes) required to decompress
2076 the files with this
2077 .B xz
2078 version
2079 .IP 11. 4
2080 .B yes
2081 or
2082 .B no
2083 indicating if all block headers have both compressed size and
2084 uncompressed size stored in them
2085 .PP
2086 .I Since
2087 .B xz
2088 .I 5.1.2alpha:
2089 .IP 12. 4
2090 Minimum
2091 .B xz
2092 version required to decompress the file
2093 .RE
2094 .PD
2095 .PP
2096 Future versions may add new line types and
2097 new columns can be added to the existing line types,
2098 but the existing columns won't be changed.
2099 .
2100 .SH "EXIT STATUS"
2101 .TP
2102 .B 0
2103 All is good.
2104 .TP
2105 .B 1
2106 An error occurred.
2107 .TP
2108 .B 2
2109 Something worth a warning occurred,
2110 but no actual errors occurred.
2111 .PP
2112 Notices (not warnings or errors) printed on standard error
2113 don't affect the exit status.
2114 .
2115 .SH ENVIRONMENT
2116 .B xz
2117 parses space-separated lists of options
2118 from the environment variables
2119 .B XZ_DEFAULTS
2120 and
2121 .BR XZ_OPT ,
2122 in this order, before parsing the options from the command line.
2123 Note that only options are parsed from the environment variables;
2124 all non-options are silently ignored.
2125 Parsing is done with
2126 .BR getopt_long (3)
2127 which is used also for the command line arguments.
2128 .TP
2129 .B XZ_DEFAULTS
2130 User-specific or system-wide default options.
2131 Typically this is set in a shell initialization script to enable
2132 .BR xz 's
2133 memory usage limiter by default.
2134 Excluding shell initialization scripts
2135 and similar special cases, scripts must never set or unset
2136 .BR XZ_DEFAULTS .
2137 .TP
2138 .B XZ_OPT
2139 This is for passing options to
2140 .B xz
2141 when it is not possible to set the options directly on the
2142 .B xz
2143 command line.
2144 This is the case e.g. when
2145 .B xz
2146 is run by a script or tool, e.g. GNU
2147 .BR tar (1):
2148 .RS
2149 .RS
2150 .PP
2151 .nf
2152 .ft CW
2153 XZ_OPT=\-2v tar caf foo.tar.xz foo
2154 .ft R
2155 .fi
2156 .RE
2157 .RE
2158 .IP ""
2159 Scripts may use
2160 .B XZ_OPT
2161 e.g. to set script-specific default compression options.
2162 It is still recommended to allow users to override
2163 .B XZ_OPT
2164 if that is reasonable, e.g. in
2165 .BR sh (1)
2166 scripts one may use something like this:
2167 .RS
2168 .RS
2169 .PP
2170 .nf
2171 .ft CW
2172 XZ_OPT=${XZ_OPT\-"\-7e"}
2173 export XZ_OPT
2174 .ft R
2175 .fi
2176 .RE
2177 .RE
2178 .
2179 .SH "LZMA UTILS COMPATIBILITY"
2180 The command line syntax of
2181 .B xz
2182 is practically a superset of
2183 .BR lzma ,
2184 .BR unlzma ,
2185 and
2186 .BR lzcat
2187 as found from LZMA Utils 4.32.x.
2188 In most cases, it is possible to replace
2189 LZMA Utils with XZ Utils without breaking existing scripts.
2190 There are some incompatibilities though,
2191 which may sometimes cause problems.
2192 .
2193 .SS "Compression preset levels"
2194 The numbering of the compression level presets is not identical in
2195 .B xz
2196 and LZMA Utils.
2197 The most important difference is how dictionary sizes
2198 are mapped to different presets.
2199 Dictionary size is roughly equal to the decompressor memory usage.
2200 .RS
2201 .PP
2202 .TS
2203 tab(;);
2204 c c c
2205 c n n.
2206 Level;xz;LZMA Utils
2207 \-0;256 KiB;N/A
2208 \-1;1 MiB;64 KiB
2209 \-2;2 MiB;1 MiB
2210 \-3;4 MiB;512 KiB
2211 \-4;4 MiB;1 MiB
2212 \-5;8 MiB;2 MiB
2213 \-6;8 MiB;4 MiB
2214 \-7;16 MiB;8 MiB
2215 \-8;32 MiB;16 MiB
2216 \-9;64 MiB;32 MiB
2217 .TE
2218 .RE
2219 .PP
2220 The dictionary size differences affect
2221 the compressor memory usage too,
2222 but there are some other differences between
2223 LZMA Utils and XZ Utils, which
2224 make the difference even bigger:
2225 .RS
2226 .PP
2227 .TS
2228 tab(;);
2229 c c c
2230 c n n.
2231 Level;xz;LZMA Utils 4.32.x
2232 \-0;3 MiB;N/A
2233 \-1;9 MiB;2 MiB
2234 \-2;17 MiB;12 MiB
2235 \-3;32 MiB;12 MiB
2236 \-4;48 MiB;16 MiB
2237 \-5;94 MiB;26 MiB
2238 \-6;94 MiB;45 MiB
2239 \-7;186 MiB;83 MiB
2240 \-8;370 MiB;159 MiB
2241 \-9;674 MiB;311 MiB
2242 .TE
2243 .RE
2244 .PP
2245 The default preset level in LZMA Utils is
2246 .B \-7
2247 while in XZ Utils it is
2248 .BR \-6 ,
2249 so both use an 8 MiB dictionary by default.
2250 .
2251 .SS "Streamed vs. non-streamed .lzma files"
2252 The uncompressed size of the file can be stored in the
2253 .B .lzma
2254 header.
2255 LZMA Utils does that when compressing regular files.
2256 The alternative is to mark that uncompressed size is unknown
2257 and use end-of-payload marker to indicate
2258 where the decompressor should stop.
2259 LZMA Utils uses this method when uncompressed size isn't known,
2260 which is the case for example in pipes.
2261 .PP
2262 .B xz
2263 supports decompressing
2264 .B .lzma
2265 files with or without end-of-payload marker, but all
2266 .B .lzma
2267 files created by
2268 .B xz
2269 will use end-of-payload marker and have uncompressed size
2270 marked as unknown in the
2271 .B .lzma
2272 header.
2273 This may be a problem in some uncommon situations.
2274 For example, a
2275 .B .lzma
2276 decompressor in an embedded device might work
2277 only with files that have known uncompressed size.
2278 If you hit this problem, you need to use LZMA Utils
2279 or LZMA SDK to create
2280 .B .lzma
2281 files with known uncompressed size.
2282 .
2283 .SS "Unsupported .lzma files"
2284 The
2285 .B .lzma
2286 format allows
2287 .I lc
2288 values up to 8, and
2289 .I lp
2290 values up to 4.
2291 LZMA Utils can decompress files with any
2292 .I lc
2293 and
2294 .IR lp ,
2295 but always creates files with
2296 .B lc=3
2297 and
2298 .BR lp=0 .
2299 Creating files with other
2300 .I lc
2301 and
2302 .I lp
2303 is possible with
2304 .B xz
2305 and with LZMA SDK.
2306 .PP
2307 The implementation of the LZMA1 filter in liblzma
2308 requires that the sum of
2309 .I lc
2310 and
2311 .I lp
2312 must not exceed 4.
2313 Thus,
2314 .B .lzma
2315 files, which exceed this limitation, cannot be decompressed with
2316 .BR xz .
2317 .PP
2318 LZMA Utils creates only
2319 .B .lzma
2320 files which have a dictionary size of
2321 .RI "2^" n
2322 (a power of 2) but accepts files with any dictionary size.
2323 liblzma accepts only
2324 .B .lzma
2325 files which have a dictionary size of
2326 .RI "2^" n
2327 or
2328 .RI "2^" n " + 2^(" n "\-1)."
2329 This is to decrease false positives when detecting
2330 .B .lzma
2331 files.
2332 .PP
2333 These limitations shouldn't be a problem in practice,
2334 since practically all
2335 .B .lzma
2336 files have been compressed with settings that liblzma will accept.
2337 .
2338 .SS "Trailing garbage"
2339 When decompressing,
2340 LZMA Utils silently ignore everything after the first
2341 .B .lzma
2342 stream.
2343 In most situations, this is a bug.
2344 This also means that LZMA Utils
2345 don't support decompressing concatenated
2346 .B .lzma
2347 files.
2348 .PP
2349 If there is data left after the first
2350 .B .lzma
2351 stream,
2352 .B xz
2353 considers the file to be corrupt unless
2354 .B \-\-single\-stream
2355 was used.
2356 This may break obscure scripts which have
2357 assumed that trailing garbage is ignored.
2358 .
2359 .SH NOTES
2360 .
2361 .SS "Compressed output may vary"
2362 The exact compressed output produced from
2363 the same uncompressed input file
2364 may vary between XZ Utils versions even if
2365 compression options are identical.
2366 This is because the encoder can be improved
2367 (faster or better compression)
2368 without affecting the file format.
2369 The output can vary even between different
2370 builds of the same XZ Utils version,
2371 if different build options are used.
2372 .PP
2373 The above means that once
2374 .B \-\-rsyncable
2375 has been implemented,
2376 the resulting files won't necessarily be rsyncable
2377 unless both old and new files have been compressed
2378 with the same xz version.
2379 This problem can be fixed if a part of the encoder
2380 implementation is frozen to keep rsyncable output
2381 stable across xz versions.
2382 .
2383 .SS "Embedded .xz decompressors"
2384 Embedded
2385 .B .xz
2386 decompressor implementations like XZ Embedded don't necessarily
2387 support files created with integrity
2388 .I check
2389 types other than
2390 .B none
2391 and
2392 .BR crc32 .
2393 Since the default is
2394 .BR \-\-check=crc64 ,
2395 you must use
2396 .B \-\-check=none
2397 or
2398 .B \-\-check=crc32
2399 when creating files for embedded systems.
2400 .PP
2401 Outside embedded systems, all
2402 .B .xz
2403 format decompressors support all the
2404 .I check
2405 types, or at least are able to decompress
2406 the file without verifying the
2407 integrity check if the particular
2408 .I check
2409 is not supported.
2410 .PP
2411 XZ Embedded supports BCJ filters,
2412 but only with the default start offset.
2413 .
2414 .SH EXAMPLES
2415 .
2416 .SS Basics
2417 Compress the file
2418 .I foo
2419 into
2420 .I foo.xz
2421 using the default compression level
2422 .RB ( \-6 ),
2423 and remove
2424 .I foo
2425 if compression is successful:
2426 .RS
2427 .PP
2428 .nf
2429 .ft CW
2430 xz foo
2431 .ft R
2432 .fi
2433 .RE
2434 .PP
2435 Decompress
2436 .I bar.xz
2437 into
2438 .I bar
2439 and don't remove
2440 .I bar.xz
2441 even if decompression is successful:
2442 .RS
2443 .PP
2444 .nf
2445 .ft CW
2446 xz \-dk bar.xz
2447 .ft R
2448 .fi
2449 .RE
2450 .PP
2451 Create
2452 .I baz.tar.xz
2453 with the preset
2454 .B \-4e
2455 .RB ( "\-4 \-\-extreme" ),
2456 which is slower than e.g. the default
2457 .BR \-6 ,
2458 but needs less memory for compression and decompression (48\ MiB
2459 and 5\ MiB, respectively):
2460 .RS
2461 .PP
2462 .nf
2463 .ft CW
2464 tar cf \- baz | xz \-4e > baz.tar.xz
2465 .ft R
2466 .fi
2467 .RE
2468 .PP
2469 A mix of compressed and uncompressed files can be decompressed
2470 to standard output with a single command:
2471 .RS
2472 .PP
2473 .nf
2474 .ft CW
2475 xz \-dcf a.txt b.txt.xz c.txt d.txt.lzma > abcd.txt
2476 .ft R
2477 .fi
2478 .RE
2479 .
2480 .SS "Parallel compression of many files"
2481 On GNU and *BSD,
2482 .BR find (1)
2483 and
2484 .BR xargs (1)
2485 can be used to parallelize compression of many files:
2486 .RS
2487 .PP
2488 .nf
2489 .ft CW
2490 find . \-type f \e! \-name '*.xz' \-print0 \e
2491     | xargs \-0r \-P4 \-n16 xz \-T1
2492 .ft R
2493 .fi
2494 .RE
2495 .PP
2496 The
2497 .B \-P
2498 option to
2499 .BR xargs (1)
2500 sets the number of parallel
2501 .B xz
2502 processes.
2503 The best value for the
2504 .B \-n
2505 option depends on how many files there are to be compressed.
2506 If there are only a couple of files,
2507 the value should probably be 1;
2508 with tens of thousands of files,
2509 100 or even more may be appropriate to reduce the number of
2510 .B xz
2511 processes that
2512 .BR xargs (1)
2513 will eventually create.
2514 .PP
2515 The option
2516 .B \-T1
2517 for
2518 .B xz
2519 is there to force it to single-threaded mode, because
2520 .BR xargs (1)
2521 is used to control the amount of parallelization.
2522 .
2523 .SS "Robot mode"
2524 Calculate how many bytes have been saved in total
2525 after compressing multiple files:
2526 .RS
2527 .PP
2528 .nf
2529 .ft CW
2530 xz \-\-robot \-\-list *.xz | awk '/^totals/{print $5\-$4}'
2531 .ft R
2532 .fi
2533 .RE
2534 .PP
2535 A script may want to know that it is using new enough
2536 .BR xz .
2537 The following
2538 .BR sh (1)
2539 script checks that the version number of the
2540 .B xz
2541 tool is at least 5.0.0.
2542 This method is compatible with old beta versions,
2543 which didn't support the
2544 .B \-\-robot
2545 option:
2546 .RS
2547 .PP
2548 .nf
2549 .ft CW
2550 if ! eval "$(xz \-\-robot \-\-version 2> /dev/null)" ||
2551         [ "$XZ_VERSION" \-lt 50000002 ]; then
2552     echo "Your xz is too old."
2553 fi
2554 unset XZ_VERSION LIBLZMA_VERSION
2555 .ft R
2556 .fi
2557 .RE
2558 .PP
2559 Set a memory usage limit for decompression using
2560 .BR XZ_OPT ,
2561 but if a limit has already been set, don't increase it:
2562 .RS
2563 .PP
2564 .nf
2565 .ft CW
2566 NEWLIM=$((123 << 20))  # 123 MiB
2567 OLDLIM=$(xz \-\-robot \-\-info\-memory | cut \-f3)
2568 if [ $OLDLIM \-eq 0 \-o $OLDLIM \-gt $NEWLIM ]; then
2569     XZ_OPT="$XZ_OPT \-\-memlimit\-decompress=$NEWLIM"
2570     export XZ_OPT
2571 fi
2572 .ft R
2573 .fi
2574 .RE
2575 .
2576 .SS "Custom compressor filter chains"
2577 The simplest use for custom filter chains is
2578 customizing a LZMA2 preset.
2579 This can be useful,
2580 because the presets cover only a subset of the
2581 potentially useful combinations of compression settings.
2582 .PP
2583 The CompCPU columns of the tables
2584 from the descriptions of the options
2585 .BR "\-0" " ... " "\-9"
2586 and
2587 .B \-\-extreme
2588 are useful when customizing LZMA2 presets.
2589 Here are the relevant parts collected from those two tables:
2590 .RS
2591 .PP
2592 .TS
2593 tab(;);
2594 c c
2595 n n.
2596 Preset;CompCPU
2597 \-0;0
2598 \-1;1
2599 \-2;2
2600 \-3;3
2601 \-4;4
2602 \-5;5
2603 \-6;6
2604 \-5e;7
2605 \-6e;8
2606 .TE
2607 .RE
2608 .PP
2609 If you know that a file requires
2610 somewhat big dictionary (e.g. 32 MiB) to compress well,
2611 but you want to compress it quicker than
2612 .B "xz \-8"
2613 would do, a preset with a low CompCPU value (e.g. 1)
2614 can be modified to use a bigger dictionary:
2615 .RS
2616 .PP
2617 .nf
2618 .ft CW
2619 xz \-\-lzma2=preset=1,dict=32MiB foo.tar
2620 .ft R
2621 .fi
2622 .RE
2623 .PP
2624 With certain files, the above command may be faster than
2625 .B "xz \-6"
2626 while compressing significantly better.
2627 However, it must be emphasized that only some files benefit from
2628 a big dictionary while keeping the CompCPU value low.
2629 The most obvious situation,
2630 where a big dictionary can help a lot,
2631 is an archive containing very similar files
2632 of at least a few megabytes each.
2633 The dictionary size has to be significantly bigger
2634 than any individual file to allow LZMA2 to take
2635 full advantage of the similarities between consecutive files.
2636 .PP
2637 If very high compressor and decompressor memory usage is fine,
2638 and the file being compressed is
2639 at least several hundred megabytes, it may be useful
2640 to use an even bigger dictionary than the 64 MiB that
2641 .B "xz \-9"
2642 would use:
2643 .RS
2644 .PP
2645 .nf
2646 .ft CW
2647 xz \-vv \-\-lzma2=dict=192MiB big_foo.tar
2648 .ft R
2649 .fi
2650 .RE
2651 .PP
2652 Using
2653 .B \-vv
2654 .RB ( "\-\-verbose \-\-verbose" )
2655 like in the above example can be useful
2656 to see the memory requirements
2657 of the compressor and decompressor.
2658 Remember that using a dictionary bigger than
2659 the size of the uncompressed file is waste of memory,
2660 so the above command isn't useful for small files.
2661 .PP
2662 Sometimes the compression time doesn't matter,
2663 but the decompressor memory usage has to be kept low
2664 e.g. to make it possible to decompress the file on
2665 an embedded system.
2666 The following command uses
2667 .B \-6e
2668 .RB ( "\-6 \-\-extreme" )
2669 as a base and sets the dictionary to only 64\ KiB.
2670 The resulting file can be decompressed with XZ Embedded
2671 (that's why there is
2672 .BR \-\-check=crc32 )
2673 using about 100\ KiB of memory.
2674 .RS
2675 .PP
2676 .nf
2677 .ft CW
2678 xz \-\-check=crc32 \-\-lzma2=preset=6e,dict=64KiB foo
2679 .ft R
2680 .fi
2681 .RE
2682 .PP
2683 If you want to squeeze out as many bytes as possible,
2684 adjusting the number of literal context bits
2685 .RI ( lc )
2686 and number of position bits
2687 .RI ( pb )
2688 can sometimes help.
2689 Adjusting the number of literal position bits
2690 .RI ( lp )
2691 might help too, but usually
2692 .I lc
2693 and
2694 .I pb
2695 are more important.
2696 E.g. a source code archive contains mostly US-ASCII text,
2697 so something like the following might give
2698 slightly (like 0.1\ %) smaller file than
2699 .B "xz \-6e"
2700 (try also without
2701 .BR lc=4 ):
2702 .RS
2703 .PP
2704 .nf
2705 .ft CW
2706 xz \-\-lzma2=preset=6e,pb=0,lc=4 source_code.tar
2707 .ft R
2708 .fi
2709 .RE
2710 .PP
2711 Using another filter together with LZMA2 can improve
2712 compression with certain file types.
2713 E.g. to compress a x86-32 or x86-64 shared library
2714 using the x86 BCJ filter:
2715 .RS
2716 .PP
2717 .nf
2718 .ft CW
2719 xz \-\-x86 \-\-lzma2 libfoo.so
2720 .ft R
2721 .fi
2722 .RE
2723 .PP
2724 Note that the order of the filter options is significant.
2725 If
2726 .B \-\-x86
2727 is specified after
2728 .BR \-\-lzma2 ,
2729 .B xz
2730 will give an error,
2731 because there cannot be any filter after LZMA2,
2732 and also because the x86 BCJ filter cannot be used
2733 as the last filter in the chain.
2734 .PP
2735 The Delta filter together with LZMA2
2736 can give good results with bitmap images.
2737 It should usually beat PNG,
2738 which has a few more advanced filters than simple
2739 delta but uses Deflate for the actual compression.
2740 .PP
2741 The image has to be saved in uncompressed format,
2742 e.g. as uncompressed TIFF.
2743 The distance parameter of the Delta filter is set
2744 to match the number of bytes per pixel in the image.
2745 E.g. 24-bit RGB bitmap needs
2746 .BR dist=3 ,
2747 and it is also good to pass
2748 .B pb=0
2749 to LZMA2 to accommodate the three-byte alignment:
2750 .RS
2751 .PP
2752 .nf
2753 .ft CW
2754 xz \-\-delta=dist=3 \-\-lzma2=pb=0 foo.tiff
2755 .ft R
2756 .fi
2757 .RE
2758 .PP
2759 If multiple images have been put into a single archive (e.g.\&
2760 .BR .tar ),
2761 the Delta filter will work on that too as long as all images
2762 have the same number of bytes per pixel.
2763 .
2764 .SH "SEE ALSO"
2765 .BR xzdec (1),
2766 .BR xzdiff (1),
2767 .BR xzgrep (1),
2768 .BR xzless (1),
2769 .BR xzmore (1),
2770 .BR gzip (1),
2771 .BR bzip2 (1),
2772 .BR 7z (1)
2773 .PP
2774 XZ Utils: <http://tukaani.org/xz/>
2775 .br
2776 XZ Embedded: <http://tukaani.org/xz/embedded.html>
2777 .br
2778 LZMA SDK: <http://7-zip.org/sdk.html>