sys/contrib/zstd/programs/zstd.1.md

   1 zstd(1) -- zstd, zstdmt, unzstd, zstdcat - Compress or decompress .zst files
   2 ============================================================================
   3
   4 SYNOPSIS
   5 --------
   6
   7 `zstd` [*OPTIONS*] [-|_INPUT-FILE_] [-o _OUTPUT-FILE_]
   8
   9 `zstdmt` is equivalent to `zstd -T0`
  10
  11 `unzstd` is equivalent to `zstd -d`
  12
  13 `zstdcat` is equivalent to `zstd -dcf`
  14
  15
  16 DESCRIPTION
  17 -----------
  18 `zstd` is a fast lossless compression algorithm and data compression tool,
  19 with command line syntax similar to `gzip (1)` and `xz (1)`.
  20 It is based on the **LZ77** family, with further FSE & huff0 entropy stages.
  21 `zstd` offers highly configurable compression speed,
  22 with fast modes at > 200 MB/s per code,
  23 and strong modes nearing lzma compression ratios.
  24 It also features a very fast decoder, with speeds > 500 MB/s per core.
  25
  26 `zstd` command line syntax is generally similar to gzip,
  27 but features the following differences :
  28
  29   - Source files are preserved by default.
  30     It's possible to remove them automatically by using the `--rm` command.
  31   - When compressing a single file, `zstd` displays progress notifications
  32     and result summary by default.
  33     Use `-q` to turn them off.
  34   - `zstd` does not accept input from console,
  35     but it properly accepts `stdin` when it's not the console.
  36   - `zstd` displays a short help page when command line is an error.
  37     Use `-q` to turn it off.
  38
  39 `zstd` compresses or decompresses each _file_ according to the selected
  40 operation mode.
  41 If no _files_ are given or _file_ is `-`, `zstd` reads from standard input
  42 and writes the processed data to standard output.
  43 `zstd` will refuse to write compressed data to standard output
  44 if it is a terminal : it will display an error message and skip the _file_.
  45 Similarly, `zstd` will refuse to read compressed data from standard input
  46 if it is a terminal.
  47
  48 Unless `--stdout` or `-o` is specified, _files_ are written to a new file
  49 whose name is derived from the source _file_ name:
  50
  51 * When compressing, the suffix `.zst` is appended to the source filename to
  52   get the target filename.
  53 * When decompressing, the `.zst` suffix is removed from the source filename to
  54   get the target filename
  55
  56 ### Concatenation with .zst files
  57 It is possible to concatenate `.zst` files as is.
  58 `zstd` will decompress such files as if they were a single `.zst` file.
  59
  60 OPTIONS
  61 -------
  62
  63 ### Integer suffixes and special values
  64 In most places where an integer argument is expected,
  65 an optional suffix is supported to easily indicate large integers.
  66 There must be no space between the integer and the suffix.
  67
  68 * `KiB`:
  69     Multiply the integer by 1,024 (2\^10).
  70     `Ki`, `K`, and `KB` are accepted as synonyms for `KiB`.
  71 * `MiB`:
  72     Multiply the integer by 1,048,576 (2\^20).
  73     `Mi`, `M`, and `MB` are accepted as synonyms for `MiB`.
  74
  75 ### Operation mode
  76 If multiple operation mode options are given,
  77 the last one takes effect.
  78
  79 * `-z`, `--compress`:
  80     Compress.
  81     This is the default operation mode when no operation mode option is specified
  82     and no other operation mode is implied from the command name
  83     (for example, `unzstd` implies `--decompress`).
  84 * `-d`, `--decompress`, `--uncompress`:
  85     Decompress.
  86 * `-t`, `--test`:
  87     Test the integrity of compressed _files_.
  88     This option is equivalent to `--decompress --stdout` except that the
  89     decompressed data is discarded instead of being written to standard output.
  90     No files are created or removed.
  91 * `-b#`:
  92     Benchmark file(s) using compression level #
  93 * `--train FILEs`:
  94     Use FILEs as a training set to create a dictionary.
  95     The training set should contain a lot of small files (> 100).
  96 * `-l`, `--list`:
  97     Display information related to a zstd compressed file, such as size, ratio, and checksum.
  98     Some of these fields may not be available.
  99     This command can be augmented with the `-v` modifier.
 100
 101 ### Operation modifiers
 102
 103 * `-#`:
 104     `#` compression level \[1-19] (default: 3)
 105 * `--ultra`:
 106     unlocks high compression levels 20+ (maximum 22), using a lot more memory.
 107     Note that decompression will also require more memory when using these levels.
 108 * `--long[=#]`:
 109     enables long distance matching with `#` `windowLog`, if not `#` is not
 110     present it defaults to `27`.
 111     This increases the window size (`windowLog`) and memory usage for both the
 112     compressor and decompressor.
 113     This setting is designed to improve the compression ratio for files with
 114     long matches at a large distance.
 115
 116     Note: If `windowLog` is set to larger than 27, `--long=windowLog` or
 117     `--memory=windowSize` needs to be passed to the decompressor.
 118 * `--fast[=#]`:
 119     switch to ultra-fast compression levels.
 120     If `=#` is not present, it defaults to `1`.
 121     The higher the value, the faster the compression speed,
 122     at the cost of some compression ratio.
 123     This setting overwrites compression level if one was set previously.
 124     Similarly, if a compression level is set after `--fast`, it overrides it.
 125
 126 * `-T#`, `--threads=#`:
 127     Compress using `#` working threads (default: 1).
 128     If `#` is 0, attempt to detect and use the number of physical CPU cores.
 129     In all cases, the nb of threads is capped to ZSTDMT_NBTHREADS_MAX==200.
 130     This modifier does nothing if `zstd` is compiled without multithread support.
 131 * `--single-thread`:
 132     Does not spawn a thread for compression, use caller thread instead.
 133     This is the only available mode when multithread support is disabled.
 134     In this mode, compression is serialized with I/O.
 135     (This is different from `-T1`, which spawns 1 compression thread in parallel of I/O).
 136     Single-thread mode also features lower memory usage.
 137 * `-D file`:
 138     use `file` as Dictionary to compress or decompress FILE(s)
 139 * `--nodictID`:
 140     do not store dictionary ID within frame header (dictionary compression).
 141     The decoder will have to rely on implicit knowledge about which dictionary to use,
 142     it won't be able to check if it's correct.
 143 * `-o file`:
 144     save result into `file` (only possible with a single _INPUT-FILE_)
 145 * `-f`, `--force`:
 146     overwrite output without prompting, and (de)compress symbolic links
 147 * `-c`, `--stdout`:
 148     force write to standard output, even if it is the console
 149 * `--[no-]sparse`:
 150     enable / disable sparse FS support,
 151     to make files with many zeroes smaller on disk.
 152     Creating sparse files may save disk space and speed up decompression by
 153     reducing the amount of disk I/O.
 154     default: enabled when output is into a file,
 155     and disabled when output is stdout.
 156     This setting overrides default and can force sparse mode over stdout.
 157 * `--rm`:
 158     remove source file(s) after successful compression or decompression
 159 * `-k`, `--keep`:
 160     keep source file(s) after successful compression or decompression.
 161     This is the default behavior.
 162 * `-r`:
 163     operate recursively on dictionaries
 164 * `--format=FORMAT`:
 165     compress and decompress in other formats. If compiled with
 166     support, zstd can compress to or decompress from other compression algorithm
 167     formats. Possibly available options are `gzip`, `xz`, `lzma`, and `lz4`.
 168 * `-h`/`-H`, `--help`:
 169     display help/long help and exit
 170 * `-V`, `--version`:
 171     display version number and exit.
 172     Advanced : `-vV` also displays supported formats.
 173     `-vvV` also displays POSIX support.
 174 * `-v`:
 175     verbose mode
 176 * `-q`, `--quiet`:
 177     suppress warnings, interactivity, and notifications.
 178     specify twice to suppress errors too.
 179 * `-C`, `--[no-]check`:
 180     add integrity check computed from uncompressed data (default: enabled)
 181 * `--`:
 182     All arguments after `--` are treated as files
 183
 184
 185 DICTIONARY BUILDER
 186 ------------------
 187 `zstd` offers _dictionary_ compression,
 188 which greatly improves efficiency on small files and messages.
 189 It's possible to train `zstd` with a set of samples,
 190 the result of which is saved into a file called a `dictionary`.
 191 Then during compression and decompression, reference the same dictionary,
 192 using command `-D dictionaryFileName`.
 193 Compression of small files similar to the sample set will be greatly improved.
 194
 195 * `--train FILEs`:
 196     Use FILEs as training set to create a dictionary.
 197     The training set should contain a lot of small files (> 100),
 198     and weight typically 100x the target dictionary size
 199     (for example, 10 MB for a 100 KB dictionary).
 200
 201     Supports multithreading if `zstd` is compiled with threading support.
 202     Additional parameters can be specified with `--train-cover`.
 203     The legacy dictionary builder can be accessed with `--train-legacy`.
 204     Equivalent to `--train-cover=d=8,steps=4`.
 205 * `-o file`:
 206     Dictionary saved into `file` (default name: dictionary).
 207 * `--maxdict=#`:
 208     Limit dictionary to specified size (default: 112640).
 209 * `-#`:
 210     Use `#` compression level during training (optional).
 211     Will generate statistics more tuned for selected compression level,
 212     resulting in a _small_ compression ratio improvement for this level.
 213 * `-B#`:
 214     Split input files in blocks of size # (default: no split)
 215 * `--dictID=#`:
 216     A dictionary ID is a locally unique ID that a decoder can use to verify it is
 217     using the right dictionary.
 218     By default, zstd will create a 4-bytes random number ID.
 219     It's possible to give a precise number instead.
 220     Short numbers have an advantage : an ID < 256 will only need 1 byte in the
 221     compressed frame header, and an ID < 65536 will only need 2 bytes.
 222     This compares favorably to 4 bytes default.
 223     However, it's up to the dictionary manager to not assign twice the same ID to
 224     2 different dictionaries.
 225 * `--train-cover[=k#,d=#,steps=#]`:
 226     Select parameters for the default dictionary builder algorithm named cover.
 227     If _d_ is not specified, then it tries _d_ = 6 and _d_ = 8.
 228     If _k_ is not specified, then it tries _steps_ values in the range [50, 2000].
 229     If _steps_ is not specified, then the default value of 40 is used.
 230     Requires that _d_ <= _k_.
 231
 232     Selects segments of size _k_ with highest score to put in the dictionary.
 233     The score of a segment is computed by the sum of the frequencies of all the
 234     subsegments of size _d_.
 235     Generally _d_ should be in the range [6, 8], occasionally up to 16, but the
 236     algorithm will run faster with d <= _8_.
 237     Good values for _k_ vary widely based on the input data, but a safe range is
 238     [2 * _d_, 2000].
 239     Supports multithreading if `zstd` is compiled with threading support.
 240
 241     Examples:
 242
 243     `zstd --train-cover FILEs`
 244
 245     `zstd --train-cover=k=50,d=8 FILEs`
 246
 247     `zstd --train-cover=d=8,steps=500 FILEs`
 248
 249     `zstd --train-cover=k=50 FILEs`
 250
 251 * `--train-legacy[=selectivity=#]`:
 252     Use legacy dictionary builder algorithm with the given dictionary
 253     _selectivity_ (default: 9).
 254     The smaller the _selectivity_ value, the denser the dictionary,
 255     improving its efficiency but reducing its possible maximum size.
 256     `--train-legacy=s=#` is also accepted.
 257
 258     Examples:
 259
 260     `zstd --train-legacy FILEs`
 261
 262     `zstd --train-legacy=selectivity=8 FILEs`
 263
 264
 265 BENCHMARK
 266 ---------
 267
 268 * `-b#`:
 269     benchmark file(s) using compression level #
 270 * `-e#`:
 271     benchmark file(s) using multiple compression levels, from `-b#` to `-e#` (inclusive)
 272 * `-i#`:
 273     minimum evaluation time, in seconds (default: 3s), benchmark mode only
 274 * `-B#`, `--block-size=#`:
 275     cut file(s) into independent blocks of size # (default: no block)
 276 * `--priority=rt`:
 277     set process priority to real-time
 278
 279 **Output Format:** CompressionLevel#Filename : IntputSize -> OutputSize (CompressionRatio), CompressionSpeed, DecompressionSpeed
 280
 281 **Methodology:** For both compression and decompression speed, the entire input is compressed/decompressed in-memory to measure speed. A run lasts at least 1 sec, so when files are small, they are compressed/decompressed several times per run, in order to improve measurement accuracy.
 282
 283 ADVANCED COMPRESSION OPTIONS
 284 ----------------------------
 285 ### --zstd[=options]:
 286 `zstd` provides 22 predefined compression levels.
 287 The selected or default predefined compression level can be changed with
 288 advanced compression options.
 289 The _options_ are provided as a comma-separated list.
 290 You may specify only the options you want to change and the rest will be
 291 taken from the selected or default compression level.
 292 The list of available _options_:
 293
 294 - `strategy`=_strat_, `strat`=_strat_:
 295     Specify a strategy used by a match finder.
 296
 297     There are 8 strategies numbered from 1 to 8, from faster to stronger:
 298     1=ZSTD\_fast, 2=ZSTD\_dfast, 3=ZSTD\_greedy, 4=ZSTD\_lazy,
 299     5=ZSTD\_lazy2, 6=ZSTD\_btlazy2, 7=ZSTD\_btopt, 8=ZSTD\_btultra.
 300
 301 - `windowLog`=_wlog_, `wlog`=_wlog_:
 302     Specify the maximum number of bits for a match distance.
 303
 304     The higher number of increases the chance to find a match which usually
 305     improves compression ratio.
 306     It also increases memory requirements for the compressor and decompressor.
 307     The minimum _wlog_ is 10 (1 KiB) and the maximum is 30 (1 GiB) on 32-bit
 308     platforms and 31 (2 GiB) on 64-bit platforms.
 309
 310     Note: If `windowLog` is set to larger than 27, `--long=windowLog` or
 311     `--memory=windowSize` needs to be passed to the decompressor.
 312
 313 - `hashLog`=_hlog_, `hlog`=_hlog_:
 314     Specify the maximum number of bits for a hash table.
 315
 316     Bigger hash tables cause less collisions which usually makes compression
 317     faster, but requires more memory during compression.
 318
 319     The minimum _hlog_ is 6 (64 B) and the maximum is 26 (128 MiB).
 320
 321 - `chainLog`=_clog_, `clog`=_clog_:
 322     Specify the maximum number of bits for a hash chain or a binary tree.
 323
 324     Higher numbers of bits increases the chance to find a match which usually
 325     improves compression ratio.
 326     It also slows down compression speed and increases memory requirements for
 327     compression.
 328     This option is ignored for the ZSTD_fast strategy.
 329
 330     The minimum _clog_ is 6 (64 B) and the maximum is 28 (256 MiB).
 331
 332 - `searchLog`=_slog_, `slog`=_slog_:
 333     Specify the maximum number of searches in a hash chain or a binary tree
 334     using logarithmic scale.
 335
 336     More searches increases the chance to find a match which usually increases
 337     compression ratio but decreases compression speed.
 338
 339     The minimum _slog_ is 1 and the maximum is 26.
 340
 341 - `searchLength`=_slen_, `slen`=_slen_:
 342     Specify the minimum searched length of a match in a hash table.
 343
 344     Larger search lengths usually decrease compression ratio but improve
 345     decompression speed.
 346
 347     The minimum _slen_ is 3 and the maximum is 7.
 348
 349 - `targetLen`=_tlen_, `tlen`=_tlen_:
 350     The impact of this field vary depending on selected strategy.
 351
 352     For ZSTD\_btopt and ZSTD\_btultra, it specifies the minimum match length
 353     that causes match finder to stop searching for better matches.
 354     A larger `targetLen` usually improves compression ratio
 355     but decreases compression speed.
 356
 357     For ZSTD\_fast, it specifies
 358     the amount of data skipped between match sampling.
 359     Impact is reversed : a larger `targetLen` increases compression speed
 360     but decreases compression ratio.
 361
 362     For all other strategies, this field has no impact.
 363
 364     The minimum _tlen_ is 1 and the maximum is 999.
 365
 366 - `overlapLog`=_ovlog_,  `ovlog`=_ovlog_:
 367     Determine `overlapSize`, amount of data reloaded from previous job.
 368     This parameter is only available when multithreading is enabled.
 369     Reloading more data improves compression ratio, but decreases speed.
 370
 371     The minimum _ovlog_ is 0, and the maximum is 9.
 372     0 means "no overlap", hence completely independent jobs.
 373     9 means "full overlap", meaning up to `windowSize` is reloaded from previous job.
 374     Reducing _ovlog_ by 1 reduces the amount of reload by a factor 2.
 375     Default _ovlog_ is 6, which means "reload `windowSize / 8`".
 376     Exception : the maximum compression level (22) has a default _ovlog_ of 9.
 377
 378 - `ldmHashLog`=_ldmhlog_, `ldmhlog`=_ldmhlog_:
 379     Specify the maximum size for a hash table used for long distance matching.
 380
 381     This option is ignored unless long distance matching is enabled.
 382
 383     Bigger hash tables usually improve compression ratio at the expense of more
 384     memory during compression and a decrease in compression speed.
 385
 386     The minimum _ldmhlog_ is 6 and the maximum is 26 (default: 20).
 387
 388 - `ldmSearchLength`=_ldmslen_, `ldmslen`=_ldmslen_:
 389     Specify the minimum searched length of a match for long distance matching.
 390
 391     This option is ignored unless long distance matching is enabled.
 392
 393     Larger/very small values usually decrease compression ratio.
 394
 395     The minumum _ldmslen_ is 4 and the maximum is 4096 (default: 64).
 396
 397 - `ldmBucketSizeLog`=_ldmblog_, `ldmblog`=_ldmblog_:
 398     Specify the size of each bucket for the hash table used for long distance
 399     matching.
 400
 401     This option is ignored unless long distance matching is enabled.
 402
 403     Larger bucket sizes improve collision resolution but decrease compression
 404     speed.
 405
 406     The minimum _ldmblog_ is 0 and the maximum is 8 (default: 3).
 407
 408 - `ldmHashEveryLog`=_ldmhevery_, `ldmhevery`=_ldmhevery_:
 409     Specify the frequency of inserting entries into the long distance matching
 410     hash table.
 411
 412     This option is ignored unless long distance matching is enabled.
 413
 414     Larger values will improve compression speed. Deviating far from the
 415     default value will likely result in a decrease in compression ratio.
 416
 417     The default value is `wlog - ldmhlog`.
 418
 419 ### -B#:
 420 Select the size of each compression job.
 421 This parameter is available only when multi-threading is enabled.
 422 Default value is `4 * windowSize`, which means it varies depending on compression level.
 423 `-B#` makes it possible to select a custom value.
 424 Note that job size must respect a minimum value which is enforced transparently.
 425 This minimum is either 1 MB, or `overlapSize`, whichever is largest.
 426
 427 ### Example
 428 The following parameters sets advanced compression options to those of
 429 predefined level 19 for files bigger than 256 KB:
 430
 431 `--zstd`=windowLog=23,chainLog=23,hashLog=22,searchLog=6,searchLength=3,targetLength=48,strategy=6
 432
 433 BUGS
 434 ----
 435 Report bugs at: https://github.com/facebook/zstd/issues
 436
 437 AUTHOR
 438 ------
 439 Yann Collet