2 .\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
3 .\" Copyright (c) 2019, 2020 by Delphix. All rights reserved.
4 .\" Copyright (c) 2019 Datto Inc.
5 .\" The contents of this file are subject to the terms of the Common Development
6 .\" and Distribution License (the "License"). You may not use this file except
7 .\" in compliance with the License. You can obtain a copy of the license at
8 .\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
10 .\" See the License for the specific language governing permissions and
11 .\" limitations under the License. When distributing Covered Code, include this
12 .\" CDDL HEADER in each file and include the License file at
13 .\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
14 .\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
15 .\" own identifying information:
16 .\" Portions Copyright [yyyy] [name of copyright owner]
17 .TH ZFS-MODULE-PARAMETERS 5 "Aug 24, 2020" OpenZFS
19 zfs\-module\-parameters \- ZFS module parameters
23 Description of the different parameters to the ZFS module.
25 .SS "Module parameters"
32 \fBdbuf_cache_max_bytes\fR (ulong)
35 Maximum size in bytes of the dbuf cache. The target size is determined by the
36 MIN versus \fB1/2^dbuf_cache_shift\fR (1/32) of the target ARC size. The
37 behavior of the dbuf cache and its associated settings can be observed via the
38 \fB/proc/spl/kstat/zfs/dbufstats\fR kstat.
40 Default value: \fBULONG_MAX\fR.
46 \fBdbuf_metadata_cache_max_bytes\fR (ulong)
49 Maximum size in bytes of the metadata dbuf cache. The target size is
50 determined by the MIN versus \fB1/2^dbuf_metadata_cache_shift\fR (1/64) of the
51 target ARC size. The behavior of the metadata dbuf cache and its associated
52 settings can be observed via the \fB/proc/spl/kstat/zfs/dbufstats\fR kstat.
54 Default value: \fBULONG_MAX\fR.
60 \fBdbuf_cache_hiwater_pct\fR (uint)
63 The percentage over \fBdbuf_cache_max_bytes\fR when dbufs must be evicted
66 Default value: \fB10\fR%.
72 \fBdbuf_cache_lowater_pct\fR (uint)
75 The percentage below \fBdbuf_cache_max_bytes\fR when the evict thread stops
78 Default value: \fB10\fR%.
84 \fBdbuf_cache_shift\fR (int)
87 Set the size of the dbuf cache, \fBdbuf_cache_max_bytes\fR, to a log2 fraction
88 of the target ARC size.
90 Default value: \fB5\fR.
96 \fBdbuf_metadata_cache_shift\fR (int)
99 Set the size of the dbuf metadata cache, \fBdbuf_metadata_cache_max_bytes\fR,
100 to a log2 fraction of the target ARC size.
102 Default value: \fB6\fR.
108 \fBdmu_object_alloc_chunk_shift\fR (int)
111 dnode slots allocated in a single operation as a power of 2. The default value
112 minimizes lock contention for the bulk operation performed.
114 Default value: \fB7\fR (128).
120 \fBdmu_prefetch_max\fR (int)
123 Limit the amount we can prefetch with one call to this amount (in bytes).
124 This helps to limit the amount of memory that can be used by prefetching.
126 Default value: \fB134,217,728\fR (128MB).
132 \fBignore_hole_birth\fR (int)
135 This is an alias for \fBsend_holes_without_birth_time\fR.
141 \fBl2arc_feed_again\fR (int)
144 Turbo L2ARC warm-up. When the L2ARC is cold the fill interval will be set as
147 Use \fB1\fR for yes (default) and \fB0\fR to disable.
153 \fBl2arc_feed_min_ms\fR (ulong)
156 Min feed interval in milliseconds. Requires \fBl2arc_feed_again=1\fR and only
157 applicable in related situations.
159 Default value: \fB200\fR.
165 \fBl2arc_feed_secs\fR (ulong)
168 Seconds between L2ARC writing
170 Default value: \fB1\fR.
176 \fBl2arc_headroom\fR (ulong)
179 How far through the ARC lists to search for L2ARC cacheable content, expressed
180 as a multiplier of \fBl2arc_write_max\fR.
181 ARC persistence across reboots can be achieved with persistent L2ARC by setting
182 this parameter to \fB0\fR allowing the full length of ARC lists to be searched
183 for cacheable content.
185 Default value: \fB2\fR.
191 \fBl2arc_headroom_boost\fR (ulong)
194 Scales \fBl2arc_headroom\fR by this percentage when L2ARC contents are being
195 successfully compressed before writing. A value of \fB100\fR disables this
198 Default value: \fB200\fR%.
204 \fBl2arc_mfuonly\fR (int)
207 Controls whether only MFU metadata and data are cached from ARC into L2ARC.
208 This may be desired to avoid wasting space on L2ARC when reading/writing large
209 amounts of data that are not expected to be accessed more than once. The
210 default is \fB0\fR, meaning both MRU and MFU data and metadata are cached.
211 When turning off (\fB0\fR) this feature some MRU buffers will still be present
212 in ARC and eventually cached on L2ARC.
214 Use \fB0\fR for no (default) and \fB1\fR for yes.
220 \fBl2arc_meta_percent\fR (int)
223 Percent of ARC size allowed for L2ARC-only headers.
224 Since L2ARC buffers are not evicted on memory pressure, too large amount of
225 headers on system with irrationaly large L2ARC can render it slow or unusable.
226 This parameter limits L2ARC writes and rebuild to achieve it.
228 Default value: \fB33\fR%.
234 \fBl2arc_trim_ahead\fR (ulong)
237 Trims ahead of the current write size (\fBl2arc_write_max\fR) on L2ARC devices
238 by this percentage of write size if we have filled the device. If set to
239 \fB100\fR we TRIM twice the space required to accommodate upcoming writes. A
240 minimum of 64MB will be trimmed. It also enables TRIM of the whole L2ARC device
241 upon creation or addition to an existing pool or if the header of the device is
242 invalid upon importing a pool or onlining a cache device. A value of \fB0\fR
243 disables TRIM on L2ARC altogether and is the default as it can put significant
244 stress on the underlying storage devices. This will vary depending of how well
245 the specific device handles these commands.
247 Default value: \fB0\fR%.
253 \fBl2arc_noprefetch\fR (int)
256 Do not write buffers to L2ARC if they were prefetched but not used by
259 Use \fB1\fR for yes (default) and \fB0\fR to disable.
265 \fBl2arc_norw\fR (int)
268 No reads during writes.
270 Use \fB1\fR for yes and \fB0\fR for no (default).
276 \fBl2arc_write_boost\fR (ulong)
279 Cold L2ARC devices will have \fBl2arc_write_max\fR increased by this amount
280 while they remain cold.
282 Default value: \fB8,388,608\fR.
288 \fBl2arc_write_max\fR (ulong)
291 Max write bytes per interval.
293 Default value: \fB8,388,608\fR.
299 \fBl2arc_rebuild_enabled\fR (int)
302 Rebuild the L2ARC when importing a pool (persistent L2ARC). This can be
303 disabled if there are problems importing a pool or attaching an L2ARC device
304 (e.g. the L2ARC device is slow in reading stored log metadata, or the metadata
305 has become somehow fragmented/unusable).
307 Use \fB1\fR for yes (default) and \fB0\fR for no.
313 \fBl2arc_rebuild_blocks_min_l2size\fR (ulong)
316 Min size (in bytes) of an L2ARC device required in order to write log blocks
317 in it. The log blocks are used upon importing the pool to rebuild
318 the L2ARC (persistent L2ARC). Rationale: for L2ARC devices less than 1GB, the
319 amount of data l2arc_evict() evicts is significant compared to the amount of
320 restored L2ARC data. In this case do not write log blocks in L2ARC in order not
323 Default value: \fB1,073,741,824\fR (1GB).
329 \fBmetaslab_aliquot\fR (ulong)
332 Metaslab granularity, in bytes. This is roughly similar to what would be
333 referred to as the "stripe size" in traditional RAID arrays. In normal
334 operation, ZFS will try to write this amount of data to a top-level vdev
335 before moving on to the next one.
337 Default value: \fB524,288\fR.
343 \fBmetaslab_bias_enabled\fR (int)
346 Enable metaslab group biasing based on its vdev's over- or under-utilization
347 relative to the pool.
349 Use \fB1\fR for yes (default) and \fB0\fR for no.
355 \fBmetaslab_force_ganging\fR (ulong)
358 Make some blocks above a certain size be gang blocks. This option is used
359 by the test suite to facilitate testing.
361 Default value: \fB16,777,217\fR.
367 \fBzfs_keep_log_spacemaps_at_export\fR (int)
370 Prevent log spacemaps from being destroyed during pool exports and destroys.
372 Use \fB1\fR for yes and \fB0\fR for no (default).
378 \fBzfs_metaslab_segment_weight_enabled\fR (int)
381 Enable/disable segment-based metaslab selection.
383 Use \fB1\fR for yes (default) and \fB0\fR for no.
389 \fBzfs_metaslab_switch_threshold\fR (int)
392 When using segment-based metaslab selection, continue allocating
393 from the active metaslab until \fBzfs_metaslab_switch_threshold\fR
394 worth of buckets have been exhausted.
396 Default value: \fB2\fR.
402 \fBmetaslab_debug_load\fR (int)
405 Load all metaslabs during pool import.
407 Use \fB1\fR for yes and \fB0\fR for no (default).
413 \fBmetaslab_debug_unload\fR (int)
416 Prevent metaslabs from being unloaded.
418 Use \fB1\fR for yes and \fB0\fR for no (default).
424 \fBmetaslab_fragmentation_factor_enabled\fR (int)
427 Enable use of the fragmentation metric in computing metaslab weights.
429 Use \fB1\fR for yes (default) and \fB0\fR for no.
435 \fBmetaslab_df_max_search\fR (int)
438 Maximum distance to search forward from the last offset. Without this limit,
439 fragmented pools can see >100,000 iterations and metaslab_block_picker()
440 becomes the performance limiting factor on high-performance storage.
442 With the default setting of 16MB, we typically see less than 500 iterations,
443 even with very fragmented, ashift=9 pools. The maximum number of iterations
444 possible is: \fBmetaslab_df_max_search / (2 * (1<<ashift))\fR.
445 With the default setting of 16MB this is 16*1024 (with ashift=9) or 2048
448 Default value: \fB16,777,216\fR (16MB)
454 \fBmetaslab_df_use_largest_segment\fR (int)
457 If we are not searching forward (due to metaslab_df_max_search,
458 metaslab_df_free_pct, or metaslab_df_alloc_threshold), this tunable controls
459 what segment is used. If it is set, we will use the largest free segment.
460 If it is not set, we will use a segment of exactly the requested size (or
463 Use \fB1\fR for yes and \fB0\fR for no (default).
469 \fBzfs_metaslab_max_size_cache_sec\fR (ulong)
472 When we unload a metaslab, we cache the size of the largest free chunk. We use
473 that cached size to determine whether or not to load a metaslab for a given
474 allocation. As more frees accumulate in that metaslab while it's unloaded, the
475 cached max size becomes less and less accurate. After a number of seconds
476 controlled by this tunable, we stop considering the cached max size and start
477 considering only the histogram instead.
479 Default value: \fB3600 seconds\fR (one hour)
485 \fBzfs_metaslab_mem_limit\fR (int)
488 When we are loading a new metaslab, we check the amount of memory being used
489 to store metaslab range trees. If it is over a threshold, we attempt to unload
490 the least recently used metaslab to prevent the system from clogging all of
491 its memory with range trees. This tunable sets the percentage of total system
492 memory that is the threshold.
494 Default value: \fB25 percent\fR
500 \fBzfs_vdev_default_ms_count\fR (int)
503 When a vdev is added target this number of metaslabs per top-level vdev.
505 Default value: \fB200\fR.
511 \fBzfs_vdev_default_ms_shift\fR (int)
514 Default limit for metaslab size.
516 Default value: \fB29\fR [meaning (1 << 29) = 512MB].
522 \fBzfs_vdev_max_auto_ashift\fR (ulong)
525 Maximum ashift used when optimizing for logical -> physical sector size on new
528 Default value: \fBASHIFT_MAX\fR (16).
534 \fBzfs_vdev_min_auto_ashift\fR (ulong)
537 Minimum ashift used when creating new top-level vdevs.
539 Default value: \fBASHIFT_MIN\fR (9).
545 \fBzfs_vdev_min_ms_count\fR (int)
548 Minimum number of metaslabs to create in a top-level vdev.
550 Default value: \fB16\fR.
556 \fBvdev_validate_skip\fR (int)
559 Skip label validation steps during pool import. Changing is not recommended
560 unless you know what you are doing and are recovering a damaged label.
562 Default value: \fB0\fR.
568 \fBzfs_vdev_ms_count_limit\fR (int)
571 Practical upper limit of total metaslabs per top-level vdev.
573 Default value: \fB131,072\fR.
579 \fBmetaslab_preload_enabled\fR (int)
582 Enable metaslab group preloading.
584 Use \fB1\fR for yes (default) and \fB0\fR for no.
590 \fBmetaslab_lba_weighting_enabled\fR (int)
593 Give more weight to metaslabs with lower LBAs, assuming they have
594 greater bandwidth as is typically the case on a modern constant
595 angular velocity disk drive.
597 Use \fB1\fR for yes (default) and \fB0\fR for no.
603 \fBmetaslab_unload_delay\fR (int)
606 After a metaslab is used, we keep it loaded for this many txgs, to attempt to
607 reduce unnecessary reloading. Note that both this many txgs and
608 \fBmetaslab_unload_delay_ms\fR milliseconds must pass before unloading will
611 Default value: \fB32\fR.
617 \fBmetaslab_unload_delay_ms\fR (int)
620 After a metaslab is used, we keep it loaded for this many milliseconds, to
621 attempt to reduce unnecessary reloading. Note that both this many
622 milliseconds and \fBmetaslab_unload_delay\fR txgs must pass before unloading
625 Default value: \fB600000\fR (ten minutes).
631 \fBsend_holes_without_birth_time\fR (int)
634 When set, the hole_birth optimization will not be used, and all holes will
635 always be sent on zfs send. This is useful if you suspect your datasets are
636 affected by a bug in hole_birth.
638 Use \fB1\fR for on (default) and \fB0\fR for off.
644 \fBspa_config_path\fR (charp)
649 Default value: \fB/etc/zfs/zpool.cache\fR.
655 \fBspa_asize_inflation\fR (int)
658 Multiplication factor used to estimate actual disk consumption from the
659 size of data being written. The default value is a worst case estimate,
660 but lower values may be valid for a given pool depending on its
661 configuration. Pool administrators who understand the factors involved
662 may wish to specify a more realistic inflation factor, particularly if
663 they operate close to quota or capacity limits.
665 Default value: \fB24\fR.
671 \fBspa_load_print_vdev_tree\fR (int)
674 Whether to print the vdev tree in the debugging message buffer during pool import.
675 Use 0 to disable and 1 to enable.
677 Default value: \fB0\fR.
683 \fBspa_load_verify_data\fR (int)
686 Whether to traverse data blocks during an "extreme rewind" (\fB-X\fR)
687 import. Use 0 to disable and 1 to enable.
689 An extreme rewind import normally performs a full traversal of all
690 blocks in the pool for verification. If this parameter is set to 0,
691 the traversal skips non-metadata blocks. It can be toggled once the
692 import has started to stop or start the traversal of non-metadata blocks.
694 Default value: \fB1\fR.
700 \fBspa_load_verify_metadata\fR (int)
703 Whether to traverse blocks during an "extreme rewind" (\fB-X\fR)
704 pool import. Use 0 to disable and 1 to enable.
706 An extreme rewind import normally performs a full traversal of all
707 blocks in the pool for verification. If this parameter is set to 0,
708 the traversal is not performed. It can be toggled once the import has
709 started to stop or start the traversal.
711 Default value: \fB1\fR.
717 \fBspa_load_verify_shift\fR (int)
720 Sets the maximum number of bytes to consume during pool import to the log2
721 fraction of the target ARC size.
723 Default value: \fB4\fR.
729 \fBspa_slop_shift\fR (int)
732 Normally, we don't allow the last 3.2% (1/(2^spa_slop_shift)) of space
733 in the pool to be consumed. This ensures that we don't run the pool
734 completely out of space, due to unaccounted changes (e.g. to the MOS).
735 It also limits the worst-case time to allocate space. If we have
736 less than this amount of free space, most ZPL operations (e.g. write,
737 create) will return ENOSPC.
739 Default value: \fB5\fR.
745 \fBvdev_removal_max_span\fR (int)
748 During top-level vdev removal, chunks of data are copied from the vdev
749 which may include free space in order to trade bandwidth for IOPS.
750 This parameter determines the maximum span of free space (in bytes)
751 which will be included as "unnecessary" data in a chunk of copied data.
753 The default value here was chosen to align with
754 \fBzfs_vdev_read_gap_limit\fR, which is a similar concept when doing
755 regular reads (but there's no reason it has to be the same).
757 Default value: \fB32,768\fR.
763 \fBvdev_file_logical_ashift\fR (ulong)
766 Logical ashift for file-based devices.
768 Default value: \fB9\fR.
774 \fBvdev_file_physical_ashift\fR (ulong)
777 Physical ashift for file-based devices.
779 Default value: \fB9\fR.
785 \fBzap_iterate_prefetch\fR (int)
788 If this is set, when we start iterating over a ZAP object, zfs will prefetch
789 the entire object (all leaf blocks). However, this is limited by
790 \fBdmu_prefetch_max\fR.
792 Use \fB1\fR for on (default) and \fB0\fR for off.
798 \fBzfetch_array_rd_sz\fR (ulong)
801 If prefetching is enabled, disable prefetching for reads larger than this size.
803 Default value: \fB1,048,576\fR.
809 \fBzfetch_max_distance\fR (uint)
812 Max bytes to prefetch per stream (default 8MB).
814 Default value: \fB8,388,608\fR.
820 \fBzfetch_max_streams\fR (uint)
823 Max number of streams per zfetch (prefetch streams per file).
825 Default value: \fB8\fR.
831 \fBzfetch_min_sec_reap\fR (uint)
834 Min time before an active prefetch stream can be reclaimed
836 Default value: \fB2\fR.
842 \fBzfs_abd_scatter_enabled\fR (int)
845 Enables ARC from using scatter/gather lists and forces all allocations to be
846 linear in kernel memory. Disabling can improve performance in some code paths
847 at the expense of fragmented kernel memory.
849 Default value: \fB1\fR.
855 \fBzfs_abd_scatter_max_order\fR (iunt)
858 Maximum number of consecutive memory pages allocated in a single block for
859 scatter/gather lists. Default value is specified by the kernel itself.
861 Default value: \fB10\fR at the time of this writing.
867 \fBzfs_abd_scatter_min_size\fR (uint)
870 This is the minimum allocation size that will use scatter (page-based)
871 ABD's. Smaller allocations will use linear ABD's.
873 Default value: \fB1536\fR (512B and 1KB allocations will be linear).
879 \fBzfs_arc_dnode_limit\fR (ulong)
882 When the number of bytes consumed by dnodes in the ARC exceeds this number of
883 bytes, try to unpin some of it in response to demand for non-metadata. This
884 value acts as a ceiling to the amount of dnode metadata, and defaults to 0 which
885 indicates that a percent which is based on \fBzfs_arc_dnode_limit_percent\fR of
886 the ARC meta buffers that may be used for dnodes.
888 See also \fBzfs_arc_meta_prune\fR which serves a similar purpose but is used
889 when the amount of metadata in the ARC exceeds \fBzfs_arc_meta_limit\fR rather
890 than in response to overall demand for non-metadata.
893 Default value: \fB0\fR.
899 \fBzfs_arc_dnode_limit_percent\fR (ulong)
902 Percentage that can be consumed by dnodes of ARC meta buffers.
904 See also \fBzfs_arc_dnode_limit\fR which serves a similar purpose but has a
905 higher priority if set to nonzero value.
907 Default value: \fB10\fR%.
913 \fBzfs_arc_dnode_reduce_percent\fR (ulong)
916 Percentage of ARC dnodes to try to scan in response to demand for non-metadata
917 when the number of bytes consumed by dnodes exceeds \fBzfs_arc_dnode_limit\fR.
920 Default value: \fB10\fR% of the number of dnodes in the ARC.
926 \fBzfs_arc_average_blocksize\fR (int)
929 The ARC's buffer hash table is sized based on the assumption of an average
930 block size of \fBzfs_arc_average_blocksize\fR (default 8K). This works out
931 to roughly 1MB of hash table per 1GB of physical memory with 8-byte pointers.
932 For configurations with a known larger average block size this value can be
933 increased to reduce the memory footprint.
936 Default value: \fB8192\fR.
942 \fBzfs_arc_eviction_pct\fR (int)
945 When \fBarc_is_overflowing()\fR, \fBarc_get_data_impl()\fR waits for this
946 percent of the requested amount of data to be evicted. For example, by
947 default for every 2KB that's evicted, 1KB of it may be "reused" by a new
948 allocation. Since this is above 100%, it ensures that progress is made
949 towards getting \fBarc_size\fR under \fBarc_c\fR. Since this is finite, it
950 ensures that allocations can still happen, even during the potentially long
951 time that \fBarc_size\fR is more than \fBarc_c\fR.
953 Default value: \fB200\fR.
959 \fBzfs_arc_evict_batch_limit\fR (int)
962 Number ARC headers to evict per sub-list before proceeding to another sub-list.
963 This batch-style operation prevents entire sub-lists from being evicted at once
964 but comes at a cost of additional unlocking and locking.
966 Default value: \fB10\fR.
972 \fBzfs_arc_grow_retry\fR (int)
975 If set to a non zero value, it will replace the arc_grow_retry value with this value.
976 The arc_grow_retry value (default 5) is the number of seconds the ARC will wait before
977 trying to resume growth after a memory pressure event.
979 Default value: \fB0\fR.
985 \fBzfs_arc_lotsfree_percent\fR (int)
988 Throttle I/O when free system memory drops below this percentage of total
989 system memory. Setting this value to 0 will disable the throttle.
991 Default value: \fB10\fR%.
997 \fBzfs_arc_max\fR (ulong)
1000 Max size of ARC in bytes. If set to 0 then the max size of ARC is determined
1001 by the amount of system memory installed. For Linux, 1/2 of system memory will
1002 be used as the limit. For FreeBSD, the larger of all system memory - 1GB or
1003 5/8 of system memory will be used as the limit. This value must be at least
1004 67108864 (64 megabytes).
1006 This value can be changed dynamically with some caveats. It cannot be set back
1007 to 0 while running and reducing it below the current ARC size will not cause
1008 the ARC to shrink without memory pressure to induce shrinking.
1010 Default value: \fB0\fR.
1016 \fBzfs_arc_meta_adjust_restarts\fR (ulong)
1019 The number of restart passes to make while scanning the ARC attempting
1020 the free buffers in order to stay below the \fBzfs_arc_meta_limit\fR.
1021 This value should not need to be tuned but is available to facilitate
1022 performance analysis.
1024 Default value: \fB4096\fR.
1030 \fBzfs_arc_meta_limit\fR (ulong)
1033 The maximum allowed size in bytes that meta data buffers are allowed to
1034 consume in the ARC. When this limit is reached meta data buffers will
1035 be reclaimed even if the overall arc_c_max has not been reached. This
1036 value defaults to 0 which indicates that a percent which is based on
1037 \fBzfs_arc_meta_limit_percent\fR of the ARC may be used for meta data.
1039 This value my be changed dynamically except that it cannot be set back to 0
1040 for a specific percent of the ARC; it must be set to an explicit value.
1042 Default value: \fB0\fR.
1048 \fBzfs_arc_meta_limit_percent\fR (ulong)
1051 Percentage of ARC buffers that can be used for meta data.
1053 See also \fBzfs_arc_meta_limit\fR which serves a similar purpose but has a
1054 higher priority if set to nonzero value.
1057 Default value: \fB75\fR%.
1063 \fBzfs_arc_meta_min\fR (ulong)
1066 The minimum allowed size in bytes that meta data buffers may consume in
1067 the ARC. This value defaults to 0 which disables a floor on the amount
1068 of the ARC devoted meta data.
1070 Default value: \fB0\fR.
1076 \fBzfs_arc_meta_prune\fR (int)
1079 The number of dentries and inodes to be scanned looking for entries
1080 which can be dropped. This may be required when the ARC reaches the
1081 \fBzfs_arc_meta_limit\fR because dentries and inodes can pin buffers
1082 in the ARC. Increasing this value will cause to dentry and inode caches
1083 to be pruned more aggressively. Setting this value to 0 will disable
1084 pruning the inode and dentry caches.
1086 Default value: \fB10,000\fR.
1092 \fBzfs_arc_meta_strategy\fR (int)
1095 Define the strategy for ARC meta data buffer eviction (meta reclaim strategy).
1096 A value of 0 (META_ONLY) will evict only the ARC meta data buffers.
1097 A value of 1 (BALANCED) indicates that additional data buffers may be evicted if
1098 that is required to in order to evict the required number of meta data buffers.
1100 Default value: \fB1\fR.
1106 \fBzfs_arc_min\fR (ulong)
1109 Min size of ARC in bytes. If set to 0 then arc_c_min will default to
1110 consuming the larger of 32M or 1/32 of total system memory.
1112 Default value: \fB0\fR.
1118 \fBzfs_arc_min_prefetch_ms\fR (int)
1121 Minimum time prefetched blocks are locked in the ARC, specified in ms.
1122 A value of \fB0\fR will default to 1000 ms.
1124 Default value: \fB0\fR.
1130 \fBzfs_arc_min_prescient_prefetch_ms\fR (int)
1133 Minimum time "prescient prefetched" blocks are locked in the ARC, specified
1134 in ms. These blocks are meant to be prefetched fairly aggressively ahead of
1135 the code that may use them. A value of \fB0\fR will default to 6000 ms.
1137 Default value: \fB0\fR.
1143 \fBzfs_max_missing_tvds\fR (int)
1146 Number of missing top-level vdevs which will be allowed during
1147 pool import (only in read-only mode).
1149 Default value: \fB0\fR
1155 \fBzfs_max_nvlist_src_size\fR (ulong)
1158 Maximum size in bytes allowed to be passed as zc_nvlist_src_size for ioctls on
1159 /dev/zfs. This prevents a user from causing the kernel to allocate an excessive
1160 amount of memory. When the limit is exceeded, the ioctl fails with EINVAL and a
1161 description of the error is sent to the zfs-dbgmsg log. This parameter should
1162 not need to be touched under normal circumstances. On FreeBSD, the default is
1163 based on the system limit on user wired memory. On Linux, the default is
1164 \fBKMALLOC_MAX_SIZE\fR .
1166 Default value: \fB0\fR (kernel decides)
1172 \fBzfs_multilist_num_sublists\fR (int)
1175 To allow more fine-grained locking, each ARC state contains a series
1176 of lists for both data and meta data objects. Locking is performed at
1177 the level of these "sub-lists". This parameters controls the number of
1178 sub-lists per ARC state, and also applies to other uses of the
1179 multilist data structure.
1181 Default value: \fB4\fR or the number of online CPUs, whichever is greater
1187 \fBzfs_arc_overflow_shift\fR (int)
1190 The ARC size is considered to be overflowing if it exceeds the current
1191 ARC target size (arc_c) by a threshold determined by this parameter.
1192 The threshold is calculated as a fraction of arc_c using the formula
1193 "arc_c >> \fBzfs_arc_overflow_shift\fR".
1195 The default value of 8 causes the ARC to be considered to be overflowing
1196 if it exceeds the target size by 1/256th (0.3%) of the target size.
1198 When the ARC is overflowing, new buffer allocations are stalled until
1199 the reclaim thread catches up and the overflow condition no longer exists.
1201 Default value: \fB8\fR.
1208 \fBzfs_arc_p_min_shift\fR (int)
1211 If set to a non zero value, this will update arc_p_min_shift (default 4)
1213 arc_p_min_shift is used to shift of arc_c for calculating both min and max
1216 Default value: \fB0\fR.
1222 \fBzfs_arc_p_dampener_disable\fR (int)
1225 Disable arc_p adapt dampener
1227 Use \fB1\fR for yes (default) and \fB0\fR to disable.
1233 \fBzfs_arc_shrink_shift\fR (int)
1236 If set to a non zero value, this will update arc_shrink_shift (default 7)
1239 Default value: \fB0\fR.
1245 \fBzfs_arc_pc_percent\fR (uint)
1248 Percent of pagecache to reclaim arc to
1250 This tunable allows ZFS arc to play more nicely with the kernel's LRU
1251 pagecache. It can guarantee that the ARC size won't collapse under scanning
1252 pressure on the pagecache, yet still allows arc to be reclaimed down to
1253 zfs_arc_min if necessary. This value is specified as percent of pagecache
1254 size (as measured by NR_FILE_PAGES) where that percent may exceed 100. This
1255 only operates during memory pressure/reclaim.
1257 Default value: \fB0\fR% (disabled).
1263 \fBzfs_arc_shrinker_limit\fR (int)
1266 This is a limit on how many pages the ARC shrinker makes available for
1267 eviction in response to one page allocation attempt. Note that in
1268 practice, the kernel's shrinker can ask us to evict up to about 4x this
1269 for one allocation attempt.
1271 The default limit of 10,000 (in practice, 160MB per allocation attempt with
1272 4K pages) limits the amount of time spent attempting to reclaim ARC memory to
1273 less than 100ms per allocation attempt, even with a small average compressed
1276 The parameter can be set to 0 (zero) to disable the limit.
1278 This parameter only applies on Linux.
1280 Default value: \fB10,000\fR.
1286 \fBzfs_arc_sys_free\fR (ulong)
1289 The target number of bytes the ARC should leave as free memory on the system.
1290 Defaults to the larger of 1/64 of physical memory or 512K. Setting this
1291 option to a non-zero value will override the default.
1293 Default value: \fB0\fR.
1299 \fBzfs_autoimport_disable\fR (int)
1302 Disable pool import at module load by ignoring the cache file (typically \fB/etc/zfs/zpool.cache\fR).
1304 Use \fB1\fR for yes (default) and \fB0\fR for no.
1310 \fBzfs_checksum_events_per_second\fR (uint)
1313 Rate limit checksum events to this many per second. Note that this should
1314 not be set below the zed thresholds (currently 10 checksums over 10 sec)
1315 or else zed may not trigger any action.
1323 \fBzfs_commit_timeout_pct\fR (int)
1326 This controls the amount of time that a ZIL block (lwb) will remain "open"
1327 when it isn't "full", and it has a thread waiting for it to be committed to
1328 stable storage. The timeout is scaled based on a percentage of the last lwb
1329 latency to avoid significantly impacting the latency of each individual
1330 transaction record (itx).
1332 Default value: \fB5\fR%.
1338 \fBzfs_condense_indirect_commit_entry_delay_ms\fR (int)
1341 Vdev indirection layer (used for device removal) sleeps for this many
1342 milliseconds during mapping generation. Intended for use with the test suite
1343 to throttle vdev removal speed.
1345 Default value: \fB0\fR (no throttle).
1351 \fBzfs_condense_indirect_vdevs_enable\fR (int)
1354 Enable condensing indirect vdev mappings. When set to a non-zero value,
1355 attempt to condense indirect vdev mappings if the mapping uses more than
1356 \fBzfs_condense_min_mapping_bytes\fR bytes of memory and if the obsolete
1357 space map object uses more than \fBzfs_condense_max_obsolete_bytes\fR
1358 bytes on-disk. The condensing process is an attempt to save memory by
1359 removing obsolete mappings.
1361 Default value: \fB1\fR.
1367 \fBzfs_condense_max_obsolete_bytes\fR (ulong)
1370 Only attempt to condense indirect vdev mappings if the on-disk size
1371 of the obsolete space map object is greater than this number of bytes
1372 (see \fBfBzfs_condense_indirect_vdevs_enable\fR).
1374 Default value: \fB1,073,741,824\fR.
1380 \fBzfs_condense_min_mapping_bytes\fR (ulong)
1383 Minimum size vdev mapping to attempt to condense (see
1384 \fBzfs_condense_indirect_vdevs_enable\fR).
1386 Default value: \fB131,072\fR.
1392 \fBzfs_dbgmsg_enable\fR (int)
1395 Internally ZFS keeps a small log to facilitate debugging. By default the log
1396 is disabled, to enable it set this option to 1. The contents of the log can
1397 be accessed by reading the /proc/spl/kstat/zfs/dbgmsg file. Writing 0 to
1398 this proc file clears the log.
1400 Default value: \fB0\fR.
1406 \fBzfs_dbgmsg_maxsize\fR (int)
1409 The maximum size in bytes of the internal ZFS debug log.
1411 Default value: \fB4M\fR.
1417 \fBzfs_dbuf_state_index\fR (int)
1420 This feature is currently unused. It is normally used for controlling what
1421 reporting is available under /proc/spl/kstat/zfs.
1423 Default value: \fB0\fR.
1429 \fBzfs_deadman_enabled\fR (int)
1432 When a pool sync operation takes longer than \fBzfs_deadman_synctime_ms\fR
1433 milliseconds, or when an individual I/O takes longer than
1434 \fBzfs_deadman_ziotime_ms\fR milliseconds, then the operation is considered to
1435 be "hung". If \fBzfs_deadman_enabled\fR is set then the deadman behavior is
1436 invoked as described by the \fBzfs_deadman_failmode\fR module option.
1437 By default the deadman is enabled and configured to \fBwait\fR which results
1438 in "hung" I/Os only being logged. The deadman is automatically disabled
1439 when a pool gets suspended.
1441 Default value: \fB1\fR.
1447 \fBzfs_deadman_failmode\fR (charp)
1450 Controls the failure behavior when the deadman detects a "hung" I/O. Valid
1451 values are \fBwait\fR, \fBcontinue\fR, and \fBpanic\fR.
1453 \fBwait\fR - Wait for a "hung" I/O to complete. For each "hung" I/O a
1454 "deadman" event will be posted describing that I/O.
1456 \fBcontinue\fR - Attempt to recover from a "hung" I/O by re-dispatching it
1457 to the I/O pipeline if possible.
1459 \fBpanic\fR - Panic the system. This can be used to facilitate an automatic
1460 fail-over to a properly configured fail-over partner.
1462 Default value: \fBwait\fR.
1468 \fBzfs_deadman_checktime_ms\fR (int)
1471 Check time in milliseconds. This defines the frequency at which we check
1472 for hung I/O and potentially invoke the \fBzfs_deadman_failmode\fR behavior.
1474 Default value: \fB60,000\fR.
1480 \fBzfs_deadman_synctime_ms\fR (ulong)
1483 Interval in milliseconds after which the deadman is triggered and also
1484 the interval after which a pool sync operation is considered to be "hung".
1485 Once this limit is exceeded the deadman will be invoked every
1486 \fBzfs_deadman_checktime_ms\fR milliseconds until the pool sync completes.
1488 Default value: \fB600,000\fR.
1494 \fBzfs_deadman_ziotime_ms\fR (ulong)
1497 Interval in milliseconds after which the deadman is triggered and an
1498 individual I/O operation is considered to be "hung". As long as the I/O
1499 remains "hung" the deadman will be invoked every \fBzfs_deadman_checktime_ms\fR
1500 milliseconds until the I/O completes.
1502 Default value: \fB300,000\fR.
1508 \fBzfs_dedup_prefetch\fR (int)
1511 Enable prefetching dedup-ed blks
1513 Use \fB1\fR for yes and \fB0\fR to disable (default).
1519 \fBzfs_delay_min_dirty_percent\fR (int)
1522 Start to delay each transaction once there is this amount of dirty data,
1523 expressed as a percentage of \fBzfs_dirty_data_max\fR.
1524 This value should be >= zfs_vdev_async_write_active_max_dirty_percent.
1525 See the section "ZFS TRANSACTION DELAY".
1527 Default value: \fB60\fR%.
1533 \fBzfs_delay_scale\fR (int)
1536 This controls how quickly the transaction delay approaches infinity.
1537 Larger values cause longer delays for a given amount of dirty data.
1539 For the smoothest delay, this value should be about 1 billion divided
1540 by the maximum number of operations per second. This will smoothly
1541 handle between 10x and 1/10th this number.
1543 See the section "ZFS TRANSACTION DELAY".
1545 Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64.
1547 Default value: \fB500,000\fR.
1553 \fBzfs_disable_ivset_guid_check\fR (int)
1556 Disables requirement for IVset guids to be present and match when doing a raw
1557 receive of encrypted datasets. Intended for users whose pools were created with
1558 ZFS on Linux pre-release versions and now have compatibility issues.
1560 Default value: \fB0\fR.
1566 \fBzfs_key_max_salt_uses\fR (ulong)
1569 Maximum number of uses of a single salt value before generating a new one for
1570 encrypted datasets. The default value is also the maximum that will be
1573 Default value: \fB400,000,000\fR.
1579 \fBzfs_object_mutex_size\fR (uint)
1582 Size of the znode hashtable used for holds.
1584 Due to the need to hold locks on objects that may not exist yet, kernel mutexes
1585 are not created per-object and instead a hashtable is used where collisions
1586 will result in objects waiting when there is not actually contention on the
1589 Default value: \fB64\fR.
1595 \fBzfs_slow_io_events_per_second\fR (int)
1598 Rate limit delay zevents (which report slow I/Os) to this many per second.
1606 \fBzfs_unflushed_max_mem_amt\fR (ulong)
1609 Upper-bound limit for unflushed metadata changes to be held by the
1610 log spacemap in memory (in bytes).
1612 Default value: \fB1,073,741,824\fR (1GB).
1618 \fBzfs_unflushed_max_mem_ppm\fR (ulong)
1621 Percentage of the overall system memory that ZFS allows to be used
1622 for unflushed metadata changes by the log spacemap.
1623 (value is calculated over 1000000 for finer granularity).
1625 Default value: \fB1000\fR (which is divided by 1000000, resulting in
1626 the limit to be \fB0.1\fR% of memory)
1632 \fBzfs_unflushed_log_block_max\fR (ulong)
1635 Describes the maximum number of log spacemap blocks allowed for each pool.
1636 The default value of 262144 means that the space in all the log spacemaps
1637 can add up to no more than 262144 blocks (which means 32GB of logical
1638 space before compression and ditto blocks, assuming that blocksize is
1641 This tunable is important because it involves a trade-off between import
1642 time after an unclean export and the frequency of flushing metaslabs.
1643 The higher this number is, the more log blocks we allow when the pool is
1644 active which means that we flush metaslabs less often and thus decrease
1645 the number of I/Os for spacemap updates per TXG.
1646 At the same time though, that means that in the event of an unclean export,
1647 there will be more log spacemap blocks for us to read, inducing overhead
1648 in the import time of the pool.
1649 The lower the number, the amount of flushing increases destroying log
1650 blocks quicker as they become obsolete faster, which leaves less blocks
1651 to be read during import time after a crash.
1653 Each log spacemap block existing during pool import leads to approximately
1654 one extra logical I/O issued.
1655 This is the reason why this tunable is exposed in terms of blocks rather
1658 Default value: \fB262144\fR (256K).
1664 \fBzfs_unflushed_log_block_min\fR (ulong)
1667 If the number of metaslabs is small and our incoming rate is high, we
1668 could get into a situation that we are flushing all our metaslabs every
1670 Thus we always allow at least this many log blocks.
1672 Default value: \fB1000\fR.
1678 \fBzfs_unflushed_log_block_pct\fR (ulong)
1681 Tunable used to determine the number of blocks that can be used for
1682 the spacemap log, expressed as a percentage of the total number of
1683 metaslabs in the pool.
1685 Default value: \fB400\fR (read as \fB400\fR% - meaning that the number
1686 of log spacemap blocks are capped at 4 times the number of
1687 metaslabs in the pool).
1693 \fBzfs_unlink_suspend_progress\fR (uint)
1696 When enabled, files will not be asynchronously removed from the list of pending
1697 unlinks and the space they consume will be leaked. Once this option has been
1698 disabled and the dataset is remounted, the pending unlinks will be processed
1699 and the freed space returned to the pool.
1700 This option is used by the test suite to facilitate testing.
1702 Uses \fB0\fR (default) to allow progress and \fB1\fR to pause progress.
1708 \fBzfs_delete_blocks\fR (ulong)
1711 This is the used to define a large file for the purposes of delete. Files
1712 containing more than \fBzfs_delete_blocks\fR will be deleted asynchronously
1713 while smaller files are deleted synchronously. Decreasing this value will
1714 reduce the time spent in an unlink(2) system call at the expense of a longer
1715 delay before the freed space is available.
1717 Default value: \fB20,480\fR.
1723 \fBzfs_dirty_data_max\fR (int)
1726 Determines the dirty space limit in bytes. Once this limit is exceeded, new
1727 writes are halted until space frees up. This parameter takes precedence
1728 over \fBzfs_dirty_data_max_percent\fR.
1729 See the section "ZFS TRANSACTION DELAY".
1731 Default value: \fB10\fR% of physical RAM, capped at \fBzfs_dirty_data_max_max\fR.
1737 \fBzfs_dirty_data_max_max\fR (int)
1740 Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed in bytes.
1741 This limit is only enforced at module load time, and will be ignored if
1742 \fBzfs_dirty_data_max\fR is later changed. This parameter takes
1743 precedence over \fBzfs_dirty_data_max_max_percent\fR. See the section
1744 "ZFS TRANSACTION DELAY".
1746 Default value: \fB25\fR% of physical RAM.
1752 \fBzfs_dirty_data_max_max_percent\fR (int)
1755 Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed as a
1756 percentage of physical RAM. This limit is only enforced at module load
1757 time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed.
1758 The parameter \fBzfs_dirty_data_max_max\fR takes precedence over this
1759 one. See the section "ZFS TRANSACTION DELAY".
1761 Default value: \fB25\fR%.
1767 \fBzfs_dirty_data_max_percent\fR (int)
1770 Determines the dirty space limit, expressed as a percentage of all
1771 memory. Once this limit is exceeded, new writes are halted until space frees
1772 up. The parameter \fBzfs_dirty_data_max\fR takes precedence over this
1773 one. See the section "ZFS TRANSACTION DELAY".
1775 Default value: \fB10\fR%, subject to \fBzfs_dirty_data_max_max\fR.
1781 \fBzfs_dirty_data_sync_percent\fR (int)
1784 Start syncing out a transaction group if there's at least this much dirty data
1785 as a percentage of \fBzfs_dirty_data_max\fR. This should be less than
1786 \fBzfs_vdev_async_write_active_min_dirty_percent\fR.
1788 Default value: \fB20\fR% of \fBzfs_dirty_data_max\fR.
1794 \fBzfs_fallocate_reserve_percent\fR (uint)
1797 Since ZFS is a copy-on-write filesystem with snapshots, blocks cannot be
1798 preallocated for a file in order to guarantee that later writes will not
1799 run out of space. Instead, fallocate() space preallocation only checks
1800 that sufficient space is currently available in the pool or the user's
1801 project quota allocation, and then creates a sparse file of the requested
1802 size. The requested space is multiplied by \fBzfs_fallocate_reserve_percent\fR
1803 to allow additional space for indirect blocks and other internal metadata.
1804 Setting this value to 0 disables support for fallocate(2) and returns
1805 EOPNOTSUPP for fallocate() space preallocation again.
1807 Default value: \fB110\fR%
1813 \fBzfs_fletcher_4_impl\fR (string)
1816 Select a fletcher 4 implementation.
1818 Supported selectors are: \fBfastest\fR, \fBscalar\fR, \fBsse2\fR, \fBssse3\fR,
1819 \fBavx2\fR, \fBavx512f\fR, \fBavx512bw\fR, and \fBaarch64_neon\fR.
1820 All of the selectors except \fBfastest\fR and \fBscalar\fR require instruction
1821 set extensions to be available and will only appear if ZFS detects that they are
1822 present at runtime. If multiple implementations of fletcher 4 are available,
1823 the \fBfastest\fR will be chosen using a micro benchmark. Selecting \fBscalar\fR
1824 results in the original, CPU based calculation, being used. Selecting any option
1825 other than \fBfastest\fR and \fBscalar\fR results in vector instructions from
1826 the respective CPU instruction set being used.
1828 Default value: \fBfastest\fR.
1834 \fBzfs_free_bpobj_enabled\fR (int)
1837 Enable/disable the processing of the free_bpobj object.
1839 Default value: \fB1\fR.
1845 \fBzfs_async_block_max_blocks\fR (ulong)
1848 Maximum number of blocks freed in a single txg.
1850 Default value: \fBULONG_MAX\fR (unlimited).
1856 \fBzfs_max_async_dedup_frees\fR (ulong)
1859 Maximum number of dedup blocks freed in a single txg.
1861 Default value: \fB100,000\fR.
1867 \fBzfs_override_estimate_recordsize\fR (ulong)
1870 Record size calculation override for zfs send estimates.
1872 Default value: \fB0\fR.
1878 \fBzfs_vdev_async_read_max_active\fR (int)
1881 Maximum asynchronous read I/Os active to each device.
1882 See the section "ZFS I/O SCHEDULER".
1884 Default value: \fB3\fR.
1890 \fBzfs_vdev_async_read_min_active\fR (int)
1893 Minimum asynchronous read I/Os active to each device.
1894 See the section "ZFS I/O SCHEDULER".
1896 Default value: \fB1\fR.
1902 \fBzfs_vdev_async_write_active_max_dirty_percent\fR (int)
1905 When the pool has more than
1906 \fBzfs_vdev_async_write_active_max_dirty_percent\fR dirty data, use
1907 \fBzfs_vdev_async_write_max_active\fR to limit active async writes. If
1908 the dirty data is between min and max, the active I/O limit is linearly
1909 interpolated. See the section "ZFS I/O SCHEDULER".
1911 Default value: \fB60\fR%.
1917 \fBzfs_vdev_async_write_active_min_dirty_percent\fR (int)
1920 When the pool has less than
1921 \fBzfs_vdev_async_write_active_min_dirty_percent\fR dirty data, use
1922 \fBzfs_vdev_async_write_min_active\fR to limit active async writes. If
1923 the dirty data is between min and max, the active I/O limit is linearly
1924 interpolated. See the section "ZFS I/O SCHEDULER".
1926 Default value: \fB30\fR%.
1932 \fBzfs_vdev_async_write_max_active\fR (int)
1935 Maximum asynchronous write I/Os active to each device.
1936 See the section "ZFS I/O SCHEDULER".
1938 Default value: \fB10\fR.
1944 \fBzfs_vdev_async_write_min_active\fR (int)
1947 Minimum asynchronous write I/Os active to each device.
1948 See the section "ZFS I/O SCHEDULER".
1950 Lower values are associated with better latency on rotational media but poorer
1951 resilver performance. The default value of 2 was chosen as a compromise. A
1952 value of 3 has been shown to improve resilver performance further at a cost of
1953 further increasing latency.
1955 Default value: \fB2\fR.
1961 \fBzfs_vdev_initializing_max_active\fR (int)
1964 Maximum initializing I/Os active to each device.
1965 See the section "ZFS I/O SCHEDULER".
1967 Default value: \fB1\fR.
1973 \fBzfs_vdev_initializing_min_active\fR (int)
1976 Minimum initializing I/Os active to each device.
1977 See the section "ZFS I/O SCHEDULER".
1979 Default value: \fB1\fR.
1985 \fBzfs_vdev_max_active\fR (int)
1988 The maximum number of I/Os active to each device. Ideally, this will be >=
1989 the sum of each queue's max_active. It must be at least the sum of each
1990 queue's min_active. See the section "ZFS I/O SCHEDULER".
1992 Default value: \fB1,000\fR.
1998 \fBzfs_vdev_rebuild_max_active\fR (int)
2001 Maximum sequential resilver I/Os active to each device.
2002 See the section "ZFS I/O SCHEDULER".
2004 Default value: \fB3\fR.
2010 \fBzfs_vdev_rebuild_min_active\fR (int)
2013 Minimum sequential resilver I/Os active to each device.
2014 See the section "ZFS I/O SCHEDULER".
2016 Default value: \fB1\fR.
2022 \fBzfs_vdev_removal_max_active\fR (int)
2025 Maximum removal I/Os active to each device.
2026 See the section "ZFS I/O SCHEDULER".
2028 Default value: \fB2\fR.
2034 \fBzfs_vdev_removal_min_active\fR (int)
2037 Minimum removal I/Os active to each device.
2038 See the section "ZFS I/O SCHEDULER".
2040 Default value: \fB1\fR.
2046 \fBzfs_vdev_scrub_max_active\fR (int)
2049 Maximum scrub I/Os active to each device.
2050 See the section "ZFS I/O SCHEDULER".
2052 Default value: \fB2\fR.
2058 \fBzfs_vdev_scrub_min_active\fR (int)
2061 Minimum scrub I/Os active to each device.
2062 See the section "ZFS I/O SCHEDULER".
2064 Default value: \fB1\fR.
2070 \fBzfs_vdev_sync_read_max_active\fR (int)
2073 Maximum synchronous read I/Os active to each device.
2074 See the section "ZFS I/O SCHEDULER".
2076 Default value: \fB10\fR.
2082 \fBzfs_vdev_sync_read_min_active\fR (int)
2085 Minimum synchronous read I/Os active to each device.
2086 See the section "ZFS I/O SCHEDULER".
2088 Default value: \fB10\fR.
2094 \fBzfs_vdev_sync_write_max_active\fR (int)
2097 Maximum synchronous write I/Os active to each device.
2098 See the section "ZFS I/O SCHEDULER".
2100 Default value: \fB10\fR.
2106 \fBzfs_vdev_sync_write_min_active\fR (int)
2109 Minimum synchronous write I/Os active to each device.
2110 See the section "ZFS I/O SCHEDULER".
2112 Default value: \fB10\fR.
2118 \fBzfs_vdev_trim_max_active\fR (int)
2121 Maximum trim/discard I/Os active to each device.
2122 See the section "ZFS I/O SCHEDULER".
2124 Default value: \fB2\fR.
2130 \fBzfs_vdev_trim_min_active\fR (int)
2133 Minimum trim/discard I/Os active to each device.
2134 See the section "ZFS I/O SCHEDULER".
2136 Default value: \fB1\fR.
2142 \fBzfs_vdev_queue_depth_pct\fR (int)
2145 Maximum number of queued allocations per top-level vdev expressed as
2146 a percentage of \fBzfs_vdev_async_write_max_active\fR which allows the
2147 system to detect devices that are more capable of handling allocations
2148 and to allocate more blocks to those devices. It allows for dynamic
2149 allocation distribution when devices are imbalanced as fuller devices
2150 will tend to be slower than empty devices.
2152 See also \fBzio_dva_throttle_enabled\fR.
2154 Default value: \fB1000\fR%.
2160 \fBzfs_expire_snapshot\fR (int)
2163 Seconds to expire .zfs/snapshot
2165 Default value: \fB300\fR.
2171 \fBzfs_admin_snapshot\fR (int)
2174 Allow the creation, removal, or renaming of entries in the .zfs/snapshot
2175 directory to cause the creation, destruction, or renaming of snapshots.
2176 When enabled this functionality works both locally and over NFS exports
2177 which have the 'no_root_squash' option set. This functionality is disabled
2180 Use \fB1\fR for yes and \fB0\fR for no (default).
2186 \fBzfs_flags\fR (int)
2189 Set additional debugging flags. The following flags may be bitwise-or'd
2201 Enable dprintf entries in the debug log.
2203 2 ZFS_DEBUG_DBUF_VERIFY *
2204 Enable extra dbuf verifications.
2206 4 ZFS_DEBUG_DNODE_VERIFY *
2207 Enable extra dnode verifications.
2209 8 ZFS_DEBUG_SNAPNAMES
2210 Enable snapshot name verification.
2213 Check for illegally modified ARC buffers.
2215 64 ZFS_DEBUG_ZIO_FREE
2216 Enable verification of block frees.
2218 128 ZFS_DEBUG_HISTOGRAM_VERIFY
2219 Enable extra spacemap histogram verifications.
2221 256 ZFS_DEBUG_METASLAB_VERIFY
2222 Verify space accounting on disk matches in-core range_trees.
2224 512 ZFS_DEBUG_SET_ERROR
2225 Enable SET_ERROR and dprintf entries in the debug log.
2227 1024 ZFS_DEBUG_INDIRECT_REMAP
2228 Verify split blocks created by device removal.
2231 Verify TRIM ranges are always within the allocatable range tree.
2233 4096 ZFS_DEBUG_LOG_SPACEMAP
2234 Verify that the log summary is consistent with the spacemap log
2235 and enable zfs_dbgmsgs for metaslab loading and flushing.
2238 * Requires debug build.
2240 Default value: \fB0\fR.
2246 \fBzfs_free_leak_on_eio\fR (int)
2249 If destroy encounters an EIO while reading metadata (e.g. indirect
2250 blocks), space referenced by the missing metadata can not be freed.
2251 Normally this causes the background destroy to become "stalled", as
2252 it is unable to make forward progress. While in this stalled state,
2253 all remaining space to free from the error-encountering filesystem is
2254 "temporarily leaked". Set this flag to cause it to ignore the EIO,
2255 permanently leak the space from indirect blocks that can not be read,
2256 and continue to free everything else that it can.
2258 The default, "stalling" behavior is useful if the storage partially
2259 fails (i.e. some but not all i/os fail), and then later recovers. In
2260 this case, we will be able to continue pool operations while it is
2261 partially failed, and when it recovers, we can continue to free the
2262 space, with no leaks. However, note that this case is actually
2265 Typically pools either (a) fail completely (but perhaps temporarily,
2266 e.g. a top-level vdev going offline), or (b) have localized,
2267 permanent errors (e.g. disk returns the wrong data due to bit flip or
2268 firmware bug). In case (a), this setting does not matter because the
2269 pool will be suspended and the sync thread will not be able to make
2270 forward progress regardless. In case (b), because the error is
2271 permanent, the best we can do is leak the minimum amount of space,
2272 which is what setting this flag will do. Therefore, it is reasonable
2273 for this flag to normally be set, but we chose the more conservative
2274 approach of not setting it, so that there is no possibility of
2275 leaking space in the "partial temporary" failure case.
2277 Default value: \fB0\fR.
2283 \fBzfs_free_min_time_ms\fR (int)
2286 During a \fBzfs destroy\fR operation using \fBfeature@async_destroy\fR a minimum
2287 of this much time will be spent working on freeing blocks per txg.
2289 Default value: \fB1,000\fR.
2295 \fBzfs_obsolete_min_time_ms\fR (int)
2298 Similar to \fBzfs_free_min_time_ms\fR but for cleanup of old indirection records
2301 Default value: \fB500\fR.
2307 \fBzfs_immediate_write_sz\fR (long)
2310 Largest data block to write to zil. Larger blocks will be treated as if the
2311 dataset being written to had the property setting \fBlogbias=throughput\fR.
2313 Default value: \fB32,768\fR.
2319 \fBzfs_initialize_value\fR (ulong)
2322 Pattern written to vdev free space by \fBzpool initialize\fR.
2324 Default value: \fB16,045,690,984,833,335,022\fR (0xdeadbeefdeadbeee).
2330 \fBzfs_initialize_chunk_size\fR (ulong)
2333 Size of writes used by \fBzpool initialize\fR.
2334 This option is used by the test suite to facilitate testing.
2336 Default value: \fB1,048,576\fR
2342 \fBzfs_livelist_max_entries\fR (ulong)
2345 The threshold size (in block pointers) at which we create a new sub-livelist.
2346 Larger sublists are more costly from a memory perspective but the fewer
2347 sublists there are, the lower the cost of insertion.
2349 Default value: \fB500,000\fR.
2355 \fBzfs_livelist_min_percent_shared\fR (int)
2358 If the amount of shared space between a snapshot and its clone drops below
2359 this threshold, the clone turns off the livelist and reverts to the old deletion
2360 method. This is in place because once a clone has been overwritten enough
2361 livelists no long give us a benefit.
2363 Default value: \fB75\fR.
2369 \fBzfs_livelist_condense_new_alloc\fR (int)
2372 Incremented each time an extra ALLOC blkptr is added to a livelist entry while
2373 it is being condensed.
2374 This option is used by the test suite to track race conditions.
2376 Default value: \fB0\fR.
2382 \fBzfs_livelist_condense_sync_cancel\fR (int)
2385 Incremented each time livelist condensing is canceled while in
2386 spa_livelist_condense_sync.
2387 This option is used by the test suite to track race conditions.
2389 Default value: \fB0\fR.
2395 \fBzfs_livelist_condense_sync_pause\fR (int)
2398 When set, the livelist condense process pauses indefinitely before
2399 executing the synctask - spa_livelist_condense_sync.
2400 This option is used by the test suite to trigger race conditions.
2402 Default value: \fB0\fR.
2408 \fBzfs_livelist_condense_zthr_cancel\fR (int)
2411 Incremented each time livelist condensing is canceled while in
2412 spa_livelist_condense_cb.
2413 This option is used by the test suite to track race conditions.
2415 Default value: \fB0\fR.
2421 \fBzfs_livelist_condense_zthr_pause\fR (int)
2424 When set, the livelist condense process pauses indefinitely before
2425 executing the open context condensing work in spa_livelist_condense_cb.
2426 This option is used by the test suite to trigger race conditions.
2428 Default value: \fB0\fR.
2434 \fBzfs_lua_max_instrlimit\fR (ulong)
2437 The maximum execution time limit that can be set for a ZFS channel program,
2438 specified as a number of Lua instructions.
2440 Default value: \fB100,000,000\fR.
2446 \fBzfs_lua_max_memlimit\fR (ulong)
2449 The maximum memory limit that can be set for a ZFS channel program, specified
2452 Default value: \fB104,857,600\fR.
2458 \fBzfs_max_dataset_nesting\fR (int)
2461 The maximum depth of nested datasets. This value can be tuned temporarily to
2462 fix existing datasets that exceed the predefined limit.
2464 Default value: \fB50\fR.
2470 \fBzfs_max_log_walking\fR (ulong)
2473 The number of past TXGs that the flushing algorithm of the log spacemap
2474 feature uses to estimate incoming log blocks.
2476 Default value: \fB5\fR.
2482 \fBzfs_max_logsm_summary_length\fR (ulong)
2485 Maximum number of rows allowed in the summary of the spacemap log.
2487 Default value: \fB10\fR.
2493 \fBzfs_max_recordsize\fR (int)
2496 We currently support block sizes from 512 bytes to 16MB. The benefits of
2497 larger blocks, and thus larger I/O, need to be weighed against the cost of
2498 COWing a giant block to modify one byte. Additionally, very large blocks
2499 can have an impact on i/o latency, and also potentially on the memory
2500 allocator. Therefore, we do not allow the recordsize to be set larger than
2501 zfs_max_recordsize (default 1MB). Larger blocks can be created by changing
2502 this tunable, and pools with larger blocks can always be imported and used,
2503 regardless of this setting.
2505 Default value: \fB1,048,576\fR.
2511 \fBzfs_allow_redacted_dataset_mount\fR (int)
2514 Allow datasets received with redacted send/receive to be mounted. Normally
2515 disabled because these datasets may be missing key data.
2517 Default value: \fB0\fR.
2523 \fBzfs_min_metaslabs_to_flush\fR (ulong)
2526 Minimum number of metaslabs to flush per dirty TXG
2528 Default value: \fB1\fR.
2534 \fBzfs_metaslab_fragmentation_threshold\fR (int)
2537 Allow metaslabs to keep their active state as long as their fragmentation
2538 percentage is less than or equal to this value. An active metaslab that
2539 exceeds this threshold will no longer keep its active status allowing
2540 better metaslabs to be selected.
2542 Default value: \fB70\fR.
2548 \fBzfs_mg_fragmentation_threshold\fR (int)
2551 Metaslab groups are considered eligible for allocations if their
2552 fragmentation metric (measured as a percentage) is less than or equal to
2553 this value. If a metaslab group exceeds this threshold then it will be
2554 skipped unless all metaslab groups within the metaslab class have also
2555 crossed this threshold.
2557 Default value: \fB95\fR.
2563 \fBzfs_mg_noalloc_threshold\fR (int)
2566 Defines a threshold at which metaslab groups should be eligible for
2567 allocations. The value is expressed as a percentage of free space
2568 beyond which a metaslab group is always eligible for allocations.
2569 If a metaslab group's free space is less than or equal to the
2570 threshold, the allocator will avoid allocating to that group
2571 unless all groups in the pool have reached the threshold. Once all
2572 groups have reached the threshold, all groups are allowed to accept
2573 allocations. The default value of 0 disables the feature and causes
2574 all metaslab groups to be eligible for allocations.
2576 This parameter allows one to deal with pools having heavily imbalanced
2577 vdevs such as would be the case when a new vdev has been added.
2578 Setting the threshold to a non-zero percentage will stop allocations
2579 from being made to vdevs that aren't filled to the specified percentage
2580 and allow lesser filled vdevs to acquire more allocations than they
2581 otherwise would under the old \fBzfs_mg_alloc_failures\fR facility.
2583 Default value: \fB0\fR.
2589 \fBzfs_ddt_data_is_special\fR (int)
2592 If enabled, ZFS will place DDT data into the special allocation class.
2594 Default value: \fB1\fR.
2600 \fBzfs_user_indirect_is_special\fR (int)
2603 If enabled, ZFS will place user data (both file and zvol) indirect blocks
2604 into the special allocation class.
2606 Default value: \fB1\fR.
2612 \fBzfs_multihost_history\fR (int)
2615 Historical statistics for the last N multihost updates will be available in
2616 \fB/proc/spl/kstat/zfs/<pool>/multihost\fR
2618 Default value: \fB0\fR.
2624 \fBzfs_multihost_interval\fR (ulong)
2627 Used to control the frequency of multihost writes which are performed when the
2628 \fBmultihost\fR pool property is on. This is one factor used to determine the
2629 length of the activity check during import.
2631 The multihost write period is \fBzfs_multihost_interval / leaf-vdevs\fR
2632 milliseconds. On average a multihost write will be issued for each leaf vdev
2633 every \fBzfs_multihost_interval\fR milliseconds. In practice, the observed
2634 period can vary with the I/O load and this observed value is the delay which is
2635 stored in the uberblock.
2637 Default value: \fB1000\fR.
2643 \fBzfs_multihost_import_intervals\fR (uint)
2646 Used to control the duration of the activity test on import. Smaller values of
2647 \fBzfs_multihost_import_intervals\fR will reduce the import time but increase
2648 the risk of failing to detect an active pool. The total activity check time is
2649 never allowed to drop below one second.
2651 On import the activity check waits a minimum amount of time determined by
2652 \fBzfs_multihost_interval * zfs_multihost_import_intervals\fR, or the same
2653 product computed on the host which last had the pool imported (whichever is
2654 greater). The activity check time may be further extended if the value of mmp
2655 delay found in the best uberblock indicates actual multihost updates happened
2656 at longer intervals than \fBzfs_multihost_interval\fR. A minimum value of
2657 \fB100ms\fR is enforced.
2659 A value of 0 is ignored and treated as if it was set to 1.
2661 Default value: \fB20\fR.
2667 \fBzfs_multihost_fail_intervals\fR (uint)
2670 Controls the behavior of the pool when multihost write failures or delays are
2673 When \fBzfs_multihost_fail_intervals = 0\fR, multihost write failures or delays
2674 are ignored. The failures will still be reported to the ZED which depending on
2675 its configuration may take action such as suspending the pool or offlining a
2679 When \fBzfs_multihost_fail_intervals > 0\fR, the pool will be suspended if
2680 \fBzfs_multihost_fail_intervals * zfs_multihost_interval\fR milliseconds pass
2681 without a successful mmp write. This guarantees the activity test will see
2682 mmp writes if the pool is imported. A value of 1 is ignored and treated as
2683 if it was set to 2. This is necessary to prevent the pool from being suspended
2684 due to normal, small I/O latency variations.
2687 Default value: \fB10\fR.
2693 \fBzfs_no_scrub_io\fR (int)
2696 Set for no scrub I/O. This results in scrubs not actually scrubbing data and
2697 simply doing a metadata crawl of the pool instead.
2699 Use \fB1\fR for yes and \fB0\fR for no (default).
2705 \fBzfs_no_scrub_prefetch\fR (int)
2708 Set to disable block prefetching for scrubs.
2710 Use \fB1\fR for yes and \fB0\fR for no (default).
2716 \fBzfs_nocacheflush\fR (int)
2719 Disable cache flush operations on disks when writing. Setting this will
2720 cause pool corruption on power loss if a volatile out-of-order write cache
2723 Use \fB1\fR for yes and \fB0\fR for no (default).
2729 \fBzfs_nopwrite_enabled\fR (int)
2734 Use \fB1\fR for yes (default) and \fB0\fR to disable.
2740 \fBzfs_dmu_offset_next_sync\fR (int)
2743 Enable forcing txg sync to find holes. When enabled forces ZFS to act
2744 like prior versions when SEEK_HOLE or SEEK_DATA flags are used, which
2745 when a dnode is dirty causes txg's to be synced so that this data can be
2748 Use \fB1\fR for yes and \fB0\fR to disable (default).
2754 \fBzfs_pd_bytes_max\fR (int)
2757 The number of bytes which should be prefetched during a pool traversal
2758 (eg: \fBzfs send\fR or other data crawling operations)
2760 Default value: \fB52,428,800\fR.
2766 \fBzfs_per_txg_dirty_frees_percent \fR (ulong)
2769 Tunable to control percentage of dirtied indirect blocks from frees allowed
2770 into one TXG. After this threshold is crossed, additional frees will wait until
2772 A value of zero will disable this throttle.
2774 Default value: \fB5\fR, set to \fB0\fR to disable.
2780 \fBzfs_prefetch_disable\fR (int)
2783 This tunable disables predictive prefetch. Note that it leaves "prescient"
2784 prefetch (e.g. prefetch for zfs send) intact. Unlike predictive prefetch,
2785 prescient prefetch never issues i/os that end up not being needed, so it
2786 can't hurt performance.
2788 Use \fB1\fR for yes and \fB0\fR for no (default).
2794 \fBzfs_qat_checksum_disable\fR (int)
2797 This tunable disables qat hardware acceleration for sha256 checksums. It
2798 may be set after the zfs modules have been loaded to initialize the qat
2799 hardware as long as support is compiled in and the qat driver is present.
2801 Use \fB1\fR for yes and \fB0\fR for no (default).
2807 \fBzfs_qat_compress_disable\fR (int)
2810 This tunable disables qat hardware acceleration for gzip compression. It
2811 may be set after the zfs modules have been loaded to initialize the qat
2812 hardware as long as support is compiled in and the qat driver is present.
2814 Use \fB1\fR for yes and \fB0\fR for no (default).
2820 \fBzfs_qat_encrypt_disable\fR (int)
2823 This tunable disables qat hardware acceleration for AES-GCM encryption. It
2824 may be set after the zfs modules have been loaded to initialize the qat
2825 hardware as long as support is compiled in and the qat driver is present.
2827 Use \fB1\fR for yes and \fB0\fR for no (default).
2833 \fBzfs_read_chunk_size\fR (long)
2836 Bytes to read per chunk
2838 Default value: \fB1,048,576\fR.
2844 \fBzfs_read_history\fR (int)
2847 Historical statistics for the last N reads will be available in
2848 \fB/proc/spl/kstat/zfs/<pool>/reads\fR
2850 Default value: \fB0\fR (no data is kept).
2856 \fBzfs_read_history_hits\fR (int)
2859 Include cache hits in read history
2861 Use \fB1\fR for yes and \fB0\fR for no (default).
2867 \fBzfs_rebuild_max_segment\fR (ulong)
2870 Maximum read segment size to issue when sequentially resilvering a
2873 Default value: \fB1,048,576\fR.
2879 \fBzfs_reconstruct_indirect_combinations_max\fR (int)
2882 If an indirect split block contains more than this many possible unique
2883 combinations when being reconstructed, consider it too computationally
2884 expensive to check them all. Instead, try at most
2885 \fBzfs_reconstruct_indirect_combinations_max\fR randomly-selected
2886 combinations each time the block is accessed. This allows all segment
2887 copies to participate fairly in the reconstruction when all combinations
2888 cannot be checked and prevents repeated use of one bad copy.
2890 Default value: \fB4096\fR.
2896 \fBzfs_recover\fR (int)
2899 Set to attempt to recover from fatal errors. This should only be used as a
2900 last resort, as it typically results in leaked space, or worse.
2902 Use \fB1\fR for yes and \fB0\fR for no (default).
2908 \fBzfs_removal_ignore_errors\fR (int)
2912 Ignore hard IO errors during device removal. When set, if a device encounters
2913 a hard IO error during the removal process the removal will not be cancelled.
2914 This can result in a normally recoverable block becoming permanently damaged
2915 and is not recommended. This should only be used as a last resort when the
2916 pool cannot be returned to a healthy state prior to removing the device.
2918 Default value: \fB0\fR.
2924 \fBzfs_removal_suspend_progress\fR (int)
2928 This is used by the test suite so that it can ensure that certain actions
2929 happen while in the middle of a removal.
2931 Default value: \fB0\fR.
2937 \fBzfs_remove_max_segment\fR (int)
2941 The largest contiguous segment that we will attempt to allocate when removing
2942 a device. This can be no larger than 16MB. If there is a performance
2943 problem with attempting to allocate large blocks, consider decreasing this.
2945 Default value: \fB16,777,216\fR (16MB).
2951 \fBzfs_resilver_disable_defer\fR (int)
2954 Disables the \fBresilver_defer\fR feature, causing an operation that would
2955 start a resilver to restart one in progress immediately.
2957 Default value: \fB0\fR (feature enabled).
2963 \fBzfs_resilver_min_time_ms\fR (int)
2966 Resilvers are processed by the sync thread. While resilvering it will spend
2967 at least this much time working on a resilver between txg flushes.
2969 Default value: \fB3,000\fR.
2975 \fBzfs_scan_ignore_errors\fR (int)
2978 If set to a nonzero value, remove the DTL (dirty time list) upon
2979 completion of a pool scan (scrub) even if there were unrepairable
2980 errors. It is intended to be used during pool repair or recovery to
2981 stop resilvering when the pool is next imported.
2983 Default value: \fB0\fR.
2989 \fBzfs_scrub_min_time_ms\fR (int)
2992 Scrubs are processed by the sync thread. While scrubbing it will spend
2993 at least this much time working on a scrub between txg flushes.
2995 Default value: \fB1,000\fR.
3001 \fBzfs_scan_checkpoint_intval\fR (int)
3004 To preserve progress across reboots the sequential scan algorithm periodically
3005 needs to stop metadata scanning and issue all the verifications I/Os to disk.
3006 The frequency of this flushing is determined by the
3007 \fBzfs_scan_checkpoint_intval\fR tunable.
3009 Default value: \fB7200\fR seconds (every 2 hours).
3015 \fBzfs_scan_fill_weight\fR (int)
3018 This tunable affects how scrub and resilver I/O segments are ordered. A higher
3019 number indicates that we care more about how filled in a segment is, while a
3020 lower number indicates we care more about the size of the extent without
3021 considering the gaps within a segment. This value is only tunable upon module
3022 insertion. Changing the value afterwards will have no affect on scrub or
3023 resilver performance.
3025 Default value: \fB3\fR.
3031 \fBzfs_scan_issue_strategy\fR (int)
3034 Determines the order that data will be verified while scrubbing or resilvering.
3035 If set to \fB1\fR, data will be verified as sequentially as possible, given the
3036 amount of memory reserved for scrubbing (see \fBzfs_scan_mem_lim_fact\fR). This
3037 may improve scrub performance if the pool's data is very fragmented. If set to
3038 \fB2\fR, the largest mostly-contiguous chunk of found data will be verified
3039 first. By deferring scrubbing of small segments, we may later find adjacent data
3040 to coalesce and increase the segment size. If set to \fB0\fR, zfs will use
3041 strategy \fB1\fR during normal verification and strategy \fB2\fR while taking a
3044 Default value: \fB0\fR.
3050 \fBzfs_scan_legacy\fR (int)
3053 A value of 0 indicates that scrubs and resilvers will gather metadata in
3054 memory before issuing sequential I/O. A value of 1 indicates that the legacy
3055 algorithm will be used where I/O is initiated as soon as it is discovered.
3056 Changing this value to 0 will not affect scrubs or resilvers that are already
3059 Default value: \fB0\fR.
3065 \fBzfs_scan_max_ext_gap\fR (int)
3068 Indicates the largest gap in bytes between scrub / resilver I/Os that will still
3069 be considered sequential for sorting purposes. Changing this value will not
3070 affect scrubs or resilvers that are already in progress.
3072 Default value: \fB2097152 (2 MB)\fR.
3078 \fBzfs_scan_mem_lim_fact\fR (int)
3081 Maximum fraction of RAM used for I/O sorting by sequential scan algorithm.
3082 This tunable determines the hard limit for I/O sorting memory usage.
3083 When the hard limit is reached we stop scanning metadata and start issuing
3084 data verification I/O. This is done until we get below the soft limit.
3086 Default value: \fB20\fR which is 5% of RAM (1/20).
3092 \fBzfs_scan_mem_lim_soft_fact\fR (int)
3095 The fraction of the hard limit used to determined the soft limit for I/O sorting
3096 by the sequential scan algorithm. When we cross this limit from below no action
3097 is taken. When we cross this limit from above it is because we are issuing
3098 verification I/O. In this case (unless the metadata scan is done) we stop
3099 issuing verification I/O and start scanning metadata again until we get to the
3102 Default value: \fB20\fR which is 5% of the hard limit (1/20).
3108 \fBzfs_scan_strict_mem_lim\fR (int)
3111 Enforces tight memory limits on pool scans when a sequential scan is in
3112 progress. When disabled the memory limit may be exceeded by fast disks.
3114 Default value: \fB0\fR.
3120 \fBzfs_scan_suspend_progress\fR (int)
3123 Freezes a scrub/resilver in progress without actually pausing it. Intended for
3126 Default value: \fB0\fR.
3133 \fBzfs_scan_vdev_limit\fR (int)
3136 Maximum amount of data that can be concurrently issued at once for scrubs and
3137 resilvers per leaf device, given in bytes.
3139 Default value: \fB41943040\fR.
3145 \fBzfs_send_corrupt_data\fR (int)
3148 Allow sending of corrupt data (ignore read/checksum errors when sending data)
3150 Use \fB1\fR for yes and \fB0\fR for no (default).
3156 \fBzfs_send_unmodified_spill_blocks\fR (int)
3159 Include unmodified spill blocks in the send stream. Under certain circumstances
3160 previous versions of ZFS could incorrectly remove the spill block from an
3161 existing object. Including unmodified copies of the spill blocks creates a
3162 backwards compatible stream which will recreate a spill block if it was
3163 incorrectly removed.
3165 Use \fB1\fR for yes (default) and \fB0\fR for no.
3171 \fBzfs_send_no_prefetch_queue_ff\fR (int)
3174 The fill fraction of the \fBzfs send\fR internal queues. The fill fraction
3175 controls the timing with which internal threads are woken up.
3177 Default value: \fB20\fR.
3183 \fBzfs_send_no_prefetch_queue_length\fR (int)
3186 The maximum number of bytes allowed in \fBzfs send\fR's internal queues.
3188 Default value: \fB1,048,576\fR.
3194 \fBzfs_send_queue_ff\fR (int)
3197 The fill fraction of the \fBzfs send\fR prefetch queue. The fill fraction
3198 controls the timing with which internal threads are woken up.
3200 Default value: \fB20\fR.
3206 \fBzfs_send_queue_length\fR (int)
3209 The maximum number of bytes allowed that will be prefetched by \fBzfs send\fR.
3210 This value must be at least twice the maximum block size in use.
3212 Default value: \fB16,777,216\fR.
3218 \fBzfs_recv_queue_ff\fR (int)
3221 The fill fraction of the \fBzfs receive\fR queue. The fill fraction
3222 controls the timing with which internal threads are woken up.
3224 Default value: \fB20\fR.
3230 \fBzfs_recv_queue_length\fR (int)
3233 The maximum number of bytes allowed in the \fBzfs receive\fR queue. This value
3234 must be at least twice the maximum block size in use.
3236 Default value: \fB16,777,216\fR.
3242 \fBzfs_recv_write_batch_size\fR (int)
3245 The maximum amount of data (in bytes) that \fBzfs receive\fR will write in
3246 one DMU transaction. This is the uncompressed size, even when receiving a
3247 compressed send stream. This setting will not reduce the write size below
3248 a single block. Capped at a maximum of 32MB
3250 Default value: \fB1MB\fR.
3256 \fBzfs_override_estimate_recordsize\fR (ulong)
3259 Setting this variable overrides the default logic for estimating block
3260 sizes when doing a zfs send. The default heuristic is that the average
3261 block size will be the current recordsize. Override this value if most data
3262 in your dataset is not of that size and you require accurate zfs send size
3265 Default value: \fB0\fR.
3271 \fBzfs_sync_pass_deferred_free\fR (int)
3274 Flushing of data to disk is done in passes. Defer frees starting in this pass
3276 Default value: \fB2\fR.
3282 \fBzfs_spa_discard_memory_limit\fR (int)
3285 Maximum memory used for prefetching a checkpoint's space map on each
3286 vdev while discarding the checkpoint.
3288 Default value: \fB16,777,216\fR.
3294 \fBzfs_special_class_metadata_reserve_pct\fR (int)
3297 Only allow small data blocks to be allocated on the special and dedup vdev
3298 types when the available free space percentage on these vdevs exceeds this
3299 value. This ensures reserved space is available for pool meta data as the
3300 special vdevs approach capacity.
3302 Default value: \fB25\fR.
3308 \fBzfs_sync_pass_dont_compress\fR (int)
3311 Starting in this sync pass, we disable compression (including of metadata).
3312 With the default setting, in practice, we don't have this many sync passes,
3313 so this has no effect.
3315 The original intent was that disabling compression would help the sync passes
3316 to converge. However, in practice disabling compression increases the average
3317 number of sync passes, because when we turn compression off, a lot of block's
3318 size will change and thus we have to re-allocate (not overwrite) them. It
3319 also increases the number of 128KB allocations (e.g. for indirect blocks and
3320 spacemaps) because these will not be compressed. The 128K allocations are
3321 especially detrimental to performance on highly fragmented systems, which may
3322 have very few free segments of this size, and may need to load new metaslabs
3323 to satisfy 128K allocations.
3325 Default value: \fB8\fR.
3331 \fBzfs_sync_pass_rewrite\fR (int)
3334 Rewrite new block pointers starting in this pass
3336 Default value: \fB2\fR.
3342 \fBzfs_sync_taskq_batch_pct\fR (int)
3345 This controls the number of threads used by the dp_sync_taskq. The default
3346 value of 75% will create a maximum of one thread per cpu.
3348 Default value: \fB75\fR%.
3354 \fBzfs_trim_extent_bytes_max\fR (uint)
3357 Maximum size of TRIM command. Ranges larger than this will be split in to
3358 chunks no larger than \fBzfs_trim_extent_bytes_max\fR bytes before being
3359 issued to the device.
3361 Default value: \fB134,217,728\fR.
3367 \fBzfs_trim_extent_bytes_min\fR (uint)
3370 Minimum size of TRIM commands. TRIM ranges smaller than this will be skipped
3371 unless they're part of a larger range which was broken in to chunks. This is
3372 done because it's common for these small TRIMs to negatively impact overall
3373 performance. This value can be set to 0 to TRIM all unallocated space.
3375 Default value: \fB32,768\fR.
3381 \fBzfs_trim_metaslab_skip\fR (uint)
3384 Skip uninitialized metaslabs during the TRIM process. This option is useful
3385 for pools constructed from large thinly-provisioned devices where TRIM
3386 operations are slow. As a pool ages an increasing fraction of the pools
3387 metaslabs will be initialized progressively degrading the usefulness of
3388 this option. This setting is stored when starting a manual TRIM and will
3389 persist for the duration of the requested TRIM.
3391 Default value: \fB0\fR.
3397 \fBzfs_trim_queue_limit\fR (uint)
3400 Maximum number of queued TRIMs outstanding per leaf vdev. The number of
3401 concurrent TRIM commands issued to the device is controlled by the
3402 \fBzfs_vdev_trim_min_active\fR and \fBzfs_vdev_trim_max_active\fR module
3405 Default value: \fB10\fR.
3411 \fBzfs_trim_txg_batch\fR (uint)
3414 The number of transaction groups worth of frees which should be aggregated
3415 before TRIM operations are issued to the device. This setting represents a
3416 trade-off between issuing larger, more efficient TRIM operations and the
3417 delay before the recently trimmed space is available for use by the device.
3419 Increasing this value will allow frees to be aggregated for a longer time.
3420 This will result is larger TRIM operations and potentially increased memory
3421 usage. Decreasing this value will have the opposite effect. The default
3422 value of 32 was determined to be a reasonable compromise.
3424 Default value: \fB32\fR.
3430 \fBzfs_txg_history\fR (int)
3433 Historical statistics for the last N txgs will be available in
3434 \fB/proc/spl/kstat/zfs/<pool>/txgs\fR
3436 Default value: \fB0\fR.
3442 \fBzfs_txg_timeout\fR (int)
3445 Flush dirty data to disk at least every N seconds (maximum txg duration)
3447 Default value: \fB5\fR.
3453 \fBzfs_vdev_aggregate_trim\fR (int)
3456 Allow TRIM I/Os to be aggregated. This is normally not helpful because
3457 the extents to be trimmed will have been already been aggregated by the
3458 metaslab. This option is provided for debugging and performance analysis.
3460 Default value: \fB0\fR.
3466 \fBzfs_vdev_aggregation_limit\fR (int)
3469 Max vdev I/O aggregation size
3471 Default value: \fB1,048,576\fR.
3477 \fBzfs_vdev_aggregation_limit_non_rotating\fR (int)
3480 Max vdev I/O aggregation size for non-rotating media
3482 Default value: \fB131,072\fR.
3488 \fBzfs_vdev_cache_bshift\fR (int)
3491 Shift size to inflate reads too
3493 Default value: \fB16\fR (effectively 65536).
3499 \fBzfs_vdev_cache_max\fR (int)
3502 Inflate reads smaller than this value to meet the \fBzfs_vdev_cache_bshift\fR
3505 Default value: \fB16384\fR.
3511 \fBzfs_vdev_cache_size\fR (int)
3514 Total size of the per-disk cache in bytes.
3516 Currently this feature is disabled as it has been found to not be helpful
3517 for performance and in some cases harmful.
3519 Default value: \fB0\fR.
3525 \fBzfs_vdev_mirror_rotating_inc\fR (int)
3528 A number by which the balancing algorithm increments the load calculation for
3529 the purpose of selecting the least busy mirror member when an I/O immediately
3530 follows its predecessor on rotational vdevs for the purpose of making decisions
3533 Default value: \fB0\fR.
3539 \fBzfs_vdev_mirror_rotating_seek_inc\fR (int)
3542 A number by which the balancing algorithm increments the load calculation for
3543 the purpose of selecting the least busy mirror member when an I/O lacks
3544 locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
3545 this that are not immediately following the previous I/O are incremented by
3548 Default value: \fB5\fR.
3554 \fBzfs_vdev_mirror_rotating_seek_offset\fR (int)
3557 The maximum distance for the last queued I/O in which the balancing algorithm
3558 considers an I/O to have locality.
3559 See the section "ZFS I/O SCHEDULER".
3561 Default value: \fB1048576\fR.
3567 \fBzfs_vdev_mirror_non_rotating_inc\fR (int)
3570 A number by which the balancing algorithm increments the load calculation for
3571 the purpose of selecting the least busy mirror member on non-rotational vdevs
3572 when I/Os do not immediately follow one another.
3574 Default value: \fB0\fR.
3580 \fBzfs_vdev_mirror_non_rotating_seek_inc\fR (int)
3583 A number by which the balancing algorithm increments the load calculation for
3584 the purpose of selecting the least busy mirror member when an I/O lacks
3585 locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
3586 this that are not immediately following the previous I/O are incremented by
3589 Default value: \fB1\fR.
3595 \fBzfs_vdev_read_gap_limit\fR (int)
3598 Aggregate read I/O operations if the gap on-disk between them is within this
3601 Default value: \fB32,768\fR.
3607 \fBzfs_vdev_write_gap_limit\fR (int)
3610 Aggregate write I/O over gap
3612 Default value: \fB4,096\fR.
3618 \fBzfs_vdev_raidz_impl\fR (string)
3621 Parameter for selecting raidz parity implementation to use.
3623 Options marked (always) below may be selected on module load as they are
3624 supported on all systems.
3625 The remaining options may only be set after the module is loaded, as they
3626 are available only if the implementations are compiled in and supported
3627 on the running system.
3629 Once the module is loaded, the content of
3630 /sys/module/zfs/parameters/zfs_vdev_raidz_impl will show available options
3631 with the currently selected one enclosed in [].
3632 Possible options are:
3633 fastest - (always) implementation selected using built-in benchmark
3634 original - (always) original raidz implementation
3635 scalar - (always) scalar raidz implementation
3636 sse2 - implementation using SSE2 instruction set (64bit x86 only)
3637 ssse3 - implementation using SSSE3 instruction set (64bit x86 only)
3638 avx2 - implementation using AVX2 instruction set (64bit x86 only)
3639 avx512f - implementation using AVX512F instruction set (64bit x86 only)
3640 avx512bw - implementation using AVX512F & AVX512BW instruction sets (64bit x86 only)
3641 aarch64_neon - implementation using NEON (Aarch64/64 bit ARMv8 only)
3642 aarch64_neonx2 - implementation using NEON with more unrolling (Aarch64/64 bit ARMv8 only)
3643 powerpc_altivec - implementation using Altivec (PowerPC only)
3645 Default value: \fBfastest\fR.
3651 \fBzfs_vdev_scheduler\fR (charp)
3654 \fBDEPRECATED\fR: This option exists for compatibility with older user
3655 configurations. It does nothing except print a warning to the kernel log if
3663 \fBzfs_zevent_cols\fR (int)
3666 When zevents are logged to the console use this as the word wrap width.
3668 Default value: \fB80\fR.
3674 \fBzfs_zevent_console\fR (int)
3677 Log events to the console
3679 Use \fB1\fR for yes and \fB0\fR for no (default).
3685 \fBzfs_zevent_len_max\fR (int)
3688 Max event queue length. A value of 0 will result in a calculated value which
3689 increases with the number of CPUs in the system (minimum 64 events). Events
3690 in the queue can be viewed with the \fBzpool events\fR command.
3692 Default value: \fB0\fR.
3698 \fBzfs_zevent_retain_max\fR (int)
3701 Maximum recent zevent records to retain for duplicate checking. Setting
3702 this value to zero disables duplicate detection.
3704 Default value: \fB2000\fR.
3710 \fBzfs_zevent_retain_expire_secs\fR (int)
3713 Lifespan for a recent ereport that was retained for duplicate checking.
3715 Default value: \fB900\fR.
3719 \fBzfs_zil_clean_taskq_maxalloc\fR (int)
3722 The maximum number of taskq entries that are allowed to be cached. When this
3723 limit is exceeded transaction records (itxs) will be cleaned synchronously.
3725 Default value: \fB1048576\fR.
3731 \fBzfs_zil_clean_taskq_minalloc\fR (int)
3734 The number of taskq entries that are pre-populated when the taskq is first
3735 created and are immediately available for use.
3737 Default value: \fB1024\fR.
3743 \fBzfs_zil_clean_taskq_nthr_pct\fR (int)
3746 This controls the number of threads used by the dp_zil_clean_taskq. The default
3747 value of 100% will create a maximum of one thread per cpu.
3749 Default value: \fB100\fR%.
3755 \fBzil_maxblocksize\fR (int)
3758 This sets the maximum block size used by the ZIL. On very fragmented pools,
3759 lowering this (typically to 36KB) can improve performance.
3761 Default value: \fB131072\fR (128KB).
3767 \fBzil_nocacheflush\fR (int)
3770 Disable the cache flush commands that are normally sent to the disk(s) by
3771 the ZIL after an LWB write has completed. Setting this will cause ZIL
3772 corruption on power loss if a volatile out-of-order write cache is enabled.
3774 Use \fB1\fR for yes and \fB0\fR for no (default).
3780 \fBzil_replay_disable\fR (int)
3783 Disable intent logging replay. Can be disabled for recovery from corrupted
3786 Use \fB1\fR for yes and \fB0\fR for no (default).
3792 \fBzil_slog_bulk\fR (ulong)
3795 Limit SLOG write size per commit executed with synchronous priority.
3796 Any writes above that will be executed with lower (asynchronous) priority
3797 to limit potential SLOG device abuse by single active ZIL writer.
3799 Default value: \fB786,432\fR.
3805 \fBzio_deadman_log_all\fR (int)
3808 If non-zero, the zio deadman will produce debugging messages (see
3809 \fBzfs_dbgmsg_enable\fR) for all zios, rather than only for leaf
3810 zios possessing a vdev. This is meant to be used by developers to gain
3811 diagnostic information for hang conditions which don't involve a mutex
3812 or other locking primitive; typically conditions in which a thread in
3813 the zio pipeline is looping indefinitely.
3815 Default value: \fB0\fR.
3821 \fBzio_decompress_fail_fraction\fR (int)
3824 If non-zero, this value represents the denominator of the probability that zfs
3825 should induce a decompression failure. For instance, for a 5% decompression
3826 failure rate, this value should be set to 20.
3828 Default value: \fB0\fR.
3834 \fBzio_slow_io_ms\fR (int)
3837 When an I/O operation takes more than \fBzio_slow_io_ms\fR milliseconds to
3838 complete is marked as a slow I/O. Each slow I/O causes a delay zevent. Slow
3839 I/O counters can be seen with "zpool status -s".
3842 Default value: \fB30,000\fR.
3848 \fBzio_dva_throttle_enabled\fR (int)
3851 Throttle block allocations in the I/O pipeline. This allows for
3852 dynamic allocation distribution when devices are imbalanced.
3853 When enabled, the maximum number of pending allocations per top-level vdev
3854 is limited by \fBzfs_vdev_queue_depth_pct\fR.
3856 Default value: \fB1\fR.
3862 \fBzio_requeue_io_start_cut_in_line\fR (int)
3865 Prioritize requeued I/O
3867 Default value: \fB0\fR.
3873 \fBzio_taskq_batch_pct\fR (uint)
3876 Percentage of online CPUs (or CPU cores, etc) which will run a worker thread
3877 for I/O. These workers are responsible for I/O work such as compression and
3878 checksum calculations. Fractional number of CPUs will be rounded down.
3880 The default value of 75 was chosen to avoid using all CPUs which can result in
3881 latency issues and inconsistent application performance, especially when high
3882 compression is enabled.
3884 Default value: \fB75\fR.
3890 \fBzvol_inhibit_dev\fR (uint)
3893 Do not create zvol device nodes. This may slightly improve startup time on
3894 systems with a very large number of zvols.
3896 Use \fB1\fR for yes and \fB0\fR for no (default).
3902 \fBzvol_major\fR (uint)
3905 Major number for zvol block devices
3907 Default value: \fB230\fR.
3913 \fBzvol_max_discard_blocks\fR (ulong)
3916 Discard (aka TRIM) operations done on zvols will be done in batches of this
3917 many blocks, where block size is determined by the \fBvolblocksize\fR property
3920 Default value: \fB16,384\fR.
3926 \fBzvol_prefetch_bytes\fR (uint)
3929 When adding a zvol to the system prefetch \fBzvol_prefetch_bytes\fR
3930 from the start and end of the volume. Prefetching these regions
3931 of the volume is desirable because they are likely to be accessed
3932 immediately by \fBblkid(8)\fR or by the kernel scanning for a partition
3935 Default value: \fB131,072\fR.
3941 \fBzvol_request_sync\fR (uint)
3944 When processing I/O requests for a zvol submit them synchronously. This
3945 effectively limits the queue depth to 1 for each I/O submitter. When set
3946 to 0 requests are handled asynchronously by a thread pool. The number of
3947 requests which can be handled concurrently is controller by \fBzvol_threads\fR.
3949 Default value: \fB0\fR.
3955 \fBzvol_threads\fR (uint)
3958 Max number of threads which can handle zvol I/O requests concurrently.
3960 Default value: \fB32\fR.
3966 \fBzvol_volmode\fR (uint)
3969 Defines zvol block devices behaviour when \fBvolmode\fR is set to \fBdefault\fR.
3970 Valid values are \fB1\fR (full), \fB2\fR (dev) and \fB3\fR (none).
3972 Default value: \fB1\fR.
3975 .SH ZFS I/O SCHEDULER
3976 ZFS issues I/O operations to leaf vdevs to satisfy and complete I/Os.
3977 The I/O scheduler determines when and in what order those operations are
3978 issued. The I/O scheduler divides operations into five I/O classes
3979 prioritized in the following order: sync read, sync write, async read,
3980 async write, and scrub/resilver. Each queue defines the minimum and
3981 maximum number of concurrent operations that may be issued to the
3982 device. In addition, the device has an aggregate maximum,
3983 \fBzfs_vdev_max_active\fR. Note that the sum of the per-queue minimums
3984 must not exceed the aggregate maximum. If the sum of the per-queue
3985 maximums exceeds the aggregate maximum, then the number of active I/Os
3986 may reach \fBzfs_vdev_max_active\fR, in which case no further I/Os will
3987 be issued regardless of whether all per-queue minimums have been met.
3989 For many physical devices, throughput increases with the number of
3990 concurrent operations, but latency typically suffers. Further, physical
3991 devices typically have a limit at which more concurrent operations have no
3992 effect on throughput or can actually cause it to decrease.
3994 The scheduler selects the next operation to issue by first looking for an
3995 I/O class whose minimum has not been satisfied. Once all are satisfied and
3996 the aggregate maximum has not been hit, the scheduler looks for classes
3997 whose maximum has not been satisfied. Iteration through the I/O classes is
3998 done in the order specified above. No further operations are issued if the
3999 aggregate maximum number of concurrent operations has been hit or if there
4000 are no operations queued for an I/O class that has not hit its maximum.
4001 Every time an I/O is queued or an operation completes, the I/O scheduler
4002 looks for new operations to issue.
4004 In general, smaller max_active's will lead to lower latency of synchronous
4005 operations. Larger max_active's may lead to higher overall throughput,
4006 depending on underlying storage.
4008 The ratio of the queues' max_actives determines the balance of performance
4009 between reads, writes, and scrubs. E.g., increasing
4010 \fBzfs_vdev_scrub_max_active\fR will cause the scrub or resilver to complete
4011 more quickly, but reads and writes to have higher latency and lower throughput.
4013 All I/O classes have a fixed maximum number of outstanding operations
4014 except for the async write class. Asynchronous writes represent the data
4015 that is committed to stable storage during the syncing stage for
4016 transaction groups. Transaction groups enter the syncing state
4017 periodically so the number of queued async writes will quickly burst up
4018 and then bleed down to zero. Rather than servicing them as quickly as
4019 possible, the I/O scheduler changes the maximum number of active async
4020 write I/Os according to the amount of dirty data in the pool. Since
4021 both throughput and latency typically increase with the number of
4022 concurrent operations issued to physical devices, reducing the
4023 burstiness in the number of concurrent operations also stabilizes the
4024 response time of operations from other -- and in particular synchronous
4025 -- queues. In broad strokes, the I/O scheduler will issue more
4026 concurrent operations from the async write queue as there's more dirty
4031 The number of concurrent operations issued for the async write I/O class
4032 follows a piece-wise linear function defined by a few adjustable points.
4035 | o---------| <-- zfs_vdev_async_write_max_active
4042 |-------o | | <-- zfs_vdev_async_write_min_active
4043 0|_______^______|_________|
4044 0% | | 100% of zfs_dirty_data_max
4046 | `-- zfs_vdev_async_write_active_max_dirty_percent
4047 `--------- zfs_vdev_async_write_active_min_dirty_percent
4050 Until the amount of dirty data exceeds a minimum percentage of the dirty
4051 data allowed in the pool, the I/O scheduler will limit the number of
4052 concurrent operations to the minimum. As that threshold is crossed, the
4053 number of concurrent operations issued increases linearly to the maximum at
4054 the specified maximum percentage of the dirty data allowed in the pool.
4056 Ideally, the amount of dirty data on a busy pool will stay in the sloped
4057 part of the function between \fBzfs_vdev_async_write_active_min_dirty_percent\fR
4058 and \fBzfs_vdev_async_write_active_max_dirty_percent\fR. If it exceeds the
4059 maximum percentage, this indicates that the rate of incoming data is
4060 greater than the rate that the backend storage can handle. In this case, we
4061 must further throttle incoming writes, as described in the next section.
4063 .SH ZFS TRANSACTION DELAY
4064 We delay transactions when we've determined that the backend storage
4065 isn't able to accommodate the rate of incoming writes.
4067 If there is already a transaction waiting, we delay relative to when
4068 that transaction will finish waiting. This way the calculated delay time
4069 is independent of the number of threads concurrently executing
4072 If we are the only waiter, wait relative to when the transaction
4073 started, rather than the current time. This credits the transaction for
4074 "time already served", e.g. reading indirect blocks.
4076 The minimum time for a transaction to take is calculated as:
4078 min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
4079 min_time is then capped at 100 milliseconds.
4082 The delay has two degrees of freedom that can be adjusted via tunables. The
4083 percentage of dirty data at which we start to delay is defined by
4084 \fBzfs_delay_min_dirty_percent\fR. This should typically be at or above
4085 \fBzfs_vdev_async_write_active_max_dirty_percent\fR so that we only start to
4086 delay after writing at full speed has failed to keep up with the incoming write
4087 rate. The scale of the curve is defined by \fBzfs_delay_scale\fR. Roughly speaking,
4088 this variable determines the amount of delay at the midpoint of the curve.
4092 10ms +-------------------------------------------------------------*+
4108 2ms + (midpoint) * +
4111 | zfs_delay_scale ----------> ******** |
4112 0 +-------------------------------------*********----------------+
4113 0% <- zfs_dirty_data_max -> 100%
4116 Note that since the delay is added to the outstanding time remaining on the
4117 most recent transaction, the delay is effectively the inverse of IOPS.
4118 Here the midpoint of 500us translates to 2000 IOPS. The shape of the curve
4119 was chosen such that small changes in the amount of accumulated dirty data
4120 in the first 3/4 of the curve yield relatively small differences in the
4123 The effects can be easier to understand when the amount of delay is
4124 represented on a log scale:
4128 100ms +-------------------------------------------------------------++
4137 + zfs_delay_scale ----------> ***** +
4148 +--------------------------------------------------------------+
4149 0% <- zfs_dirty_data_max -> 100%
4152 Note here that only as the amount of dirty data approaches its limit does
4153 the delay start to increase rapidly. The goal of a properly tuned system
4154 should be to keep the amount of dirty data out of that range by first
4155 ensuring that the appropriate limits are set for the I/O scheduler to reach
4156 optimal throughput on the backend storage, and then by changing the value
4157 of \fBzfs_delay_scale\fR to increase the steepness of the curve.