Michael Tuexen [Sat, 4 May 2019 10:06:17 +0000 (10:06 +0000)]
MFC r343960:
Fix a locking issue in the IPPROTO_SCTP level SCTP_PEER_ADDR_THLDS socket
option. The problem affects only setsockopt with invalid parameters.
Michael Tuexen [Sat, 4 May 2019 10:04:00 +0000 (10:04 +0000)]
MFC r340179:
Don't use a function when neither INET nor INET6 are defined.
This is a valid case for the userland stack, where this fixes
two set-but-not-used warnings in this case.
Thanks to Christian Wright for reporting the issue.
Michael Tuexen [Sat, 4 May 2019 09:25:32 +0000 (09:25 +0000)]
MFC r343951:
Fix locking for IPPROTO_SCTP level SCTP_DEFAULT_PRINFO socket option.
This problem occurred when calling setsockopt() will invalid parameters.
Michael Tuexen [Sat, 4 May 2019 09:11:17 +0000 (09:11 +0000)]
MFC r343661:
When handling SYN-ACK segments in the SYN-RCVD state, set tp->snd_wnd
consistently.
This inconsistency was observed when working on the bug reported in
PR 235256, although it does not fix the reported issue. The fix for
the PR will be a separate commit.
PR: 235256
Reviewed by: rrs@, Richard Scheffenegger
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D19033
Michael Tuexen [Sat, 4 May 2019 09:07:53 +0000 (09:07 +0000)]
MFC r343525:
Fix the detection of ECN-setup SYN-ACK packets.
RFC 3168 defines an ECN-setup SYN-ACK packet as on with the ECE flags
set and the CWR flags not set. The code was only checking if ECE flag
is set. This patch adds the check to verify that the CWR flags is not
set.
Submitted by: Richard Scheffenegger
Reviewed by: tuexen@
Differential Revision: https://reviews.freebsd.org/D18996
Rick Macklem [Fri, 3 May 2019 02:58:33 +0000 (02:58 +0000)]
MFC: r346424
Add support for the ModeSetMasked attribute to the NFSv4.1 server.
I do not know of an extant NFSv4.1 client that currently does a Setattr
operation for the ModeSetMasked, but it has been discussed on the linux-nfs
mailing list.
This patch adds support for doing a Setattr of ModeSetMasked, so that it
will work for any future NFSv4.1 client that chooses to do so.
Tested via a hacked FreeBSD NFSv4.1 client.
Rick Macklem [Fri, 3 May 2019 02:44:57 +0000 (02:44 +0000)]
MFC: r346423
Replace "vp" with NULL to make the code more readable.
At the time of this nfsv4_sattr() call, "vp == NULL", so this patch doesn't
change the semantics, but I think it makes the code more readable.
It also makes it consistent with the nfsv4_sattr() call a few lines above
this one. Found during code inspection.
Rick Macklem [Fri, 3 May 2019 02:03:29 +0000 (02:03 +0000)]
MFC: r346365
Fix the NFSv4.0 server so that it does not support NFSv4.1 attributes.
During inspection of a packet trace, I noticed that an NFSv4.0 mount
reported that it supported attributes that are only defined for NFSv4.1.
In practice, this bug appears to be benign, since NFSv4.0 clients will
not use attributes that were added for NFSv4.1.
However, this was not correct and this patch fixes the NFSv4.0 server
so that it only supports attributes defined for NFSv4.0.
It also adds a definition for NFSv4.1 attributes that can only be set,
although it is only defined as 0 for now.
This is anticipation of the addition of support for the NFSv4.1 mode+mask
attribute soon.
Glen Barber [Fri, 3 May 2019 00:45:31 +0000 (00:45 +0000)]
MFC r346959:
Reduce the default image size for virtual machine disk images from
30GB to 3GB. The raw images can be resized using truncate(1), and
other formats can be resized with tools included in hypervisors.
Enable the growfs(8) rc(8) at firstboot if the disk was resized
prior to booting the virtual machine for the first time.
Rick Macklem [Thu, 2 May 2019 01:02:26 +0000 (01:02 +0000)]
MFC: r346709
Add support to nfsdumpstate for printing of INET6 addresses for locks.
r346190 added support for printing of INET6 addresses for the "-o" option
(all opens) but missed adding support for INET6 addresses for the "-l" option.
This patch adds that support.
John Baldwin [Wed, 1 May 2019 21:45:15 +0000 (21:45 +0000)]
MFC 345655: Remove nested epochs from lagg(4).
lagg_bcast_start appeared to have a bug in that was using the last
lagg port structure after exiting the epoch that was keeping that
structure alive. However, upon further inspection, the epoch was
already entered by the caller (lagg_transmit), so the epoch enter/exit
in lagg_bcast_start was actually unnecessary.
This commit generally removes uses of the net epoch via LAGG_RLOCK to
protect the list of ports when the list of ports was already protected
by an existing LAGG_RLOCK in a caller, or the LAGG_XLOCK.
It also adds a missing epoch enter/exit in lagg_snd_tag_alloc while
accessing the lagg port structures. An ifp is still accessed via an
unsafe reference after the epoch is exited, but that is true in the
current code and will be fixed in a future change.
John Baldwin [Wed, 1 May 2019 18:17:42 +0000 (18:17 +0000)]
MFC 346066: Refine r330113 to honor the ProducerConsumer flag most of the time.
While it is true that the ACPI spec says that the flag is only valid
on Extended Address Space Descriptors, examples of other descriptors
in the spec use the ProducerConsumer flag explicitly, and real
hardware uses it as well. In fact, even in the ASL of the Thunder X2
for which r330113 was a workaround, some devices use this flag on
non-Extended Address Space Descriptors correctly. Instead, only
ignore the flag for resources associated with the UART devices on the
Thunder X2 using the "ARMH0011" HID to identify these devices.
This should fix regressions from ignoring this flag in other contexts
such as Hyper-V.
John Baldwin [Wed, 1 May 2019 17:30:14 +0000 (17:30 +0000)]
MFC 346063: Don't pre-reserve resources for CPU devices when they are set.
CPUs can use shared (RF_SHAREABLE) resources for the I/O port used for
entering and exiting C states. If this I/O port is included in an ACPI
system resource device, then this happens to still work, but if the port
wasn't part of a system resource device, only the first CPU could allocate
the I/O port and use C states since resource_list_reserve() was always
allocating the resource from nexus0 without RF_SHAREABLE. By avoiding
the reservation, the flags from the bus_alloc_resource() in the CPU driver
(which include RF_SHAREABLE) are honored.
Alexander Motin [Wed, 1 May 2019 13:42:56 +0000 (13:42 +0000)]
MFC r346644: Call delist_dev() before destroy_dev_sched_cb().
destroy_dev_sched_cb() is excessively asynchronous, and during media change
retaste new provider may appear sooner then device of the previous one get
destroyed.
MFC r345843:
Follow the declared behaviour that specifies server string format in
bsnmpclient(3).
snmp_parse_server() function accepts string where some fields can be
omitted: [trans::][community@][server][:port]
"trans" field can be "udp", "udp6", "dgram" and "stream".
"community" can be empty string, if it is omitted, the default value
will be used. For read_community it is "public", for write_comminity
it is "private". "server" field can be hostname, IPv4 address or IPv6
address. IPv6 address should be specified in brackets "[]".
If port is omitted, the default value "snmp" will be used for "udp"
and "udp6" transports. So, now for bsnmpget(1) and bsnmwalk(1) it is
not required to specify all fields in argument of '-s' option. E.g.
This patch adds a new table begemotSnmpdTransInetTable that uses the
InetAddressType textual convention and can be used to create listening
ports for IPv4, IPv6, zoned IPv6 and based on DNS names. It also supports
future extension beyond UDP by adding a protocol identifier to the table
index. In order to support this gensnmptree had to be modified.
Submitted by: harti
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D16654
MFC r346263:
tcpdump: disable Capsicum if -E option is provided.
The -E is used to provide a secret for decrypting IPsec.
The secret may be provided through command line or as the file.
The problem is that tcpdump doesn't support yet opening files in capability mode
and the file may contain a list of the files to open.
As a workaround, for now, let's just disable capsicum if the -E
the option is provided.
Cy Schubert [Wed, 1 May 2019 01:42:38 +0000 (01:42 +0000)]
MFC r341759, r341796, r341839, r341989, r346591:
The following five MFCs update wpa 2.6 --> 2.8.
r341759:
MFV r341618: Update wpa 2.6 --> 2.7.
r341796:
Clean stale wpa dependencies and objects after r341759
The wpa update added some source files with the same name as a file in
another directory (found via .PATH in the previous version). Having a
stale entry in a .depend file means the new file won't be built, so test
for this case and if found remove all of wpa's dependency files.
Sponsored by: The FreeBSD Foundation
r341839:
Set default ciphers.
Submitted by: jkim@
r341989:
Makefile.inc1: update stale wpa dependency removal statement
Only stale .depend files are removed; do not mention object files.
John Baldwin [Tue, 30 Apr 2019 23:53:54 +0000 (23:53 +0000)]
MFC 344711: Fix missed posted interrupts in VT-x in bhyve.
When a vCPU is HLTed, interrupts with a priority below the processor
priority (PPR) should not resume the vCPU while interrupts at or above
the PPR should. With posted interrupts, bhyve maintains a bitmap of
pending interrupts in PIR descriptor along with a single 'pending'
bit. This bit is checked by a CPU running in guest mode at various
places to determine if it should be checked. In addition, another CPU
can force a CPU in guest mode to check for pending interrupts by
sending an IPI to a special IDT vector reserved for this purpose.
bhyve had a bug in that it would only notify a guest vCPU of an
interrupt (e.g. by sending the special IPI or by resuming it if it was
idle due to HLT) if an interrupt arrived that was higher priority than
PPR and no interrupts were currently pending. This assumed that if
the 'pending' bit was set, any needed notification was already in
progress. However, if the first interrupt sent to a HLTed vCPU was
lower priority than PPR and the second was higher than PPR, the first
interrupt would set 'pending' but not notify the vCPU, and the second
interrupt would not notify the vCPU because 'pending' was already set.
To fix this, track the priority of pending interrupts in a separate
per-vCPU bitmask and notify a vCPU anytime an interrupt arrives that
is above PPR and higher than any previously-received interrupt.
This was found and debugged in the bhyve port to SmartOS maintained by
Joyent. Relevant SmartOS bugs with more background:
cxgbe(4): Make sure bundled_fw is always initialized before use.
This fixes a bug that prevented the driver from auto-flashing the
firmware when it didn't see one on the card. This feature was
introduced in r321390 and this bug was introduced in r343269.
cxgbe/netmap: Fix cxgbe netmap when interface is DOWN
A kernel panic can occur if the cxgbe interface is DOWN
when activating netmap. This patch prevents the driver
from freeing up cxgbe netmap resources when they have not
been allocated.
Submitted by: Nicolas Witkowski <nwitkowski@verisign.com>
Reviewed by: np
Sponsored by: Verisign, Inc.
Differential Revision: https://reviews.freebsd.org/D17802
fw_outstanding"(outstanding IOs at firmware level) counter gets screwed up when R1 fastpath
writes are running. Some of the cases which are not handled properly in driver are:
1. With R1 fastpath supported, single write from CAM layer can consume 2 MPT frames
at driver/firmware level for fastpath qualification(if fw_outstanding < controller Queue Depth).
Due to this driver has to throttle IOs coming from CAM layer as well as second fastpath
write(of R1 write) against Adapter Queue Depth.
If "fw_outstanding" reaches to adapter queue depth, driver should return IOs from CAM layer with
device busy status.While allocating second MPT frame(corresponding to R1 FP write) also, driver
should ensure fw_outstanding should not exceed adapter QD.
2. For R1 fastpath writes completion, driver decrements "fw_oustanding" counter without
really returning MPT frame to free pool. It may cause IOs(with heavy IOs running, consuming whole
adapter Queue Depth) consuming MPT frames reserved for DCMDs(management commands) and
DCMDs(internal and sent by application) not getting MPT frame will start failing.
Below is one test case to hit the issue described above-
1. Run heavy IOs (outstanding IOs should hit adapter Queue Depth).
2. Run management tool (Broadcom's storcli tool) querying adapter in loop (run command- "storcli64 /c0 show" in loop).
3. Management tool's requests would start failing due to non-availability of free MPT frames as all frames would be consumed by IOs.
Fix: Increment/decrement of "fw_outstanding" counter should be in sync with MPT frame get/return.
r345058
Allocated MFI frames should be same as MPT frames reserved for DCMDs
Ian Lepore [Tue, 30 Apr 2019 00:54:31 +0000 (00:54 +0000)]
MFC r346489:
Move the reporting of spurious interrupts under bootverbose control, because
occasional spurious interrupts are a normal thing on this hardware. Also,
change the name of the cpu-local interrupt controller driver from local_intc
to lintc, because the name gets built into interrupt names, which have to
fit into a 19-byte field for stats reporting (so this allows 5 more bytes
of the actual interrupt name to be displayed).
Enji Cooper [Mon, 29 Apr 2019 23:07:19 +0000 (23:07 +0000)]
MFC r345351:
r345351 (by bdrewery):
Build common kernel dependencies before modules.
This ensures files like genassym.o and awk/mfiles are generated before
descending into the modules build. It may also allow some module builds
to not recreate files that are already present in the KERNBUILDDIR.
This fixes a rare build race where genassym.o is missing and assym.inc
is empty.
More work is planned around this to reduce some redundant dependency
generation in modules.
Enji Cooper [Mon, 29 Apr 2019 19:32:11 +0000 (19:32 +0000)]
MFC r345723:
PROG_OVERRIDE_VARS should override default values if specified
The behavior prior to this change would not override default values if set in
`bsd.own.mk`, or (in the more general case) globally before `bsd.progs.mk` was
included. This affected `bsd.test.mk` as well, since it consumes
`bsd.progs.mk`.
Some examples of this failing behavior are as follows:
* `BINMODE` defaults to 0555 per `bsd.own.mk`. If someone wanted to set the
`BINMODE` to `NOBINMODE` (0444) for `prog`, for example, like
`BINMODE.prog= ${NOBINMODE}`, `bsd.progs.mk` would not honor the per-PROG
setting.
* An application, `prog`, does not build at `WARNS?= 6`. Before this change,
setting to a lower `WARNS` value, e.g., `WARNS.prog= 3`, would have been
impossible, requiring that `prog` be built from another directory,
the global `WARNS` be lowered, or a per-PROG value needing to be set
across the board. None of the above workarounds is desirable.
This change unbreaks variables defined in `PROG_OVERRIDE_VARS` which have
defaults set before `bsd.progs.mk` is included, by setting them to their
defined values if set on a per-PROG basis.
Enji Cooper [Mon, 29 Apr 2019 19:12:47 +0000 (19:12 +0000)]
MFC r346539:
Fix `get_int_via_sysctlbyname(..)` on Jenkins
Initialize `oldlen` to the size of the value, instead of leaving the value
unitialized. Leaving it unitialized seems to work by accident on amd64 when
running 64-bit programs, but not on i386.
This matches patterns in use in other programs.
PR: 237458
Tested on: ^/head (amd64), ^/stable/11 (i386)
Alexander Motin [Mon, 29 Apr 2019 19:10:24 +0000 (19:10 +0000)]
MFC r344192 (by sef): Add support for a virtual hostname to nfsd
Specifically, this allows (via "-V vhostname") telling nfsd what principal
to use, instead of the hostname. This is used at iXsystems for fail-over in
HA systems.
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D19191
Enji Cooper [Mon, 29 Apr 2019 19:09:44 +0000 (19:09 +0000)]
MFC r346542:
Fix sys.kern.coredump_phnum_test.coredump_phnum on i386
The zero-padding when printing out the Size field is on 32-bit architectures is
5, not 15. Adjust the regular expression to work with both the 32-bit and
64-bit case.
r346568: ar: test for writing 64-bit format only if symbol count is nonzero
This is a minor simplification; if we do not have any symbols the empty
symbol table can be in 32-bit format.
r346569: ar: use array notation to access s_so
This is somewhat more readable than pointer arithmetic. Also remove an
unnecessary cast while here.
r346582: ar: shuffle symbol offsets during conversion for 32-bit ar archives
During processing we maintain symbol offsets in the 64-bit s_so array,
and when writing the archive convert to 32-bit if no offsets are greater
than 4GB. However, this was somewhat inefficient as we looped over the
array twice: first, converting to big endian and second, writing each
32-bit value one at a time (and incorrectly so on big-endian platforms).
Instead, when writing a 32-bit archive shuffle convert symbol data to
big endian (as required by the ar format) and shuffle to the beginning
of the allocation at the same time.
Also correct emission of the symbol count on big endian platforms.
Further changes are planned, but this should fix powerpc64.
Ed Maste [Mon, 29 Apr 2019 18:25:39 +0000 (18:25 +0000)]
MFC r339648: ar: report errno on warning/error
Previously ar would report an error like "ar: fatal: Write error"
without including additional errno information. Change warnings and
errors to include archive_errno() so that the user may have some idea
of the reason for the failure.
For 32-bit Linuxulator, ipc() syscall was historically
the entry point for the IPC API. Starting in Linux 4.18, direct
syscalls are provided for the IPC. Enable it.
Linux between 4.18 and 5.0 split IPC system calls.
In preparation for doing this in the Linuxulator modify our linux_shmat()
to match actual Linux shmat() system call.
r346545:
libbe(3): allow creation of arbitrary depth boot environments
libbe currently only provides an API to create a recursive boot environment,
without any formal support for intentionally limiting the depth. This
changeset adds an API, be_create_depth, that may be used to arbitrarily
restrict the depth of the new BE.
r346546:
libbe(3): Add a test for be creation
r346680:
libbe(3): Copy received properties as well
This was inherently broken on send|recv datasets.
r346700:
libbe(3): Fix mis-application of patch (SHLIBDIR)
Rob's patch in D18564 cemented the SHLIBDIR because bsd.own.mk (included by
src.opts.mk) sets it to /usr/lib. r346546 did somehow not apply this part of
the patch, leaving it to get installed to the wrong place and subsequently
removed via ObsoleteFiles.
r346705:
libbe(3): Fix libcompat build
SHLIBDIR should still be optionally set, just before src.opts.mk is included
so that libcompat can properly override it. This fixes lib32 failures
reported by both Jenkins and Michael Butler.
if_bridge and if_vxlan conversion to this deterministic MAC address KPI has
been MFC as well. This is potentially error prone as the generated address
range for these has decreased, but I've deemed this acceptable for stable
branches due to collisions for thees interfaces being easily remedied.
I have no intention of switching anything else to this KPI in any stable
branches.
r345139:
ether: centralize fake hwaddr generation
We currently have two places with identical fake hwaddr generation --
if_vxlan and if_bridge. Lift it into if_ethersubr for reuse in other
interfaces that may also need a fake addr.
r345151:
ether_fakeaddr: Use 'b' 's' 'd' for the prefix
This has the advantage of being obvious to sniff out the designated prefix
by eye and it has all the right bits set. Comment stolen from ffec.
I've removed bryanv@'s pending question of using the FreeBSD OUI range --
no one has followed up on this with a definitive action, and there's no
particular reason to shoot for it and the administrative overhead that comes
with deciding exactly how to use it.
r346324:
net: adjust randomized address bits
Give devices that need a MAC a 16-bit allocation out of the FreeBSD
Foundation OUI range. Change the name ether_fakeaddr to ether_gen_addr now
that we're dealing real MAC addresses with a real OUI rather than random
locally-administered addresses.
r346328:
Compile sha1.c when ether support is included
sha1 is used by ether_gen_addr after r346324. Perhaps in an ideal world we
could detect that the kernel's been compiled without sha1_* bits included
and silently fallback to arc4random instead because these platforms/kernel
configs are far and few between. It's fairly lightweight, though, so just
include it for now.
MFC r346307, r346618: Further DTB building consolidation/documentation
r346307:
fdt: further consolidate DTB building and revise manpage
FDT_DTS_FILE was built separately with a rule in sys/conf/files and
recreated the rules we used in dtb.mk. Now that we have other infrastructure
to build a DTB along with the kernel, fold FDT_DTS_FILE into that since it
doesn't have any special requirements.
fdt(4) never got revised to mention the DTS/DTSO make options, so do that
now.
r346618:
fdt: stop installing FDT_DTS_FILE
r346307 inadvertently started installing FDT_DTS_FILE along with the kernel.
While this isn't necessarily bad, it was not intended or discussed and it
actively breaks some current setups that don't anticipate any .dtb being
installed when it's using static fdt. This change could be reconsidered down
the line, but it needs to be done with prior discussion.
Fix it by pushing FDT_DTS_FILE build down into the raw dtb.build.mk bits.
This technically allows modules building DTS to accidentally specify an
FDT_DTS_FILE that gets built but isn't otherwise useful (since it's not
installed), but I suspect this isn't a big deal and would get caught with
any kind of testing -- and perhaps this might end up useful in some other
way, for example by some module wanting to embed fdt in some other way than
our current/normal mechanism.
Colin Percival [Sat, 27 Apr 2019 04:00:50 +0000 (04:00 +0000)]
MFC r346628: Split the pkg configuration file FreeBSD.conf into versions
for {latest, quarterly} and use Makefile logic to decide which one to
install (right now, unconditionally "latest").
Rick Macklem [Sat, 27 Apr 2019 01:58:51 +0000 (01:58 +0000)]
MFC: r346191
Add support for INET6 addresses to the kernel code that dumps open/lock state.
PR#223036 reported that INET6 callback addresses were not printed by
nfsdumpstate(8). This kernel patch adds INET6 addresses to the dump structure,
so that nfsdumpstate(8) can print them out, post-r346190.
Alexander Motin [Fri, 26 Apr 2019 17:21:12 +0000 (17:21 +0000)]
MFC r345656: Do not map small IOCTL buffers to KVA, but copy.
CAM IOCTL interfaces traditionally mapped user-space data buffers to KVA.
It was nice originally, but now it takes too much to handle respective
TLB shootdowns, while small kernel memory allocations up to 64KB backed
by UMA and accompanied by copyin()/copyout() can be much cheaper.
For large buffers mapping still may have sense, and unmapped I/O would
be even better, but the last unfortunately is more tricky, since unmapped
I/O API is too specific to struct bio now.
We cannot just assume that any name which ends with a letter is a group
That's not been true since we allowed renaming of network interfaces. It's also
not true for things like epair0a.
Try to retrieve the group members for the name to check, since we'll get ENOENT
if the group doesn't exist.
Marcin Wojtas [Fri, 26 Apr 2019 01:41:55 +0000 (01:41 +0000)]
MFC r345438,r345842,r346259,r346261: TPM as possible entropy source
r345438:
Allow using TPM as entropy source
TPM has a built-in RNG, with its own entropy source.
The driver was extended to harvest 16 random bytes from TPM every 10 seconds.
A new build option "TPM_HARVEST" was introduced - for now, however, it
is not enabled by default in the GENERIC config.
r345842:
Add a cv_wait to the TPM2.0 harvesting function
Marcin Wojtas [Fri, 26 Apr 2019 00:48:52 +0000 (00:48 +0000)]
MFC r344840: Extend libsecureboot(old libve) to obtain trusted certificates from UEFI and implement revocation
UEFI related headers were copied from edk2.
A new build option "MK_LOADER_EFI_SECUREBOOT" was added to allow
loading of trusted anchors from UEFI.
Certificate revocation support is also introduced.
The forbidden certificates are loaded from dbx variable.
Verification fails in two cases:
There is a direct match between cert in dbx and the one in the chain.
The CA used to sign the chain is found in dbx.
One can also insert a hash of TBS section of a certificate into dbx.
In this case verifications fails only if a direct match with a
certificate in chain is found.
Marcin Wojtas [Fri, 26 Apr 2019 00:39:30 +0000 (00:39 +0000)]
MFC r343911: Allow reading the UEFI variable size
When loading bigger variables form UEFI it is necessary to know their
size beforehand, so that an appropriate amount of memory can be
allocated. The easiest way to do this is to try to read the variable
with buffer size equal 0, expecting EFI_BUFFER_TOO_SMALL error to be
returned. Allow such possible approach in efi_getenv routine.
MFC: r345888: Use IN_foo() macros from sys/netinet/in.h inplace of
handcrafted code
There are a few places that use hand crafted versions of the macros
from sys/netinet/in.h making it difficult to actually alter the
values in use by these macros. Correct that by replacing handcrafted
code with proper macro usage.
Alexander Motin [Thu, 25 Apr 2019 21:04:38 +0000 (21:04 +0000)]
MFC r339826 (by yuripv):
Provide basic descriptions for VMX exit reason (from "Intel 64 and IA-32
Architectures Software Developer’s Manual Volume 3"). Add the document
to SEE ALSO in bhyve.8 (and pet manlint here a bit).
Alexander Motin [Thu, 25 Apr 2019 14:41:29 +0000 (14:41 +0000)]
MFC r345200: MFV r336930: 9284 arc_reclaim_thread has 2 jobs
`arc_reclaim_thread()` calls `arc_adjust()` after calling
`arc_kmem_reap_now()`; `arc_adjust()` signals `arc_get_data_buf()` to
indicate that we may no longer be `arc_is_overflowing()`.
The problem is, `arc_kmem_reap_now()` can take several seconds to
complete, has no impact on `arc_is_overflowing()`, but due to how the
code is structured, can impact how long the ARC will remain in the
`arc_is_overflowing()` state.
The fix is to use seperate threads to:
1. keep `arc_size` under `arc_c`, by calling `arc_adjust()`, which
improves `arc_is_overflowing()`
2. keep enough free memory in the system, by calling
`arc_kmem_reap_now()` plus `arc_shrink()`, which improves
`arc_available_memory()`.