Casper services expect that the first 3 descriptors (stdin/stdout/stderr)
will point to /dev/null. Which Casper will ensure later. The Casper
services are forked from the original process. If the initial process
closes one of those descriptors, Casper may reuse one of them for it on
purpose. If this is the case, then renumarate the descriptors used by
Casper to higher numbers. This is done already after the fork, so it
doesn't break the parent process.
Kristof Provost [Sat, 15 May 2021 11:45:55 +0000 (13:45 +0200)]
pf: Convenience function for optional (numeric) arguments
Add _opt() variants for the uint* functions. These functions set the
provided default value if the nvlist doesn't contain the relevant value.
This is helpful for optional values (e.g. when the API is extended to
add new fields).
While here simplify the header by also using macros to create the
prototypes for the macro-generated function implementations.
netgraph/ng_base: Renaming a node to the same name is a noop
Detailed analysis in https://github.com/genneko/freebsd-vimage-jails/issues/2
brought the problem down to a double call of ng_node_name() before and
after a vnet move. Because the name of the node is already known
(occupied by itself), the second call fails.
PR: 241954
Reported by: Paul Armstrong
Differential Revision: https://reviews.freebsd.org/D30110
Ed Maste [Thu, 7 Nov 2019 15:51:44 +0000 (15:51 +0000)]
linux_renameat2: improve flag checks
In the cases where Linux returns an error (e.g. passing in an undefined
flag) there's no need for us to emit a message. (The target of this
message is a developer working on the linuxulatorm, not the author of
presumably broken Linux software).
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21606
Dmitry Chagin [Wed, 26 May 2021 05:34:32 +0000 (08:34 +0300)]
linux_common: retire extra module version.
The second 'linuxcommon' line was added by c66f5b079d2a259c3a65b1efe0f2143cd030dc52
but Linuxulator's modules dependend on 'linux_common'.
To avoid such mistakes in the future rename moduledata name and module
name to 'linux_common' and retire 'linuxcommon' line.
Tijl Coosemans [Sun, 6 Dec 2020 10:58:55 +0000 (10:58 +0000)]
Move V4L feature declarations and DTrace provider definitions from
linux_common.c to linux_util.c so they become available on i386.
linux_common.c defines the linux_common kernel module but this module does
not exist on i386 and linux_common.c is not included in the linux module.
linux_util.c is included in the linux_common module on amd64 and the linux
module on i386.
Remove linux_common.c from files.i386 again. It was added recently in
r367433 when the DTrace provider definitions were moved.
The V4L feature declarations were moved to linux_common in r283423.
Tijl Coosemans [Sat, 5 Dec 2020 14:53:24 +0000 (14:53 +0000)]
Fix i386 linux module after r367395.
In r367395 parts of machine dependent linux_dummy.c were moved to a new
machine independent file sys/compat/linux/linux_dummy.c and the existing
linux_dummy.c was renamed to linux_dummy_machdep.c.
Add linux_dummy_machdep.c to the linux module for i386.
Rename sys/amd64/linux32/linux_dummy.c for consistency.
Add the new linux_dummy.c to the linux module for i386.
Conrad Meyer [Fri, 6 Nov 2020 22:04:57 +0000 (22:04 +0000)]
linux(4): Fix loadable modules after r367395
Move dtrace SDT definitions into linux_common module code. Also, build
linux_dummy.c into the linux_common kld -- we don't need separate
versions of these stubs for 32- and 64-bit emulation.
John Baldwin [Tue, 20 Apr 2021 20:22:11 +0000 (13:22 -0700)]
etcupdate: Gracefully handle SIGINT when building trees.
Run the 'build_tree' function inside of a subshell and trap SIGINT to
return an error to the caller. This allows callers to gracefully
cleanup a partially created tree.
While here, redirect stdout/stderr of the subshell to the log file
instead of applying redirections individually to each command executed
while building the tree.
John Baldwin [Tue, 20 Apr 2021 20:21:42 +0000 (13:21 -0700)]
etcupdate: Always extract to a temporary tree.
etcupdate has had a somewhat nasty race condition since its creation
in that its state machine can get very confused if it is interrupted
while building the tree to compare against. This is exacerbated by
the fact that etcupdate doesn't emit any output while building the
tree which can take several seconds (especially in recent years with
the addition of the tree-wide buildconfig/installconfig passes).
To mitigate this, always install a new tree into a temporary directory
created via mktemp as was previously done only for dry-runs via -n.
The existing trees are only rotated and the new tree installed as
/var/db/etcupdate/current after the update command has completed.
David Bright [Mon, 24 May 2021 19:02:43 +0000 (14:02 -0500)]
pciconf: Fix up pciconf -lc output
The pciconf command fails to emit newlines when particular ecap field
values are seen. Fix them up. This has been seen on several systems at
$JOB. The documentation for PCI capabilities says that capability
type 0 should not be used once the spec for PCI capabilities was
published, but that seems more wishful-thinking than reality. pciconf
also chooses not to print fields related to field values that are
zero, but it seems several of these fields are zero on actual
hardware.
Sponsored by: Dell EMC Isilon
Submitted by: Robert Herndon (Robert.Herndon@dell.com)
David Bright [Mon, 24 May 2021 17:12:15 +0000 (12:12 -0500)]
libsa: Fix infinite loop in bzipfs & gzipfs
A bug in the loader's bzipfs & gzipfs filesystems caused compressed
kernel and modules not to work on EFI systems with a veriexec-enabled
loader. Since the size of files in these filesystems are not known
_a priori_ `stat` would initialize the size to -1 and the loader would
then hang in an infinite loop while trying to seek (read) to the end
of file since the loop termination condition compares the current
offset to that negative target position.
Markus Stoff [Tue, 18 May 2021 20:35:33 +0000 (22:35 +0200)]
ng_parse: IP address parsing in netgraph eating too many characters
Once the final component of the IP address has been parsed, the offset
on the input must not be advanced, as this would remove an unparsed
character from the input.
Submitted by: Markus Stoff
Reviewed by: donner
Differential Revision: https://reviews.freebsd.org/D26489
Kristof Provost [Thu, 3 Jun 2021 13:22:19 +0000 (15:22 +0200)]
pf tests: Make killstate:match more robust
The killstate:match test starts nc as a background process. There was no
guarantee that the nc process would have connected by the time we check
for states, so this test occasionally failed without good reason.
Teach the test to wait for at least some states to turn up before
executing the critical checks.
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC ("Netgate")
Lutz Donnerhacke [Sat, 15 May 2021 13:24:12 +0000 (15:24 +0200)]
libalias: Remove unused function LibAliasCheckNewLink
The functionality to detect a newly created link after processing a
single packet is decoupled from the packet processing. Every new
packet is processed asynchronously and will reset the indicator, hence
the function is unusable. I made a Google search for third party code,
which uses the function, and failed to find one.
That's why the function should be removed: It unusable and unused.
A much simplified API/ABI will remain in anything below 14.
Mark Johnston [Mon, 31 May 2021 22:51:14 +0000 (18:51 -0400)]
x86: Fix lapic_ipi_alloc() on i386
The loop which checks to see if "dynamic" IDT entries are allocated
needs to compare with the trampoline address of the reserved ISR.
Otherwise it will never succeed.
Reported by: Harry Schmalzbauer <freebsd@omnilan.de>
Tested by: Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Kristof Provost [Tue, 1 Jun 2021 14:05:47 +0000 (16:05 +0200)]
pf: Fix more ioctl memory leaks
We must also remember to free nvlists added to a parent nvlist with
nvlist_append_nvlist_array().
More importantly, when nvlist_pack() allocates memory for us it does so
in the M_NVLIST zone, so we must free it with free(.., M_NVLIST). Using
free(.., M_TEMP) as we did silently failed to free the memory.
Rick Macklem [Fri, 21 May 2021 01:37:40 +0000 (18:37 -0700)]
nfsd: Add support for CLAIM_DELEG_PREV_FH to the NFSv4.1/4.2 Open
Commit b3d4c70dc60f added support for CLAIM_DELEG_CUR_FH to Open.
While doing this, I noticed that CLAIM_DELEG_PREV_FH support
could be added the same way. Although I am not aware of any extant
NFSv4.1/4.2 client that uses this claim type, it seems prudent to add
support for this variant of Open to the NFSv4.1/4.2 server.
This patch does not affect mounts from extant NFSv4.1/4.2 clients,
as far as I know.
Rick Macklem [Wed, 19 May 2021 21:52:56 +0000 (14:52 -0700)]
nfscl: Fix NFSv4.1/4.2 mount recovery from an expired lease
The most difficult NFSv4 client recovery case happens when the
lease has expired on the server. For NFSv4.0, the client will
receive a NFSERR_EXPIRED reply from the server to indicate this
has happened.
For NFSv4.1/4.2, most RPCs have a Sequence operation and, as such,
the client will receive a NFSERR_BADSESSION reply when the lease
has expired for these RPCs. The client will then call nfscl_recover()
to handle the NFSERR_BADSESSION reply. However, for the expired lease
case, the first reclaim Open will fail with NFSERR_NOGRACE.
This patch recognizes this case and calls nfscl_expireclient()
to handle the recovery from an expired lease.
This patch only affects NFSv4.1/4.2 mounts when the lease
expires on the server, due to a network partitioning that
exceeds the lease duration or similar.
Jessica Clarke [Fri, 28 May 2021 18:07:17 +0000 (19:07 +0100)]
aic7xxx: Fix re-building firmware with -fno-common
The generated C output for aicasm_scan.l defines yylineno already, so
references to it from other files should use an extern declaration.
The STAILQ_HEAD use in aicasm_symbol.h also provided an identifier,
causing it to both define the struct type and define a variable of that
struct type, causing any C file including the header to define the same
variable. This variable is not used (and confusingly clashes with a
field name just below) and was likely caused by confusion when switching
between defining fields using similar type macros and defining the type
itself.
Kristof Provost [Thu, 27 May 2021 09:28:36 +0000 (11:28 +0200)]
libpfctl: fix memory leak
When we create an nvlist and insert it into another nvlist we must
remember to destroy it. The nvlist_add_nvlist() function makes a copy,
just like nvlist_add_string() makes a copy of the string.
Rick Macklem [Wed, 19 May 2021 21:52:56 +0000 (14:52 -0700)]
nfscl: Fix NFSv4.1/4.2 mount recovery from an expired lease
The most difficult NFSv4 client recovery case happens when the
lease has expired on the server. For NFSv4.0, the client will
receive a NFSERR_EXPIRED reply from the server to indicate this
has happened.
For NFSv4.1/4.2, most RPCs have a Sequence operation and, as such,
the client will receive a NFSERR_BADSESSION reply when the lease
has expired for these RPCs. The client will then call nfscl_recover()
to handle the NFSERR_BADSESSION reply. However, for the expired lease
case, the first reclaim Open will fail with NFSERR_NOGRACE.
This patch recognizes this case and calls nfscl_expireclient()
to handle the recovery from an expired lease.
This patch only affects NFSv4.1/4.2 mounts when the lease
expires on the server, due to a network partitioning that
exceeds the lease duration or similar.
Lutz Donnerhacke [Sun, 23 May 2021 12:43:00 +0000 (14:43 +0200)]
tests/libalias: Test LibAliasIn and redirection
Rework the tests to check the correct layer in a single test. Factor
out tests for reuse in other modules. Extend the test suite for
libalias(3) to incoming connections. Test the various types of
redirections.
gettimeofday(3) is almost as expensive as the calls to libalias.
So the call frequency for this call is reduced by a factor of 1000 in
order to neglect it's influence.
Using NAT entries became more realistic: A communication of a random
length of up to 150 packets (10% outgoing, 90% incoming) is applied
for each entry.
Add port forwardings to the performance tests. This will cause random
incoming packets to match the random port forwardings opends beforehand.
After a long test run, a lot of ressouces have been allocated.
Measure the time tot free them.
Dmitry Chagin [Wed, 19 May 2021 21:08:25 +0000 (00:08 +0300)]
tcsh: cleanup source tree to reduce diff size.
Remove makefiles, configure files and unused at build time files
to reduce the diff size. Otherwise the diff contains a lot of
unnecessary lines what makes reviewing and merging proccess so hard,
especially for re@.
Kristof Provost [Tue, 18 May 2021 07:24:50 +0000 (09:24 +0200)]
pf: Move nvlist conversion functions to pf_nv
Separate the conversion functions (between kernel structs and nvlists)
to pf_nv. This reduces the size of pf_ioctl.c, which is already quite
large and complex, a good bit. It also keeps all the fairly
straightforward conversion code together.
Rick Macklem [Tue, 18 May 2021 22:53:54 +0000 (15:53 -0700)]
nfsd: Add support for CLAIM_DELEG_CUR_FH to the NFSv4.1/4.2 Open
The Linux NFSv4.1/4.2 client now uses the CLAIM_DELEG_CUR_FH
variant of the Open operation when delegations are recalled and
the client has a local open of the file. This patch adds
support for this variant of Open to the NFSv4.1/4.2 server.
This patch only affects mounts from Linux clients when delegations
are enabled on the server.
Rick Macklem [Tue, 18 May 2021 23:17:58 +0000 (16:17 -0700)]
nfsd: Reduce the callback timeout to 800msec
Recent discussion on the nfsv4@ietf.org mailing list confirmed
that an NFSv4 server should reply to an RPC in less than 1second.
If an NFSv4 RPC requires a delegation be recalled,
the server will attempt a CB_RECALL callback.
If the client is not responsive, the RPC reply will be delayed
until the callback times out.
Without this patch, the timeout is set to 4 seconds (set in
ticks, but used as seconds), resulting in the RPC reply taking over 4sec.
This patch redefines the constant as being in milliseconds and it
implements that for a value of 800msec, to ensure the RPC
reply is sent in less than 1second.
This patch only affects mounts from clients when delegations
are enabled on the server and the client is unresponsive to callbacks.
tcp: Use local CC data only in the correct context
Most CC algos do use local data, and when calling
newreno_cong_signal from there, the latter misinterprets
the data as its own struct, leading to incorrect behavior.
Reported by: chengc_netapp.com
Reviewed By: chengc_netapp.com, tuexen, #transport
MFC after: 3 days
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D30470
Rick Macklem [Sun, 16 May 2021 23:40:01 +0000 (16:40 -0700)]
NFSv4 server: Re-establish the delegation recall timeout
Commit 7a606f280a3e allowed the server to do retries of CB_RECALL
callbacks every couple of seconds. This was needed to allow the
Linux client to re-establish the back channel.
However this patch broke the delegation timeout check, such that
it would just keep retrying CB_RECALLS.
If the client has crashed or been network patitioned from the
server, this continues until the client TCP reconnects to
the server and re-establishes the back channel.
This patch modifies the code such that it still times out the
delegation recall after some minutes, so that the server will
allow the conflicting client request once the delegation times out.
This patch only affects the NFSv4 server when delegations are
enabled and a NFSv4 client that holds a delegation has crashed
or been network partitioned from the server for at least several
minutes when a delegation needs to be recalled.
Lutz Donnerhacke [Fri, 14 May 2021 13:08:08 +0000 (15:08 +0200)]
libalias: Style cleanup
libalias is a convolut of various coding styles modified by a series
of different editors enforcing interesting convetions on spacing and
comments.
This patch is a baseline to start with a perfomance rework of
libalias. Upcoming patches should be focus on the code, not on the
style. That's why most annoying style errors should be fixed
beforehand.
Alex Richardson [Tue, 19 Jan 2021 11:32:32 +0000 (11:32 +0000)]
libalias: Fix -Wcast-align compiler warnings
This fixes -Wcast-align warnings caused by the underaligned `struct ip`.
This also silences them in the public functions by changing the function
signature from char * to void *. This is source and binary compatible and
avoids the -Wcast-align warning.
Lutz Donnerhacke [Sun, 16 May 2021 21:37:37 +0000 (23:37 +0200)]
test/libalias: Tests for instantiation and outgoing NAT
In order to modify libalias for performance, the existing
functionality must not change. Enforce this.
Testing LibAliasOut functionality. This concentrates the typical use
case of initiating data transfers from the inside. Provide a
exhaustive test for the data structure in order to check for
performance improvements.
In order to compare upcoming changes for their effectivness, measure
performance by counting opertions and the runtime of each operation
over the time. Accumulate all tests in a single instance, so make it
complicated over the time. If you wait long enough, you will notice
the expiry of old flows.
Mark Johnston [Fri, 28 May 2021 14:41:43 +0000 (10:41 -0400)]
libradius: Fix attribute length validation in rad_get_attr(3)
The length of the attribute header needs to be excluded when comparing
the attribute length against the length of the packet. Otherwise,
validation may incorrectly fail when fetching the final attribute in a
message.
Fixes: 8d5c78130 ("libradius: Fix input validation bugs")
Reported by: Peter Eriksson
Tested by: Peter Eriksson
Sponsored by: The FreeBSD Foundation
Kevin Bowling [Fri, 16 Apr 2021 23:15:50 +0000 (16:15 -0700)]
e1000: Correct promisc multicast filter handling
There are a number of issues in the e1000 multicast filter handling
that have been present for a long time. Take the updated approach from
ixgbe(4) which does not have the issues.
The issues are outlined in the PR, in particular this solves crossing
over and under the hardware's filter limit, not programming the
hardware filter when we are above its limit, disabling SBP (show bad
packets) when the tunable is enabled and exiting promiscuous mode, and
an off-by-one error in the em_copy_maddr function.
Kevin Bowling [Wed, 21 Apr 2021 02:35:14 +0000 (19:35 -0700)]
ixgbe: Improve device name strings
This is just clerical work to ease bug triage and may be used to set
expectations around the ability for anyone in the community to perform
testing and development on older parts.
Kevin Bowling [Sun, 25 Apr 2021 08:22:23 +0000 (01:22 -0700)]
e1000: Rework em_msi_link interrupt filter
* Fix 82574 Link Status Changes, carrying the OTHER mask bit around as
needed.
* Move igb-class LSC re-arming out of FAST back into the handler.
* Clarify spurious/other interrupt re-arms in FAST.
In MSI-X mode, 82574 and igb-class devices use an interrupt filter to
handle Link Status Changes. We want to do LSC re-arms in the handler
to take advantage of autoclear (EIAC) single shot behavior.
82574 uses 'Other' in ICR and IMS for LSC interrupt types when in MSI-X
mode, so we need to set and re-arm the 'Other' bit during attach and
after ICR reads in the FAST handler if not an LSC or after handling on
LSC due to autoclearing.
This work was primarily done to address the referenced PR, but inspired
some clarification and improvement for igb-class devices once the
intentions of previous bug fix attempts became clearer.
Kevin Bowling [Wed, 21 Apr 2021 05:27:48 +0000 (22:27 -0700)]
e1000: Improve device name strings
This is just clerical work to ease bug triage and may be used to set
expectations around the ability for anyone in the community to perform
testing and development on older parts (this driver covers over 20 years
of silicon)
Reviewed by: erj
Approved by: markj
Sponsored by: Pink Floyd - Any Colour You Like (in kind)
Differential Revision: https://reviews.freebsd.org/D29872
Major improvement to build parallelism for googletest internal tests
Currently the googletest internal tests build after the matching library.
However, each of these is serialized at the top level makefile.
Additionally some of the tests (e.g. the gmock-matches-test) take up to
90 seconds to build with clang -O2. Having to wait for this test to
complete before continuing to the next directory seriously slows down the
parllelism of a -j32 build.
Before this change running `make -C lib/googletest -j32 -s` in buildenv
took 202 seconds, now it's 153 due to improved parallelism.
Reviewed By: emaste (no objection)
Differential Revision: https://reviews.freebsd.org/D26748
Significantly reduce compile time for googletest internal tests
Clang's optimizer spends a really long time on these tests at -O2, so we now
use -O0 instead. This reduces the -j32 time for lib/googletest/test from 131s
to 29s. Using -O0 also reduces the disk usage from 144MB (at -O2) / 92MB (at
-O1) to 82MB.
Reviewed By: ngie, dim
Differential Revision: https://reviews.freebsd.org/D26751
Don Morris [Thu, 20 May 2021 14:54:38 +0000 (10:54 -0400)]
ufs: Avoid M_WAITOK allocations when building a dirhash
At this point the directory's vnode lock is held, so blocking while
waiting for free pages makes the system more susceptible to deadlock in
low memory conditions. This is particularly problematic on NUMA systems
as UMA currently implements a strict first-touch policy.
ufsdirhash_build() already uses M_NOWAIT for other allocations and
already handled failures for the block array allocation, so just convert
to M_NOWAIT.
Lutz Donnerhacke [Thu, 11 Feb 2021 22:59:11 +0000 (23:59 +0100)]
netgraph/ng_bridge: Avoid cache thrashing
Hint the compiler, that this update is needed at most once per second.
Only in this case the memory line needs to be written. This will
reduce the amount of cache trashing during forward of most frames.
Lutz Donnerhacke [Tue, 12 Jan 2021 21:28:29 +0000 (22:28 +0100)]
netgraph/ng_bridge: become SMP aware
The node ng_bridge underwent a lot of changes in the last few months.
All those steps were necessary to distinguish between structure
modifying and read-only data transport paths. Now it's done, the node
can perform frame forwarding on multiple cores in parallel.
Use the new control message to move ethernet addresses from a link to
a new link in ng_bridge(4). Send this message instead of doing the
work directly requires to move the loop detection into the control
message processing. This will delay the loop detection by a few
frames.
This decouples the read-only activity from the modification under a
more strict writer lock.
netgraph/ng_bridge: learn MACs via control message
Add a new control message to move ethernet addresses to a given link
in ng_bridge(4). Send this message instead of doing the work directly.
This decouples the read-only activity from the modification under a
more strict writer lock.
Decoupling the work is a prerequisite for multithreaded operation.