nullfs: dirty v_object must imply the need for inactivation
Otherwise pages are cleaned some time later when the lower fs decides
that it is time to do it. This mostly manifests itself as delayed
mtime update, e.g. breaking make-like programs.
Reported by: mav
Tested by: mav, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
There is no need to own vnode interlock, since v_object is type stable
and can only change to/from NULL, and no other checks in the function
access fields protected by the interlock. Remove the need variable, the
result of the test is directly usable as return value.
Tested by: mav, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This makes it possible to use core_write(), core_output(),
and sbuf_drain_core_output(), in Linux coredump code. Moving
them out of imgact_elf.c is necessary because of the weird way
it's being built.
rack: honor prior socket buffer lock when doing the upcall
While partially reverting D24237 with D29690, due to introducing some
unintended effects for in-kernel TCP consumers, the preexisting lock
on the socket send buffer was not considered properly.
Found by: markj
MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D30390
Mark Johnston [Fri, 21 May 2021 21:44:46 +0000 (17:44 -0400)]
Fix handling of errors from pru_send(PRUS_NOTREADY)
PRUS_NOTREADY indicates that the caller has not yet populated the chain
with data, and so it is not ready for transmission. This is used by
sendfile (for async I/O) and KTLS (for encryption). In particular, if
pru_send returns an error, the caller is responsible for freeing the
chain since other implicit references to the data buffers exist.
For async sendfile, it happens that an error will only be returned if
the connection was dropped, in which case tcp_usr_ready() will handle
freeing the chain. But since KTLS can be used in conjunction with the
regular socket I/O system calls, many more error cases - which do not
result in the connection being dropped - are reachable. In these cases,
KTLS was effectively assuming success.
So:
- Change sosend_generic() to free the mbuf chain if
pru_send(PRUS_NOTREADY) fails. Nothing else owns a reference to the
chain at that point.
- Similarly, in vn_sendfile() change the !async I/O && KTLS case to free
the chain.
- If async I/O is still outstanding when pru_send fails in
vn_sendfile(), set an error in the sfio structure so that the
connection is aborted and the mbuf chain is freed.
Reviewed by: gallatin, tuexen
Discussed with: jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30349
Mark Johnston [Fri, 21 May 2021 21:44:40 +0000 (17:44 -0400)]
tcp: Make error handling in tcp_usr_send() more consistent
- Free the input mbuf in a single place instead of in every error path.
- Handle PRUS_NOTREADY consistently.
- Flush the socket's send buffer if an implicit connect fails. At that
point the mbuf has already been enqueued but we don't want to keep it
in the send buffer.
Reviewed by: gallatin, tuexen
Discussed with: jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30349
Kirk McKusick [Fri, 21 May 2021 20:41:40 +0000 (13:41 -0700)]
Fix fsck_ufs segfaults with gjournal (SU+J)
The segfault was being hit in ckfini() (sbin/fsck_ffs/fsutil.c)
while attempting to traverse the buffer cache to flush dirty buffers.
The tail queue used for the buffer cache was not initialized before
dropping into gjournal_check(). Move the buffer initialization earlier
so that it has been done before calling gjournal_check().
Reported by: crypt47, nvass
Fix by: Robert Wing
Tested by: Robert Wing
PR: 255030
PR: 255979
MFC after: 3 days
Sponsored by: Netflix
Emmanuel Vadot [Sun, 16 May 2021 14:21:43 +0000 (16:21 +0200)]
extres: regulator: Fix regulator_status for already enable regulators
If a regulator hasn't been enable by a driver but is enabled in hardware
(most likely enabled by U-Boot), regulator_status will returns that it
is enabled and so any call to regulator_disable will panic as it wasn't
enabled by one of our drivers.
Sponsored by: Diablotin Systems
Differential Revision: https://reviews.freebsd.org/D30293
mmccam: Add two new XPT for MMC and use them in mmc_sim and sdhci
For the discovery phase of SD/eMMC we need to do some transaction in a async
way.
The classic CAM XPT_{GET,SET}_TRAN_SETTING cannot be used in a async way.
This also allow us to split the discovery phase into a more complete state
machine and we don't mtx_sleep with a random number to wait for completion
of the tasks.
For mmc_sim we now do the SET_TRAN_SETTING in a taskqueue so we can call
the needed function for regulators/clocks without the cam lock(s). This part is
still needed to be done for sdhci.
We also now save the host OCR in the discovery phase as it wasn't done before and
only worked because the same ccb was reused.
Marcin Wojtas [Fri, 21 May 2021 09:29:22 +0000 (11:29 +0200)]
Disable stack gap for ntpd during build.
When starting, ntpd calls setrlimit(2) to limit maximum size of its
stack. The stack limit chosen by ntpd is 200K, so when stack gap
is enabled, the stack gap is larger than this limit, which results
in ntpd crashing.
There is a window where threads are removed from the process list and where
the thread destructor is invoked. Catch that window by waiting for all
task_struct allocations to be returned before freeing the UMA zone in the
LinuxKPI. Else UMA may fail to release the zone due to concurrent access
and panic:
panic() - Bad link element prev->next != elm
zone_release()
bucket_drain()
bucket_free()
zone_dtor()
zone_free_item()
uma_zdestroy()
linux_current_uninit()
This failure can be triggered by loading and unloading the LinuxKPI module
in a loop:
while true
do
kldload linuxkpi
kldunload linuxkpi
done
Lutz Donnerhacke [Mon, 17 May 2021 21:49:31 +0000 (23:49 +0200)]
test/libalias: Tests for outgoing NAT
Testing LibAliasOut functionality. This concentrates the typical use
case of initiating data transfers from the inside. Provide a
exhaustive test for the data structure in order to check for
performance improvements.
r367492 would unlock the socket buffer before eventually calling the upcall.
This leads to problematic interaction with NFS kernel server/client components
(MP threads) accessing the socket buffer with potentially not correctly updated
state.
Michael Tuexen [Fri, 21 May 2021 07:45:00 +0000 (09:45 +0200)]
tcp: Fix sending of TCP segments with IP level options
When bringing in TCP over UDP support in
https://cgit.FreeBSD.org/src/commit/?id=9e644c23000c2f5028b235f6263d17ffb24d3605,
the length of IP level options was considered when locating the
transport header. This was incorrect and is fixed by this patch.
X-MFC with: https://cgit.FreeBSD.org/src/commit/?id=9e644c23000c2f5028b235f6263d17ffb24d3605
MFC after: 3 days
Reviewed by: markj, rscheff
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D30358
Wojciech Macek [Wed, 5 May 2021 03:28:56 +0000 (05:28 +0200)]
ip_mroute: refactor bw_meter API
API should work as following:
- periodicaly report Lower-or-EQual bandwidth (LEQ) connections
over kernel socket, if user application registered for such
per-flow notifications
- report Grater-or-EQual (GEQ) bandwidth as soon as it reaches
specified value in configured time window
Custom implementation of callouts was removed. There is no
point of doing calout-wheel here as generic callouts are
doing exactly the same. The performance is not critical
for such reporting, so the biggest concern should be
to have a code which can be easily maintained.
This is ia preparation for locking rework which is highly inefficient.
Rick Macklem [Fri, 21 May 2021 01:37:40 +0000 (18:37 -0700)]
nfsd: Add support for CLAIM_DELEG_PREV_FH to the NFSv4.1/4.2 Open
Commit b3d4c70dc60f added support for CLAIM_DELEG_CUR_FH to Open.
While doing this, I noticed that CLAIM_DELEG_PREV_FH support
could be added the same way. Although I am not aware of any extant
NFSv4.1/4.2 client that uses this claim type, it seems prudent to add
support for this variant of Open to the NFSv4.1/4.2 server.
This patch does not affect mounts from extant NFSv4.1/4.2 clients,
as far as I know.
kldxref: do not error out if specified path is not directory, for -d mode
kldxref(8) is the only tool that can dump FreeBSD kernel module
metadata, with the -d option. But the command line requirements for that
are inconvenient, since parser requires that argv[1] is a directory
containing whole set of modules to generate xref file.
For -d, allow argv[0] to be a regular file, now it is possible to do e.g.
$ kldxref -d /boot/kernel/ufs.ko
to see only ufs.ko metadata.
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30368
Warner Losh [Thu, 20 May 2021 17:26:46 +0000 (11:26 -0600)]
md5: portability fix -- include stdbool.h explicitly
stdbool.h needs to be included to use type bool variables. Due to
namespace pollution, this gets brought in on FreeBSD, but not on
other systems. Include it explicilty.
John Baldwin [Thu, 20 May 2021 16:59:11 +0000 (09:59 -0700)]
iscsi: Move the maximum data segment limits into 'struct icl_conn'.
This fixes a few bugs in iSCSI backends where the backends were using
the limits they advertised initially during the login phase as the
final values instead of the values negotiated with the other end.
John Baldwin [Thu, 20 May 2021 16:58:59 +0000 (09:58 -0700)]
iscsi: Always free a cdw before its associated ctl_io.
cxgbei stores state about a target transfer in the ctl_private[] array
of a ctl_io that is freed when a target transfer (represented by the
cdw) is freed. As such, freeing a ctl_io before a cdw that references
it can result in a use after free in cxgbei. Two of the four places
freed the cdw first, and the other two freed the ctl_io first. Fix
the latter two places to free the cdw first.
Don Morris [Thu, 20 May 2021 14:54:38 +0000 (10:54 -0400)]
ufs: Avoid M_WAITOK allocations when building a dirhash
At this point the directory's vnode lock is held, so blocking while
waiting for free pages makes the system more susceptible to deadlock in
low memory conditions. This is particularly problematic on NUMA systems
as UMA currently implements a strict first-touch policy.
ufsdirhash_build() already uses M_NOWAIT for other allocations and
already handled failures for the block array allocation, so just convert
to M_NOWAIT.
Kristof Provost [Tue, 18 May 2021 13:03:01 +0000 (15:03 +0200)]
pfctl: Fix crash on ALTQ configuration
The following config could crash pfctl:
altq on igb0 fairq bandwidth 1Gb queue { qLink }
queue qLink fairq(default)
That happens because when we're parsing the parent queue (on igb0) it
doesn't have a parent, and the check in eval_pfqueue_fairq() checks
pa->parent rather than parent.
Kristof Provost [Thu, 13 May 2021 07:51:28 +0000 (09:51 +0200)]
pf: Support killing floating states by interface
Floating states get assigned to interface 'all' (V_pfi_all), so when we
try to flush all states for an interface states originally created
through this interface are not flushed. Only if-bound states can be
flushed in this way.
Given that we track the original interface we can check if the state's
interface is 'all', and if so compare to the orig_if instead.
Marcin Wojtas [Wed, 19 May 2021 15:27:42 +0000 (17:27 +0200)]
Rename ofwpci.c to ofw_pcib.c
It's a class0 driver that implements some pcib methods and creates
a pci bus as its children.
The "ofw_pci" name will be used by a new driver that will be a subclass
of the pci bus.
No functional changes intended.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30226
Wojciech Macek [Wed, 5 May 2021 03:28:56 +0000 (05:28 +0200)]
ip_mroute: refactor bw_meter API
API should work as following:
- periodicaly report Lower-or-EQual bandwidth (LEQ) connections
over kernel socket, if user application registered for such
per-flow notifications
- report Grater-or-EQual (GEQ) bandwidth as soon as it reaches
specified value in configured time window
Custom implementation of callouts was removed. There is no
point of doing calout-wheel here as generic callouts are
doing exactly the same. The performance is not critical
for such reporting, so the biggest concern should be
to have a code which can be easily maintained.
This is ia preparation for locking rework which is highly inefficient.
The new Mikrotik 10/25G NIC is mostly compatible with AR8151 hardware,
with few exceptions:
* card supports only 32bit DMA operations
* card does not support write-one-to-clear semantics for interrupt status
register
* MDIO operations can take longer to complete
This patch adds support for Mikrotik 10/25G NIC to the alc driver
while maintaining support for all earlier HW.
This was tested on Intel i7-4790K system with Mikrotik 10/25G NIC.
This was tested on Intel i7-4790K system with RB44Ge (AR8151 based 4-port NIC)
to verify backwards compatibility.
Rick Macklem [Wed, 19 May 2021 21:52:56 +0000 (14:52 -0700)]
nfscl: Fix NFSv4.1/4.2 mount recovery from an expired lease
The most difficult NFSv4 client recovery case happens when the
lease has expired on the server. For NFSv4.0, the client will
receive a NFSERR_EXPIRED reply from the server to indicate this
has happened.
For NFSv4.1/4.2, most RPCs have a Sequence operation and, as such,
the client will receive a NFSERR_BADSESSION reply when the lease
has expired for these RPCs. The client will then call nfscl_recover()
to handle the NFSERR_BADSESSION reply. However, for the expired lease
case, the first reclaim Open will fail with NFSERR_NOGRACE.
This patch recognizes this case and calls nfscl_expireclient()
to handle the recovery from an expired lease.
This patch only affects NFSv4.1/4.2 mounts when the lease
expires on the server, due to a network partitioning that
exceeds the lease duration or similar.
Kirk McKusick [Wed, 19 May 2021 21:38:21 +0000 (14:38 -0700)]
Fix fsck_ffs Pass 1b error exit "bad inode number 256 to nextinode".
Pass 1b of fsck_ffs runs only when Pass 1 has found duplicate blocks.
Pass 1 only knows that a block is duplicate when it finds the second
instance of its use. The role of Pass 1b is to find the first use
of all the duplicate blocks. It makes a pass over the cylinder groups
looking for these blocks. When moving to the next cylinder group,
Pass 1b failed to properly calculate the starting inode number for
the cylinder group resulting in the above error message when it
tried to read the first inode in the cylinder group.
Dmitry Chagin [Wed, 19 May 2021 21:08:25 +0000 (00:08 +0300)]
tcsh: cleanup source tree to reduce diff size.
Remove makefiles, configure files and unused at build time files
to reduce the diff size. Otherwise the diff contains a lot of
unnecessary lines what makes reviewing and merging proccess so hard,
especially for re@.
Warner Losh [Wed, 19 May 2021 17:26:20 +0000 (11:26 -0600)]
md5: Create md5sum, etc compatible programs
On Linux, there's a similar set of programs to ours, but that end in the
letters 'sum'. These act basically like FreeBSD versions run with the -r
option. Add code so that when the program ends in 'sum' you get the
linux -r behavior. This is enough to make most things that use sha*sum
work correctly (the -c / --check options, as well as the long args are
not implemented). When running with the -sum programs, ignore -t instead
of running internal speed tests and make -c an error.
Reviewed by: sef, and kp and allanjude (earlier version)
Relnotes: yes
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30309
Eugene Grosbein [Wed, 19 May 2021 13:02:31 +0000 (20:02 +0700)]
rc.d: unbreak sysctl lastload
/etc/rc.d/securelevel is supposed to run /etc/rc.d/sysctl lastload
late at boot time to apply /etc/sysctl.conf settings that fail
to apply early. However, this does not work in default configuration
because of kern_securelevel_enable="NO" by default.
Add new script /etc/rc.d/sysctl lastload that starts unconditionally.
Bjoern A. Zeeb [Thu, 6 May 2021 14:25:52 +0000 (14:25 +0000)]
arm64: rockchip, implement the two rk805/808 clocks
While the xin32k clk was implemented in rk3399_cru as a fixed rate
clock, migrate it to rk805 as we will also need the 2nd clock
'rtc_clko_wifi' for SDIO and BT.
Both clocks remain fixed rate, and while the 1st one is always on
(though that is not expressed in the clk framework), the 2nd one
we can toggle on/off.
Reviewed-by: manu
Tested-by: manu
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26870
Rick Macklem [Tue, 18 May 2021 23:17:58 +0000 (16:17 -0700)]
nfsd: Reduce the callback timeout to 800msec
Recent discussion on the nfsv4@ietf.org mailing list confirmed
that an NFSv4 server should reply to an RPC in less than 1second.
If an NFSv4 RPC requires a delegation be recalled,
the server will attempt a CB_RECALL callback.
If the client is not responsive, the RPC reply will be delayed
until the callback times out.
Without this patch, the timeout is set to 4 seconds (set in
ticks, but used as seconds), resulting in the RPC reply taking over 4sec.
This patch redefines the constant as being in milliseconds and it
implements that for a value of 800msec, to ensure the RPC
reply is sent in less than 1second.
This patch only affects mounts from clients when delegations
are enabled on the server and the client is unresponsive to callbacks.
Rick Macklem [Tue, 18 May 2021 22:53:54 +0000 (15:53 -0700)]
nfsd: Add support for CLAIM_DELEG_CUR_FH to the NFSv4.1/4.2 Open
The Linux NFSv4.1/4.2 client now uses the CLAIM_DELEG_CUR_FH
variant of the Open operation when delegations are recalled and
the client has a local open of the file. This patch adds
support for this variant of Open to the NFSv4.1/4.2 server.
This patch only affects mounts from Linux clients when delegations
are enabled on the server.
Markus Stoff [Tue, 18 May 2021 20:35:33 +0000 (22:35 +0200)]
ng_parse: IP address parsing in netgraph eating too many characters
Once the final component of the IP address has been parsed, the offset
on the input must not be advanced, as this would remove an unparsed
character from the input.
ttydev_write: prevent stops while terminal is busied
Since busy state is checked by all blocked writes, stopping a process
which waits in ttydisc_write() causes cascade. Utilize sigdeferstop()
to avoid the issue.