tcp: Tidying up the conditionals for unwinding a spurious RTO
- Use the semantically correct TSTMP_xx macro when comparing
timestamps. (No functional change)
- check for bad retransmits only when TSopt is present in ACK
(don't assume there will be a valid TSopt in the TCP options struct)
- exclude tsecr == 0, since that most likely indicates an
invalid ts echo return (tsecr) value.
Reviewed By: tuexen, #transport
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34062
tcp: Rewind erraneous RTO only while performing RTO retransmissions
Under rare circumstances, a spurious retranmission is
incorrectly detected and rewound, messing up various tcpcb values,
which can lead to a panic when SACK is in use.
Reviewed By: tuexen, chengc_netapp.com, #transport
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D33979
Kyle Evans [Thu, 27 Jan 2022 18:02:17 +0000 (12:02 -0600)]
cp: fix some cases with infinite recursion
As noted in the PR, cp -R has some surprising behavior. Typically, when
you `cp -R foo bar` where both foo and bar exist, foo is cleanly copied
to foo/bar. When you `cp -R foo foo` (where foo clearly exists), cp(1)
goes a little off the rails as it creates foo/foo, then discovers that
and creates foo/foo/foo, so on and so forth, until it eventually fails.
POSIX doesn't seem to disallow this behavior, but it isn't very useful.
GNU cp(1) will detect the recursion and squash it, but emit a message in
the process that it has done so.
This change seemingly follows the GNU behavior, but it currently doesn't
warn about the situation -- the author feels that the final product is
about what one might expect from doing this and thus, doesn't need a
warning. The author doesn't feel strongly about this.
Gleb Smirnoff [Thu, 27 Jan 2022 17:41:31 +0000 (09:41 -0800)]
rtsock: always set m_pkthdr.rcvif when queueing on netisr
netisr uses global workstreams and after dequeueing an mbuf it
uses rcvif to get the VNET of the mbuf. Of course, this is not
needed when kernel is compiled without VIMAGE. It came out that
routing socket does not set rcvif if compiled without VIMAGE.
Make this assignment not depending on VIMAGE option.
Tom Jones [Thu, 27 Jan 2022 17:24:45 +0000 (17:24 +0000)]
Remove SMALL conditionals from gzip
gzip has SMALL conditionals which enable building a reduced size version
of the binary. These exist as part of the introduction of BSD licensed
gzip in 2004 in NetBSD and appear to have been required to reach a size
for inclusion in their install media. For more information see commits
to gzip in the NetBSD tree on the 28th of March 2004.
SMALL doesn't appear to be hooked up to our build system and
complicates gzip quite a bit.
This test was written because execvp was found to improperly handle the
argc == 0 case when it falls back from an ENOEXEC. We could probably
mostly revert it now, but let's just fix the test for the time being and
circle back later to decide if we want to simplify execvp. The test
will likely remain either way just to make sure execvp isn't working
around the newly enforced restriction with the fallback.
Fixes: 301cb491ea41 ("execvp: fix up the ENOEXEC fallback")
Reported by: jenkins via lwhsu@
Tom Jones [Thu, 27 Jan 2022 17:10:06 +0000 (17:10 +0000)]
bsdinstall: Add quotes around error message argument
When error is called with a message with spaces (and probably multiple
lines) these are passed into dialog unquoted and an error message was
presented, wrap with quotes.
Andriy Gapon [Thu, 27 Jan 2022 16:49:27 +0000 (18:49 +0200)]
mmc_da: create disk(9) for pre-2.0 SD cards
It does not look like there is anything in mmc_da code that actually
requires protocol 2.0 or later. dev/mmc code also does not have such a
restriction.
Tested with a very old 2GB mini-SD card. Prior to this change mmc_da
would claim the card but would not expose it to GEOM.
Without MMCCAM:
mmc0: <MMC/SD bus> on sdhci_pci0
mmc0: Probing bus
mmc0: SD probe: OK (OCR: 0x00ff8000)
mmc0: Current OCR: 0x00ff8000
mmc0: CMD8 failed, RESULT: 1
mmc0: Probing cards
mmc0: New card detected (CID 1c53565344432020100002982e007600)
mmc0: New card detected (CSD 005e00325f5a83d02db7ffbf96800000)
mmc0: Card at relative address 0xb368 added:
mmc0: card: SD SDC 1.0 SN 0002982E MFG 06/2007 by 28 SV
mmc0: quirks: 0
mmc0: bus: 4bit, 50MHz (high speed timing)
mmc0: memory: 3998720 blocks, erase sector 256 blocks
mmc0: setting transfer rate to 50.000MHz (high speed timing)
GEOM: new disk mmcsd0
mmcsd0: 2GB <SD SDC 1.0 SN 0002982E MFG 06/2007 by 28 SV> at mmc0 50.0MHz/4bit/65535-block
mmc0: setting bus width to 4 bits high speed timing
With MMCCAM and this change:
sdda0 at sdhci_slot0 bus 0 scbus2 target 0 lun 0
sdda0: Relative addr: 0000b368
Card features: <Memory>
sdda0: Serial Number 0002982E
sdda0: SD SDC 1.0 SN 0002982E MFG 06/2007 by 28 SV
GEOM: new disk sdda0
Mateusz Guzik [Thu, 27 Jan 2022 16:24:32 +0000 (16:24 +0000)]
Revert b58ca5df0bb7 ("vfs: remove the now unused insmntque1")
I was somehow convinced that insmntque calls insmntque1 with a NULL
destructor. Unfortunately this worked well enough to not immediately
blow up in simple testing.
Keep not using the destructor in previously patched filesystems though
as it avoids unnecessary casts.
Andrew Gallatin [Thu, 27 Jan 2022 15:35:03 +0000 (10:35 -0500)]
tcp: fix leaks in tcp_chg_pacing_rate error paths
tcp_chg_pacing_rate() is expected to release the hw rate limit table,
but failed to do so in several error cases, leading to ever
increasing counts of flows using the rate.
Andrew Gallatin [Thu, 27 Jan 2022 15:28:15 +0000 (10:28 -0500)]
Fix a memory leak when ip_output_send() returns EAGAIN due to send tag issues
When ip_output_send() returns EAGAIN due to issues with send tags (route
change, lagg failover, etc), it must free the mbuf. This is because
ip_output_send() was written as a wrapper/replacement for a direct
call to if_output(), and the contract with if_output() has
historically been that it owns the mbufs once called. When
ip_output_send() failed to free mbufs, it violated this assumption
and lead to leaked mbufs.
This was noticed when using NIC TLS in combination with hardware
rate-limited connections. When seeing lots of NIC output drops
triggered ratelimit send tag changes, we noticed we were leaking
ktls_sessions, send tags and mbufs. This was due ip_output_send()
leaking mbufs which held references to ktls_sessions, which in
turn held references to send tags.
Many thanks to jbh, rrs, hselasky and markj for their help in
debugging this.
Mark Johnston [Thu, 27 Jan 2022 14:53:02 +0000 (09:53 -0500)]
shsec: Allocate data blocks only for BIO_READ/WRITE requests
In particular, there is no need to allocate a data block when passing
BIO_FLUSH requests to child providers, and g_io_request() asserts that
bp->bio_data == NULL for such requests.
PR: 255131
Reported and tested by: nvass@gmx.com
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Andrew Turner [Mon, 24 Jan 2022 11:24:17 +0000 (11:24 +0000)]
Add PT_GETREGSET
This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be
used to access all the registers from a specified core dump note type.
The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other
machine-dependant types are expected to be added in the future.
The ptrace addr points to a struct iovec pointing at memory to hold the
registers along with its length. On success the length in the iovec is
updated to tell userspace the actual length the kernel wrote or, if the
base address is NULL, the length the kernel would have written.
Because the data field is an int the arguments are backwards when
compared to the Linux PTRACE_GETREGSET call.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19831
Andriy Gapon [Thu, 27 Jan 2022 10:49:04 +0000 (12:49 +0200)]
g_mirror: don't fail reads while losing next-to-last disk
I observed a situation where some read requests failed when a 2-way geom
mirror lost one disk. The problem appears to be in the logic that skips
retrying a failed request when a mirror has only one active disk.
Generally, that makes sense. But during a transition from two disks to
one it is possible that the request failed on the failing disk before it
was inactivated and, so, the remaining active disk is the disk that
should be tried.
This change adds an additional check to ensure that it was the (only)
active disk that was already tried.
Kristof Provost [Thu, 27 Jan 2022 09:16:21 +0000 (10:16 +0100)]
netpfil tests: re-enable dummynet tests
These had been disabled due to panics with queued packets keeping
pointers (in m->m_pkthdr.rcvif) to removed interfaces.
This issue has been resolved in 165746f4e4, so the tests can be run
again.
Kristof Provost [Fri, 19 Nov 2021 22:05:48 +0000 (23:05 +0100)]
netpfil tests: test removing interfaces with pending dummynet packets
Dummynet queues packets with an associated struct ifnet pointer. Ensure
that things do not explode if that interface goes away with packets
still in the queue.
Kristof Provost [Fri, 21 Jan 2022 16:50:15 +0000 (17:50 +0100)]
libpfctl: fix creatorid endianness
We provide the hostid (which is the state creatorid) to the kernel as a
big endian number (see pfctl/pfctl.c pfctl_set_hostid()), so convert it
back to system endianness when we get it from the kernel.
This avoids a confusing mismatch between the value the user configures
and the value displayed in the state.
Gleb Smirnoff [Thu, 27 Jan 2022 05:58:50 +0000 (21:58 -0800)]
dummynet: use m_rcvif_serialize/restore when queueing packets
This fixed panic with interface being removed while packet
was sitting on a queue. This allows to pass all dummynet
tests including forthcoming dummynet:ipfw_interface_removal
and dummynet:pf_interface_removal and demonstrates use of
m_rcvif_serialize() and m_rcvif_restore().
Gleb Smirnoff [Thu, 27 Jan 2022 05:58:44 +0000 (21:58 -0800)]
ifnet: make if_index global
Now that ifindex is static to if.c we can unvirtualize it. For lifetime
of an ifnet its index never changes. To avoid leaking foreign interfaces
the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI
filter their returned value on curvnet. Since if_vmove() no longer
changes the if_index, inline ifindex_alloc() and ifindex_free() into
if_alloc() and if_free() respectively.
API wise the only change is that now minimum interface index can be
greater than 1. The holes in interface indexes were always allowed.
Li-Wen Hsu [Wed, 26 Jan 2022 23:11:17 +0000 (07:11 +0800)]
Fix test of ses(4) when there is no SES device exists
glob(3) returns GLOB_NOMATCH if GLOB_NOCHECK or GLOB_NOMAGIC flag is not
passed so ATF_REQUIRE_EQ(r, 0) will cause a precondition check failure if no
/dev/ses* exists.
Remove calling of atf_tc_skip() in ATF_TC_CLEANUP() because it would let
the clean up procedure unfinish.
Kyle Evans [Wed, 26 Jan 2022 01:22:03 +0000 (19:22 -0600)]
tests: add a basic test for argc == 0
The kernel should reject such exec()s now, early on. Instead of adding
the needed boilerplate to write a test in C, just add an -n argument for
"(n)ull argv" to the execve helper and exec this other helper that just
exits silently with argv count.
Kyle Evans [Tue, 25 Jan 2022 22:47:23 +0000 (16:47 -0600)]
execve: disallow argc == 0
The manpage has contained the following verbiage on the matter for just
under 31 years:
"At least one argument must be present in the array"
Previous to this version, it had been prefaced with the weakening phrase
"By convention."
Carry through and document it the rest of the way. Allowing argc == 0
has been a source of security issues in the past, and it's hard to
imagine a valid use-case for allowing it. Toss back EINVAL if we ended
up not copying in any args for *execve().
The manpage change can be considered "Obtained from: OpenBSD"
limit sort(1) memory usage to 20% of available main memory
By default BSD sort(1) uses 90% (or at least 50%?) of the available
main memory. That is good for performance for a single job, but not
for a shared OS. For a long running script the performance is less
important than the stability of the server. Also, if a server
with 64GB RAM starts swapping, the performance goes south and
hurts other running applications.
Note: this change does not affect the weekly cron job to
rebuild the locate database. The FreeBSD locate.updatedb
use the -presort option (find -s)
Ryan Moeller [Wed, 26 Jan 2022 18:33:23 +0000 (18:33 +0000)]
zfs: Fix zvol_cdev_open locking
First open locking changes were correctly applied to zvol_geom_open but
incorrectly applied to zvol_cdev_open, causing spa_namespace_lock to be
held indefinitely.
Make the first open locking in zvol_cdev_open match zvol_geom_open.
This change has been accepted upstream in openzfs/zfs#13016 but is not
yet merged.
Reviewed by: mav
Fixes: e92ffd9b6268
Sponsored by: iXsystems, Inc.
This fixes an integer overflow for very large partitions around 35 billion
filenames (>2PB). However, in an artificially worst case it may occurs
by only 17 mio filenames on a partition.
Add definitions for TLS receive tags using the existing send tag infrastructure.
Although send tags are strictly used for transmit, the name might be changed
in the future to be more generic.
The TLS receive tags support regular IPv4 and IPv6 traffic, and also over any
VLAN. If prio-tagging is enabled, VLAN ID zero, this must be checked in the
network driver itself when creating the TLS RX decryption offload filter.
TLS receive tags have a modify callback to tell the network driver about
the progress of decryption. Currently decryption is done IP packet by IP
packet, even if the IP packet contains a partial TLS record. The modify
callback allows the network driver to keep track of TCP sequence numbers
pointing to the beginning of TLS records after TCP packet reassembly.
These callbacks only happen when encrypted or partially decrypted data is
received and are used to verify the decryptions starting point for the
hardware. Typically the hardware will guess where TLS headers start and
needs help from the software to know if the guess was correct. This is
the purpose of the modify callback.
This flag allows a full text search on man pages. Although this is a last resort
option, it can be useful to pin point a certain man page.
It can be used with -S to narrow the search.
Unlike the Linux version, the search takes place in the rendered text so it
avoids false-positives when the text is found in comments in the source files.
It relies on `grep(1)` and `mandoc(1)` to do its job.
Add flag documentation and EXAMPLES to the manual page (bump .Dd).
Andriy Gapon [Sat, 27 Nov 2021 18:49:08 +0000 (20:49 +0200)]
cam_get_device: resolve path links before parsing device name
The CAM subsystem uses bus:taget:lun tuple to address peripherals. But
for convenience many userland programs such as camcontrol accept devices
names such as da0. There is a libcam function, cam_open_device, to
support that. It first calls cam_get_device() to parse the device name
as a driver name and a unit (and handle some special device name
prefixes) and then uses cam_lookup_pass() to find a matching pass
device.
This change extends cam_get_device() to apply realpath(3) to the device
name before parsing it. This will allow to use tools such as camcontrol
and smartctl with symbolic links that could be friendlier (more
distinguished) names for devices.
Warner Losh [Tue, 25 Jan 2022 23:23:03 +0000 (16:23 -0700)]
mpr/mps: Fix a race in diagnostic reset
There's a small race in freezing the simq when performing a diagnostic
reset. During this time, a transaction can slip through and encounter
the target id of 0. If we're still in diagnostic reset when we detect
this, return a CAM_DEVICE_NOT_THERE status. Instead, freeze the queue
and return a requeue status, similar to what we do when we're resetting
a target and a transaction get here. The race is unavoidable due to
separate locks for queue and SIM, but easy enough to detect and make
harmless.
Robert Wing [Tue, 25 Jan 2022 16:44:13 +0000 (07:44 -0900)]
bhyve/block_if: allow DIOCGMEDIASIZE ioctl
This is needed to get mediasize of the device after a resize event.
I missed this earlier as I was building WITH_BHYVE_SNAPSHOT, which
disables capsicum.
Reviewed by: khng, markj Fixes: ae9ea22e14bf ("bhyve: get mediasize for character devices when ...")
Differential Revision: https://reviews.freebsd.org/D34013
__sflush(): on write error, if nothing was written, reset FILE state back
otherwise the data is just dropped. Check for current position equal to
the buffer base at the entry of the function; if not equal, setvbuf()
was done from the write method and it is not our business to override
the decision.
PR: 76398
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34007
Emmanuel Vadot [Thu, 4 Nov 2021 09:42:37 +0000 (10:42 +0100)]
linuxkpi: Add i2c support
Add i2c support to linuxkpi. This is needed by drm-kmod.
For every i2c_adapter added by i2c_add_adapter we add a child to the
device named "lkpi_iic". This child handle the conversion between
Linux i2c_msgs to FreeBSD iic_msgs.
For every i2c_adapter added by i2c_bit_add_bus we add a child to the
device named "lkpi_iicbb". This child handle the conversion between
Linux i2c_msgs to FreeBSD iic_msgs.
With the help of iic(4), this expose the i2c controller to userspace
allowing a user to query DDC information from a monitor.
e.g.: i2c -f /dev/iic0 -a 0x28 -c 128 -d r
will query the standard EDID from the monitor if plugged.
The bitbang part (lkpi_iicbb) isn't tested at all for now as I don't have
compatible hardware (all my hardware have native i2c controller).
Tested on: Intel (SandyBridge, Skylake, ApolloLake)
Tested on: AMD (Picasso, Polaris (amd64 and arm64))
Don't emit warnings; this isn't any different from a Linux kernel
built without OPTIONS_SECCOMP, so the userspace already needs to know
how to deal with it. This is also similar with how we handle seccomp
in linux_prctl().
We have the authorization from the University of California to remove
the advertising clause for a while, wosch@ who also hold a copyright
on this code also approved the relicensing
Gleb Smirnoff [Tue, 25 Jan 2022 05:08:03 +0000 (21:08 -0800)]
tests/net*: destroy interface from inside a jail
There is no guarentee that upon return of 'jail -r' all jail resources
will be released. The test suite used to rely on that. Recent changes
to the PCB zones made jails delay releasing their resources, which ended
with interface leak in the test suite.
Fix that by executing 'ifconfig foo0 destroy' inside the jail, instead
of doing 'jail -r' and expecting interfaces to pop up back immediately
in the parent jail.
Gleb Smirnoff [Tue, 25 Jan 2022 05:07:16 +0000 (21:07 -0800)]
if_clone: correctly destroy a clone from a different vnet
Try to live with cruel reality fact - if_vmove doesn't move an
interface from previous vnet cloning infrastructure to the new
one. Let's admit this as design feature and make it work better.
* Delete two blocks of code that would fallback to vnet0, if a
cloner isn't found. They didn't do any good job and also whole
idea of treating vnet0 as special one is wrong.
* When deleting a cloned interface, lookup its cloner using it's
home vnet.
Gleb Smirnoff [Tue, 25 Jan 2022 05:06:59 +0000 (21:06 -0800)]
if_vmove: improve restoration in cloner's ifgroup membership
* Do a single call into if_clone.c instead of two. The cloner
can't disappear since the interface sits on its list.
* Make restoration smarter - check that cloner with same name
exists in the new vnet.
nd6: use CARP link level address in SLLAO for NS sent out
When sending an NS, check if we are using a IPv6 CARP address
and if we do, then put proper CARP link level address into
ND_OPT_SOURCE_LINKADDR option and also put PACKET_TAG_CARP tag
on the packet. The latter will enforce CARP link level address
at the data link layer too, which might be necessary for broken
implementations.
The code really follows what NA sending code has been doing since
introduction of carp(4). While here, bring to style(9) the whole
block of code.
Eric Joyner [Thu, 29 Jul 2021 23:24:14 +0000 (16:24 -0700)]
iflib: Allow drivers to determine which queue to TX on
Adds a new function pointer to struct if_txrx in order to allow
drivers to set their own function that will determine which queue
a packet should be sent on.
Since this includes a kernel ABI change, bump the __FreeBSD_version
as well.
(This motivation behind this is to allow the driver to examine the
UP in the VLAN tag and determine which queue to TX on based on
that, in support of HW TX traffic shaping.)