Alexander Motin [Wed, 27 Dec 2023 03:30:56 +0000 (22:30 -0500)]
Schedule fast taskqueue callouts on right CPU.
With fast taskqueues using direct callouts we can reduce number of
CPU wakeups by scheduling callout on current CPU if taskqueue calls
taskqueue_enqueue_timeout() on itself. The trick won't work for
regular taskqueues, since the callout thread will occupy the CPU.
It also may not work in case of multiple threads since we do not
know which thread will pick the task, and we do not want excessive
callout migrations. So we optimize only the other cases we can.
In practice this allows iichid(4) taskqueue to stay on CPU where
underlying ig4(4) interrupts are routed and to not kick CPU 0 with
timer interrupts on each sampling period (every 2nd/3rd sleep).
Lexi Winter [Thu, 4 Jan 2024 22:34:58 +0000 (22:34 +0000)]
mail: add volatile in grabh()
setjmp() requires that any stack variables modified between the setjmp
call and the longjmp() must be volatile. This means that 'saveint' in
grabh() must be volatile, since it's modified after the setjmp().
Otherwise, the signal handler is not properly restored, resulting in a
crash (SIGBUS) if ^C is typed twice while composing.
Colin Percival [Thu, 18 Jan 2024 18:30:03 +0000 (10:30 -0800)]
13.3: update stable/13 to -PRERELEASE
This marks the start of the FreeBSD 13.3 release cycle; the stable/13
tree is now in "code slush".
Developers are encouraged to prioritize fixing bugs (and/or merging bug
fixes from HEAD) over new features at this time. Commit approval from
re@ is not required but if new features introduce problems they may be
removed from the release.
Colin Percival [Wed, 27 Dec 2023 08:09:08 +0000 (00:09 -0800)]
x86: Adjust base addr for PCI MCFG regions
Each bus gets 1 MB of address space; the actual base address for an
MCFG bus range is the address from the table plus the starting bus
number times 1 MB.
The PCI spec is unclear on this point, but this change matches what
Linux does, which is likely enough of a de facto standard regardless
of what any de jure standard might attempt to say.
John Baldwin [Thu, 18 Jan 2024 23:19:11 +0000 (15:19 -0800)]
pci_cfgreg: Add shims to preserve ABI of pci_cfgreg(read|write)
This is a direct commit to stable/14 to preserve the ABI of the
the pci_cfgregread and pci_cfgregwrite functions. The new routines
are renamed to add a _domain suffix and macros map the new API to
the new functions.
Note: No API compatibility has been provided as modules in ports
should not be using this internal API (normal PCI drivers use
pci_read_config and pci_write_config with a device_t).
John Baldwin [Wed, 29 Nov 2023 18:31:47 +0000 (10:31 -0800)]
pci_cfgreg: Add a PCI domain argument to the low-level register API
This commit changes the API of pci_cfgreg(read|write) to add a domain
argument (referred to as a segment in ACPI parlance) (note that this
is not the same as a NUMA domain, but something PCI-specific). This
does not yet enable access to domains other than 0, but updates the
API to support domains.
Places that use hard-coded bus/slot/function addresses have been
updated to hardcode a domain of 0. A few places that have the PCI
domain (segment) available such as the acpi_pcib_acpi.c Host-PCI
bridge driver pass the PCI domain.
The hpt27xx(4) and hptnr(4) drivers fail to attach to a device not on
domain 0 since they provide APIs to their binary blobs that only
permit bus/slot/function addressing.
The x86 non-ACPI PCI bus drivers all hardcode a domain of 0 as they do
not support multiple domains.
John Baldwin [Thu, 28 Dec 2023 19:17:22 +0000 (11:17 -0800)]
mbuf.9: Document mtodo
mtodo() accepts an mbuf and offset and returns a void * pointer to the
requested offset into the mbuf's associated data. Similar to mtod(),
no bounds checking is performed.
John Baldwin [Tue, 9 Jan 2024 19:23:10 +0000 (11:23 -0800)]
acpi: Only reserve resources enumerated via _CRS
In particular, don't reserve resources added by drivers via other
means (e.g. acpi_bus_alloc_gas which calls bus_alloc_resource
right after adding the resource).
The intention of reserved resources is to ensure that a resource range
that a bus driver knows is assigned to a device is reserved by the
system even if no driver is attached to the device. This prevents
other "wildcard" resource requests from conflicting with these
resources. For ACPI, the only resources the bus driver knows about
for unattached devices are the resources returned from _CRS. All of
these resources are already reserved now via acpi_reserve_resources
called from acpi_probe_children.
As such, remove the logic from acpi_set_resource to try to reserve
resources when they are set. This permits RF_SHAREABLE to work with
acpi_bus_alloc_gas without requiring hacks like the current one for
CPU device resources in acpi_set_resource.
Reported by: gallatin (RF_SHAREABLE not working)
Diagnosed by: jrtc27
John Baldwin [Tue, 9 Jan 2024 18:57:48 +0000 (10:57 -0800)]
kldxref: Workaround incorrect PT_DYNAMIC in existing powerpc kernels
Existing powerpc kernels include additional sections beyond .dynamic
in the PT_DYNAMIC segment. Relax the requirement for an exact size
match of the section and segment for PowerPC files as a workaround.
Alex Richardson [Tue, 2 Jan 2024 19:06:51 +0000 (11:06 -0800)]
kldxref: fix bootstrapping on Linux with Clang 16
The glibc fts_open() callback type does not have the second const
qualifier and it appears that Clang 16 errors by default for mismatched
function pointer types. Add an ifdef to handle this case.
John Baldwin [Fri, 22 Dec 2023 15:49:40 +0000 (07:49 -0800)]
kldxref: Appease a Coverity warning
While parsing .dynamic, nsym is set when parsing the symbol table from
.dynsym. That parsing also sets ef->ef_symtab to a non-NULL value.
The value of nsym isn't validated until after a check for
ef->ef_symtab being NULL, so nsym always has a valid value when it is
read. However, that chain of events is a bit much for static analysis
to follow, so initialize nsym to 0 before parsing sections to quiet
the warning.
John Baldwin [Fri, 22 Dec 2023 15:49:18 +0000 (07:49 -0800)]
kldxref: Simplify handling of ELF object files
Unlike the backend for ELF DSOs, the object file backend allocated an
aligned chunk of memory and read all of the in-memory sections from
the file into this memory even though most of the file contents were
never used. Instead, just track a set of virtual addresses (based at
0) that each loaded section would be loaded at and only read the
necessary bits from the backing file when needed.
John Baldwin [Fri, 22 Dec 2023 15:49:03 +0000 (07:49 -0800)]
kldxref: Simplify elf_read_raw_data
Use pread as a valid offset is always passed now. Originally the DSO
code read the .hash section in two separate requests and relied on the
implicit offset for the second read, but now the hash table is fetched
in a single call.
Jessica Clarke [Thu, 14 Dec 2023 20:17:20 +0000 (20:17 +0000)]
kldxref: Reduce divergence between per-architecture files
Note that relbase is always 0 for DSOs so its omission for __KLD_SHARED
architectures was not a bug in practice.
Whilst here, also parenthesise the dest offset for where to avoid
transiently creating an out-of-bounds pointer, which is UB (though even
on CHERI architectures, where capability bounds compression can result
in that creating invalid capabilities that will trap on dereference,
optimisation will reassociate to the correct form in practice and thus
work just fine).
John Baldwin [Tue, 12 Dec 2023 23:43:00 +0000 (15:43 -0800)]
kldxref: Make use of libelf to be a portable cross tool
This allows kldxref to operate on kernel objects from any
architecture, not just the native architecture. In particular, this
will permit generating linker.hints files as part of a cross-arch
release build.
- elf.c is a new file that includes various wrappers around libelf
including routines to read ELF data structures such as program and
section headers and ELF relocations into the "generic" forms
described in <gelf.h>. This file also provides routines for
converting a linker set into an array of addresses (GElf_Addr)
as well as reading architecture-specific mod_* structures and
converting them into "generic" Gmod_* forms where pointers are
replaced with addresses.
- The various architecture-specific reloc handlers now use GElf_*
types for most values (including GElf_Rel and GElf_Rela for
relocation structures) and use routines from <sys/endian.h> to read
and write target values. A new linker set matches reloc handlers
to specific ELF (class, encoding, machine) tuples.
- The bits of kldxref.c that write out linker.hints now use the
encoding (ELFDATA2[LM]SB) of the first file encountered in a
directory to set the endianness of the output file. Input files
with a different architecture in the same directory are skipped with
a warning. In addition, the initial version record for the file
must be deferred until the first record is finished since the
architecture of the output file is not known until then.
- Various places that used 'sizeof(void *)' throughout now use
'elf_pointer_size()' to determine the size of a pointer in the
target architecture.
Tested by: amd64 binary on both amd64 and i386 /boot/kernel
Reviewed by: imp
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D42966
John Baldwin [Tue, 12 Dec 2023 23:30:16 +0000 (15:30 -0800)]
kldxref: Refactor PNP entry parsing, no functional change
- Add a free_pnp_list to complement parse_pnp_list. Add freeing
of 'new_desc' which was previously leaked.
- Move body of loop that checked a single pnp list element against a
table entry into a parse_pnp_entry function to reduce indentation
and split parse_entry into a smaller function.
- Similarly, split out a record_pnp_info function from parse_entry
which builds the pnp_list and walks a table.
Olivier Certner [Fri, 24 Nov 2023 21:21:16 +0000 (22:21 +0100)]
libthr: thr_attr.c: EINVAL, not ENOTSUP, on invalid arguments
On first read, POSIX may seem ambiguous about the return code for some
scheduling-related pthread functions on invalid arguments. But a more
thorough reading and a bit of standards archeology strongly suggests
that this case should be handled by EINVAL and that ENOTSUP is reserved
for implementations providing only part of the functionality required by
the POSIX option POSIX_PRIORITY_SCHEDULING (e.g., if an implementation
doesn't support SCHED_FIFO, it should return ENOTSUP on a call to, e.g.,
sched_setscheduler() with 'policy' SCHED_FIFO).
This reading is supported by the second sentence of the very definition
of ENOTSUP, as worded in CAE/XSI Issue 5 and POSIX Issue 6: "The
implementation does not support this feature of the Realtime Feature
Group.", and the fact that an additional ENOTSUP case was added to
pthread_setschedparam() in Issue 6, which introduces SCHED_SPORADIC,
saying that pthread_setschedparam() may return it when attempting to
dynamically switch to SCHED_SPORADIC on systems that doesn't support
that.
glibc, illumos and NetBSD also support that reading by always returning
EINVAL, and OpenBSD as well, since it always returns EINVAL but the
corresponding code has a comment suggesting returning ENOTSUP for
SCHED_FIFO and SCHED_RR, which it effectively doesn't support.
Additionally, always returning EINVAL fixes inconsistencies where EINVAL
would be returned on some out-of-range values and ENOTSUP on others.
Reviewed by: markj
Approved by: markj (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43006
Marius Strobl [Fri, 12 Jan 2024 22:27:07 +0000 (23:27 +0100)]
uart(4): Honor hardware state of NS8250-class for tsw_busy
In 9750d9e5, I brought the equivalent of the TS_BUSY flag back in a
mostly hardware-agnostic way in order to fix tty_drain() and, thus,
TIOCDRAIN for UARTs with TX FIFOs. This proved to be sufficient for
fixing the regression reported. So in light of the release cycle of
FreeBSD 10.3, I decided that this change was be good enough for the
time being and opted to go with the smallest possible yet generic
(for all UARTs driven by uart(4)) solution addressing the problem at
hand.
However, at least for the NS8250-class the above isn't a complete
fix as these UARTs only trigger an interrupt when the TX FIFO became
empty. At this point, there still can be an outstanding character
left in the transmit shift register as indicated via the LSR. Thus,
this change adds the 3rd (besides the tty(4) and generic uart(4) bits)
part I had in my tree ever since, adding a uart_txbusy method to be
queried in addition for tsw_busy and hooking it up as appropriate
for the NS8250-class.
As it turns out, the exact equivalent of this 3rd part later on was
implemented for uftdi(4) in 9ad221a5.
While at it, explain the rational behind the deliberately missing
locking in uart_tty_busy() (also applying to the generic sc_txbusy
testing already present).
Marius Strobl [Tue, 9 Jan 2024 22:01:46 +0000 (23:01 +0100)]
igb(4): Remove disconnected SYSCTL
The global hw.igb.rx_process_limit knob never was adhered to by the
in-tree version of this driver but similar functionality is available
via the device-specific dev.igb.N.iflib.rx_budget.
While at it, remove the - besides initialization of tx_process_limit -
unused {r,t}x_process_limit members.
Stefan Eßer [Sat, 29 Jul 2023 18:52:53 +0000 (20:52 +0200)]
usr.bin/gh-bc: fix Makefile for WITHOUT_NLS_CATALOGS case
Some macro definitions had been moved into a Makefile section
that depends on MK_NLS_CATALOGS != "no", leading to LTO and the
installation of tests being disabled in the WITHOUT_NLS_CATALOGS
case.
This update fixes a bug where line breaks in printed numbers may not
match the line length set by the user. The value is printed correctly,
just not split as specified in some situations.
Output a line as soon as it is possible to determine that it will have
to be output. For the basic case, this means output each line as it is
read unless it is identical to the previous one. For the -d case, it
means output the first instance as soon as the second is read, unless
the -c option was also given. The -D and -u cases were already fine.
Add test cases for interactive use with no options and with -d.
Daniel Tameling [Sat, 25 Feb 2023 17:25:51 +0000 (10:25 -0700)]
uniq(1): use strtonum to parse options
Previously strtol was used and the result was directly cast to an int
without checking for an overflow. Use strtonum instead since it is
safer and tells us what went wrong.
The standard is somewhat unclear, but on the balance, I believe that the
phrase “the rest of the input line” should be interpreted to mean the
rest of the input line including the terminating newline if and only if
there is one. This means the current implementation is incorrect on two
points:
- First, it suppresses the previous line's newline in the '1' case.
- Second, it unconditionally emits a newline at the end of the output
for non-empty input, even if the input did not end with a newline.
Resolve this by rewriting the main loop. Instead of special-casing the
first line and then assuming that every line ends with a newline, we
remember how each line ends and emit that either at the beginning of
the next line or at the end of the file except in the one case ('+')
where the standard explicitly says not to.
While here, try to reduce diff to upstream a little and update their
RCS tag to reflect the fact that while we've diverged significantly
from them, we've incorporated all their changes. Remove the useless
second RCS tag.
We also update the tests to account for the change in interpretation
of the '1' case and add a test case for unterminated input.
Rewrite `copy_file()` so the lflag and sflag are handled as early as
possible instead of constantly checking that they're not set and then
handling them at the end. This also opens the door to changing the
failure logic at some future point (for instance, we might decide to
fall back to copying if `errno` indicates that the file system does not
support links).
This test case tests two different things: first, that copying a symlink
results in a file with the same contents as the target of the symlink,
rather than a second symlink, and second, that cp will refuse to copy a
file to itself, or to a link to itself, or a link to its target. Leave
the first part in basic_symlink, move the second part to a new test case
named samefile, and slightly expand both cases.
- The HLPR flags are grouped together at the beginning because they are
the standard flags for programs using FTS. Move the N flag out from
among them to its correct place in the sequence.
- The Pflag variable isn't used outside main(), but moving it out lets
us skip initialization and keeps it with its friends H, L and R.
If the destination file exists but we decide unlink it, set the dne
flag. This means we don't need to re-check the conditions that would
have caused us to delete the file when we later need to decide whether
to create or replace it.
Warner Losh [Thu, 7 Dec 2023 19:32:27 +0000 (12:32 -0700)]
cp: Add -N flag, inspired by NetBSD's similar flag
Add -N to supress copying of file flags when -p is specified (explicitly
or implicitly). Often times we don't care about the flags or wish to be
able to copy to NFS, and this comes in handy for that. FreeBSD's and
NetBSD's cp are somewhat different, so I had to reimplement all but one
of the patch hunks...
cp: Don't warn for chflags() failing with EOPNOTSUPP if flags == 0
From NetBSD's utils.c 1.5 importing importing BSDI change, with light
formatting changes:
Author: cgd <cgd@NetBSD.org>
Date: Wed Feb 26 14:40:51 1997 +0000
Patch from BSDI (via Keith Bostic):
>NFS doesn't support chflags; ignore errors unless there's reason
>to believe we're losing bits. (Note, this still won't be right
>if the server supports flags and we were trying to *remove* flags
>on a file that we copied, i.e., that we didn't create.)
Tom Jones [Thu, 27 Jan 2022 17:24:45 +0000 (17:24 +0000)]
Remove SMALL conditionals from gzip
gzip has SMALL conditionals which enable building a reduced size version
of the binary. These exist as part of the introduction of BSD licensed
gzip in 2004 in NetBSD and appear to have been required to reach a size
for inclusion in their install media. For more information see commits
to gzip in the NetBSD tree on the 28th of March 2004.
SMALL doesn't appear to be hooked up to our build system and
complicates gzip quite a bit.
Kyle Evans [Wed, 3 Jan 2024 22:17:59 +0000 (16:17 -0600)]
bhyveload: use a dirfd to support -h
Don't allow lookups from the loader scripts, which in rare cases may be
in guest control depending on the setup, to leave the specified host
root. Open the root dir and strictly do RESOLVE_BENEATH lookups from
there.
cb_open() has been restructured a bit to work nicely with this, using
fdopendir() in the directory case and just using the fd we already
opened in the regular file case.
hostbase_open() was split out to provide an obvious place to apply
rights(4) if that's something we care to do.
man.sh on stable/13 is missing some new features.
Unfortunately this means that this fix is not working as on stable/14.
Be patient and wait until the following 2 commits are ready to merge.
Mike Karels [Sun, 14 Jan 2024 17:01:19 +0000 (11:01 -0600)]
Increase the size of riscv GENERICSD images to 6 GB
The stable/13 snapshot this week failed to build the riscv GENERICSD
image because it ran out of space. Checking main and stable/14
snapshots, they are also low on space, around 100% or more of
capacity. Increase them all from 5 GB to 6 GB. Note, this is the
only riscv image configuration.
Zhenlei Huang [Tue, 7 Nov 2023 04:45:25 +0000 (12:45 +0800)]
kern linker: Do not retry loading modules on EEXIST
LINKER_LOAD_FILE() calls linker_load_dependencies() which will return
EEXIST in case the module to be loaded has already been compiled into
the kernel. Since the format of the module is now recognized then there
is no need to retry loading with a different linker, otherwise the
userland will get misleading error number ENOEXEC.
tcp: prevent spurious empty segments and fix uncommon panic
Only try sending more data on pure ACKs when there is
more data available in the send buffer.
In the case of a retransmitted SYN not being sent due to
an internal error, the snd_una/snd_nxt accounting could
be off, leading to a panic. Pulling snd_nxt up to snd_una
prevents this from happening.
Xin LI [Fri, 12 Jan 2024 05:38:04 +0000 (21:38 -0800)]
releng-gce: Advertise the availability of gVNIC support in GCE images.
This marks FreeBSD GCE images as gVNIC capable by adding the
--guest-os-features=GVNIC flag at creation time as suggested in GCE
documentation[1]. This allows Generation 3 and newer GCE instances
to leverage advanced networking capabilities and performance
enhancements provided by gVNIC. Users will benefit from these
improvements without needing to create custom images.
This commit introduces SRD metrics through sysctl.
The metrics can be queried using the following sysctl node:
sysctl dev.ena.<device index>.ena_srd_info
This commit adds sysctl support for customer metrics.
Different customer metrics can be found in the following sysctl node:
sysctl dev.ena.<device index>.customer_metrics
ena: Introduce shared sample interval for all stats
Rename sample_interval node to stats_sample_interval and move
it up in the sysctl tree to make it clear that it's relevant for
all the stats and not only ENI metrics (Currently, sample interval node
is found under eni_metrics node).
Path to node:
dev.ena.<device_index>.stats_sample_interval
Once this parameter is set it will set the sample interval for all the
stats node including SRD/customer metrics.
Osama Abboud [Mon, 30 Oct 2023 11:27:03 +0000 (11:27 +0000)]
ena: Add sysctl support for spreading IRQs
This commit allows spreading IO IRQs over different CPUs through sysctl.
Two sysctl nodes are introduced:
1- base_cpu: servers as the first CPU to which the first IO IRQ
will be bound.
2- cpu_stride: sets the distance between every two CPUs to which every
two consecutive IO IRQs are bound.
For example for doing the following IO IRQs / CPU binding:
Run the following commands:
sysctl dev.ena.<device index>.irq_affinity.base_cpu=0
sysctl dev.ena.<device_index>.irq_affinity.cpu_stride=2
Also introduced rss_enabled field, which is intended to replace
'#ifdef RSS' in multiple places, in order to prevent code duplication.
We want to bind interrupts to CPUs in case of rss set OR in case
the newly defined sysctl paremeter is set. This requires to remove a
couple of '#ifdef RSS' as well in the structs, since we'll be using the
relevant parameters in the CPU binding code.
Gleb Smirnoff [Mon, 4 Dec 2023 18:18:56 +0000 (10:18 -0800)]
if_tuntap: fix NOIP build
Note: this removes one TUNDEBUG() for the sake of not having one more
ifdefed variable declaration and for the overall code brevity. The call
from tuntap into LRO can be so easily traced with dtrace(1) that an
80-ish printf(9)-based debugging can be omitted.
Michael Tuexen [Thu, 9 Nov 2023 10:37:27 +0000 (11:37 +0100)]
if_tuntap: support receive checksum offloading for tap interfaces
When enabled, pretend that the IPv4 and transport layer checksum
is correct for packets injected via the character device.
This is a prerequisite for adding support for LRO, which will
be added next. Then packetdrill can be used to test the LRO
code in local mode.
Michael Tuexen [Sun, 5 Nov 2023 19:32:46 +0000 (20:32 +0100)]
if_tuntap: trigger the bpf hook on transmitting for the tap interface
The tun interface triggers the bpf hook when a packet is transmitted,
the tap interface triggers it when the packet is read from the
character device. This is inconsistent.
So fix the tap device such that it behaves like the tun device.
This is needed for adding support for the tap device to packetdrill.
Michael Tuexen [Sun, 5 Nov 2023 14:28:54 +0000 (15:28 +0100)]
udplite: make socketoption available on IPv6 sockets
This patch allows the IPPROTO_UDPLITE-level socket options
UDPLITE_SEND_CSCOV and UDPLITE_RECV_CSCOV to be used on
AF_INET6 sockets in addition to AF_INET sockets.