cxgbe(4): let the PF driver use VM work requests for transmit.
This allows the PF interfaces to communicate with the VF interfaces over
the internal switch in the ASIC. Fix the GL limits for VM work requests
while here.
MFC after: 3 days
Sponsored by: Chelsio Communications
Remove claim that Allied Forces created "West Germany" in 1953. I can
find no historic substantiation for such a claim. The Federal
Republic of Germany was created by Germans on 23 May 1949, as also
noted in this file.
D Scott Phillips [Mon, 21 Sep 2020 22:24:46 +0000 (22:24 +0000)]
stand/reloc_elf: Handle relative relocations for arm{,64} and riscv
Extend the powerpc relative relocation handling from r240782 to a
handful of other architectures. This is needed to properly read
dependency information from kernel modules.
D Scott Phillips [Mon, 21 Sep 2020 22:22:53 +0000 (22:22 +0000)]
vm_reserv: Sparsify the vm_reserv_array when VM_PHYSSEG_SPARSE
On an Ampere Altra system, the physical memory is populated
sparsely within the physical address space, with only about 0.4%
of physical addresses backed by RAM in the range [0, last_pa].
This is causing the vm_reserv_array to be over-sized by a few
orders of magnitude, wasting roughly 5 GiB on a system with
256 GiB of RAM.
The sparse allocation of vm_reserv_array is controlled by defining
VM_PHYSSEG_SPARSE, with the dense allocation still remaining for
platforms with VM_PHYSSEG_DENSE.
D Scott Phillips [Mon, 21 Sep 2020 22:21:59 +0000 (22:21 +0000)]
Sparsify the vm_page_dump bitmap
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).
Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.
D Scott Phillips [Mon, 21 Sep 2020 22:20:37 +0000 (22:20 +0000)]
Move vm_page_dump bitset array definition to MI code
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.
Mark Johnston [Mon, 21 Sep 2020 22:19:21 +0000 (22:19 +0000)]
Weaken assertions in pmap_l1_to_l2() and pmap_l2_to_l3().
pmap_update_entry() will temporarily clear the valid bit of page table
entries in order to satisfy the arm64 pmap's break-before-make
constraint. pmap_kextract() may operate concurrently on kernel page
table pages, introducing windows where the assertions added in r365879
may fail incorrectly since they implicitly assert that the valid bit is
set. Modify the assertions to handle this.
D Scott Phillips [Mon, 21 Sep 2020 22:19:12 +0000 (22:19 +0000)]
bitset: expand bit index type to `long`
An upcoming patch to use the bitset macros for tracking vm page
dump information could conceivably need more than INT_MAX bits.
Expand the bit type to long so that the extra range is available
on 64-bit platforms where it would most likely be needed.
CPUSET_COUNT and DOMAINSET_COUNT are also modified to remain of
type `int`.
Rework part of routing code to reduce difference to D26449.
* Split rt_setmetrics into get_info_weight() and rt_set_expire_info(),
as these two can be applied at different entities and at different times.
* Start filling route weight in route change notifications
* Pass flowid to UDP/raw IP route lookups
* Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact
that rtentry can contain multiple nexthops.
Alex Richardson [Mon, 21 Sep 2020 19:03:07 +0000 (19:03 +0000)]
awk: Fix subobject out-of-bounds access
When matching a regex with ^, it would attempt to access
gototab[NSTATES][NCHARS+2], and therefore access the state for the \002
character instead. This change is required to run awk under CHERI (with
sub-object bounds) and when running with UBSan instrumentation.
Alan Somers [Mon, 21 Sep 2020 17:48:28 +0000 (17:48 +0000)]
fsx: fix build with WARNS=6
* signed/unsigned comparisons
* use standard warn(3)
* Suppress warnings about local vars and funcs not declared static
* const-correctness
* declaration shadows a variable in the global scope
Some userspace code include sys/kernel.h. Namely, some OpenZFS tests do
this, and it was causing breakage after r365945 due to a lack of bool
typedef. Userspace should not need the TUNABLE_** stuff, so hide it
behind an #ifdef _KERNEL.
Sorry for the breakage.
Reported by: andrew, Michael Butler, Jenkins
Discussed with: kevans, allanjude
If Unwind_Backtrace is broken, ctx.n will still contain ~0, and we will
return that which poor behavior for the user, so return 0 instead.
We could document ~0 to be an error, but that would deviate from the
Linux behavior which is not desirable. Noted by Poul-Henning Kamp
Alex Richardson [Mon, 21 Sep 2020 15:49:02 +0000 (15:49 +0000)]
When building on Ubuntu bootstrap bmake with bash as the default shell
The Ubuntu /bin/sh (dash) removes all environment variables that contain
characters outside the [a-zA-Z0-9_] range and this breaks the bmake tests that
run as part of bootstrapping bmake.
This can be reverted when the bmake tests have been updated.
Alex Richardson [Mon, 21 Sep 2020 15:48:57 +0000 (15:48 +0000)]
Add a tools/build/make.py script that bootstraps bmake and then runs the build
This makes it possible to compile on non-FreeBSD systems since make will
usually be GNU make there. Even if they include bmake, it will often
either be a broken version or too old to build FreeBSD.
This should be the last commit needed to compile FreeBSD on Linux+macOS.
After over two years, I've finally managed to upstream all our local CheriBSD
changes to allow building on Linux (and as a result of being reviewed by more
people they are slightly less ugly than they were before).
It should now be possible to run the following to build on Linux+macOS if you
have LLVM/Clang 10 or newer installed:
MAKEOBJDIRPREFIX=/somewhere ./tools/build/make.py TARGET=amd64 TARGET_ARCH=amd64 buildworld
I have only tested macOS 15, Ubuntu 18.04 and openSUSE Leap, but other Linux
distributions might also work (as long as they ship a recent GLibc and compiler).
Reviewed By: emaste (should be fine to commit to tools/)
Differential Revision: https://reviews.freebsd.org/D16767
David Bright [Mon, 21 Sep 2020 15:45:49 +0000 (15:45 +0000)]
Honor the FWUG value of some drives in nvmecontrol
nvmecontrol tries to upload firmware in chunks as large as it thinks
the device permits. It fails to take into account the FWUG value used
by some drives to advertise the size and alignment limits for firmware
chunks.
- Use the firwmare update granularity value from the
- If the granularity is not reported or not restricted, fall back to
the previously existing logic that calculates the max transfer
size based on MDTS.
- Add firmware update granularity to the identify-controller output.
This adds the getenv_bool() function, to parse a boolean value from a
kernel environment variable or tunable. This works for traditional
boolean values like "0" and "1", and also "true" and "false"
(case-insensitive). These semantics do not yet apply to sysctls declared
using SYSCTL_BOOL with CTLFLAG_TUN (they still only parse 1 and 0).
Also added are two wrapper functions, getenv_is_true() and
getenv_is_false(). These are slightly simpler for callers wishing to
perform a single check of a configuration variable.
Reviewed by: jhb (slightly earlier version)
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26270
OTG mode is not supported still. It's easy to do it as a one-off
detection, but the proper support requires continuous monitoring and
communicating the current state to the USB layer.
Also, fix phy0_route setting for H3. Remove duplicate register
definitions.
Tested on Orange Pi PC Plus with dr_mode="peripheral" using
hw.usb.template=3
umodem_load="YES"
Reviewed by: manu
MFC after: 5 weeks
Differential Revision: https://reviews.freebsd.org/D26348
Alex Richardson [Mon, 21 Sep 2020 09:03:42 +0000 (09:03 +0000)]
Fix vi build on Linux/macOS
This absolute include causes a build failure on Linux for me:
.../cheri/freebsd/contrib/nvi/cl/../common/common.h:10:10: fatal error: '/usr/include/db.h' file not found
This change patches the file to use #include <db.h> instead until a
solution has been found upstream. See also https://github.com/lichray/nvi2/issues/69
Alex Richardson [Mon, 21 Sep 2020 09:03:32 +0000 (09:03 +0000)]
Prefer bootstrapped tools when running certctl.sh
Otherwise we get lots of warnings when building on Linux/macOS during
installworld:
Scanning /local/scratch/alr48/cheri/output/freebsd-x86/usr/share/certs/blacklisted for certificates...
install: invalid option -- 'U'
Try 'install --help' for more information.
install: invalid option -- 'U'
....
Toomas Soome [Mon, 21 Sep 2020 09:01:10 +0000 (09:01 +0000)]
loader: zfs should support bootonce an nextboot
bootonce feature is temporary, one time boot, activated by
"bectl activate -t BE", "bectl activate -T BE" will reset the bootonce flag.
By default, the bootonce setting is reset on attempt to boot and the next
boot will use previously active BE.
By setting zfs_bootonce_activate="YES" in rc.conf, the bootonce BE will
be set permanently active.
bootonce dataset name is recorded in boot pool labels, bootenv area.
in case of nextboot, the nextboot_enable boolean variable is recorded in
freebsd:nvstore nvlist, also stored in boot pool label bootenv area.
On boot, the loader will process /boot/nextboot.conf if nextboot_enable
is "YES", and will set nextboot_enable to "NO", preventing /boot/nextboot.conf
processing on next boot.
bootonce and nextboot features are usable in both UEFI and BIOS boot.
To use bootonce/nextboot features, the boot loader needs to be updated on disk;
if loader.efi is stored on ESP, then ESP needs to be updated and
for BIOS boot, stage2 (zfsboot or gptzfsboot) needs to be updated
(gpart or other tools).
At this time, only lua loader is updated.
Sponsored by: Netflix, Klara Inc.
Differential Revision: https://reviews.freebsd.org/D25512
atomic_common.h: Fix the volatile qualifier placement in atomic_load_ptr
This was broken in r357940 which introduced the __typeof use. We need
the volatile qualifier to be on the pointee not the pointer otherwise it
does nothing. This was found by mhorne in D26498, noticing there was a
problem (a spin loop condition was hoisted for RISC-V boot code) but not
the root cause of it.
Fix gw updates / flag updates during route changes.
* Zero gw_sdl if switching to interface route - the assumption
that underlying storage is zeroed is incorrect with route changes.
* Apply proper flag mask to rte.
Update the libufs cgget() and cgput() interfaces to have a similar
API to the sbget() and sbput() interfaces. Specifically they take
a file descriptor pointer rather than the struct uufsd *disk pointer
used by the libufs cgread() and cgwrite() interfaces. Update fsck_ffs
to use these revised interfaces.
The fsdb(8) utility uses the fsck_ffs(8) disk I/O interfaces, so
switch from using libufs's bread() to using fsck_ffs's getdatablk()
when importing tools/diag/prtblnos's prtblknos().
Alan Somers [Sat, 19 Sep 2020 19:08:27 +0000 (19:08 +0000)]
fix integer underflow in getgrnam_r and getpwnam_r
Sometimes nscd(8) will return a 1-byte buffer for a nonexistent entry. This
triggered an integer underflow in grp_unmarshal_func, causing getgrnam_r to
return ERANGE instead of 0.
Fix the user's buffer size check, and add a correct check for a too-small
nscd buffer.
Mark Johnston [Sat, 19 Sep 2020 15:22:04 +0000 (15:22 +0000)]
Fix some nits in 1G page support in the amd64 pmap.
- Move assertions out of the main loop to avoid duplicate conditional
expressions, and improve assertion messages.
- Fix va_next updates. In some cases we were not doing the wraparound
check before continuing the loop.
- Use the right va_next. In pmap_advise() and pmap_copy() we would step
through 1G pages 2M at a time.
- Copy 1G mappings in pmap_copy().
Alex Richardson [Sat, 19 Sep 2020 12:08:16 +0000 (12:08 +0000)]
Fix dtrace tools bootstrap on non-FreeBSD after OpenZFS import
This required surprisingly few build system changes and only two changes to the
openZFS compat headers which have been upstreamed as
https://github.com/openzfs/zfs/pull/10863
Michal Meloun [Sat, 19 Sep 2020 11:27:16 +0000 (11:27 +0000)]
Implement workaround for broken access to configuration space.
Due to a HW bug in the RockChip PCIe implementation, attempting to access
a non-existent register in the configuration space will throw an exception.
Use new bus functions bus_peek() and bus_poke() to overcomme this limitation.
Michal Meloun [Sat, 19 Sep 2020 11:06:41 +0000 (11:06 +0000)]
Add NetBSD compatible bus_space_peek_N() and bus_space_poke_N() functions.
One problem with the bus_space_read_N() and bus_space_write_N() family of
functions is that they provide no protection against exceptions which can
occur when no physical hardware or device responds to the read or write
cycles. In such a situation, the system typically would panic due to a
kernel-mode bus error. The bus_space_peek_N() and bus_space_poke_N() family
of functions provide a mechanism to handle these exceptions gracefully
without the risk of crashing the system.
Typical example is access to PCI(e) configuration space in bus enumeration
function on badly implemented PCI(e) root complexes (RK3399 or Neoverse
N1 N1SDP and/or access to PCI(e) register when device is in deep sleep state.
This commit adds a real implementation for arm64 only. The remaining
architectures have bus_space_peek()/bus_space_poke() emulated by using
bus_space_read()/bus_space_write() (without exception handling).
Colin Percival [Sat, 19 Sep 2020 02:15:56 +0000 (02:15 +0000)]
Move finalize_components_config from get_params to cmd_*.
This allows us to redirect its output in cmd_cron, so that the
"src component not installed, skipped" message will be treated
the same way as other output from freebsd-update cron: Sent
in an email to root (or other address specified) if there are
updates to install, and silenced otherwise.
Rick Macklem [Fri, 18 Sep 2020 23:52:56 +0000 (23:52 +0000)]
Fix a LOR between the NFS server and server side krpc.
Recent testing of the NFS-over-TLS code found a LOR between the mutex lock
used for sessions and the sleep lock used for server side krpc socket
structures in nfsrv_checksequence(). This was fixed by r365789.
A similar bug exists in nfsrv_bindconnsess(), where SVC_RELEASE() is called
while mutexes are held.
This patch applies a fix similar to r365789, moving the SVC_RELEASE() call
down to after the mutexes are released.
This patch fixes the problem by moving the SVC_RELEASE() call in
nfsrv_checksequence() down a few lines to below where the mutex is released.
Mark Johnston [Fri, 18 Sep 2020 19:03:34 +0000 (19:03 +0000)]
Install library symlinks atomically.
As we do for shared library binaries, pass -S to install(1) when
installing symlinks. Doing so helps avoid transient failures when
libraries are being reinstalled, which seems to be the root cause of
spurious libgcc_s.so link failures during CI builds.
PR: 233769
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26453
build: provide a default WARNS for all in-tree builds
The current default is provided in various Makefile.inc in some top-level
directories and covers a good portion of the tree, but doesn't cover parts
of the build a little deeper (e.g. libcasper).
Provide a default in src.sys.mk and set WARNS to it in bsd.sys.mk if that
variable is defined. This lets us relatively cleanly provide a default WARNS
no matter where you're building in the src tree without breaking things
outside of the tree.
Crunchgen has been updated as a bootstrap tool to work on this change
because it needs r365605 at a minimum to succeed. The cleanup necessary to
successfully walk over this change on WITHOUT_CLEAN builds has been added.
There is a supplemental project to this to list all of the warnings that are
encountered when the environment has WARNS=6 NO_WERROR=yes:
https://warns.kevans.dev -- this project will hopefully eventually go away
in favor of CI doing a much better job than it.
Some IPMI implementations on arm64 are reportedly unable to load our
memstick installer images, but support the older ISO format. Start
generating these for arm64.
Unlike installer ISOs for other platforms, these images are UEFI-only.
Reviewed by: emaste
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26452
Move the initialization of these variables to the beginning of their
respective functions.
On our end this creates a small amount of unneeded churn, as these
variables are properly initialized before their first use in all cases.
However, changing this benefits at least one downstream consumer
(NetApp) by allowing local and future modifications to these functions
to be made without worrying about where the initialization occurs.
Reviewed by: melifaro, rscheff
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26454
Alex Richardson [Fri, 18 Sep 2020 11:22:34 +0000 (11:22 +0000)]
libarchive: fix mismatch between library and test configuration
I was investigating libarchive test failures on CheriBSD and it turns out
we get a reproducible SIGBUS for test_archive_m5, etc. Debugging this shows
that libarchive and the tests disagree when it comes to the definition of
archive_md5_ctx: libarchive assumes it's the OpenSSL type whereas the test
use the libmd type. The latter is not necessarily aligned enough to store
a pointer (16 bytes for CHERI RISC-V), so we were crashing when storing
EVP_MD_CTX* to an 8-byte-aligned archive_md5_ctx.
To avoid problems like this in the future, factor out the common compiler
flags into a Makefile.inc and include that from the tests Makefile.
if_vxlan(4): add support for hardware assisted checksumming, TSO, and RSS.
This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx
and rx), TSO, and RSS if the NIC is capable of performing these operations on
inner VXLAN traffic.
A VXLAN interface inherits the capabilities of its vxlandev interface if one is
specified or of the interface that hosts the vxlanlocal address. If other
interfaces will carry traffic for that VXLAN then they must have the same
hardware capabilities.
On transmit, if_vxlan verifies that the outbound interface has the required
capabilities and then translates the CSUM_ flags to their inner equivalents.
This tells the hardware ifnet that it needs to operate on the inner frame and
not the outer VXLAN headers.
An event is generated when a VXLAN ifnet starts. This allows hardware drivers to
configure their devices to expect VXLAN traffic on the specified incoming port.
On receive, the hardware does RSS and checksum verification on the inner frame.
if_vxlan now does a direct netisr dispatch to take full advantage of RSS. It is
not very clear why it didn't do this already.
Future work:
Rx: it should be possible to avoid the first trip up the protocol stack to get
the frame to if_vxlan just so it can decapsulate and requeue for a second trip
up the stack. The hardware NIC driver could directly call an if_vxlan receive
routine for VXLAN traffic instead.
Rx: LRO. depends on what happens with the previous item. There will have to to
be a mechanism to indicate that it's time for if_vxlan to flush its LRO state.
Add a knob to allow zero UDP checksums for UDP/IPv6 traffic on the given UDP port.
This will be used by some upcoming changes to if_vxlan(4). RFC 7348 (VXLAN)
says that the UDP checksum "SHOULD be transmitted as zero. When a packet is
received with a UDP checksum of zero, it MUST be accepted for decapsulation."
But the original IPv6 RFCs did not allow zero UDP checksum. RFC 6935 attempts
to resolve this.
mbuf checksum flags and fields to support tunneling protocols.
These are being added to support VXLAN but will work for GENEVE as well.
ENCAP_RSVD1 will likely become ENCAP_GENEVE in the future.
The size of struct mbuf does not change and that means this change can be MFC'd.
If size wasn't a constraint a cleaner way may have been to add inner_csum_flags
and inner_csum_data to go with csum_flags and csum_data.