This is slightly stripped down from GENERIC64, as PowerMac G5 machines
are incapable of running in LE mode (so we can skip the Mac drivers.)
While technically POWER6 and POWER7 have the hardware capability of running
in LE mode, they have a tendency to trap excessively when a load/store is
misaligned. (an extremely common occurrence in LE code, and one of the main
reasons I consider BE to be superior, as it turns potential security issues
into immediately obvious mangled numbers.)
Additionally, there was no mechanism to control what endian interrupts
are delivered in, so supporting LE operation on POWER6 and POWER7 involves
some really dirty tricks in the interrupt vectors that I would rather
avoid.
IBM drew the line in the sand at POWER8 some time around 2013, embracing
full support for LE in the platform, and making a push across the board
for LE code to target POWER8 as a minimum requirement. As such, usage of
LE kernels on POWER6 and POWER7 is practically nil, despite it being
technically possible to do.
The so-called "TRUELE" feature bit which is the baseline requirement for
needed for PowerPC64LE was introduced in POWER8.
There was a small window cp was broken. Work around this by using :>
instead of cp /dev/null. Ideally, we'd keep the cp /dev/null in the
build as a regression test, but doing so breaks people that upgraded
during the cp breakage and this is simpler than bootstrapping a
working cp since there's no good __FreeBSD_version sign posts for
that.
Suggested by: lots of people
Too stubborn for his own good: imp
When running without a hypervisor, we need to set the ILE bit in the LPCR
ourselves.
For the boot processor, handle it in powernv_attach() like we do for other
LPCR bits.
No change for the APs, as they will use the lpcr global to set up their own
LPCR when they do their own cpudep_ap_early_bootstrap() and pick up this
automatically.
OPAL runs in big endian, so we need to rfid into it to switch endian
atomically when branching to it, and we need to do the
RETURN_TO_NATIVE_ENDIAN dance when it returns to us.
[PowerPC64LE] Fix endianness issues in phyp_vscsi.
Unlike virtio, which in legacy mode is guest endian, the hypervisor vscsi
interface operates in big endian, so we must convert back and forth in several
places.
[PowerPC64LE] Work around qemu TCG bug in mtmsrd emulation.
The TCG implementation of mtmsrd in qemu blindly copies the entire register
to the MSR, instead of the specific bit positions listed in the ISA.
This means that qemu will prematurely switch endian out from under the
running code instead of waiting for the rfid, causing an immediate trap
as it attempts to interpret the next instruction in the wrong endianness.
To work around this, ensure PSL_LE is still set before doing the mtmsrd.
In the future, we may wish to just turn off translation and unconditionally
use rfid to switch to the ofmsr instead of quasi-switching to the ofmsr.
Add a new platform option so this can be disabled. (And so that we can
conditonalize additional QEMU-specific hacks in the platform code.)
[PowerPC64LE] Tell the hypervisor to switch interrupts to LE at CHRP attach.
Since we will need to be able to take traps relatively early in the process,
ensure that the hypervisor changes our ILE for us as soon as we are ready.
Document the calls to send messages to userland via devctl.
devctl_notify will create a message for the specified system,
subsystem and type, optionally adding additional information.
Add O_RESOLVE_BENEATH and AT_RESOLVE_BENEATH to mimic Linux' RESOLVE_BENEATH.
It is like O_BENEATH, but disables to walk out of the subtree rooted
in the starting directory. O_BENEATH does not care if path walks out
if it returned.
Requested by: Dan Gohman <dev@sunfishcode.online>
PR: 248335
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25886
Stop abusing internal namei flag NI_LCF_STRICTRELATIVE as indicator of
cap-restricted lookup. Add designated returned flag NIRES_STRICTREL
to inform kern_openat() that lookup was restricted.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25886
Otherwise a corrupted file entry containing invalid extended attribute
lengths or allocation descriptor lengths can trigger an overflow when
the file entry is loaded.
admbug: 965
PR: 248613
Reported by: C Turt <ecturt@gmail.com>
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
cxgbe(4): let the PF driver use VM work requests for transmit.
This allows the PF interfaces to communicate with the VF interfaces over
the internal switch in the ASIC. Fix the GL limits for VM work requests
while here.
MFC after: 3 days
Sponsored by: Chelsio Communications
Remove claim that Allied Forces created "West Germany" in 1953. I can
find no historic substantiation for such a claim. The Federal
Republic of Germany was created by Germans on 23 May 1949, as also
noted in this file.
stand/reloc_elf: Handle relative relocations for arm{,64} and riscv
Extend the powerpc relative relocation handling from r240782 to a
handful of other architectures. This is needed to properly read
dependency information from kernel modules.
vm_reserv: Sparsify the vm_reserv_array when VM_PHYSSEG_SPARSE
On an Ampere Altra system, the physical memory is populated
sparsely within the physical address space, with only about 0.4%
of physical addresses backed by RAM in the range [0, last_pa].
This is causing the vm_reserv_array to be over-sized by a few
orders of magnitude, wasting roughly 5 GiB on a system with
256 GiB of RAM.
The sparse allocation of vm_reserv_array is controlled by defining
VM_PHYSSEG_SPARSE, with the dense allocation still remaining for
platforms with VM_PHYSSEG_DENSE.
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).
Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.
Move vm_page_dump bitset array definition to MI code
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.
Weaken assertions in pmap_l1_to_l2() and pmap_l2_to_l3().
pmap_update_entry() will temporarily clear the valid bit of page table
entries in order to satisfy the arm64 pmap's break-before-make
constraint. pmap_kextract() may operate concurrently on kernel page
table pages, introducing windows where the assertions added in r365879
may fail incorrectly since they implicitly assert that the valid bit is
set. Modify the assertions to handle this.
An upcoming patch to use the bitset macros for tracking vm page
dump information could conceivably need more than INT_MAX bits.
Expand the bit type to long so that the extra range is available
on 64-bit platforms where it would most likely be needed.
CPUSET_COUNT and DOMAINSET_COUNT are also modified to remain of
type `int`.
Rework part of routing code to reduce difference to D26449.
* Split rt_setmetrics into get_info_weight() and rt_set_expire_info(),
as these two can be applied at different entities and at different times.
* Start filling route weight in route change notifications
* Pass flowid to UDP/raw IP route lookups
* Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact
that rtentry can contain multiple nexthops.
When matching a regex with ^, it would attempt to access
gototab[NSTATES][NCHARS+2], and therefore access the state for the \002
character instead. This change is required to run awk under CHERI (with
sub-object bounds) and when running with UBSan instrumentation.
* signed/unsigned comparisons
* use standard warn(3)
* Suppress warnings about local vars and funcs not declared static
* const-correctness
* declaration shadows a variable in the global scope
Some userspace code include sys/kernel.h. Namely, some OpenZFS tests do
this, and it was causing breakage after r365945 due to a lack of bool
typedef. Userspace should not need the TUNABLE_** stuff, so hide it
behind an #ifdef _KERNEL.
Sorry for the breakage.
Reported by: andrew, Michael Butler, Jenkins
Discussed with: kevans, allanjude
If Unwind_Backtrace is broken, ctx.n will still contain ~0, and we will
return that which poor behavior for the user, so return 0 instead.
We could document ~0 to be an error, but that would deviate from the
Linux behavior which is not desirable. Noted by Poul-Henning Kamp
When building on Ubuntu bootstrap bmake with bash as the default shell
The Ubuntu /bin/sh (dash) removes all environment variables that contain
characters outside the [a-zA-Z0-9_] range and this breaks the bmake tests that
run as part of bootstrapping bmake.
This can be reverted when the bmake tests have been updated.
Add a tools/build/make.py script that bootstraps bmake and then runs the build
This makes it possible to compile on non-FreeBSD systems since make will
usually be GNU make there. Even if they include bmake, it will often
either be a broken version or too old to build FreeBSD.
This should be the last commit needed to compile FreeBSD on Linux+macOS.
After over two years, I've finally managed to upstream all our local CheriBSD
changes to allow building on Linux (and as a result of being reviewed by more
people they are slightly less ugly than they were before).
It should now be possible to run the following to build on Linux+macOS if you
have LLVM/Clang 10 or newer installed:
MAKEOBJDIRPREFIX=/somewhere ./tools/build/make.py TARGET=amd64 TARGET_ARCH=amd64 buildworld
I have only tested macOS 15, Ubuntu 18.04 and openSUSE Leap, but other Linux
distributions might also work (as long as they ship a recent GLibc and compiler).
Reviewed By: emaste (should be fine to commit to tools/)
Differential Revision: https://reviews.freebsd.org/D16767
dab [Mon, 21 Sep 2020 15:45:49 +0000 (15:45 +0000)]
Honor the FWUG value of some drives in nvmecontrol
nvmecontrol tries to upload firmware in chunks as large as it thinks
the device permits. It fails to take into account the FWUG value used
by some drives to advertise the size and alignment limits for firmware
chunks.
- Use the firwmare update granularity value from the
- If the granularity is not reported or not restricted, fall back to
the previously existing logic that calculates the max transfer
size based on MDTS.
- Add firmware update granularity to the identify-controller output.
This adds the getenv_bool() function, to parse a boolean value from a
kernel environment variable or tunable. This works for traditional
boolean values like "0" and "1", and also "true" and "false"
(case-insensitive). These semantics do not yet apply to sysctls declared
using SYSCTL_BOOL with CTLFLAG_TUN (they still only parse 1 and 0).
Also added are two wrapper functions, getenv_is_true() and
getenv_is_false(). These are slightly simpler for callers wishing to
perform a single check of a configuration variable.
Reviewed by: jhb (slightly earlier version)
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26270
OTG mode is not supported still. It's easy to do it as a one-off
detection, but the proper support requires continuous monitoring and
communicating the current state to the USB layer.
Also, fix phy0_route setting for H3. Remove duplicate register
definitions.
Tested on Orange Pi PC Plus with dr_mode="peripheral" using
hw.usb.template=3
umodem_load="YES"
Reviewed by: manu
MFC after: 5 weeks
Differential Revision: https://reviews.freebsd.org/D26348
This absolute include causes a build failure on Linux for me:
.../cheri/freebsd/contrib/nvi/cl/../common/common.h:10:10: fatal error: '/usr/include/db.h' file not found
This change patches the file to use #include <db.h> instead until a
solution has been found upstream. See also https://github.com/lichray/nvi2/issues/69
Otherwise we get lots of warnings when building on Linux/macOS during
installworld:
Scanning /local/scratch/alr48/cheri/output/freebsd-x86/usr/share/certs/blacklisted for certificates...
install: invalid option -- 'U'
Try 'install --help' for more information.
install: invalid option -- 'U'
....
bootonce feature is temporary, one time boot, activated by
"bectl activate -t BE", "bectl activate -T BE" will reset the bootonce flag.
By default, the bootonce setting is reset on attempt to boot and the next
boot will use previously active BE.
By setting zfs_bootonce_activate="YES" in rc.conf, the bootonce BE will
be set permanently active.
bootonce dataset name is recorded in boot pool labels, bootenv area.
in case of nextboot, the nextboot_enable boolean variable is recorded in
freebsd:nvstore nvlist, also stored in boot pool label bootenv area.
On boot, the loader will process /boot/nextboot.conf if nextboot_enable
is "YES", and will set nextboot_enable to "NO", preventing /boot/nextboot.conf
processing on next boot.
bootonce and nextboot features are usable in both UEFI and BIOS boot.
To use bootonce/nextboot features, the boot loader needs to be updated on disk;
if loader.efi is stored on ESP, then ESP needs to be updated and
for BIOS boot, stage2 (zfsboot or gptzfsboot) needs to be updated
(gpart or other tools).
At this time, only lua loader is updated.
Sponsored by: Netflix, Klara Inc.
Differential Revision: https://reviews.freebsd.org/D25512
atomic_common.h: Fix the volatile qualifier placement in atomic_load_ptr
This was broken in r357940 which introduced the __typeof use. We need
the volatile qualifier to be on the pointee not the pointer otherwise it
does nothing. This was found by mhorne in D26498, noticing there was a
problem (a spin loop condition was hoisted for RISC-V boot code) but not
the root cause of it.
Fix gw updates / flag updates during route changes.
* Zero gw_sdl if switching to interface route - the assumption
that underlying storage is zeroed is incorrect with route changes.
* Apply proper flag mask to rte.
Update the libufs cgget() and cgput() interfaces to have a similar
API to the sbget() and sbput() interfaces. Specifically they take
a file descriptor pointer rather than the struct uufsd *disk pointer
used by the libufs cgread() and cgwrite() interfaces. Update fsck_ffs
to use these revised interfaces.
The fsdb(8) utility uses the fsck_ffs(8) disk I/O interfaces, so
switch from using libufs's bread() to using fsck_ffs's getdatablk()
when importing tools/diag/prtblnos's prtblknos().
fix integer underflow in getgrnam_r and getpwnam_r
Sometimes nscd(8) will return a 1-byte buffer for a nonexistent entry. This
triggered an integer underflow in grp_unmarshal_func, causing getgrnam_r to
return ERANGE instead of 0.
Fix the user's buffer size check, and add a correct check for a too-small
nscd buffer.
Fix some nits in 1G page support in the amd64 pmap.
- Move assertions out of the main loop to avoid duplicate conditional
expressions, and improve assertion messages.
- Fix va_next updates. In some cases we were not doing the wraparound
check before continuing the loop.
- Use the right va_next. In pmap_advise() and pmap_copy() we would step
through 1G pages 2M at a time.
- Copy 1G mappings in pmap_copy().
Fix dtrace tools bootstrap on non-FreeBSD after OpenZFS import
This required surprisingly few build system changes and only two changes to the
openZFS compat headers which have been upstreamed as
https://github.com/openzfs/zfs/pull/10863
Implement workaround for broken access to configuration space.
Due to a HW bug in the RockChip PCIe implementation, attempting to access
a non-existent register in the configuration space will throw an exception.
Use new bus functions bus_peek() and bus_poke() to overcomme this limitation.
Add NetBSD compatible bus_space_peek_N() and bus_space_poke_N() functions.
One problem with the bus_space_read_N() and bus_space_write_N() family of
functions is that they provide no protection against exceptions which can
occur when no physical hardware or device responds to the read or write
cycles. In such a situation, the system typically would panic due to a
kernel-mode bus error. The bus_space_peek_N() and bus_space_poke_N() family
of functions provide a mechanism to handle these exceptions gracefully
without the risk of crashing the system.
Typical example is access to PCI(e) configuration space in bus enumeration
function on badly implemented PCI(e) root complexes (RK3399 or Neoverse
N1 N1SDP and/or access to PCI(e) register when device is in deep sleep state.
This commit adds a real implementation for arm64 only. The remaining
architectures have bus_space_peek()/bus_space_poke() emulated by using
bus_space_read()/bus_space_write() (without exception handling).
Move finalize_components_config from get_params to cmd_*.
This allows us to redirect its output in cmd_cron, so that the
"src component not installed, skipped" message will be treated
the same way as other output from freebsd-update cron: Sent
in an email to root (or other address specified) if there are
updates to install, and silenced otherwise.
Fix a LOR between the NFS server and server side krpc.
Recent testing of the NFS-over-TLS code found a LOR between the mutex lock
used for sessions and the sleep lock used for server side krpc socket
structures in nfsrv_checksequence(). This was fixed by r365789.
A similar bug exists in nfsrv_bindconnsess(), where SVC_RELEASE() is called
while mutexes are held.
This patch applies a fix similar to r365789, moving the SVC_RELEASE() call
down to after the mutexes are released.
This patch fixes the problem by moving the SVC_RELEASE() call in
nfsrv_checksequence() down a few lines to below where the mutex is released.