sbin/fsck_msdosfs: Fix an integer overflow on 32-bit platforms.
The purpose of checksize() is to verify that the referenced cluster
chain size matches the recorded file size (up to 2^32 - 1) in the
directory entry. We follow the cluster chain, then multiple the
cluster count by bytes per cluster to get the physical size, then
check it against the recorded size.
When a file is close to 4 GiB (between 4GiB - cluster size and 4GiB,
both non-inclusive), the product of cluster count and bytes per
cluster would be exactly 4 GiB. On 32-bit systems, because size_t
is 32-bit, this would wrap back to 0, which will cause the file be
truncated to 0.
Fix this by using 64-bit physicalSize instead.
This fix is inspired by an Android change request at
https://android-review.googlesource.com/c/platform/external/fsck_msdos/+/1428461
Document the new powerpc64le arch's initial specifications.
Certain things are subject to change while this is experimental. The most
likely change is that long double may switch to quad, dependent on POWER8
emulation assistance for __float128 being set up in the compiler (as
POWER8 does not have IEEE-compatible 128-bit hardware float, unlike POWER9.)
There are some tests available in the NetBSD test suite, but we don't
currently pass all of those; further investigation will go into that. For
now, just add a basic test as well as a test that copies from /dev/null to a
file.
The /dev/null test confirms that the file gets created if it's empty, then
that it truncates the file if it's non-empty. This matches some usage that
was previously employed in the build and was replaced in r366042 by a
simpler shell construct.
I will also plan on coming back to expand these in due time.
Due to enter_idle_powerx fabricating a MSR from scratch, it is necessary
for it to care about the endianness, so we don't accidentally switch
endian the first time we idle a thread.
Took about five seconds to spot after seeing an unmangled backtrace.
The hard bit was needing to temporarily set up a mutex to sort out the
logjam that happens when every thread simultaneously wakes up in the wrong
endian due to the panic IPI and panics, leaving what I can best describe as
"alphabet soup" on the console.
Luckily, I already had a patch sitting around to do that.
This brings POWER8 up to equivilence with POWER9 on PPC64LE.
[PowerPC64LE] Pass our byte order to the sqlite3 build.
Due to the sqlite3 endian detection code preferring to check platform defines
instead of checking endian defines, it is necessary to manually set
the endianness on PowerPC64LE.
Unlike other bi-endian platforms, PowerPC64LE relies entirely on the
generic endianness macros like __BYTE_ORDER__ and has no platform-specific
define to denote little endian.
Add -DSQLITE_BYTEORDER=1234 to the CFLAGS when building libsqlite3 on
powerpc64le.
OPAL unconditionally enters secondary CPUs with only HV and SF set.
I tried writing a secondary entry point instead, but OPAL rejected it
and I am unsure why, so I resorted to making the system reset interrupt
endian-flexible.
This means we take a slight performance hit on wakeup on LE, but it is
a good stopgap until we can figure out a reliable way to make OPAL enter
where we want it to.
It probably makes sense to have it around anyway, because I can imagine
scenarios where the cpu resets itself to BE and does a software reset.
This is slightly stripped down from GENERIC64, as PowerMac G5 machines
are incapable of running in LE mode (so we can skip the Mac drivers.)
While technically POWER6 and POWER7 have the hardware capability of running
in LE mode, they have a tendency to trap excessively when a load/store is
misaligned. (an extremely common occurrence in LE code, and one of the main
reasons I consider BE to be superior, as it turns potential security issues
into immediately obvious mangled numbers.)
Additionally, there was no mechanism to control what endian interrupts
are delivered in, so supporting LE operation on POWER6 and POWER7 involves
some really dirty tricks in the interrupt vectors that I would rather
avoid.
IBM drew the line in the sand at POWER8 some time around 2013, embracing
full support for LE in the platform, and making a push across the board
for LE code to target POWER8 as a minimum requirement. As such, usage of
LE kernels on POWER6 and POWER7 is practically nil, despite it being
technically possible to do.
The so-called "TRUELE" feature bit which is the baseline requirement for
needed for PowerPC64LE was introduced in POWER8.
There was a small window cp was broken. Work around this by using :>
instead of cp /dev/null. Ideally, we'd keep the cp /dev/null in the
build as a regression test, but doing so breaks people that upgraded
during the cp breakage and this is simpler than bootstrapping a
working cp since there's no good __FreeBSD_version sign posts for
that.
Suggested by: lots of people
Too stubborn for his own good: imp
When running without a hypervisor, we need to set the ILE bit in the LPCR
ourselves.
For the boot processor, handle it in powernv_attach() like we do for other
LPCR bits.
No change for the APs, as they will use the lpcr global to set up their own
LPCR when they do their own cpudep_ap_early_bootstrap() and pick up this
automatically.
OPAL runs in big endian, so we need to rfid into it to switch endian
atomically when branching to it, and we need to do the
RETURN_TO_NATIVE_ENDIAN dance when it returns to us.
[PowerPC64LE] Fix endianness issues in phyp_vscsi.
Unlike virtio, which in legacy mode is guest endian, the hypervisor vscsi
interface operates in big endian, so we must convert back and forth in several
places.
[PowerPC64LE] Work around qemu TCG bug in mtmsrd emulation.
The TCG implementation of mtmsrd in qemu blindly copies the entire register
to the MSR, instead of the specific bit positions listed in the ISA.
This means that qemu will prematurely switch endian out from under the
running code instead of waiting for the rfid, causing an immediate trap
as it attempts to interpret the next instruction in the wrong endianness.
To work around this, ensure PSL_LE is still set before doing the mtmsrd.
In the future, we may wish to just turn off translation and unconditionally
use rfid to switch to the ofmsr instead of quasi-switching to the ofmsr.
Add a new platform option so this can be disabled. (And so that we can
conditonalize additional QEMU-specific hacks in the platform code.)
[PowerPC64LE] Tell the hypervisor to switch interrupts to LE at CHRP attach.
Since we will need to be able to take traps relatively early in the process,
ensure that the hypervisor changes our ILE for us as soon as we are ready.
Document the calls to send messages to userland via devctl.
devctl_notify will create a message for the specified system,
subsystem and type, optionally adding additional information.
Add O_RESOLVE_BENEATH and AT_RESOLVE_BENEATH to mimic Linux' RESOLVE_BENEATH.
It is like O_BENEATH, but disables to walk out of the subtree rooted
in the starting directory. O_BENEATH does not care if path walks out
if it returned.
Requested by: Dan Gohman <dev@sunfishcode.online>
PR: 248335
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25886
Stop abusing internal namei flag NI_LCF_STRICTRELATIVE as indicator of
cap-restricted lookup. Add designated returned flag NIRES_STRICTREL
to inform kern_openat() that lookup was restricted.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25886
Otherwise a corrupted file entry containing invalid extended attribute
lengths or allocation descriptor lengths can trigger an overflow when
the file entry is loaded.
admbug: 965
PR: 248613
Reported by: C Turt <ecturt@gmail.com>
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
cxgbe(4): let the PF driver use VM work requests for transmit.
This allows the PF interfaces to communicate with the VF interfaces over
the internal switch in the ASIC. Fix the GL limits for VM work requests
while here.
MFC after: 3 days
Sponsored by: Chelsio Communications
Remove claim that Allied Forces created "West Germany" in 1953. I can
find no historic substantiation for such a claim. The Federal
Republic of Germany was created by Germans on 23 May 1949, as also
noted in this file.
stand/reloc_elf: Handle relative relocations for arm{,64} and riscv
Extend the powerpc relative relocation handling from r240782 to a
handful of other architectures. This is needed to properly read
dependency information from kernel modules.
vm_reserv: Sparsify the vm_reserv_array when VM_PHYSSEG_SPARSE
On an Ampere Altra system, the physical memory is populated
sparsely within the physical address space, with only about 0.4%
of physical addresses backed by RAM in the range [0, last_pa].
This is causing the vm_reserv_array to be over-sized by a few
orders of magnitude, wasting roughly 5 GiB on a system with
256 GiB of RAM.
The sparse allocation of vm_reserv_array is controlled by defining
VM_PHYSSEG_SPARSE, with the dense allocation still remaining for
platforms with VM_PHYSSEG_DENSE.
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).
Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.
Move vm_page_dump bitset array definition to MI code
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.
Weaken assertions in pmap_l1_to_l2() and pmap_l2_to_l3().
pmap_update_entry() will temporarily clear the valid bit of page table
entries in order to satisfy the arm64 pmap's break-before-make
constraint. pmap_kextract() may operate concurrently on kernel page
table pages, introducing windows where the assertions added in r365879
may fail incorrectly since they implicitly assert that the valid bit is
set. Modify the assertions to handle this.
An upcoming patch to use the bitset macros for tracking vm page
dump information could conceivably need more than INT_MAX bits.
Expand the bit type to long so that the extra range is available
on 64-bit platforms where it would most likely be needed.
CPUSET_COUNT and DOMAINSET_COUNT are also modified to remain of
type `int`.
Rework part of routing code to reduce difference to D26449.
* Split rt_setmetrics into get_info_weight() and rt_set_expire_info(),
as these two can be applied at different entities and at different times.
* Start filling route weight in route change notifications
* Pass flowid to UDP/raw IP route lookups
* Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact
that rtentry can contain multiple nexthops.
When matching a regex with ^, it would attempt to access
gototab[NSTATES][NCHARS+2], and therefore access the state for the \002
character instead. This change is required to run awk under CHERI (with
sub-object bounds) and when running with UBSan instrumentation.
* signed/unsigned comparisons
* use standard warn(3)
* Suppress warnings about local vars and funcs not declared static
* const-correctness
* declaration shadows a variable in the global scope
Some userspace code include sys/kernel.h. Namely, some OpenZFS tests do
this, and it was causing breakage after r365945 due to a lack of bool
typedef. Userspace should not need the TUNABLE_** stuff, so hide it
behind an #ifdef _KERNEL.
Sorry for the breakage.
Reported by: andrew, Michael Butler, Jenkins
Discussed with: kevans, allanjude
If Unwind_Backtrace is broken, ctx.n will still contain ~0, and we will
return that which poor behavior for the user, so return 0 instead.
We could document ~0 to be an error, but that would deviate from the
Linux behavior which is not desirable. Noted by Poul-Henning Kamp
When building on Ubuntu bootstrap bmake with bash as the default shell
The Ubuntu /bin/sh (dash) removes all environment variables that contain
characters outside the [a-zA-Z0-9_] range and this breaks the bmake tests that
run as part of bootstrapping bmake.
This can be reverted when the bmake tests have been updated.
Add a tools/build/make.py script that bootstraps bmake and then runs the build
This makes it possible to compile on non-FreeBSD systems since make will
usually be GNU make there. Even if they include bmake, it will often
either be a broken version or too old to build FreeBSD.
This should be the last commit needed to compile FreeBSD on Linux+macOS.
After over two years, I've finally managed to upstream all our local CheriBSD
changes to allow building on Linux (and as a result of being reviewed by more
people they are slightly less ugly than they were before).
It should now be possible to run the following to build on Linux+macOS if you
have LLVM/Clang 10 or newer installed:
MAKEOBJDIRPREFIX=/somewhere ./tools/build/make.py TARGET=amd64 TARGET_ARCH=amd64 buildworld
I have only tested macOS 15, Ubuntu 18.04 and openSUSE Leap, but other Linux
distributions might also work (as long as they ship a recent GLibc and compiler).
Reviewed By: emaste (should be fine to commit to tools/)
Differential Revision: https://reviews.freebsd.org/D16767
dab [Mon, 21 Sep 2020 15:45:49 +0000 (15:45 +0000)]
Honor the FWUG value of some drives in nvmecontrol
nvmecontrol tries to upload firmware in chunks as large as it thinks
the device permits. It fails to take into account the FWUG value used
by some drives to advertise the size and alignment limits for firmware
chunks.
- Use the firwmare update granularity value from the
- If the granularity is not reported or not restricted, fall back to
the previously existing logic that calculates the max transfer
size based on MDTS.
- Add firmware update granularity to the identify-controller output.
This adds the getenv_bool() function, to parse a boolean value from a
kernel environment variable or tunable. This works for traditional
boolean values like "0" and "1", and also "true" and "false"
(case-insensitive). These semantics do not yet apply to sysctls declared
using SYSCTL_BOOL with CTLFLAG_TUN (they still only parse 1 and 0).
Also added are two wrapper functions, getenv_is_true() and
getenv_is_false(). These are slightly simpler for callers wishing to
perform a single check of a configuration variable.
Reviewed by: jhb (slightly earlier version)
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26270
OTG mode is not supported still. It's easy to do it as a one-off
detection, but the proper support requires continuous monitoring and
communicating the current state to the USB layer.
Also, fix phy0_route setting for H3. Remove duplicate register
definitions.
Tested on Orange Pi PC Plus with dr_mode="peripheral" using
hw.usb.template=3
umodem_load="YES"
Reviewed by: manu
MFC after: 5 weeks
Differential Revision: https://reviews.freebsd.org/D26348
This absolute include causes a build failure on Linux for me:
.../cheri/freebsd/contrib/nvi/cl/../common/common.h:10:10: fatal error: '/usr/include/db.h' file not found
This change patches the file to use #include <db.h> instead until a
solution has been found upstream. See also https://github.com/lichray/nvi2/issues/69
Otherwise we get lots of warnings when building on Linux/macOS during
installworld:
Scanning /local/scratch/alr48/cheri/output/freebsd-x86/usr/share/certs/blacklisted for certificates...
install: invalid option -- 'U'
Try 'install --help' for more information.
install: invalid option -- 'U'
....
bootonce feature is temporary, one time boot, activated by
"bectl activate -t BE", "bectl activate -T BE" will reset the bootonce flag.
By default, the bootonce setting is reset on attempt to boot and the next
boot will use previously active BE.
By setting zfs_bootonce_activate="YES" in rc.conf, the bootonce BE will
be set permanently active.
bootonce dataset name is recorded in boot pool labels, bootenv area.
in case of nextboot, the nextboot_enable boolean variable is recorded in
freebsd:nvstore nvlist, also stored in boot pool label bootenv area.
On boot, the loader will process /boot/nextboot.conf if nextboot_enable
is "YES", and will set nextboot_enable to "NO", preventing /boot/nextboot.conf
processing on next boot.
bootonce and nextboot features are usable in both UEFI and BIOS boot.
To use bootonce/nextboot features, the boot loader needs to be updated on disk;
if loader.efi is stored on ESP, then ESP needs to be updated and
for BIOS boot, stage2 (zfsboot or gptzfsboot) needs to be updated
(gpart or other tools).
At this time, only lua loader is updated.
Sponsored by: Netflix, Klara Inc.
Differential Revision: https://reviews.freebsd.org/D25512
atomic_common.h: Fix the volatile qualifier placement in atomic_load_ptr
This was broken in r357940 which introduced the __typeof use. We need
the volatile qualifier to be on the pointee not the pointer otherwise it
does nothing. This was found by mhorne in D26498, noticing there was a
problem (a spin loop condition was hoisted for RISC-V boot code) but not
the root cause of it.
Fix gw updates / flag updates during route changes.
* Zero gw_sdl if switching to interface route - the assumption
that underlying storage is zeroed is incorrect with route changes.
* Apply proper flag mask to rte.