John Baldwin [Fri, 25 Sep 2020 21:19:56 +0000 (21:19 +0000)]
Revert most of r360179.
I had failed to notice that sgsendccb() was using cam_periph_mapmem()
and thus was not passing down user pointers directly to drivers. In
practice this broke requests submitted from userland.
Warner Losh [Fri, 25 Sep 2020 20:51:07 +0000 (20:51 +0000)]
Adjustments to includes for openzfs in _STANDALONE
Allow the necessary parts of systm.h to be visible in the _STANDALONE
environnment. Limit the reset to only being visible for _KERNEL
builds. Map KASSERT, etc to printf on failure in the bootloader until
we have more confidence things won't break and leave systems
unbootable. Eventually, this should map to a full panic in the
bootloader, but that also needs some enhancement to be more useful.
Original patch was against FreeBSD 12, and a test compile wasn't run against
head. md_tls_tcb_offset field was moved from mdthread to mdproc in the
meantime.
MFC after: 1 week
Sponsored by: Juniper Networks, Inc.
Warner Losh [Fri, 25 Sep 2020 19:02:49 +0000 (19:02 +0000)]
Dont let kernel and standalone both be defined at the same time
_KERNEL and _STANDALONE are different things. They cannot both be true
at the same time. If things that are normally visible only to _KERNEL
are needed for the _STANDALONE environment, you need to also make them
visible to _STANDALONE. Often times, this will be just a subset of the
required things for _KERNEL (eg global variables are but one example).
sys/cdefs.h is included by pretty much everything in both the loader
and the kernel, so is the ideal choke point.
Mark Johnston [Fri, 25 Sep 2020 18:55:50 +0000 (18:55 +0000)]
ng_l2tp: Fix callout synchronization in the rexmit timeout handler
A received control packet may cause the transmit queue to be flushed, in
which case ng_l2tp_seq_recv_nr() cancels the transmit timeout handler.
The handler checks to see if it was cancelled before doing anything, but
did so before acquiring the node lock, so a small race window could
cause ng_l2tp_seq_rack_timeout() to attempt to flush an empty queue,
ultimately causing a null pointer dereference.
Summary:
Two bugs:
* Elf32_Auxinfo is broken, using pointers in the union, which are 64-bits not
32.
* freebsd32_sysarch() doesn't update the 'user local' register when handling
MIPS_SET_TLS, leading to a NULL pointer dereference in the 32-bit
application.
Michal Meloun [Fri, 25 Sep 2020 16:44:01 +0000 (16:44 +0000)]
Refine locking inside of syscon driver.
In some cases, the syscon driver may be used by consumer requiring better
control about locking (ie. it may be used as registe file provider for clock
driver which needs locked access to multiple registers).
Add fine locking protocol methods together with bunch of helper functions
in syscon driver and implement this functionality in syscon_generic driver.
Michal Meloun [Fri, 25 Sep 2020 13:52:31 +0000 (13:52 +0000)]
Correctly handle nodes compatible with "syscon", "simple-bus".
Syscon can also have child nodes that share a registration file with it.
To do this correctly, follow these steps:
- subclass syscon from simplebus and expose it if the node is also
"simple-bus" compatible.
- block simplebus probe for this compatible string, so it's priority
(bus pass) doesn't colide with syscon driver.
While I'm in, also block "syscon", "simple-mfd" for the same reason.
TCP: send full initial window when timestamps are in use
The fastpath in tcp_output tries to send out
full segments, and avoid sending partial segments by
comparing against the static t_maxseg variable.
That value does not consider tcp options like timestamps,
while the initial window calculation is using
the correct dynamic tcp_maxseg() function.
Due to this interaction, the last, full size segment
is considered too short and not sent out immediately.
Adjust ssthresh in after_idle to the maximum of
the prior ssthresh, or 3/4 of the prior cwnd. See
RFC2861 section 2 for an in depth explanation for
the rationale around this.
As newreno is the default "fall-through" reaction,
most tcp variants will benefit from this.
vi(1): Add URL to the vi/ex reference manual in the man page
Reported in PR 241985
The manual page references the "vi/ex reference manual" but there is no
information about where to find that document. Add a reference to the manual in
the SEE ALSO section since the Project hosts a copy of it[1].
Change sent upstream[2]
If D26158 gets reviewed and committed, we could close that PR.
pwm(8): fix potential duty overflow, use unsigneds for period and duty
For a long period value and the duty specified as a percentage,
there could be an overflow.
Using unsigned integers aligns the code with struct pwm_state and allows
to safely use periods up to 4 seconds where supported by drivers.
Alan Somers [Thu, 24 Sep 2020 16:27:53 +0000 (16:27 +0000)]
fusefs: fix mmap'd writes in direct_io mode
If a FUSE server returns FOPEN_DIRECT_IO in response to FUSE_OPEN, that
instructs the kernel to bypass the page cache for that file. This feature
is also known by libfuse's name: "direct_io".
However, when accessing a file via mmap, there is no possible way to bypass
the cache completely. This change fixes a deadlock that would happen when
an mmap'd write tried to invalidate a portion of the cache, wrongly assuming
that a write couldn't possibly come from cache if direct_io were set.
Arguably, we could instead disable mmap for files with FOPEN_DIRECT_IO set.
But allowing it is less likely to cause user complaints, and is more in
keeping with the spirit of open(2), where O_DIRECT instructs the kernel to
"reduce", not "eliminate" cache effects.
PR: 247276
Reported by: trapexit@spawn.link
Reviewed by: cem
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D26485
Provide MS() and SM() macros for 80211 and wireless drivers.
We have (two versions) of MS() and SM() macros which we use throughout
the wireless code. Change all but three places (ath_hal, rtwn, and rsu)
to the newly provided _IEEE80211_MASKSHIFT() and _IEEE80211_SHIFTMASK()
macros. Also change one internal case using both _S and _M instead of
just _S away from _M (one of the reasons rtwn and rsu were not changed).
This was done semi-mechanically. No functional changes intended.
Andrew Turner [Thu, 24 Sep 2020 07:17:05 +0000 (07:17 +0000)]
Bounce in more cases in the arm64 busdma
We need to use a bounce buffer when the memory we are operating on is not
aligned to a cacheline, and not aligned to the maps alignment.
The former is to stop other threads from dirtying the cacheline while we
are performing DMA operations with it. The latter is to check memory
passed in by a driver is correctly aligned for the device.
Reviewed by: mmel
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D26496
Warner Losh [Thu, 24 Sep 2020 07:10:34 +0000 (07:10 +0000)]
Don't define _STANDALONE when building kernel modules.
_STANDALONE is only for the bootloader, not kernel modules. Remove it
from the build. This was harmless before, but sys/malloc.h now does
different things for the standalone environment, triggering the issue.
Warner Losh [Thu, 24 Sep 2020 06:40:35 +0000 (06:40 +0000)]
Create a standalone version of sys/malloc.h
The ZSTD support for the boot loader will need to include files that
use the kernel's malloc interface. Create a standalone stub version
that's functional enough to allow this to work. There's some
limitations in this interface, and it's not quite a perfect
match. Specifically, M_WAITOK allocations can fail because there's
nothing that can be done we no memory is available.
It is only ever called for negative entries and for those it is
just a wrapper around cache_zap_negative_locked_vnode_kl which
always succeeds.
This also fixes a bug where cache_lookup_fallback should have been
calling cache_zap_locked_bucket instead. Note that in order to trigger
the bug NOCACHE must not be set, which currently only happens when
creating a new coredump (and then the coredump-to-be has to have a
negative entry).
dd a new option (-H) to daemon(8) to catch SIGHUP and re-open output_file file when
received.
The default system log rotation mechanism (newsyslog(8)) requires ability to send
signal to a daemon in order to properly complete rotation of the logs in an "atomic"
manner without having to making a copy and truncating original file. Unfortunately
our built-in mechanism to convert "dumb" programs into daemons has no way to handle
this rotation properly. This change adds this ability, to be enabled by supplying -H
option in addition to the -o option.
memfd_create is implemented on top of posixshm, so this is a logically
correct place for them to be. Moreover, this reduces the number of places to
look to run tests when working in this part of the tree.
Warner Losh [Wed, 23 Sep 2020 19:18:53 +0000 (19:18 +0000)]
Use envvar rather than nonstandard hint. lines
The NOTES files have a bunch of hint lines that are removed when
generating LINT. However, we can achieve the same effect by prepending
each of the lines with 'envvar' so the NOTES files become standard
config(8) files. No functional changes as the sed script to generate
the LINT files filters these either way.
Alex Richardson [Wed, 23 Sep 2020 12:54:37 +0000 (12:54 +0000)]
Add github CI for testing cross-building from Linux and macOS
This builds the kernel-toolchain target and an amd64 GENERIC kernel on
Ubuntu 18.04, 20.04 and the latest macOS to ensure that new changes
don't regress building on non-FreeBSD hosts.
Toomas Soome [Wed, 23 Sep 2020 08:22:14 +0000 (08:22 +0000)]
loader: zfs_probe_dev should pick first matching zfs pool
During devswitch probe, we pick boot pool based on boot disk, if the boot
disk happens to have multiple pools in freebsd-zfs partitions, the current
code does pick last pool from boot disk as boot pool. While there is no
way at that stage to test, the more logical approach would be to pick
first matching pool.
This patch is assuming we do pass pool guid pointer with guid value 0,
this will help us to determine, if the guid value is already set or not.
The general suggestion would be not to share disk between different pools.
Xin LI [Wed, 23 Sep 2020 06:52:22 +0000 (06:52 +0000)]
sbin/fsck_msdosfs: Fix an integer overflow on 32-bit platforms.
The purpose of checksize() is to verify that the referenced cluster
chain size matches the recorded file size (up to 2^32 - 1) in the
directory entry. We follow the cluster chain, then multiple the
cluster count by bytes per cluster to get the physical size, then
check it against the recorded size.
When a file is close to 4 GiB (between 4GiB - cluster size and 4GiB,
both non-inclusive), the product of cluster count and bytes per
cluster would be exactly 4 GiB. On 32-bit systems, because size_t
is 32-bit, this would wrap back to 0, which will cause the file be
truncated to 0.
Fix this by using 64-bit physicalSize instead.
This fix is inspired by an Android change request at
https://android-review.googlesource.com/c/platform/external/fsck_msdos/+/1428461
Document the new powerpc64le arch's initial specifications.
Certain things are subject to change while this is experimental. The most
likely change is that long double may switch to quad, dependent on POWER8
emulation assistance for __float128 being set up in the compiler (as
POWER8 does not have IEEE-compatible 128-bit hardware float, unlike POWER9.)
There are some tests available in the NetBSD test suite, but we don't
currently pass all of those; further investigation will go into that. For
now, just add a basic test as well as a test that copies from /dev/null to a
file.
The /dev/null test confirms that the file gets created if it's empty, then
that it truncates the file if it's non-empty. This matches some usage that
was previously employed in the build and was replaced in r366042 by a
simpler shell construct.
I will also plan on coming back to expand these in due time.
Due to enter_idle_powerx fabricating a MSR from scratch, it is necessary
for it to care about the endianness, so we don't accidentally switch
endian the first time we idle a thread.
Took about five seconds to spot after seeing an unmangled backtrace.
The hard bit was needing to temporarily set up a mutex to sort out the
logjam that happens when every thread simultaneously wakes up in the wrong
endian due to the panic IPI and panics, leaving what I can best describe as
"alphabet soup" on the console.
Luckily, I already had a patch sitting around to do that.
This brings POWER8 up to equivilence with POWER9 on PPC64LE.
[PowerPC64LE] Pass our byte order to the sqlite3 build.
Due to the sqlite3 endian detection code preferring to check platform defines
instead of checking endian defines, it is necessary to manually set
the endianness on PowerPC64LE.
Unlike other bi-endian platforms, PowerPC64LE relies entirely on the
generic endianness macros like __BYTE_ORDER__ and has no platform-specific
define to denote little endian.
Add -DSQLITE_BYTEORDER=1234 to the CFLAGS when building libsqlite3 on
powerpc64le.
OPAL unconditionally enters secondary CPUs with only HV and SF set.
I tried writing a secondary entry point instead, but OPAL rejected it
and I am unsure why, so I resorted to making the system reset interrupt
endian-flexible.
This means we take a slight performance hit on wakeup on LE, but it is
a good stopgap until we can figure out a reliable way to make OPAL enter
where we want it to.
It probably makes sense to have it around anyway, because I can imagine
scenarios where the cpu resets itself to BE and does a software reset.
This is slightly stripped down from GENERIC64, as PowerMac G5 machines
are incapable of running in LE mode (so we can skip the Mac drivers.)
While technically POWER6 and POWER7 have the hardware capability of running
in LE mode, they have a tendency to trap excessively when a load/store is
misaligned. (an extremely common occurrence in LE code, and one of the main
reasons I consider BE to be superior, as it turns potential security issues
into immediately obvious mangled numbers.)
Additionally, there was no mechanism to control what endian interrupts
are delivered in, so supporting LE operation on POWER6 and POWER7 involves
some really dirty tricks in the interrupt vectors that I would rather
avoid.
IBM drew the line in the sand at POWER8 some time around 2013, embracing
full support for LE in the platform, and making a push across the board
for LE code to target POWER8 as a minimum requirement. As such, usage of
LE kernels on POWER6 and POWER7 is practically nil, despite it being
technically possible to do.
The so-called "TRUELE" feature bit which is the baseline requirement for
needed for PowerPC64LE was introduced in POWER8.
Warner Losh [Wed, 23 Sep 2020 01:04:25 +0000 (01:04 +0000)]
Work around cp breakage in current from last week
There was a small window cp was broken. Work around this by using :>
instead of cp /dev/null. Ideally, we'd keep the cp /dev/null in the
build as a regression test, but doing so breaks people that upgraded
during the cp breakage and this is simpler than bootstrapping a
working cp since there's no good __FreeBSD_version sign posts for
that.
Suggested by: lots of people
Too stubborn for his own good: imp
When running without a hypervisor, we need to set the ILE bit in the LPCR
ourselves.
For the boot processor, handle it in powernv_attach() like we do for other
LPCR bits.
No change for the APs, as they will use the lpcr global to set up their own
LPCR when they do their own cpudep_ap_early_bootstrap() and pick up this
automatically.
OPAL runs in big endian, so we need to rfid into it to switch endian
atomically when branching to it, and we need to do the
RETURN_TO_NATIVE_ENDIAN dance when it returns to us.