Warner Losh [Sat, 31 Jul 2021 17:57:21 +0000 (11:57 -0600)]
awk: merge fixes to metamode
This is a partial MFC of c63c5ab001106/r349062. The whole thing doesn't
apply cleanly, but this bit, at least, is needed to fix metamode on
stable/12 after the changes to awk were merged from head. Rather than
risk breaking other things, I'm just merging the bit I know that's
needed. All build tools need to be in DEPENDOBJ so the dependency order
is correct and they get built with host tools.
smartpqi: Maintenance commit of Microchip smartpqi
Newly added features and bug fixes in latest Microchip SmartPQI driver.
1) Newly added TMF feature.
2) Added newly Huawei & Inspur PCI ID's
3) Fixed smartpqi driver hangs in Z-Pool while running on FreeBSD12.1
4) Fixed flooding dmesg in kernel while the controller is offline during
in ioctls.
5) Avoided unnecessary host memory allocation for rcb sg buffers.
6) Fixed race conditions while accessing internal rcb structure.
7) Fixed where Logical volumes exposing two different names to the OS
it's due to the system memory is overwritten with DMA stale data.
8) Fixed dynamically unloading a smartpqi driver.
9) Added device_shutdown callback instead of deprecated shutdown_final
kernel event in smartpqi driver.
10) Fixed where Os is crashed during physical drive hot removal during
heavy IO.
11) Fixed OS crash during controller lockup/offline during heavy IO.
12) Fixed coverity issues in smartpqi driver
13) Fixed system crash while creating and deleting logical volume in a
continuous loop.
14) Fixed where the volume size is not exposing to OS when it expands.
15) Added HC3 pci id's.
16) Fixed compiler issues in 12.2 kernel.
Note: this is a direct commit, submitted by the vendor to support
stable/12
Reviewed by: imp, Murthy Bhat, Scott Benesh
Differential Revision: https://reviews.freebsd.org/D24428
Warner Losh [Mon, 12 Jul 2021 03:26:08 +0000 (21:26 -0600)]
awk: remove proctab.c
proctab.c is a generated file and never should have been committed to
the tree. This file has been added and removed a couple of times, most
recently added by me in my 2019 updates.
Kristof Provost [Tue, 2 Mar 2021 15:57:27 +0000 (16:57 +0100)]
pf tests: Test the match keyword
The new match keyword can currently only assign queues, so we can only
test it with ALTQ.
Set up a basic scenario where we use 'match' to assign ICMP traffic to a
slow queue, and confirm that it's really getting slowed down.
Kristof Provost [Tue, 2 Mar 2021 15:01:04 +0000 (16:01 +0100)]
pf: match keyword support
Support the 'match' keyword.
Note that support is limited to adding queuing information, so without
ALTQ support in the kernel setting match rules is pointless.
For the avoidance of doubt: this is NOT full support for the match
keyword as found in OpenBSD's pf. That could potentially be built on top
of this, but this commit is NOT that.
The general style in sbin/nvmecontrol apppears to print uint64_t types
using %j, so I'm using that instead of the more general (but admittedly
ugly) PRIu64.
Warner Losh [Fri, 2 Jul 2021 22:00:42 +0000 (16:00 -0600)]
nvme: coherently read status of completion records
Coherently read the phase bit of the status completion record. We loop
over the completion record array, looking for all the transactions in
the same phase that have been completed. In doing that, we have to be
careful to read the status field first, and if it indicates a complete
record, we need to read and process that record. Otherwise, the host
might be overtaken by device when reading this completion record,
leading to a mistaken belief that the record is in phase. This leads to
the code using old values and looking at an already completed entry, which
has no current tracker.
To work around this problem, we read the status and make sure it is in
phase, we then re-read the entire completion record guaranteeing it's
complete, valid, and consistent . In addition we resync the dmatag to
reflect changes since the prior loop for the bouncing dma case.
Reviewed by: jrtc27@, chuck@
Found by: jrtc27 (this fix is based in part on her D30995 fix)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31002
Warner Losh [Fri, 2 Jul 2021 21:58:19 +0000 (15:58 -0600)]
nvme: Fix alignment on nvme structures
Remove __packed from nvme_command, nvme_completion and
nvme_dsm_trim. Add super-alignment to nvme_completion since it's always
at least that aligned in hardware (and in our existing uses of it
embedded in structures). It generates better code in
nvme_qpair_process_completions on riscv64 because otherwise the ABI
assumes a 4-byte alignment, and the same on all other platforms.
Warner Losh [Sat, 29 May 2021 05:01:52 +0000 (23:01 -0600)]
nvme: fix a race between failing the controller and failing requests
Part of the nvme recovery process for errors is to reset the
card. Sometimes, this results in failing the entire controller. When nda
is in use, we free the sim, which will sleep until all the I/O has
completed. However, with only one thread, the request fail task never
runs once the reset thread sleeps here. Create two threads to allow I/O
to fail until it's all processed and the reset task can proceed.
This is a temporary kludge until I can work out questions that arose
during the review, not least is what was the race that queueing to a
failure task solved. The original commit is vague and other error paths
in the same context do a direct failure. I'll investigate that more
completely before committing changing that to a direct failure. mav@
raised this issue during the review, but didn't otherwise object.
Multiple threads, though, solve the problem in the mean time until other
such means can be perfected.
Warner Losh [Thu, 11 Mar 2021 15:42:44 +0000 (08:42 -0700)]
nvme: use config_intrhook_drain to avoid removable card races
nvme drives are configured early in boot. However, a number of the configuration
steps takes which take a while, so we defer those to a config intrhook that runs
before the root filesystem is mounted. At the same time, the PCI hot plug wakes
up and tests the status of the card. It may decide that the card has gone away
and deletes the child. As part of that process nvme_detach is called. If this
call happens after the config_intrhook starts to run, but before it is finished,
there's a race where we can tear down the device's soft state while the
config_intrhook is still using it. Use the new config_intrhook_drain to
disestablish the hook. Either it will be removed w/o running, or the routine
will wait for it to finish. This closes the race and allows safe hotplug at any
time, even very early in boot.
The NVMe byte-swap routines for big-endian platforms used memcpy() to
move the unaligned 64-bit value into a temp register to byte swap it.
Instead of introducing a dependency, manually byte-swap the values in
place.
Warner Losh [Fri, 4 Dec 2020 21:34:48 +0000 (21:34 +0000)]
nvme: Remove a wmb() that's not necessary.
bus_dmamap_sync() ensures that memory that's prepared for PREWRITE can
be DMA'd immediately after it returns. The details differ, but this
mirrors atomic thread release semantics, at least for the buffers
synced.
For non-x86 platforms, bus_dmamap_sync() has the right syncing and
fences. So in the past, wmb() had been omitted for them.
For x86 platforms, the memory ordering is already strong enough to
ensure DMA to the device sees the current contents. As such, we don't
need the wmb() here. It translates to an sfence which is only needed
for writes to regions that have the write combining attribute set or
when some exotic opcodes are used. The nvme driver does neither of
these. Since bus_dmamap_sync() includes atomic_thread_fence_rel, we
can be assured any optimizer won't reorder the bus_dmamap_sync and the
bus_space_write operations. The wmb() was a vestiage of the pre-busdma
version initially committed to the tree.
Michal Meloun [Wed, 2 Dec 2020 16:54:24 +0000 (16:54 +0000)]
NVME: Multiple busdma related fixes.
- in nvme_qpair_process_completions() do dma sync before completion buffer
is used.
- in nvme_qpair_submit_tracker(), don't do explicit wmb() also for arm
and arm64. Bus_dmamap_sync() on these architectures is sufficient to ensure
that all CPU stores are visible to external (including DMA) observers.
- Allocate completion buffer as BUS_DMA_COHERENT. On not-DMA coherent systems,
buffers continuously owned (and accessed) by DMA must be allocated with this
flag. Note that BUS_DMA_COHERENT flag is no-op on DMA coherent systems
(or coherent buses in mixed systems).
Warner Losh [Sat, 24 Oct 2020 01:59:01 +0000 (01:59 +0000)]
nvme: Remove compat code for older kernels
Remove code that supported pre-2011 kernels. CTLTYPE_S64 was defined
in rev 217616. All supported branches have it, so remove its compat
definition as OBE.
Warner Losh [Fri, 1 May 2020 21:24:19 +0000 (21:24 +0000)]
Add KASSERT to ensure sane nsid.
All callers are currently filtering bad nsid to this function,
however, we'll have undefined behavior if that's not true. Add the
KASSERT to prevent that.
Warner Losh [Tue, 20 Jul 2021 04:47:30 +0000 (22:47 -0600)]
awk: Make -F '' and -v FS="" behave the same
IEEE Std 1003.1-2008 mandates that -F str be treated the same as -v
FS=str. For a null string, this was not the case. Since awk(1) documents
that a null string for FS has a specific behavior, make -F '' behave
consistently with -v FS="".
Warner Losh [Thu, 22 Jul 2021 02:24:57 +0000 (20:24 -0600)]
awk: Remove last markings we have on awk
We normally don't add $FreeBSD$ to contrib software. However, these
changes date back to the CVS era of source code management and have been
overlooked. Now that all these files are back to the same as the
upstream bsd-features branch, remove the FreeBSD specific changes, which
are now just $FreeBSD$ and the (FreeBSD) in the version string.
Warner Losh [Thu, 22 Jul 2021 02:22:43 +0000 (20:22 -0600)]
awk: revert to upstream behavior for ranges for gawk compatibility
In 2005, FreeBSD changed one-true-awk to honor the locale's collating
order. This was billed as a temporary patch. It was also compatible with
the then-current behavior of gawk. That temporary patch has lasted 16
years now.
However, IEEE Std 1003.1-2008 changed the behaivor of ranges in regular
expressions outside of the "C" and "POSIX" locales to be undefined.
Starting in 2011, gawk 4.0 stopped using the locale for the range
regular expressions and used the traditional behavior only. The
maintainer had grown weary of answering why '[A-Z]' would sometimes
match lower-case expressions. The details about are explained here:
https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html
To restore compatibility with other implementaitons of awk, revert this
patch. FreeBSD is the odd-system out. It also has the nice side effect
of eliminating the last of our differences with upstream one-true-awk.
Warner Losh [Fri, 9 Jul 2021 03:51:24 +0000 (21:51 -0600)]
awk: Reduce diffs with upstream to almost nothing.
In the merge of 20210215, I left two merge conflicts #if 0'd by mistake
to check later rather than resolve them as part of the merge. This code
turns out to be from the original one-true-awk import and not FreeBSD
specific, so remove them.
Remove a extra definition of HAT.
Remove a stylistic change that also appears to be a mismerge along the
way.
Remove FREEBSD-upgrade. Nobody has updated it since the original 2007
cvs import. It talks about old CVS branches that never made it into svn,
let alone git. New imports will follow the standard practices now, so
there's nothing left to document.
Move README to README.md and copy the README.md from upstream over.
This leaves just the $FreeBSD$ lines (which remain for the stable/12
merge) and the strcoll part of ru@'s r201989/d98dd8e5f94c as the only
diffs with upstream. FreeBSD also still has its own man page, which I
don't plan on changing. Once this commit is merged to stable/12, I plan
no further merges to stable/12. Sometime after that I'll remove the
$FreeBSD$ lines to reduce the diffs even more (though i want to make
sure plans won't change first). I also plan to talk to upstream about
this change...
Only lists the states relevant to the connection we're killing.
Sometimes there are IPv6 related states (due to the usual IPv6
background traffic of router solicitations, DAD, ...) that causes us to
think we failed to kill the state, which in turn caused the test to fail
intermittently.
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC ("Netgate")
Kevin Bowling [Fri, 16 Jul 2021 06:50:14 +0000 (23:50 -0700)]
ixgbe: Print FW NVM and Option ROM versions
It can be useful for system operators to see this kind of information
when correlating issues or requesting support from the OEM or Intel for
hardware and firmware issues.
dd a new option (-H) to daemon(8) to catch SIGHUP and re-open output_file file when
received.
The default system log rotation mechanism (newsyslog(8)) requires ability to send
signal to a daemon in order to properly complete rotation of the logs in an "atomic"
manner without having to making a copy and truncating original file. Unfortunately
our built-in mechanism to convert "dumb" programs into daemons has no way to handle
this rotation properly. This change adds this ability, to be enabled by supplying -H
option in addition to the -o option.
Alan Somers [Thu, 20 May 2021 01:10:15 +0000 (19:10 -0600)]
fusefs: correctly set lock owner during FUSE_SETLK
During FUSE_SETLK, the owner field should uniquely identify the calling
process. The fusefs module now sets it to the process's pid.
Previously, it expected the calling process to set it directly, which
was wrong.
libfuse also apparently expects the owner field to be set during
FUSE_GETLK, though I'm not sure why.
Alan Somers [Fri, 18 Jun 2021 00:04:59 +0000 (18:04 -0600)]
fusefs: ensure that FUSE ops' headers' unique values are actually unique
Every FUSE operation has a unique value in its header. As the name
implies, these values are supposed to be unique among all outstanding
operations. And since FUSE_INTERRUPT is asynchronous and racy, it is
desirable that the unique values be unique among all operations that are
"close in time".
Ensure that they are actually unique by incrementing them whenever we
reuse a fuse_dispatcher object, for example during fsync, write, and
listextattr.
Alan Somers [Thu, 24 Dec 2020 19:21:00 +0000 (19:21 +0000)]
fusefs: delete some dead code
The original fusefs GSoC project seems to have envisioned exchanging two
types of messages with FUSE servers. Perhaps vectored and non-vectored?
But in practice only one type has ever been used. Delete the other type.
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D27770
Alan Somers [Tue, 15 Jun 2021 20:24:05 +0000 (14:24 -0600)]
fusefs: improve warnings about buggy FUSE servers
The fusefs driver will print warning messages about FUSE servers that
commit protocol violations. Previously it would print those warnings on
every violation, but that could spam the console. Now it will print
each warning no more than once per lifetime of the mount. There is also
now a dtrace probe for each violation.
We failed to list the new pf_syncookies.c file in sys/conf/files. This
worked for the usual configurations, where pf is a module, but not for
LINT builds.
Kristof Provost [Wed, 2 Jun 2021 16:16:03 +0000 (18:16 +0200)]
pf tests: Forwarding syncookie test
Test syncookies on a forwarding host. That is, in a setup where the
machine (or vnet) running pf is not the same as the machine (or vnet)
running the server it's protecting.
Kristof Provost [Thu, 20 May 2021 09:54:41 +0000 (11:54 +0200)]
pf: syncookie support
Import OpenBSD's syncookie support for pf. This feature help pf resist
TCP SYN floods by only creating states once the remote host completes
the TCP handshake rather than when the initial SYN packet is received.
This is accomplished by using the initial sequence numbers to encode a
cookie (hence the name) in the SYN+ACK response and verifying this on
receipt of the client ACK.
This can be used for variables which are only used with either
INVARIANTS or WITNESS. Without any annotation they run into dead store
warnings from cc --analyze and always annotating with __unused may hide
bad vars when it should not.
Warner Losh [Thu, 15 Jul 2021 22:17:23 +0000 (16:17 -0600)]
nvme: Enable interrupts after qpair fully constructed
To guard against the ill effects of a spurious interrupt during
construction (or one that was bogusly pending), enable interrupts after
the qpair is completely constructed. Otherwise, we can die with null
pointer dereferences in nvme_qpair_process_completions. This has been
observed in at least one pre-release NVMe drive where the MSIX interrupt
fired while the queue was being created, before we'd started the NVMe
controller card.
The alternative of only turning on the interrupts after the rest was
tried, but was insufficient to work around this bug and made the code
more complicated w/o benefit.
Warner Losh [Thu, 6 May 2021 19:05:09 +0000 (13:05 -0600)]
boot: fix OBJS to not include BTX's crt0.o
According to comments in the Makefile, to make pxeboot work we need to
have crt0.o first. This is needed because the simplified loader in
pxeboot assumes that the startup code is at offset 0 in this binary. In
normal booting, the start address can be obtained from headers of the
binary, but since pxeboot encodes this as a pure binary, it has no way
of knowing where that is and assumes 0. Added comments to that effect
in the Makefile.
We've done this by adding it to OBJS before all the other .o's are
added. However, there's a problem. This also adds it to the CLEANFILES
variable, which causes it to be removed from multiple places. The
dependencies may also cause it to be re-built at a time that's after
boot2 is built. This causes installs to fail because at install time
boot2 is considered to be out of date and the programs to rebuild it are
no longer in the path.
Cope with this problem by just adding it to LDFLAGS instead.
Glanced at by: kevans ("I thought that went in ages ago")
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D28876
nanobsd: Use gpart and create code image before full disk image
The attached patch brings two main changes to the nanobsd script:
1- gpart is used instead of fdisk;
2- the code image is created first, and then used to ``assemble'' the
full disk image.
The patch was first proposed on the freebsd-embedded list:
http://lists.freebsd.org/pipermail/freebsd-embedded/2012-June/001580.html
and is currently under discussion:
http://lists.freebsd.org/pipermail/freebsd-embedded/2014-January/002216.html
Another effect is that the -f option ("suppress code slice extraction")
now imples the -i option ("suppress disk image build").
imp@ applied Patch by hand to new legacy.sh, plus tweaked for NANO_LOG vs
NANO_OBJ confusion in original.
Warner Losh [Thu, 15 Jul 2021 03:06:08 +0000 (21:06 -0600)]
loader: make sure CPUTYPE is ignored when building
CPUTYPE?=native causes -march=native to be added to the command
line. When the host machine is haswell, this causes some versions of
clang to generate code that can't execute in the efi boot loader
environment. Set _CPUCFLAGS= to undo what's done bsd.cpu.mk. bsd.cpu.mk
is included too early to control with NO_CPU_CFLAGS here. The only other
option is to put that in all the Makefiles, and this is less tedious and
error prone.
Alfonso Gregory [Wed, 14 Jul 2021 21:48:35 +0000 (15:48 -0600)]
Remove incorrect __restricted labels from strcspn
strcspn should never have had the __restrict keywords. While both of
these strings are const, it may have unindended side effects. While this
is the kernel, the POSIX definition also omits restrict.
These issues have low impact because they require precise circumstances
to trigger one of them. The disk must be > 2 TiB in size and either:
- The primary GPT header is dammaged.
- The freebsd-boot partiton is located farther than the first 2 TiB of
the disc and one of its sectors takes place at a lba value that makes
the higher 32 bits of this very value change.
Errors and corrections folow:
- decl and incl don't affect CF, so replace with subl/addl $1
- repe uses %cx, so move size to it with movw
- moving a 64-bit value with %cx of 2 (should be 4) so addresses
> 2TB will work.
PR: 233180
Reviewed by: imp@ (applied patch using description in bug)
Differential Revision: https://reviews.freebsd.org/D31100
Warner Losh [Tue, 13 Jul 2021 06:00:33 +0000 (00:00 -0600)]
cam_iosched: use tunable flag and make a bool really a bool
kern.cam.do_dynamic_iosched is really a bool, so change its type to
bool. While I'm here, also use the CTLFLAG_TUN flag instead of a
separate tunable line for it and kern.cam.iosched_alpha_bits.
Young Xiao [Tue, 21 May 2019 07:36:29 +0000 (15:36 +0800)]
Fix potential NULL pointer dereference of device physical path
In ata_dev_advinfo() and nvme_dev_advinfo(), if the physical path is
being stored and there is a malloc failure (malloc(9) is called with
M_NOWAIT), we could wind up in a situation where the device's
physpath_len is set to the length the user provided, but the physpath
itself is NULL.
If another context then comes in to fetch the physical path value, we
would wind up trying to memcpy a NULL pointer into the caller's buffer.
So, set the physpath_len to 0 when we free the physpath on entry into
the store case for the physical path. Reset the length to a non-zero
value only after we've successfully malloced a buffer to hold it.
This code mirrors scsi_xpt.c does already as well.
Signed-off-by: Young Xiao <92siuyang@gmail.com>
Reviewed by: imp
PR: 238014
John Hood [Sun, 11 Jul 2021 14:44:12 +0000 (08:44 -0600)]
loader: support.4th resets the read buffer incorrectly
Large nextboot.conf files (over 80 bytes) are not read correctly by the
Forth loader, causing file parsing to abort, and nextboot configuration
fails to apply.
Simple repro:
nextboot -e foo=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
shutdown -r now
That will cause the bug to cause a parse failure but shouldn't otherwise
affect the boot. Depending on your loader configuration, you may also
have to set beastie_disable and/or reduce the number of modules loaded
to see the error on a small console screen. 12.0 or CURRENT users will
also have to explicitly use the Forth loader instead of the Lua loader.
The error will look something like:
Warning: syntax error on file /boot/loader.conf.local
foo="xxxxxxxxxxxxxxnextboot_enable="YES"
^
/boot/support.4th has crude file I/O buffering, which uses a buffer
'read_buffer', defined to be 80 bytes by the 'read_buffer_size'
constant. The loader first tastes nextboot.conf, reading and parsing
the first line in it for nextboot_enable="YES". If this is true, then
it reopens the file and parses it like other loader .conf files.
Unfortunately, the file I/O buffering code does not fully reset the
buffer state in the reset_line_reading word. If the last file was read
to the end, that doesn't matter; the file buffer is treated as empty
anyway. But in the nextboot.conf case, the loader will not read to the
end of file if it is over 80 bytes, and the file buffer may be reused
when reading the next file. When the file is reread, the corrupt text
may cause file parsing to abort on bad syntax (if the corrupt line has
<>2 quotes in it), the wrong variable to be set, no variable to be set
at all, or (if the splice happens to land at a line ending) something
approximating normal operation.
The bug is very old, dating back to at least 2000 if not before, and is
still present in 12.0 and CURRENT r345863 (though it is now hidden by
the Lua loader by default).
Suggested one-line attached. This does change the behavior of the
reset_line_reading word, which is exported in the line-reading
dictionary (though the export is not documented in loader man pages).
But repo history shows it was probably exported for the PNP support
code, which was never included in the loader build, and was removed 5
months ago.
One thing that puzzles me: how has this bug gone unnoticed/unfixed for
nearly 2 decades? I find it hard to believe that nobody's tried to do
something interesting with nextboot, like load a kernel and filesystem,
which is what I'm doing.