kib [Tue, 16 Jun 2020 21:29:02 +0000 (21:29 +0000)]
rtld: Allow to load ET_DYN && DF_1_PIE when tracing.
This makes old ldd to still work on newer tagged PIE binaries.
Also move debug line for hashes before both decisions to not load are
done, so that the end of digest_dynamic() processing and reason to not
load or load is seen in debug trace.
Noted by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
jmallett [Tue, 16 Jun 2020 19:21:28 +0000 (19:21 +0000)]
Improve unit parsing of mpsutil.
Previously, it used atoi(3) to parse the unit parameter, which would silently
yield a unit of 0 in the presence of an invalid unit number. As most users of
mpsutil(8) are likely to have at least a unit 0, this is could have confusing
results.
This behaviour was particularly unintuitive if one incorrectly passed an
adapter device name, or a device path, instead of a unit number. In addition
to using strtoumax(3) instead of atoi(3) to parse unit numbers, support
stripping a device name (e.g. mps1) or path (e.g. /dev/mps2) to just its unit
number.
rrs [Tue, 16 Jun 2020 18:16:45 +0000 (18:16 +0000)]
iSo in doing final checks on OCA firmware with all the latest tweaks the dup-ack checking
packet drill script was failing with a number of unexpected acks. So it turns
out if you have the default recvwin set up to 1Meg (like OCA's do) and you
have no window scaling (like the dupack checking code) then we have another
case where we are always trying to update the rwnd and sending an
ack when we should not.
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D25298
eugen [Tue, 16 Jun 2020 17:45:23 +0000 (17:45 +0000)]
newsyslog(8): make configuration parser more robust.
Currently newsyslog supports <include> directive that is used
in our default /etc/newsyslog.conf in the following form:
<include> /usr/local/etc/newsyslog.conf.d/*
While this is suitable for ports installing their own rules
for logs rotation, this also makes newsyslog break entire
processing of all files if it encounters single line it cannot parse.
This includes lines referring to nonexistent username/group for log
ownership, so newsyslog stops calling errx() function in the parser.
With this fix, newsyslog uses warnx() instead of errx() in such cases
to print a warning, recover gracefully and continue with execution.
Among other cases, this unbreaks initial creation of log files
having flag "C" at boot time (newsyslog -CN). This is most important
for systems having RAM-based /var file system like nanobsd(8)-based
that rely on newsyslog to bring system log files into existence.
rrs [Tue, 16 Jun 2020 12:26:23 +0000 (12:26 +0000)]
So it turns out rack has a shortcoming in dup-ack counting. It counts the dupacks but
then does not properly respond to them. This is because a few missing bits are not present.
BBR actually does properly respond (though it also sends a TLP which is interesting and
maybe something to fix)..
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D25294
rmacklem [Tue, 16 Jun 2020 02:31:22 +0000 (02:31 +0000)]
Expose UID_xxx and GID_xxx definitions to userspace.
This patch moves the UID_xxx and GID_xxx definitions out of the
#ifdef _KERNEL section, so that userspace programs like mountd
can use them.
There are a couple of userspace programs that do define UID_ROOT,
but they do not include sys/conf.h. Since they are defined as
the same value, maybe they should be changed to include sys/conf.h.
U-APSD (unscheduled automatic power save delivery) is a power save method
that's a bit better than legacy PS-POLL - stations can mark frames with
an extra flag that tells the AP to leak out more frames after it sends
its own frames rather than needing to send a PS-POLL to get another frame
from the AP.
Now, this code just handles the negotiation bits; it doesn't actually
implement U-APSD. That's up to drivers, and nothing in the tree yet
implements this. I /may/ implement this for ath(4) if I eventually care
enough but right now I plan on just implementing it for firmware offload
based NICs that handle this in the NIC.
I'll commit the ifconfig bit after this and I may have some follow-up
commits as this gets used more by me in local testing.
This should be a glorious no-op for everyone else. If things change
for anyone that isn't fixed by a complete recompile then please reach out
to me.
trasz [Mon, 15 Jun 2020 20:12:10 +0000 (20:12 +0000)]
Make Linux uname(2) return x86_64 to 32-bit apps. This helps Steam.
PR: kern/240432
Analyzed by by: Alex S <iwtcex@gmail.com>
Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25248
vmaffione [Mon, 15 Jun 2020 19:46:34 +0000 (19:46 +0000)]
if_vtnet: let vtnet_rx_vq_intr() and vtnet_rxq_tq_intr() share code
Since the two functions are similar, introduce a common function
(vtnet_rx_vq_process()) to share common code.
This also improves locking, by ensuring vrxs_rescheduled is accessed
under the RXQ lock, and taskqueue_enqueue() is not called under the
lock (therefore avoiding a spurious duplicate lock warning).
jhb [Mon, 15 Jun 2020 19:38:48 +0000 (19:38 +0000)]
Remove the sed hack for ABI tag notes.
The ELF notes compiled in C were placed in a section with the wrong type
(SHT_PROGBITS instead of SHT_NOTE). Previously, sed was used on the
generated assembly to rewrite the section type. Instead, write the notes
in assembly which permits setting the correct section type directly.
While here, move inline assembly entry points out of C and into assembly
for aarch64, arm, and riscv.
jhb [Mon, 15 Jun 2020 18:57:43 +0000 (18:57 +0000)]
Simplify MACHINE_ARCH to be a single string.
Big endian and armv4 mean that we are now down to only two supported
variants. A future change will use MACHINE_ARCH in assembly which
does not support C-style string concatentation and thus needs
MACHINE_ARCH defined as a single string.
jrtc27 [Sun, 14 Jun 2020 22:39:34 +0000 (22:39 +0000)]
vtnet: Fix regression introduced in r361944
For legacy devices that don't support MrgRxBuf (such as bhyve pre-r358180),
r361944 failed to update the receive handler to account for the additional
padding introduced by the unused num_buffers field that is now always present
in struct vtnet_rx_header. Thus, calculate the padding dynamically based on
vtnet_hdr_size.
vmaffione [Sun, 14 Jun 2020 20:47:31 +0000 (20:47 +0000)]
netmap: vtnet: fix races in vtnet_netmap_reg()
The nm_register callback needs to call nm_set_native_flags()
or nm_clear_native_flags() once the device has been stopped.
However, in the current implementation this is not true,
as the device is stopped by vtnet_init_locked(). This causes
race conditions where the driver crashes as soon as it
dequeues netmap buffers assuming they are mbufs (or the other
way around).
To fix the issue, we extend vtnet_init_locked() with a second
argument that, if not zero, will set/clear the netmap flags.
This results in a huge simplification of the nm_register
callback itself.
Also, use netmap_reset() to check if a ring is going to be
re-initialized in netmap mode.
jilles [Sun, 14 Jun 2020 19:41:24 +0000 (19:41 +0000)]
sh/tests: Add tests for SIGINT in non-jobc background commands
If job control is not enabled, background commands shall ignore SIGINT and
SIGQUIT, and it shall be possible to override that ignore in the same shell.
bdragon [Sun, 14 Jun 2020 16:47:16 +0000 (16:47 +0000)]
[PowerPC] Fix scc z8530 driver
Parts of the z8530 driver were still using the SUN channel spacing.
This was invalid on PowerMac and QEMU, where the attachment was to escc,
not escc-legacy. This means the driver has apparently NEVER worked properly
on Macintosh hardware.
Add documentation for the channel spacing details, and change to using
driver-specific initialization instead of hardcoded spacing so either
spacing can be used.
Fixes boot hang in QEMU when using the serial console, and fixes use on
Xserve serial (and presumably PowerMacs that have a Stealth Serial port
or similar)
tsoome [Sun, 14 Jun 2020 06:58:58 +0000 (06:58 +0000)]
Move font related data structured to sys/font.c and update vtfontcvt
Prepare support to be able to handle font data in loader, consolidate
data structures to sys/font.h and update vtfontcvt.
vtfontcvt update is about to output set of glyphs in form of C source,
the implementation does allow to output compressed or uncompressed font
bitmaps.
rmacklem [Sun, 14 Jun 2020 00:40:00 +0000 (00:40 +0000)]
Modify mountd to use the new struct export_args committed by r362158.
r362158 modified struct export_args for make the ex_flags field 64bits
and also changed the anonymous credentials to allow more than 16 groups.
This patch fixes mountd.c to use the new structure.
It does allocate larger exportlist and grouplist structures now.
That will be fixed in a future commit.
The only visible change will be that the credentials provided for the
-maproot and -mapall exports options can now have more than 16 groups.
rmacklem [Sun, 14 Jun 2020 00:10:18 +0000 (00:10 +0000)]
Fix export_args ex_flags field so that is 64bits, the same as mnt_flags.
Since mnt_flags was upgraded to 64bits there has been a quirk in
"struct export_args", since it hold a copy of mnt_flags
in ex_flags, which is an "int" (32bits).
This happens to currently work, since all the flag bits used in ex_flags are
defined in the low order 32bits. However, new export flags cannot be defined.
Also, ex_anon is a "struct xucred", which limits it to 16 additional groups.
This patch revises "struct export_args" to make ex_flags 64bits and replaces
ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a
groups list, so it can be malloc'd up to NGROUPS in size.
This requires that the VFS_CHECKEXP() arguments change, so I also modified the
last "secflavors" argument to be an array pointer, so that the
secflavors could be copied in VFS_CHECKEXP() while the export entry is locked.
(Without this patch VFS_CHECKEXP() returns a pointer to the secflavors
array and then it is used after being unlocked, which is potentially
a problem if the exports entry is changed.
In practice this does not occur when mountd is run with "-S",
but I think it is worth fixing.)
This patch also deleted the vfs_oexport_conv() function, since
do_mount_update() does the conversion, as required by the old vfs_cmount()
calls.
adrian [Sat, 13 Jun 2020 23:35:22 +0000 (23:35 +0000)]
[net80211] Handle offloaded AMSDU in AMPDU reordering.
In the 11n world, most NICs did A-MPDU receive/transmit offloading but
not A-MSDU offloading. So, the net80211 A-MPDU receive path would just
receive MPDUs, do the reordering bit, pass it up to the rest of
net80211 for crypto decap and then do A-MSDU decap before throwing ethernet
frames up to the rest of the system.
However 11ac and 11ax NICs are increasingly doing A-MSDU offload (and
newer 11ax stuff does socket offload, but hey I don't want to scare people
JUST yet) - so although A-MPDU reordering may be done in the OS, A-MSDUs
look like a normal MPDU. This means that all the MSDUs are actually
faked into a set of MPDUs with matching 802.11 header - the sequence number,
QoS header and any encryption verification bits (like IV) are just copied.
This shows up as MASSIVE packet loss in net80211, cause after the first MPDU
we just toss the rest.
(And don't get me started about ethernet decap with A-MPDU host reordering;
we'll have to cross that bridge for later 11ac and 11ax bits too.)
Anyway, this work changes each A-MPDU reorder slot into an mbufq.
The mbufq is treated as a whole set of frames to pass up to the stack
and reordered/de-duped as a group. The last frame in the reorder list
is checked to see if it's an A-MSDU final frame so any duplicates are
correctly tossed rather than double-received. Other than that, the
rest of the logic is unchanged.
The previous commit did a small subset of this - if there wasn't any reordering
going on then it'd accept the A-MSDUs. This is the rest of the needed work.
This is a no-op for 11n NICs doing A-MPDU reordering but needing software
A-MSDU decap - they aren't tagged as A-MSDU and so any subsequent
frames added to the reorder slot are tossed.
adrian [Sat, 13 Jun 2020 22:20:02 +0000 (22:20 +0000)]
[net80211] separate out node allocation and node initialisation.
This is a new, optional (for now!) method that drivers can use to separate
node allocation and node initialisation. Right now they're the same, and
drivers that need to do node allocation via firmware commands need to sleep
and thus they need to defer node allocation into an internal taskqueue.
Right now they're just separate but not deferred. Later on if I get the time
we'll start deferring the node and key related operations but that requires
making a bunch of other stuff (notably things that generate frames!) also
async/deferred.
kib [Sat, 13 Jun 2020 18:21:31 +0000 (18:21 +0000)]
Fix ldd for PIE binaries after rtld stopped accepting binaries for dlopen.
ldd proclaims ET_DYN objects as shared libraries and tries to
dlopen(RTLD_TRACE) them to get dependencies. Since PIE binaries are
ET_DYN | DF_1_PIE, refusal to dlopen such binaries breaks ldd.
Fix it by reading and parsing dynamic segment looking for DF_FLAG_1
and taking DF_1_PIE into account when deciding between binary and
library.
Reported by: Dewayne Geraghty <dewayne@heuristicsystems.com.au>
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25257
yuripv [Sat, 13 Jun 2020 14:11:02 +0000 (14:11 +0000)]
nvi: fallback to ISO8859-1 as last resort
Current logic of using user's locale encoding that is UTF-8 doesn't make
much sense if we already failed the looks_utf8() check and skipped
encoding set using "fileencoding" as being UTF-8 as well; fallback to
ISO8859-1 in that case.
CVE-2020-11655: SQLite through 3.31.1 allows attackers to cause a denial of
service (segmentation fault) via a malformed window-function query because
the AggInfo object's initialization is mishandled.
CVE-2020-13434: SQLite through 3.32.0 has an integer overflow in
sqlite3_str_vappendf in printf.c.
CVE-2020-13435: SQLite through 3.32.0 has a segmentation fault in
sqlite3ExprCodeTarget in expr.c.
CVE-2020-13630: ext/fts3/fts3.c in SQLite before 3.32.0 has a
use-after-free in fts3EvalNextRow, related to the snippet feature
CVE-2020-13631: SQLite before 3.32.0 allows a virtual table to be renamed
to the name of one of its shadow tables, related to alter.c and build.c.
CVE-2020-13632: ext/fts3/fts3_snippet.c in SQLite before 3.32.0 ha s a
NULL pointer dereference via a crafted matchinfo() query.
dougm [Sat, 13 Jun 2020 01:54:09 +0000 (01:54 +0000)]
Linuxkpi uses the rb-tree structures without using their interfaces,
making them break when the representation changes. Revert changes that
eliminated the color field from rb-trees, leaving everything as it was
before.
jhb [Fri, 12 Jun 2020 23:10:30 +0000 (23:10 +0000)]
Various optimizations to software AES-CCM and AES-GCM.
- Make use of cursors to avoid data copies for AES-CCM and AES-GCM.
Pass pointers into the request's input and/or output buffers
directly to the Update, encrypt, and decrypt hooks rather than
always copying all data into a temporary block buffer on the stack.
- Move handling for partial final blocks out of the main loop.
This removes branches from the main loop and permits using
encrypt/decrypt_last which avoids a memset to clear the rest of the
block on the stack.
- Shrink the on-stack buffers to assume AES block sizes and CCM/GCM
tag lengths.
- For AAD data, pass larger chunks to axf->Update. CCM can take each
AAD segment in a single call. GMAC can take multiple blocks at a
time.
kib [Fri, 12 Jun 2020 22:14:45 +0000 (22:14 +0000)]
Control for Special Register Buffer Data Sampling mitigation.
New microcode update for Intel enables mitigation for SRBDS, which
slows down RDSEED and related instructions. The update also provides
a control to limit the mitigation to SGX enclaves, which should
restore the speed of random generator by the cost of potential
cross-core bufer sampling.
See https://software.intel.com/security-software-guidance/insights/deep-dive-special-register-buffer-data-sampling
GIve the user control over it.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25221
kib [Fri, 12 Jun 2020 22:10:03 +0000 (22:10 +0000)]
rtld: set osrel when in the direct exec mode.
Rtld itself is a shared object which does not have vendor note, so
after the direct exec of ld-elf.so.1 process has p_osrel set to zero.
This affects the ABI of syscalls.
Set osrel to the __FreeBSD_version value at compile time right after
rtld identified direct exec mode. Then, switch to the osrel read from
the binary note or zero if no note, right before starting calling
ifunc resolvers, which is the first byte of the user code.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
jhb [Fri, 12 Jun 2020 21:21:18 +0000 (21:21 +0000)]
Various fixes to TLS for MIPS.
- Clear the current thread's TLS pointer on exec. Previously the TLS
pointer (and register) remain unchanged.
- Explicitly clear the TLS pointer when new threads are created.
- Make md_tls_tcb_offset per-process instead of per-thread.
The layout of the TLS and TCB are identical for all threads in a
process, it is only the TLS pointer values themselves that vary by
thread. This also makes setting md_tls_tcb_offset in
cpu_set_user_tls() redundant with the setting in exec_setregs(), so
only set it in exec_setregs().
vangyzen [Fri, 12 Jun 2020 21:17:56 +0000 (21:17 +0000)]
FPU init: allocate initial state from UMA to ensure alignment
The Intel Instruction Set Reference says this about the XSAVE instruction:
Use of a destination operand not aligned to 64-byte boundary
(in either 64-bit or 32-bit modes) results in a general-protection
(#GP) exception.
This alignment happens naturally when all malloc buckets are powers
of two. However, this change is necessary on some systems when
certain non-power-of-two (and non-multiple of 64) malloc buckets
are defined.
Reviewed by: cem; kib; earlier version by jhb
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D25098
vangyzen [Fri, 12 Jun 2020 21:10:45 +0000 (21:10 +0000)]
FPU init: Do potentially blocking operations before disabling interrupts
In particular, uma_zcreate creates sysctl oids, which locks an sx lock,
which uses IPIs under contention. IPIs tend not to work very well
when interrupts are disabled. Who knew, right?
rrs [Fri, 12 Jun 2020 19:56:19 +0000 (19:56 +0000)]
So it turns out with the right window scaling you can get the code in all stacks to
always want to do a window update, even when no data can be sent. Now in
cases where you are not pacing thats probably ok, you just send an extra
window update or two. However with bbr (and rack if its paced) every time
the pacer goes off its going to send a "window update".
Also in testing bbr I have found that if we are not responding to
data right away we end up staying in startup but incorrectly holding
a pacing gain of 192 (a loss). This is because the idle window code
does not restict itself to only work with PROBE_BW. In all other
states you dont want it doing a PROBE_BW state change.
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D25247
gallatin [Fri, 12 Jun 2020 18:41:12 +0000 (18:41 +0000)]
x86: Bump default msi/msix vector limit to 2048
Given that 64c/128t CPUs are currently available, and that many
devices (nvme, many NICs) desire to map 1 MSI-X vector per core,
or even 1 per-thread, it is becoming far easier to see MSI-X interrupt
setup fail due to msi vector exhaustion, and devices fail to attach at
boot on large system.
This bump costs 12KB on amd64 (and 6KB on i386), which seems
worth the trade off for a better out of the box experience on
high end hardware.
Reviewed by: jhb
MFC after: 21 days
Sponsored by: Netflix
kevans [Fri, 12 Jun 2020 18:13:32 +0000 (18:13 +0000)]
posix_spawn: fix for some custom allocator setups
libc cannot assume that aligned_alloc and free come from jemalloc, or that
any application providing its own malloc and free is actually providing
aligned_alloc.
Switch back to malloc and just make sure we're passing a properly aligned
stack into rfork_thread, as an application perhaps can't reasonably replace
just malloc or just free without headaches.
This unbreaks ksh93 after r361996, which provides malloc/free but no
aligned_alloc.
Reported by: freqlabs
Diagnosed by: Andrew Gierth <andrew_tao173.riddles.org.uk>
X-MFC-With: r361996