mjg [Mon, 13 Jan 2020 14:33:51 +0000 (14:33 +0000)]
ufs: relax an overzealous assert added in r356671
Part of i_flag can persist across a drop to hold count of 0, at which
point the vnode is taken off the lazy list. Then whoever locks and unlocks
the vnode can trip on the assert.
This trips over kyua running a test untarring character devices to ufs.
cy [Mon, 13 Jan 2020 06:55:31 +0000 (06:55 +0000)]
Unbound's config.h is manually maintained, using a ./configure produced
config.h as a guide. In practice contributed software maintains a copy
of config.h within its build directory tree containing its Makefile.
usr.sbin/unbound is the home for its config.h.
mhorne [Mon, 13 Jan 2020 03:39:02 +0000 (03:39 +0000)]
RISC-V: fix global symbol lookups for mpentry with lld
This is a follow up to r356481. In locore.S, before virtual memory is
set up, we should avoid using indirect address lookups through the GOT.
Therefore we need to convert uses of the la instruction to lla, which
always generates an auipc/addi pair of instructions. This conversion was
done for the BSP case, but not the AP case, resulting in a fault
somewhere before mpva and a failure to bring APs online.
Reported by: lwhsu
Reviewed by: lwhsu, jrtc27 (accepted in a comment)
Differential Revision: https://reviews.freebsd.org/D23138
mjg [Mon, 13 Jan 2020 02:39:41 +0000 (02:39 +0000)]
vfs: per-cpu batched requeuing of free vnodes
Constant requeuing adds significant lock contention in certain
workloads. Lessen the problem by batching it.
Per-cpu areas are locked in order to synchronize against UMA freeing
memory.
vnode's v_mflag is converted to short to prevent the struct from
growing.
Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock: 122.38s user 1780.45s system 6242% cpu 30.480 total
patched: 144.84s user 985.90s system 4856% cpu 23.282 total
Reviewed by: jeff
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D22998
mjg [Mon, 13 Jan 2020 02:37:25 +0000 (02:37 +0000)]
vfs: rework vnode list management
The current notion of an active vnode is eliminated.
Vnodes transition between 0<->1 hold counts all the time and the
associated traversal between different lists induces significant
scalability problems in certain workloads.
Introduce a global list containing all allocated vnodes. They get
unlinked only when UMA reclaims memory and are only requeued when
hold count reaches 0.
Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock: 118.55s user 3649.73s system 7479% cpu 50.382 total
patched: 122.38s user 1780.45s system 6242% cpu 30.480 total
Reviewed by: jeff
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D22997
cem [Sun, 12 Jan 2020 20:47:38 +0000 (20:47 +0000)]
getrandom(2): Add Linux GRND_INSECURE API flag
Treat it as a synonym for GRND_NONBLOCK. The reasoning is this:
We have two choices for handling Linux's GRND_INSECURE API flag.
1. We could ignore it completely (like GRND_RANDOM). However, this might
produce the surprising result of GRND_INSECURE requests blocking, when the
Linux API does not block.
2. Alternatively, we could treat GRND_INSECURE requests as requests for
GRND_NONBLOCk. Here, the surprising result for Linux programs is that
invocations with unseeded random(4) will produce EAGAIN, rather than
garbage.
Honoring the flag in the way Linux does seems fraught. If we actually use
the output of a random(4) implementation prior to seeding, we leak some
entropy (in an information theory and also practical sense) from what will
be the initial seed to attackers (or allow attackers to arbitrary DoS
initial seeding, if we don't leak). This seems unacceptable -- it defeats
the purpose of blocking on initial seeding.
Secondary to that concern, before seeding we may have arbitrarily little
entropy collected; producing output from zero or a handful of entropy bits
does not seem particularly useful to userspace.
If userspace can accept garbage, insecure, non-random bytes, they can create
their own insecure garbage with srandom(time(NULL)) or similar. Any program
which would be satisfied with a 3-bit key CTR stream has no need for CSPRNG
bytes. So asking the kernel to produce such an output from the secure
getrandom(2) API seems inane.
For now, we've elected to emulate GRND_INSECURE as an alternative spelling
of GRND_NONBLOCK (2). Consider this API not-quite stable for now. We
guarantee it will never block. But we will attempt to monitor actual port
uptake of this bizarre API and may revise our plans for the unseeded
behavior (prior stable/13 branching).
Approved by: csprng(markm), manpages(bcr)
See also: https://lwn.net/ml/linux-kernel/cover.1577088521.git.luto@kernel.org/
See also: https://lwn.net/ml/linux-kernel/20200107204400.GH3619@mit.edu/
Differential Revision: https://reviews.freebsd.org/D23130
gad [Sun, 12 Jan 2020 20:25:11 +0000 (20:25 +0000)]
Fix the way 'factor' behaves when using OpenSSL to match the description
of how it works when not compiled with OpenSSL.
Also, allow users to specify a hexadecimal number by using a prefix of
'0x'. Before this, users could only specify a hexadecimal value if that
value included a hex digit ('a'-'f') in the value.
tuexen [Sun, 12 Jan 2020 17:52:32 +0000 (17:52 +0000)]
Fix race when accepting TCP connections.
When expanding a SYN-cache entry to a socket/inp a two step approach was
taken:
1) The local address was filled in, then the inp was added to the hash
table.
2) The remote address was filled in and the inp was relocated in the
hash table.
Before the epoch changes, a write lock was held when this happens and
the code looking up entries was holding a corresponding read lock.
Since the read lock is gone away after the introduction of the
epochs, the half populated inp was found during lookup.
This resulted in processing TCP segments in the context of the wrong
TCP connection.
This patch changes the above procedure in a way that the inp is fully
populated before inserted into the hash table.
Thanks to Paul <devgs@ukr.net> for reporting the issue on the net@
mailing list and for testing the patch!
bz [Sun, 12 Jan 2020 17:41:09 +0000 (17:41 +0000)]
nd6_rtr: constantly use __func__ for nd6log()
Over time one or two hard coded function names did not match the
actual function anymore. Consistently use __func__ for nd6log() calls
and re-wrap/re-format some messages for consitency.
delphij [Sun, 12 Jan 2020 06:13:52 +0000 (06:13 +0000)]
Tighten FAT checks and fix off-by-one error in corner case.
sbin/fsck_msdosfs/fat.c:
- readfat:
* Only truncate out-of-range cluster pointers (1, or greater than
NumClusters but smaller than CLUST_RSRVD), as the current cluster
may contain some data. We can't fix reserved cluster pointers at
this pass, because we do no know the potential cluster preceding
it.
* Accept valid cluster for head bitmap. This is a no-op, and mainly
to improve code readability, because the 1 is already handled in
the previous else if block.
- truncate_at: absorbed into checkchain.
- checkchain: save the previous node we have traversed in case that we
have a chain that ends with a special (>= CLUST_RSRVD) cluster, or is
free. In these cases, we need to truncate at the cluster preceding the
current cluster, as the current cluster contains a marker instead of
a next pointer and can not be changed to CLUST_EOF (the else case can
happen if the user answered "no" at some point in readfat()).
- clearchain: correct the iterator for next cluster so that we don't
stop after clearing the first cluster.
- checklost: If checkchain() thinks the chain have no cluster, it
doesn't make sense to reconnect it, so don't bother asking.
kevans [Sun, 12 Jan 2020 04:18:36 +0000 (04:18 +0000)]
Makefile.inc1: push /usr/libexec into the BPATH/TMPPATH
${WORLDTMP}/legacy/usr/libexec will only have libexec/ bits that we've
pushed as bootstrap tools, so this is generally safe to include prior to
PATH. The following are the ramifications of this change:
- BPATH addition gets us at least bootstrap flua in WMAKEENV path for
buildenv, for those earlier systems where it's bootstrapped still
- Reworked the sysent target to just set PATH and let it get worked out in
src.lua.mk or individual sysent makefiles -- this gives us back the
ability to overwrite LUA_CMD and use a different/external lua for these
targets. sysent can also now work cleanly in buildenv.
- tools/build/Makefile will now symlink the host flua into build's host
tools so that the above can work without needing to add the host's
/usr/libexec explicitly into TMPPATH.
kevans [Sun, 12 Jan 2020 04:07:03 +0000 (04:07 +0000)]
regulator: small enhancements to regulator_shutdown
Highlights:
- Exit early if we're not disabling unused regulators; there's no need to
take the regulator topology lock and re-evaluate this every iteration, as
it's not going to change.
- Don't emit a notice that we're shutting down a regulator if it's not
enabled, to reduce noise.
- Mention the outcome of the shutdown, to aide debugging and easily let
developer/user collect list of regulators we actually shutdown to
determine problematic one.
Reviewed by: manu
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D22213
mjg [Sat, 11 Jan 2020 22:58:14 +0000 (22:58 +0000)]
vfs: prealloc vnodes in getnewvnode_reserve
Having a reserved vnode count does not guarantee that getnewvnodes wont
block later. Said blocking partially defeats the purpose of reserving in
the first place.
Preallocate instaed. The only consumer was always passing "1" as count
and never nesting reservations.
r302340, as an attempt to fix the localbus child handling post-rman change,
actually broke child resource allocation, due to typos in
fdt_lbc_reg_decode(). This went unnoticed because there aren't any drivers
currently in tree that use localbus.
delphij [Sat, 11 Jan 2020 17:41:20 +0000 (17:41 +0000)]
Correct off-by-two issue when determining FAT type.
In the code we used NumClusters as the upper (non-inclusive) boundary
of valid cluster number, so the actual value was 2 (CLUST_FIRST) more
than the real number of clusters. This causes a FAT16 media with
65524 clusters be treated as FAT32 and might affect FAT12 media with
4084 clusters as well.
To fix this, we increment NumClusters by CLUST_FIRST after the type
determination.
kib [Sat, 11 Jan 2020 09:18:58 +0000 (09:18 +0000)]
rtld: clean up Makefile.
Move all MD statements into $MACHINE_ARCH/Makefile.inc.
Unconditionally apply version script to rtld, the interpreter is not
functional without it for long time.
Reviewed by: brooks, emaste
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D23083
mckusick [Sat, 11 Jan 2020 03:18:47 +0000 (03:18 +0000)]
When a read error occurs while fetching a directory block to delete
or rename an entry in it, properly reset the link count of the inode
associated with the entry that was to have been changed.
kevans [Fri, 10 Jan 2020 22:20:23 +0000 (22:20 +0000)]
camdd: initialize devs earlier
GCC9 points out that devs may be used initialized after the bailout label;
in-fact, if num_io_opts != 2 then it is. Move the initialization up a little
bit.
emaste [Fri, 10 Jan 2020 22:00:39 +0000 (22:00 +0000)]
src.opts.mk: force KERBEROS_SUPPORT off where KERBEROS forced off
Explicitly setting WITHOUT_KERBEROS implies WITHOUT_KERBEROS_SUPPORT,
but previously other cases that forced KERBEROS off (such as
WITHOUT_CRYPT) did not also set KERBEROS_SUPPORT off. Because the
_SUPPORT dependent options (KERBEROS/KERBEROS_SUPPORT) are processed
before other dependencies (CRYPT/KERBEROS) it's not easy to make this
happen automatically. Instead just explicitly set KERBEROS_SUPPORT
off where we set KERBEROS off.
Reported by: Michael Dexter's Build Option Survey run
glebius [Fri, 10 Jan 2020 21:22:03 +0000 (21:22 +0000)]
Add pfil(9) hook to vtnet(4).
The patch could be simplier, using only the second chunk to
vtnet_rxq_eof(), that passes full mbufs to pfil(9). Packet
filter would m_free() them in case of returning PFIL_DROPPED.
However, we pretend to be a hardware driver, so we first try
to pass a memory buffer via PFIL_MEMPTR feature. This is mostly
done for debugging purposes, so that one can experiment in bhyve
with packet filters utilizing same features as a true driver.
glebius [Fri, 10 Jan 2020 19:32:08 +0000 (19:32 +0000)]
Always multiple vm.pgcache_zone_max to number of CPUs, and rename it
respectively. The tunable controls how big is the size of per-cpu
vm page cache. Previously the value was split for all CPUs in system,
so configuring same value on machines with different count of CPUs
yielded in different cache size available to a particular CPU.
manu [Fri, 10 Jan 2020 18:52:14 +0000 (18:52 +0000)]
twsi: Rework how we handle the i2c messages
We use to handle each message separately in i2c_transfer but that cannot
work with message with NOSTOP as it confuses the controller that we disable
the interrupts and start a new message.
Handle every message in the interrupt handler and fire a new start condition
if the previous message have NOSTOP, the controller understand this as a
repeated start.
This fixes booting on Allwinner A10/A20 platform where before the i2c controller
used to write 0 to the PMIC register that control the regulators as it though that
this was the continuation of the write message.
kevans [Fri, 10 Jan 2020 18:24:17 +0000 (18:24 +0000)]
Set .ORDER for makesyscalls generated files
When either makesyscalls.lua or syscalls.master changes, all of the
${GENERATED} targets are now out-of-date. With make jobs > 1, this means we
will run the makesyscalls script in parallel for the same ABI, generating
the same set of output files.
Prior to r356603 , there is a large window for interlacing output for some
of the generated files that we were generating in-place rather than staging
in a temp dir. After that, we still should't need to run the script more
than once per-ABI as the first invocation should update all of them. Add
.ORDER to do so cleanly.
kevans [Fri, 10 Jan 2020 18:22:14 +0000 (18:22 +0000)]
makesyscalls.lua: generate all files in /tmp, write into place at the end
This makes makesyscalls.lua more parallel-friendly, or at least not as
hostile to the idea. We get into situations where we're running parallel if
we end up with MAKE_JOBS>1 entering any of the sysent targets, since each
output file is recognized a distinct build step that needs to be executed.
Another commit will add some .ORDER to further improve the situation.
kevans [Fri, 10 Jan 2020 14:40:04 +0000 (14:40 +0000)]
inetd: free WITHOUT_INET6_SUPPORT build of warnings
If inetd is compiled without inet6 support, we need to error out on
rpc+inet6 services rather than attempting to call into rpc bits with an
uninitialized netid.
v4bind is only used with INET6 support, so move it under the proper #ifdefs
with v6bind.
Reported by: Pavel Timofeev <timp87 gmail com>
MFC after: 3 days
kevans [Fri, 10 Jan 2020 14:09:59 +0000 (14:09 +0000)]
a10_ahci: grab the target-supply regulator and enable it
This regulator is marked regulator-boot-on, but it will get shutdown if it's
not actually used/enabled by a driver. This should fix sata on the
cubieboard{1,2}.
Reported by: Ray White @ UWaterloo
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D23112
jhibbits [Fri, 10 Jan 2020 03:16:40 +0000 (03:16 +0000)]
powerpc: Mark cpu_feature-based sysctls as MP_SAFE
hw.floatingpoint and hw.altivec are effectively runtime constants (bits from
the cpu_feature bitfield), so don't need Giant, or any locking for that
matter.
jhibbits [Fri, 10 Jan 2020 01:24:49 +0000 (01:24 +0000)]
powerpc/powernv: Un-Giant-ify opal_nvram driver
It may be possible to make this completely lock free, but for now it's using
a statically allocated bounce buffer in the softc, so it needs to be
guarded.
ian [Thu, 9 Jan 2020 22:51:37 +0000 (22:51 +0000)]
Remove scary-looking printf output that happens when you kldload dtrace on
arm. Replace it with a comment block explaining why the function is empty
on 32-bit arm.
markj [Thu, 9 Jan 2020 20:49:26 +0000 (20:49 +0000)]
libc: Fix a few bugs in the xlocale collation code.
- Fix checks for mmap() failures. [1]
- Set the "map" and "maplen" fields of struct xlocale_collate so that
the table destructor actually does something.
- Free an already-mapped collation file before loading a new one into
the global table.
- Harmonize the prototype and definition of __collate_load_tables_l() by
adding the "static" qualifier to the latter.
PR: 243195
Reported by: cem [1]
Reviewed by: cem, yuripv
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23109
kevans [Thu, 9 Jan 2020 19:22:11 +0000 (19:22 +0000)]
dwc_otg: fix fdt attachment for newer bcm2708-usb nodes
The newer versions of RPi FDT flipped the order of the interrupts
specification and added an 'interrupt-names' property for driver aide in
finding the correct interrupt, rather than assuming the positions. Use it if
it's available, or fallback to the old method if there is no interrupt-names
property with a usb value.
This has been tested with both old RPi3B FDT and new RPi3B FDT, USB again
works on the latter.
Reported by: Tom Yan <tom.ty89 gmail com>
MFC after: 3 days
markj [Thu, 9 Jan 2020 19:17:42 +0000 (19:17 +0000)]
UMA: Don't destroy zones after the system shutdown process starts.
Some kernel subsystems, notably ZFS, will destroy UMA zones from a
shutdown eventhandler. This causes the zone to be drained. For slabs
that are mapped into KVA this can be very expensive and so it needlessly
delays the shutdown process.
Add a new state to the "booted" variable, BOOT_SHUTDOWN. Once
kern_reboot() starts invoking shutdown handlers, turn uma_zdestroy()
into a no-op, provided that the zone does not have a custom finalization
routine.
PR: 242427
Reviewed by: jeff, kib, rlibby
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23066
imp [Thu, 9 Jan 2020 18:14:48 +0000 (18:14 +0000)]
Add note to remind me there are three choices for arm32 floating point.
hard use floating point hardware, pass registers to functions in
floating point registers.
softfp use floating point hardware, but pass registers to functions
in integer registers.
soft do floating point calcuations without using floating point
hardware. Pass arguments in integer registers.
FreeBSD 11 and newer assumes hard. 10 and earlier assumed softfp. We have no
real support, at the moment, for soft. It's untested, though, if softfp still
works.
Add a note here since this is a whack-a-doodle combination relative to all other
platforms.
softfp is likely to go away in the future because it was retained for people
using FreeBSD 10 + armv6 needing to transition more slowly from softfp -> hard
than the project. It likely is no longer needed, and may be getting in the
way of people needing 'soft' support.
melifaro [Thu, 9 Jan 2020 17:21:00 +0000 (17:21 +0000)]
Add fibnum, family and vnet pointer to each rib head.
Having metadata such as fibnum or vnet in the struct rib_head
is handy as it eases building functionality in the routing space.
This change is required to properly bring back route redirect support.
markj [Thu, 9 Jan 2020 14:58:41 +0000 (14:58 +0000)]
lagg: Further cleanup of the rr_limit option.
Add an option flag so that arbitrary updates to a lagg's configuration
do not clear sc_stride. Preseve compatibility for old ifconfig
binaries. Update ifconfig to use the new flag and improve the casting
used when parsing the option parameter.
Modify the RR transmit function to avoid locklessly reading sc_stride
twice. Ensure that sc_stride is always 1 or greater.
Reviewed by: hselasky
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23092
emaste [Thu, 9 Jan 2020 14:10:11 +0000 (14:10 +0000)]
revert r356513: libunwind: untested attempt to fix sparc64 build
The patch is untested and is almost certainly insufficient. Per the
author's request, revert until someone with access to sparc64 hardware
can test and report.
kib [Thu, 9 Jan 2020 10:00:24 +0000 (10:00 +0000)]
Resolve relative argv0 for direct exec mode to absolute path for AT_EXECPATH.
We know the binary relative name and can reliably calculate cwd path.
Because realpath(3) was already linked into ld-elf.so.1, reuse it
there to resolve dots and dotdots making the path more canonical.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23014
hselasky [Thu, 9 Jan 2020 09:29:24 +0000 (09:29 +0000)]
Fix a XHCI driver issue with Intel's Gemini Lake SOC.
Do not configure any endpoint twice, but instead keep track of which
endpoints are configured on a per device basis, and use an evaluate
endpoint context command instead. When changing the configuration make
sure all endpoints get deconfigured and the configured endpoint mask
is reset.
This fixes an issue where an endpoint might stop working if there is
an error and the endpoint needs to be reconfigured as a part of the
error recovery mechanism in the FreeBSD USB stack.
kevans [Thu, 9 Jan 2020 04:39:37 +0000 (04:39 +0000)]
md(4): improve documentation of preloading
It's not immediately clear by what mechanism loader(8) will be loading the
preloaded file. Specifically name-drop loader.conf(5) with a pointer to the
module loading section and a description of what the 'name' should look
like, because that certainly isn't clear from the loader.conf(5) standpoint.
The default loader.conf already has a pointer to md(4) where it appears and
the reference to loader.conf in the new version of this manpage should make
it more clear that this is where one should look for information.
Reported by: swills
Reviewed by: swills, manpages (bcr)
With revision by: imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D22844
kevans [Thu, 9 Jan 2020 04:34:42 +0000 (04:34 +0000)]
stand/fdt: Scale blob size better as overlays apply
Currently, our overlay blob will grow to include the size of the complete
overlay blob we're applying. This doesn't scale very well with a lot of
overlays- they tend to include a lot of overhead, and they will generally
only add a fraction of their total size to the blob they're being applied
to.
To combat this, pack the blob as we apply new overlays and keep track of how
many overlays we've applied. Only ubldr has any fixups to be applied after
overlays, so we only need to re-pad the blob in ubldr. Presumably the
allocation won't fail since we just did a lot worse in trying to apply
overlays and succeeded.
I have no intention of removing the padding in make_dtb.sh. There might be
an argument to be had over whether it should be configurable, since ubldr
*is* the only loader that actually has fixups to be applied and we can do
this at runtime, but I'm not too concerned about this.
This diff has been sitting in Phabricator for a year and a half, but I've
decided to flush it as it does make sure that we're scaling the blob
appropriately and leave room at the end for fixups in case of some freak
circumstance where applying overlays leaves us with a blob of insufficient
size.
Reviewed by: gonzo (a long time ago)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D14133
kevans [Thu, 9 Jan 2020 04:03:17 +0000 (04:03 +0000)]
shmfd: posix_fallocate(2): only take rangelock for section we need
Other mechanisms that resize the shmfd grab a write lock from 0 to OFF_MAX
for safety, so we still get proper synchronization of shmfd->shm_size in
effect. There's no need to block readers/writers of earlier segments when
we're just reserving more space, so narrow the scope -- it would likely be
safe to narrow it completely to just the section of the range that extends
beyond our current size, but this likely isn't worth it since the size isn't
stable until the writelock is granted the first time.
kevans [Thu, 9 Jan 2020 03:52:50 +0000 (03:52 +0000)]
if_vmove: return proper error status
if_vmove can fail if it lost a race and the vnet's already been moved. The
callers (and their callers) can generally cope with this, but right now
success is assumed. Plumb out the ENOENT from if_detach_internal if it
happens so that the error's properly reported to userland.