bdrewery [Sat, 15 Jun 2019 17:08:02 +0000 (17:08 +0000)]
Don't force OBJS_DEPEND_GUESS headers onto all objects.
This is in the case of not having any .depend.foo.o yet. Don't force add *.h
as a dependency for those. They are built in beforebuild already when in
SRCS/DPSRCS.
This change allows custom rules, like in bin/sh/Makefile for mksyntax, to not
have cyclic dependency problems when connected to the .depend.* handling.
This is purposely not copied to sys/conf/kern.post.mk as it handles
generating headers slightly differently.
ian [Sat, 15 Jun 2019 16:56:00 +0000 (16:56 +0000)]
Don't call pwmbus_attach_bus(), because it may not be present if this
driver is compiled into the kernel but pwmbus will be loaded as a module
when needed (and because of that, pwmbus_attach_bus() is going away in
the near future). Instead, just directly do what that function did:
register the fdt xfef handle, and attach the pwmbus.
ian [Sat, 15 Jun 2019 16:36:29 +0000 (16:36 +0000)]
In detach(), check for failure of bus_generic_detach(), only release
resources if they got allocated (because detach() gets called from attach()
to handle various failures), and delete the pwmbus child if it got created.
marius [Sat, 15 Jun 2019 11:07:41 +0000 (11:07 +0000)]
- Replace unused and only ever written to members of public iflib(9)
structs with placeholders (in the latter case, IFLIB_MAX_TX_BYTES
etc. are also only ever used for these write-only members if at all,
so both these macros and members can just go). Using these spares
may render it possible to merge certain iflib(9) fixes to stable/12.
Otherwise, changes extending struct if_irq or struct if_shared_ctx
in any way would break KBI as instances of these are allocated by
the driver front-ends (by contrast, struct if_pkt_info as well as
struct if_softc_ctx instances are provided by iflib(9) and, thus,
may grow at least at the end without breaking KBI).
- Make the pvi_name in struct pci_vendor_info const char * as device
identifiers in hardware lookup tables aren't to be expected to ever
change at runtime.
- Similarly, make the pci_vendor_info_t of struct if_shared_ctx which
is used to point to the struct pci_vendor_info arrays provided by
the driver front-ends const.
- Remove the ETH_ADDR_LEN macro from iflib.h; this was duplicating
ETHER_ADDR_LEN of <net/ethernet.h> with iflib(9) actually only
consuming the latter macro.
- Make the name argument of iflib_io_tqg_attach(9) const, matching
the taskqgroup_attach_cpu(9) this function wraps as well as e. g.
iflib_config_gtask_init(9).
- Remove the orphaned iflib_qset_lock_get() prototype.
- Remove some extraneous empty lines.
julian [Sat, 15 Jun 2019 00:47:39 +0000 (00:47 +0000)]
Lightly hide the 'var' inside the macros to read the arm special registers.
I just happenned to have 3rd party code using 'var' as the output variable
which drew my attention to this. variables defined inside macros should be
prefixed to avoid getting shadowed varable wanrings from clang.
asomers [Fri, 14 Jun 2019 20:35:37 +0000 (20:35 +0000)]
open(2): fix the description of O_FSYNC
The man page claims that with O_FSYNC (aka O_SYNC) the kernel will not cache
written data. However, that's not true. Nor does POSIX require it.
Perhaps it was true when that section of the man page was written in r69336
(I haven't checked). But it's not true now. Now the effect is simply that
writes are sent to disk immediately and synchronously, but they're still
cached.
See also: https://pubs.opengroup.org/onlinepubs/9699919799/
See also: ffs_write in sys/ufs/ffs/ffs_vnops.c
Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20641
mav [Fri, 14 Jun 2019 20:04:28 +0000 (20:04 +0000)]
Minimize aggsum_compare(&arc_size, arc_c) calls.
For busy ARC situation when arc_size close to arc_c is desired. But
then it is quite likely that aggsum_compare(&arc_size, arc_c) will need
to flush per-CPU buckets to find exact comparison result. Doing that
often in a hot path penalizes whole idea of aggsum usage there, since it
replaces few simple atomic additions with dozens of lock acquisitions.
Replacing aggsum_compare() with aggsum_upper_bound() in code increasing
arc_p when ARC is growing (arc_size < arc_c) according to PMC profiles
allows to save ~5% of CPU time in aggsum code during sequential write
to 12 ZVOLs with 16KB block size on large dual-socket system.
I suppose there some minor arc_p behavior change due to lower precision
of the new code, but I don't think it is a big deal, since it should
affect only very small window in time (aggsum buckets are flushed every
second) and in ARC size (buckets are limited to 10 average ARC blocks
per CPU).
mav [Fri, 14 Jun 2019 19:57:32 +0000 (19:57 +0000)]
Alike to ZoL disable metaslab allocation tracing code.
It is too generous to collect in production debug traces that can only
be read with kernel debugger. Illumos includes special code in their
mdb debugger to read it, we don't.
alc [Fri, 14 Jun 2019 04:01:08 +0000 (04:01 +0000)]
Change the arm64 pmap so that updates to the global count of wired pages are
not performed directly by the pmap. Instead, they are performed by
vm_page_free_pages_toq(). (This is the same approach that we use on x86.)
dougm [Fri, 14 Jun 2019 03:15:54 +0000 (03:15 +0000)]
Avoid using the prev field of vm_map_entry_t in two functions that
iterate over consecutive vm_map entries, and that can easily just
'remember' the prev value instead of looking it up.
mav [Fri, 14 Jun 2019 01:09:10 +0000 (01:09 +0000)]
Update td_runtime of running thread on each statclock().
Normally td_runtime is updated on context switch, but there are some kernel
threads that due to high absolute priority may run for many seconds without
context switches (yes, that is bad, but that is true), which means their
td_runtime was not updated all that time, that made them invisible for top
other then as some general CPU usage.
dougm [Thu, 13 Jun 2019 20:09:07 +0000 (20:09 +0000)]
Create a function for creating objects to back map entries, and one
for giving cred to a map entry backed by an object, and use them
instead of the code duplicated inline now.
vmaffione [Thu, 13 Jun 2019 17:39:32 +0000 (17:39 +0000)]
bhyve: move common code to net_utils.c
Both virtio_net and e82545 network frontends have code to validate and
generate MAC addresses. These functionalities are replicated in the two
files, so we move them in a separate compilation unit.
imp [Thu, 13 Jun 2019 05:19:42 +0000 (05:19 +0000)]
Don't print the request we may be aborting in ciss_notify_abort as
part of ciss_detach. It's a left-over debug that isn't needed and also
discloses a kernel address. Only root could provoke as part of a
devctl or kldunload.
mav [Thu, 13 Jun 2019 01:21:32 +0000 (01:21 +0000)]
Move write aggregation memory copy out of vq_lock.
Memory copy is too heavy operation to do under the congested lock.
Moving it out reduces congestion by many times to almost invisible.
Since the original zio removed from the queue, and the child zio is
not executed yet, I don't see why would the copy need protection.
My guess it just remained like this from the time when lock was not
dropped here, which was added later to fix lock ordering issue.
Multi-threaded sequential write tests with both HDD and SSD pools
with ZVOL block sizes of 4KB, 16KB, 64KB and 128KB all show major
reduction of lock congestion, saving from 15% to 35% of CPU time
and increasing throughput from 10% to 40%.
dim [Wed, 12 Jun 2019 21:10:37 +0000 (21:10 +0000)]
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt, libc++,
libunwind and openmp to the upstream release_80 branch r363030
(effectively, 8.0.1 rc2). The 8.0.1 release should follow this within a
week or so.
alc [Wed, 12 Jun 2019 20:38:49 +0000 (20:38 +0000)]
Change pmap_demote_l2_locked() so that it removes the superpage mapping on a
demotion failure. Otherwise, some callers to pmap_demote_l2_locked(), such
as pmap_protect(), may leave an incorrect mapping in place on a demotion
failure.
Change pmap_demote_l2_locked() so that it handles addresses that are not
superpage aligned. Some callers to pmap_demote_l2_locked(), such as
pmap_protect(), may not pass a superpage aligned address.
Change pmap_enter_l2() so that it correctly calls vm_page_free_pages_toq().
The arm64 pmap is updating the count of wired pages when freeing page table
pages, so pmap_enter_l2() should pass false to vm_page_free_pages_toq().
shurd [Wed, 12 Jun 2019 18:07:04 +0000 (18:07 +0000)]
Some devices take undesired actions when RTS and DTR are
asserted. Some development boards for example will reset on DTR,
and some radio interfaces will transmit on RTS.
This patch allows "stty -f /dev/ttyu9.init -rtsdtr" to prevent
RTS and DTR from being asserted on open(), allowing these devices
to be used without problems.
jtl [Wed, 12 Jun 2019 16:06:31 +0000 (16:06 +0000)]
The current IPMI KCS code is waiting 100us for all transitions (roughly
between each byte either sent or received). However, most transitions
actually complete in 2-3 microseconds.
By polling the status register with a delay of 4us with exponential
backoff, the performance of most IPMI operations is significantly
improved:
- A BMC update on a Supermicro x9 or x11 motherboard goes from ~1 hour
to ~6-8 minutes.
- An ipmitool sensor list time improves by a factor of 4.
Testing showed no significant improvements on a modern server by using
a lower delay.
The changes should also generally reduce the total amount of CPU or
I/O bandwidth used for a given IPMI operation.
ian [Wed, 12 Jun 2019 16:05:20 +0000 (16:05 +0000)]
Don't attempt to include hwpmc support for armv6, we're missing some of the
necessary support functions in cpu-v6.h, and it may be that the only armv6
platform we support (RPi, the bcm2835 SOC) is incapable of supporting hwpmc.
bdragon [Wed, 12 Jun 2019 15:58:11 +0000 (15:58 +0000)]
Fix PPC970 boot after r348783
r348783 changed the behavior of the kernel mappings and broke booting on G5.
- Split the kernel mapping logic out so that the case where we are
running from the wrong memory space is handled using identity
mappings, and the case where we are not using a DMAP is handled by
forcibly mapping the kernel into the dmap range as intended by
r348783.
mm [Wed, 12 Jun 2019 13:34:12 +0000 (13:34 +0000)]
MFV r348971,r348977:
Sync libarchive with vendor.
Relevant vendor changes:
- check_symlinks_fsobj() without chdir() and fchdir()
- bsdtar.1 manpage fixes
- patches from OpenBSD to libarchive_fe/passphrase.c
- version bumped to 3.4.0
cy [Wed, 12 Jun 2019 11:06:58 +0000 (11:06 +0000)]
Resolve IPv6 checksum errors with stateful inspection. According to
PR/203585 this appears to have been broken by r235959, which predates
the ipfilter 5.1.2 import into FreeBSD.
The IPv6 checksum calculation is incorrect. To resolve this we call
in6_cksum() to do the the heavy lifting for us, through a new function
ipf_pcksum6(). Should we need to revisit this area again, a DTrace probe
is added to aid with future debugging.
cy [Wed, 12 Jun 2019 11:06:54 +0000 (11:06 +0000)]
Register pfil hooks when VNET != vnet0. r302298, which virtualized ipf,
assumed the pfil hook registration performed in ipf_modload() would take
care of this. However ipf_modload() is only called when the ipl kld is
loaded or when ipfilter is first called when it is statically linked
into the kernel at build time.
Prior to this, even though r302298 has been in the tree for a while, it
has never been used. So, r302298 in reality begins now.
manu [Wed, 12 Jun 2019 09:18:23 +0000 (09:18 +0000)]
pkgbase: Add some tags to files installed in distribution target
Add the MK_MAIL dependant file to the runtime package as well as the
MK_KERBEROS ones the empty locate database, the FreeBSD copyright file
and the GENERIC.hints.
Tag the unbound link from /etc to /var to belong in the unbound package.
bdrewery [Wed, 12 Jun 2019 00:03:00 +0000 (00:03 +0000)]
Stop using .OODATE for extracting firmware.
This fixes META_MODE rebuilding since it assumes that it this is
a non-consistent build command. These are always unencoded consistently
though and do not need to use the .OODATE/$? mechanism.
jhb [Tue, 11 Jun 2019 23:00:55 +0000 (23:00 +0000)]
Make the warning intervals for deprecated crypto algorithms tunable.
New sysctl/tunables can now set the interval (in seconds) between
rate-limited crypto warnings. The new sysctls are:
- kern.cryptodev_warn_interval for /dev/crypto
- net.inet.ipsec.crypto_warn_interval for IPsec
- kern.kgssapi_warn_interval for KGSSAPI
dougm [Tue, 11 Jun 2019 22:41:39 +0000 (22:41 +0000)]
To test to see if a free space is big enough compare the required
length to the difference of the two offsets that define the gap, to
avoid overflow, rather that adding the length to an offset and
comparing that to another offset.
This addresses an overflow issue reported by Peter Holm on i386.
vmaffione [Tue, 11 Jun 2019 15:52:41 +0000 (15:52 +0000)]
bhyve: virtio: introduce vq_kick_enable() and vq_kick_disable()
The VirtIO standard supports two schemes for notification suppression:
a notification enable bit and a more sophisticated one (event_idx) that
also supports delayed notifications. Currently bhyve fully supports
only the first scheme. This patch hides the notification suppression
internals by means of two inline routines, vq_kick_enable() and
vq_kick_disable(), and makes the code more readable.
Moreover, further improve readability by replacing the call to mb()
with a call to atomic_thread_fence_seq_cst(), which is already used
in virtio.c
luporl [Tue, 11 Jun 2019 11:23:15 +0000 (11:23 +0000)]
[PPC] Fix build error when POWERNV is disabled
When building a kernel supporting PSERIES but not POWERNV,
the compiler would complain about an error variable being
possibly used before being initialized.
In practice, however, this should never happen. In any case, it
is now initialized to an error value.
luporl [Tue, 11 Jun 2019 11:16:41 +0000 (11:16 +0000)]
[PPC64] Fix ofw_initrd
Before this change, OFW initrd (as md) handling code was simulating an ofwbus
device. But as there isn't really a Device Tree (DT) node representing OFW
initrd (it is specified in 2 properties under /chosen), its driver was in fact
stealing other driver's DT node. This was noticed after MD_ROOT_MEM became
default and QEMU's USB keyboard stopped working under VNC.
This change consists in simplifying the process of detection and mapping of
initrd memory, turning it into a simple startup step, instead of trying to
simulate a device.
mhorne [Tue, 11 Jun 2019 00:55:54 +0000 (00:55 +0000)]
RISC-V: expose extension bits in AT_HWCAP
AT_HWCAP is a field in the elf auxiliary vector meant to describe
cpu-specific hardware features. For RISC-V we want to use this to
indicate the presence of any standard extensions supported by the CPU.
This allows userland applications to query the system for supported
extensions using elf_aux_info(3).
Support for an extension is indicated by the presence of its
corresponding bit in AT_HWCAP -- e.g. systems supporting the 'c'
extension (compressed instructions) will have the second bit set.
Extensions advertised through AT_HWCAP are only those that are supported
by all harts in the system.
bz [Mon, 10 Jun 2019 23:25:40 +0000 (23:25 +0000)]
A bit of code hygiene (no functional changes).
Hide unused code under #ifdef notyet (in one case the only caller is under
that same ifdef), or if it is arm (not arm64) specific code under the
__arm__ ifdef to not yield -Wunused-function warnings during the arm64
kernel compile.
dougm [Mon, 10 Jun 2019 21:34:07 +0000 (21:34 +0000)]
The computations of vm_map_splay_split and vm_map_splay_merge touch both
children of every entry on the search path as part of updating values of
the max_free field. By comparing the max_free values of an entry and its
child on the search path, the code can avoid accessing the child off the
path in cases where the max_free value decreases along the path.
Specifically, this patch changes splay_split so that the max_free field
of every entry on the search path is replaced, temporarily, by the
max_free field from its child not on the search path or, if the child
in that direction is NULL, then a difference between start and end
values of two pointers already available in the split code, without
following any next or prev pointers. However, to find that max_free
value does not require looking toward that other child if either the
child on the search path has a lower max_free value, or the current max_free
value is zero, because in either case we know that the value of max_free for
the other child is the value we already have. So, the changes to
vm_entry_splay_split make sure that we know all the off-search-path entries
we will need to complete the splay, without looking at all of them. There is
an exception at the bottom of the search path where we cannot rely on the
max_free value in the direction of the NULL pointer that ends the search,
because of the behavior of entry-clipping code.
The corresponding change to vm_splay_entry_merge makes it simpler, since it's
just reversing pointers and updating running maxima.
In a test intended to exercise vigorously the vm_map implementation, the
effect of this change was to reduce the data cache miss rate by 10-14% and
the running time by 5-7%.
dougm [Mon, 10 Jun 2019 21:26:14 +0000 (21:26 +0000)]
Change the check for 'size' wrapping around to zero in kern_mmap to account
for both the lower and upper bound modifications. Change the error returned
to ENOMEM. Rename the parameter size to len and make size a local variable
that stores the value of len after it has been modified.
This addresses concerns expressed by Bruce Evans after r348843.
jhb [Mon, 10 Jun 2019 19:26:57 +0000 (19:26 +0000)]
Add warnings to /dev/crypto for deprecated algorithms.
These algorithms are deprecated algorithms that will have no in-kernel
consumers in FreeBSD 13. Specifically, deprecate the following
algorithms:
- ARC4
- Blowfish
- CAST128
- DES
- 3DES
- MD5-HMAC
- Skipjack
jhb [Mon, 10 Jun 2019 19:22:36 +0000 (19:22 +0000)]
Add warnings for Kerberos GSS algorithms deprecated in RFCs 6649 and 8429.
All of these algorithms are explicitly marked SHOULD NOT in one of these
RFCs.
Specifically, RFC 6649 deprecates all algorithms using DES as well as
the "export-friendly" variant of RC4. RFC 8429 deprecates Triple DES
and the remaining RC4 algorithms.
jhb [Mon, 10 Jun 2019 19:01:54 +0000 (19:01 +0000)]
Remove an overly-aggressive assertion.
While it is true that the new vmspace passed to vmspace_switch_aio
will always have a valid reference due to the AIO job or the extra
reference on the original vmspace in the worker thread, it is not true
that the old vmspace being switched away from will have more than one
reference.
Specifically, when a process with queued AIO jobs exits, the exit hook
in aio_proc_rundown will only ensure that all of the AIO jobs have
completed or been cancelled. However, the last AIO job might have
completed and woken up the exiting process before the worker thread
servicing that job has switched back to its original vmspace. In that
case, the process might finish exiting dropping its reference to the
vmspace before the worker thread resulting in the worker thread
dropping the last reference.
zeising [Mon, 10 Jun 2019 18:19:49 +0000 (18:19 +0000)]
psm(4): Enable touchpads and trackpads by default
Enable synaptics and elantech touchpads, as well as IBM/Lenovo TrackPoints
by default, instead of having users find and toggle a loader tunable.
This makes things like two finger scroll and other modern features work out
of the box with X. By enabling these settings by default, we get a better
desktop experience in X, since xserver and evdev can make use of the more
advanced synaptics and elantech features.
bz [Mon, 10 Jun 2019 14:31:18 +0000 (14:31 +0000)]
Enhance the comment ieee80211_add_channel() to avoid a
misunderstanding that the function does not work additive
when repeatedly called for diffferent bands.
Reviewed by: avos (a few months ago)
MFC after: 2 weeks
jhibbits [Mon, 10 Jun 2019 03:24:38 +0000 (03:24 +0000)]
powernv: Port HMI handler to use the message framework
When an HMI occurs a message event also gets created with the details of the
exception. Hook into the messaging framework to retrieve the HMI message.
Nothing is done with it yet, except to panic on unhandled exception.
jhibbits [Mon, 10 Jun 2019 03:16:55 +0000 (03:16 +0000)]
powerpc/powernv: Reduce the scope of the sensor guarding mutex
vmem_xalloc() cannot be called while holding a nonblocking mutex, warned
by WITNESS. The lock may not be necessary in general, but it avoids
superfluous concurrent OPAL calls for the same sensor.
dougm [Mon, 10 Jun 2019 03:07:10 +0000 (03:07 +0000)]
There are times when a len==0 parameter to mmap is okay. But on a
32-bit machine, a len parameter just a few bytes short of 4G, rounded
up to a page boundary and hitting zero then, is not okay. Return
failure in that case.
oshogbo [Sun, 9 Jun 2019 22:55:21 +0000 (22:55 +0000)]
tail: fix the checks if the file was rotated
The freopen(3) was replaced with fileargs_open(3) and fclose(3).
In the following function, we skip if the stream is standard in, so it is
safe to do so.
This also requires us to change the logic first to open the file and then
check its status. The stat(2) is disallowed in capability mode.
This commit unbrakes the -F option.
The bug was introduced in the r348708.
mhorne [Sun, 9 Jun 2019 15:50:35 +0000 (15:50 +0000)]
RISC-V: Clean up some GENERIC options
Some of the config options that are disabled by default seem to be only
for historical reasons. Enable those that appear to no longer be
problematic. This includes WITH_CTF, STACK, GEOM_RAID, and re-enabling
blacklisted kernel modules.
mhorne [Sun, 9 Jun 2019 15:48:36 +0000 (15:48 +0000)]
RISC-V: Announce real and available memory at boot
Most architectures print their total (real) and available memory during
boot. Properly initialize the realmem global and print these messages.
Also print the physical memory chunks (behind a bootverbose flag).
mhorne [Sun, 9 Jun 2019 15:43:38 +0000 (15:43 +0000)]
Fix global pointer relaxations in the RISC-V kernel
The gp register is intended to used by the linker as another means of
performing relaxations, and should point to the small data section (.sdata).
Currently gp is being used as the pcpu pointer within the kernel, but the more
appropriate choice for this is the tp register, which is unused.
Swap existing usage of gp with tp within the kernel, and set up gp properly
at boot with the value of __global_pointer$ for all harts.
Additionally, remove some cases of accessing tp from the PCB, as it is not
part of the per-thread state. The user's tp and gp should be tracked only
through the trapframe.
vmaffione [Sun, 9 Jun 2019 12:41:21 +0000 (12:41 +0000)]
bhyve: vtnet: simplify thread synchronization
On vtnet device reset it is necessary to wait for threads to stop TX and
RX processing. However, the rx_in_progress variable (used for to wait for
RX processing to stop) is actually useless, and can be removed. Acquiring
and releasing the RX lock is enough to synchronize correctly. Moreover,
it is possible to reset the device while holding both TX and RX locks, so
that the "resetting" variable becomes unnecessary for the RX thread, and
can be protected by the TX lock (instead of being volatile).