vt(4): Resume vt_timer() in vtterm_post_input() only
There is no need to try to resume it after each smaller operations
(putchar, cursor_position, copy, fill).
The resume function already checks if the timer is armed before doing
anything, but it uses an atomic cmpset which is expensive. And resuming
the timer at the end of input processing is enough.
While here, we also skip timer resume if the input is for another
windows than the currently displayed one. I.e. if `ttyv0` is currently
displayed, any changes to `ttyv1` shouldn't resume the timer (which
would refresh `ttyv0`).
By doing the same benchmark as r333669, I get:
* vt(4), before r333669: 1500 ms
* vt(4), with this patch: 760 ms
* syscons(4): 700 ms
The situation was even worse when the vtterm_copy() and vtterm_fill()
callbacks were involved.
The new callbacks are:
* struct terminal_class->tc_pre_input()
* struct terminal_class->tc_post_input()
They are called in teken_input(), surrounding the while() loop.
The goal is to improve input processing speed of vt(4). As a benchmark,
here is the time taken to write a text file of 360 000 lines (26 MiB) on
`ttyv0`:
* vt(4), unmodified: 1500 ms
* vt(4), with this patch: 1200 ms
* syscons(4): 700 ms
This is on a Haswell laptop with a GENERIC-NODEBUG kernel.
At the same time, the locking is changed in the vt_flush() function
which is responsible to draw the text on screen. So instead of
(indirectly) using VTBUF_LOCK() just to read and reset the dirty area
of the internal buffer, the lock is held for about the entire function,
including the drawing part.
The change is mostly visible while content is scrolling fast: before,
lines could appear garbled while scrolling because the internal buffer
was accessed without locks (once the scrolling was finished, the output
was correct). Now, the scrolling appears correct.
In the end, the locking model is closer to what syscons(4) does.
Andriy Gapon [Wed, 16 May 2018 06:52:08 +0000 (06:52 +0000)]
followup to r332730/r332752: set kdb_why to "trap" for fatal traps
This change updates arm, arm64 and mips achitectures. Additionally, it
removes redundant checks for kdb_active where it already results in
kdb_reenter() and adds kdb_reenter() calls where they were missing.
Some architectures check the return value of kdb_trap(), but some don't.
I haven't changed any of that.
Some trap handling routines have a return code. I am not sure if I
provided correct ones for returns after kdb_reenter(). kdb_reenter
should never return unless kdb_jmpbufp is NULL for some reason.
Only compile tested for all affected architectures. There can be bugs
resulting from my poor understanding of architecture specific details.
Ed Maste [Wed, 16 May 2018 02:15:18 +0000 (02:15 +0000)]
Clarify that boot_mute / boot -m mutes kernel console only
Perhaps RB_MUTE could mute user startup (rc) output as well, but right
now it mutes only kernel console output, so make the documentation match
reality.
Ed Maste [Wed, 16 May 2018 01:55:52 +0000 (01:55 +0000)]
intel-ucode-split: list platform ids based on processor_flags
The Intel CPU "Platform Id" is a 3-bit integer reported by a given MSR.
Intel microcode updates have an 8-bit field to indicate Platform Id
compatibility - one bit in the mask for each of the possible Platform Id
values. To simplify interpretation, report the Platform Id mask also as
a list.
Ed Maste [Wed, 16 May 2018 01:33:48 +0000 (01:33 +0000)]
Force WITHOUT_FREEBSD_UPDATE when WITHOUT_PORTSNAP is set
freebsd-update depends on phttpget from portsnap. We could move phttpget
out of portsnap and build it as long as WITHOUT_FREEBSD_UPDATE and
WITHOUT_PORTSNAP are not both set, but for now just make the dependency
explicit.
PR: 228220
Reported by: Dries Michiels
Sponsored by: The FreeBSD Foundation
Andrew Gallatin [Tue, 15 May 2018 23:55:38 +0000 (23:55 +0000)]
Unhook DEBUG_BUFRING from INVARIANTS
Some of the DEBUG_BUFRING checks are racy, and can lead to
spurious assertions when run under high load. Unhook these
from INVARIANTS until the author can fix or remove them.
Warner Losh [Tue, 15 May 2018 22:22:10 +0000 (22:22 +0000)]
Hold the reference count until the CCB is released
When a disk disappears and the periph is invalidated, any I/Os that
are pending with the controller can cause a crash when they
complete. Move to holding the softc reference count taken in dastart()
until the I/O is complete rather than only until xpt_action()
returns. (This approach was suggested by Ken Merry.) This extends
the method used in da to ada, nda, and mda.
Ed Maste [Tue, 15 May 2018 21:51:29 +0000 (21:51 +0000)]
Add a tool to split Intel microcode into one file per Platform Id
Intel now releases microcode updates in files named after
<family>-<model>-<stepping>. In some cases a single file may include
microcode for multiple Platform Ids for the same family, model, and
stepping. Our current microcode update tooling (/usr/sbin/cpucontrol)
only processes the first microcode update in the file.
This tool splits combined files into individual files with one microcode
update each, named as
<family>-<model>-<stepping>.<platform_id_mask>.
Adding this to tools/ for experimentation and testing. In the future
we'll want to have cpucontrol or other tooling work directly with the
Intel-provided microcode files.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D15433
Warner Losh [Tue, 15 May 2018 21:25:35 +0000 (21:25 +0000)]
Hold the reference count until the CCB is released
When a disk disappears and the periph is invalidated, any I/Os that
are pending with the controller can cause a crash when they
complete. Move to holding the softc reference count taken in dastart()
until the I/O is complete rather than only until xpt_action()
returns. (This approach was suggested by Ken Merry.)
Marius Strobl [Tue, 15 May 2018 21:15:09 +0000 (21:15 +0000)]
- If present, take advantage of the R/W cache of eMMC revision 1.5 and
later devices. These caches work akin to the ones found in HDDs/SSDs
that ada(4)/da(4) also enable if existent, but likewise increase the
likelihood of data loss in case of a sudden power outage etc. On the
other hand, write performance is up to twice as high for e. g. 1 GiB
files depending on the actual chip and transfer mode employed.
For maximum data integrity, the usage of eMMC caches can be disabled
via the hw.mmcsd.cache tunable.
- Get rid of the NOP mmcsd_open().
Rick Macklem [Tue, 15 May 2018 20:28:50 +0000 (20:28 +0000)]
End grace for the NFSv4 server if all mounts do ReclaimComplete.
The NFSv4 protocol requires that the server only allow reclaim of state
and not issue any new open/lock state for a grace period after booting.
The NFSv4.0 protocol required this grace period to be greater than the
lease duration (over 2minutes). For NFSv4.1, the client tells the server
that it has done reclaiming state by doing a ReclaimComplete operation.
If all NFSv4 clients are NFSv4.1, the grace period can end once all the
clients have done ReclaimComplete, shortening the time period considerably.
This patch does this. If there are any NFSv4.0 mounts, the grace period
will still be over 2minutes.
This change is only an optimization and does not affect correct operation.
Antoine Brodin [Tue, 15 May 2018 17:20:58 +0000 (17:20 +0000)]
vmmdev: return EFAULT when trying to read beyond VM system memory max address
Currently, when using dd(1) to take a VM memory image, the capture never ends,
reading zeroes when it's beyond VM system memory max address.
Return EFAULT when trying to read beyond VM system memory max address.
Andriy Gapon [Tue, 15 May 2018 16:56:30 +0000 (16:56 +0000)]
calibrate lapic timer in native_lapic_setup
The idea is to calibrate the LAPIC timer just once and only on boot,
given that [at present] the timer constants are global and shared
between all processors.
My primary motivation is to fix a panic that can happen when dynamically
switching to lapic timer. The panic is caused by a recursion on
et_hw_mtx when printing the calibration results to console. See the
review for the details of the panic.
Also, the code should become slightly simpler and easier to read. The
previous code was racy too. Multiple processors could start calibrating
the global constants concurrently, although that seems to have been
benign.
Stephen Hurd [Tue, 15 May 2018 16:54:41 +0000 (16:54 +0000)]
Check that ifma_protospec != NULL in inm_lookup
If ifma_protospec is NULL when inm_lookup() is called, there
is a dereference in a NULL struct pointer. This ensures that struct is
not NULL before comparing the address.
Andrew Turner [Tue, 15 May 2018 16:44:35 +0000 (16:44 +0000)]
Increase the number of pages we allocate in the arm64 early boot. We are
already close to the limit so increasing the kernel size may cause it to
fail to boot when it runs past the end of allocated memory.
Brooks Davis [Tue, 15 May 2018 16:24:58 +0000 (16:24 +0000)]
Allow freebsd32 __sysctl(2) to return ENOMEM.
This is required by programs like sockstat that read variably sized
sysctls such as kern.file. The normal path has no such restriction and
the restriction was added without comment along with initial support for
freebsd32 in 2002 (r100384).
Sean Bruno [Tue, 15 May 2018 13:30:59 +0000 (13:30 +0000)]
igb(4):
I210 restore functionality if pxeboot rom is enabled on this device.
r333345 attempted to determine if this code was needed or it was some kind
of work around for a problem. Turns out, its definitely a work around for
hardware locking and synchronization that manifests itself if the option
Rom is enabled and is selected as a boot device (there was a PXE attempt).
Andriy Gapon [Tue, 15 May 2018 13:27:29 +0000 (13:27 +0000)]
Fix 'zpool create -t <tempname>'
Creating a pool with a temporary name fails when we also specify custom
dataset properties: this is because we mistakenly call
zfs_set_prop_nvlist() on the "real" pool name which, as expected,
cannot be found because the SPA is present in the namespace with the
temporary name.
Fix this by specifying the correct pool name when setting the dataset
properties.
Marcelo Araujo [Tue, 15 May 2018 05:55:29 +0000 (05:55 +0000)]
vq_getchain() can return -1 if some descriptor(s) are invalid and prints
a diagnostic message. So we do a sanity checking on the return value
of vq_getchain().
Navdeep Parhar [Tue, 15 May 2018 04:24:38 +0000 (04:24 +0000)]
cxgbe(4): Filtering related features and fixes.
- Driver support for hardware NAT.
- Driver support for swapmac action.
- Validate a request to create a hashfilter against the filter mask.
- Add a hashfilter config file for T5.
Marius Strobl [Mon, 14 May 2018 21:46:06 +0000 (21:46 +0000)]
The broken DDR52 support of Intel Bay Trail eMMC controllers rumored
in the commit log of r321385 has been confirmed via the public VLI54
erratum. Thus, stop advertising DDR52 for these controllers.
Note that this change should hardly make a difference in practice as
eMMC chips from the same era as these SoCs most likely support HS200
at least, probably even up to HS400ES.
Stephen Hurd [Mon, 14 May 2018 20:06:49 +0000 (20:06 +0000)]
Replace rmlock with epoch in lagg
Use the new epoch based reclamation API. Now the hot paths will not
block at all, and the sx lock is used for the softc data. This fixes LORs
reported where the rwlock was obtained when the sxlock was held.
John Baldwin [Mon, 14 May 2018 17:27:53 +0000 (17:27 +0000)]
Make the common interrupt entry point labels local labels.
Kernel debuggers depend on symbol names to find stack frames with a
trapframe rather than a normal stack frame. The labels used for the
shared interrupt entry point for the PTI and non-PTI cases did not
match the existing patterns confusing debuggers. Add the '.L' prefix
to mark these symbols as local so they are not visible in the symbol
table.
Nathan Whitehorn [Mon, 14 May 2018 04:00:52 +0000 (04:00 +0000)]
Final fix for alignment issues with the page table first patched with
r333273 and partially reverted with r333594.
Older CPUs implement addition of offsets into the page table by a
bitwise OR rather than actual addition, which only works if the table is
aligned at a multiple of its own size (they also require it to be aligned
at a multiple of 256KB). Newer ones do not have that requirement, but it
hardly matters to enforce it anyway.
The original code was failing on newer systems with huge amounts of RAM
(> 512 GB), in which the page table was 4 GB in size. Because the
bootstrap memory allocator took its alignment parameter as an int, this
turned into a 0, removing any alignment constraint at all and making
the MMU fail. The first round of this patch (r333273) fixed this case by
aligning it at 256 KB, which broke older CPUs. Fix this instead by widening
the alignment parameter.
Matt Macy [Mon, 14 May 2018 00:14:00 +0000 (00:14 +0000)]
epoch(9): allow sx locks to be held across epoch_wait()
The INVARIANTS checks in epoch_wait() were intended to
prevent the block handler from returning with locks held.
What it in fact did was preventing anything except Giant
from being held across it. Check that the number of locks
held has not changed instead.
Rick Macklem [Sun, 13 May 2018 23:38:01 +0000 (23:38 +0000)]
Fix the eir_server_scope reply argument for NFSv4.1 ExchangeID.
In the reply to an ExchangeID operation, the NFSv4.1 server returns a
"scope" value (eir_server_scope). If this value is the same, it indicates
that two servers share state, which is never the case for FreeBSD servers.
As such, the value needs to be unique and it was without this patch.
However, I just found out that it is not supposed to change when the
server reboots and without this patch, it did change.
This patch fixes eir_server_scope so that it does not change when the
server is rebooted.
The only affect not having this patch has is that Linux clients don't
reclaim opens and locks after a server reboot, which meant they lost
any byte range locks held before the server rebooted.
It only affects NFSv4.1 mounts and the FreeBSD NFSv4.1 client was not
affected by this bug.
Matt Macy [Sun, 13 May 2018 23:24:48 +0000 (23:24 +0000)]
epoch(9): cleanups, additional debug checks, and add global_epoch
- GC the _nopreempt routines
- to really benefit we'd need a separate routine
- they're not currently in use
- they complicate the API for no benefit at this time
- check that we're actually in a epoch section at exit
Rick Macklem [Sun, 13 May 2018 12:42:53 +0000 (12:42 +0000)]
Fix a slow leak of session structures in the NFSv4.1 server.
For a fairly rare case of a client doing an ExchangeID after a hard reboot,
the old confirmed clientid still exists, but some clients use a new
co_verifier. For this case, the server was not freeing up the sessions on
the old confirmed clientid.
This patch fixes this case. It also adds two LIST_INIT() macros, which are
actually no-ops, since the structure is malloc()d with M_ZERO so the pointer
is already set to NULL.
It should have minimal impact, since the only way I could exercise this
code path was by doing a hard power cycle (pulling the plus) on a machine
running Linux with a NFSv4.1 mount on the server.
Originally spotted during testing of the ESXi 6.5 client.
Rick Macklem [Sun, 13 May 2018 12:29:09 +0000 (12:29 +0000)]
The NFSv4.1 server should return NFSERR_BACKCHANBUSY instead of NFS_OK.
When an NFSv4.1 session is busy due to a callback being in progress,
nfsrv_freesession() should return NFSERR_BACKCHANBUSY instead of NFS_OK.
The only effect this has is that the DestroySession operation will report
the failure for this case and this probably has little or no effect on a
client. Spotted by inspection and no failures related to this have been
reported.
- Create getblkx(9) variant of getblk(9) which can return error.
- Add GB_NOSPARSE flag for getblk()/getblkx() which requests that BMAP
was performed before the buffer is created, and EJUSTRETURN returned
in case the requested block does not exist.
- Make ffs_read() use GB_NOSPARSE to avoid instantiating buffer (and
allocating the pages for it), copying from zero_region instead.
The end result is less page allocations and buffer recycling when a
hole is read, which is important for some benchmarks.
Requested and reviewed by: jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D14917
Matt Macy [Sat, 12 May 2018 20:00:29 +0000 (20:00 +0000)]
hwpmc/epoch - don't reference domain if NUMA is not set
It appears that domain information is set correctly independent
of whether or not NUMA is defined. However, there is no memory
backing secondary domains leading to allocation failure.
Mark Johnston [Sat, 12 May 2018 15:35:26 +0000 (15:35 +0000)]
DTrace aarch64: Avoid calling unwind_frame() in the probe context.
unwind_frame() may be instrumented by FBT, leading to recursion into
dtrace_probe(). Manually inline unwind_frame() as we do with stack
unwinding code for other architectures.
According to the Intel SDM (Volme 3, 9.11.7) the BIOS signature MSR
should be zeroed before executing cpuid (although in practice it does
not seem to matter).
PR: 192487
Submitted by: Dan Lukes
Reported by: Henrique de Moraes Holschuh
MFC after: 3 days
Emmanuel Vadot [Sat, 12 May 2018 13:14:01 +0000 (13:14 +0000)]
aw_mmc: Rework regulator handling
Don't enable regulator on attach but dealt with them on power_up/power_off
Only set the voltage for the signaling regulator since I don't have boards
that can change the supply voltage.
Enable 1.8v signaling voltage.
Emmanuel Vadot [Sat, 12 May 2018 13:13:34 +0000 (13:13 +0000)]
aw_mmc: Do not fully init the controller in attach
Only do a reset of the controller at attach and init it at power_up.
We use to enable some interrupts in reset, only enable the interrupts
we are interested in when doing a request.
While here remove the regulators handling in power_on as it is very wrong
and will be dealt with in another commit.
I've been holding back on this because 1.7.0 requires OpenSSL 1.1.0 or
newer for full DANE support. But we can't wait forever, and nothing in
base uses DANE anyway, so here we go.