Mateusz Guzik [Fri, 18 May 2018 22:57:52 +0000 (22:57 +0000)]
lockmgr: avoid atomic on unlock in the slow path
The code is pretty much guaranteed not to be able to unlock.
This is a minor nit. The code still performs way too many reads.
The altered exclusive-locked condition is supposed to be always
true as well, to be cleaned up at a later date.
Matt Macy [Fri, 18 May 2018 20:13:34 +0000 (20:13 +0000)]
ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@
gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.
When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:
John Baldwin [Fri, 18 May 2018 19:09:11 +0000 (19:09 +0000)]
Be more robust against garbage input on a TOE TLS TX socket.
If a socket is closed or shutdown and a partial record (or what
appears to be a partial record) is waiting in the socket buffer,
discard the partial record and close the connection rather than
waiting forever for the rest of the record.
Matt Macy [Fri, 18 May 2018 17:29:43 +0000 (17:29 +0000)]
epoch(9): Make epochs non-preemptible by default
There are risks associated with waiting on a preemptible epoch section.
Change the name to make them not be the default and document the issue
under CAVEATS.
Mark Johnston [Fri, 18 May 2018 16:59:58 +0000 (16:59 +0000)]
Don't increment addl_page_shortage for wired pages.
Such pages are dequeued as they're encountered during the inactive queue
scan, so by the time we get to the active queue scan, they should have
already been subtracted from the inactive queue length.
Andrew Gallatin [Fri, 18 May 2018 14:14:04 +0000 (14:14 +0000)]
Teach pmcannotate about $TMPDIR and _PATH_TMP
Convert pmcannotate to using $TMPDIR and _PATH_TMP rather than hard
coding /tmp for temporary files. Pmcannotate sometimes needs quite a
lot of space to store the output from objdump, and will fail in odd
ways if that output is truncated due to lack of space in /tmp.
Olivier Houchard [Fri, 18 May 2018 13:28:02 +0000 (13:28 +0000)]
Instead of ignoring the VFP registers, set the dumppcb's pcb_fpusaved
field, so that they are saved, as they may be used in the kernel, in the
EFI and the crypto code.
Make the name of option that toggles IFCAP_HWRXTSTMP capability to
match the name of this capability. It was added recently and is not merged
to stable branch, so I hope it is not too late to change the name.
Mateusz Guzik [Fri, 18 May 2018 07:31:26 +0000 (07:31 +0000)]
amd64: tweak the read_frequently section
1. align to 128 bytes to avoid possible waste from the preceeding section
2. sort entries by alignment SORT_BY_ALIGNMENT, plugging the holes (most
entries are one byte in size, but they got interleaved with bigger ones)
Interestingly I was looking for a feature of the sort earlier and failed
to find it. It turns out the script was already utilizing sorting in other
places, so shame on me.
Thanks for Travis Geiselbrecht for pointing me at the feature.
Navdeep Parhar [Fri, 18 May 2018 06:09:15 +0000 (06:09 +0000)]
cxgbe(4): Implement ifnet callbacks that deal with send tags.
An etid (ethoffload tid) is allocated for a send tag and it acquires a
reference on the traffic class that matches the send parameters
associated with the tag.
Ed Maste [Fri, 18 May 2018 02:58:26 +0000 (02:58 +0000)]
vt: add more cp437 mappings for vga textmode
In UTF-8 locales mandoc uses a number of characters outside of the Basic
Latin group, e.g. from general punctuation or miscellaneous mathematical
symbols, and these rendered as ? in text mode.
This change adds (char, replacement, code point, description):
Matt Macy [Fri, 18 May 2018 01:52:51 +0000 (01:52 +0000)]
epoch: add non-preemptible "critical" variant
adds:
- epoch_enter_critical() - can be called inside a different epoch,
starts a section that will acquire any MTX_DEF mutexes or do
anything that might sleep.
- epoch_exit_critical() - corresponding exit call
- epoch_wait_critical() - wait variant that is guaranteed that any
threads in a section are running.
- epoch_global_critical - an epoch_wait_critical safe epoch instance
Olivier Houchard [Thu, 17 May 2018 22:38:16 +0000 (22:38 +0000)]
In vfp_save_state(), don't bother trying to save the VFP registers if the
provided PCB doesn't have a pcb_fpusaved. All PCBs associated to a thread
should have one, but the dumppcb used when panic'ing doesn't.
Rick Macklem [Thu, 17 May 2018 21:17:20 +0000 (21:17 +0000)]
Add a missing nfsrv_freesession() call for an unlikely failure case.
Since NFSv4.1 clients normally create a single session which supports
both fore and back channels, it is unlikely that a callback will fail
due to a lack of a back channel.
However, if this failure occurred, the session wasn't being dereferenced
and would never be free'd.
Found by inspection during pNFS server development.
Add a "multifunction" device side USB template, which provides mass
storage, CDC ACM (serial), and CDC ECM (ethernet) at the same time.
It's quite similar in function to Linux' "g_multi" gadget.
Reviewed by: hselasky@
MFC after: 2 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Emmanuel Vadot [Thu, 17 May 2018 19:10:13 +0000 (19:10 +0000)]
release: rpi3: Copy the special rpi3 config.txt
RPI* 32bits and RPI* 64bits have a different config.txt
Copy to correct config.txt to the fat partition of the release image.
Also copy pwm.dtbo as some people want to use it.
Matt Macy [Thu, 17 May 2018 17:59:35 +0000 (17:59 +0000)]
AF_UNIX: make unix socket locking finer grained
This change moves to using a reference count across lock drop / reacquire
to guarantee liveness.
Currently sends on unix sockets contend heavily on read locking the list lock.
unix1_processes in will-it-scale peaks at 6 processes and then declines.
With this change I get a substantial improvement in number of operations per second
with 96 processes:
And even for 2 processes shows a ~18% improvement.
"Small" iron changes (1, 2, and 4 processes):
x before1
+ after1.2
+------------------------------------------------------------------------+
| + |
| x + |
| x + |
| x + |
| x ++ |
| xx ++ |
|x x xx ++ |
| |__________________A_____M_____AM____||
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 113164811977501197138.5 1190369.3 20651.839
+ 10 1203840120505612049191204827.9 353.27404
Difference at 95.0% confidence
14458.6 +/- 13723
1.21463% +/- 1.16683%
(Student's t, pooled s = 14605.2)
x before2
+ after2.2
+------------------------------------------------------------------------+
| +|
| +|
| +|
| +|
| +|
| +|
| x +|
| x +|
| x xx +|
|x xxxx +|
| |___AM_| A|
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 197284320458662038186.5 2030443.8 21367.694
+ 10 240085324021962401043.5 2401172.7 385.40024
Difference at 95.0% confidence
370729 +/- 14198.9
18.2585% +/- 0.826943%
(Student's t, pooled s = 15111.7)
x before4
+ after4.2
N Min Max Median Avg Stddev
x 10 398699439917283990137.5 3989985.2 1300.0164
+ 10 479999048066644806116.5 4805194 1990.6625
Difference at 95.0% confidence
815209 +/- 1579.64
20.4314% +/- 0.0421713%
(Student's t, pooled s = 1681.19)
Emmanuel Vadot [Thu, 17 May 2018 16:21:12 +0000 (16:21 +0000)]
release: arm: Format FAT partition as FAT16
r332674 raised the size of the FAT partition from 2MB to 41MB for some
boards. But we format them in FAT12 and this size appears to be to big
for FAT12 and some SoC bootrom cannot cope with that.
Format the msdosfs partition as FAT16,
Fix off-by-one in usb_decode_str_desc(). Previously it would decode
one character too many. Note that this function is only used to decode
string descriptors generated by the kernel itself.
Reviewed by: hselasky@
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Sean Bruno [Thu, 17 May 2018 14:55:41 +0000 (14:55 +0000)]
Retire vxge(4).
This driver was merged to HEAD one week prior to Exar publicly announcing they
had left the Ethernet market. It is not known to be used and has various code
quality issues spotted by Brooks and Hiren. Retire it in preparation for
FreeBSD 12.0.
Emmanuel Vadot [Thu, 17 May 2018 14:51:22 +0000 (14:51 +0000)]
aw_spi: Fix some silly clock mistake
The module uses the mod clock and not the ahb one.
We need to set the mod clock to twice the speed requested as the smallest
divider in the controller is 2.
The clock test function weren't calculating the register value best on the
best div but on the max one.
The cdr2 test function was using the cdr1 formula.
Dimitry Andric [Thu, 17 May 2018 14:38:58 +0000 (14:38 +0000)]
Pull in r322325 from upstream llvm trunk (by Matthias Braun):
PeepholeOpt cleanup/refactor; NFC
- Less unnecessary use of `auto`
- Add early `using RegSubRegPair(AndIdx) =` to avoid countless
`TargetInstrInfo::` qualifications.
- Use references instead of pointers where possible.
- Remove unused parameters.
- Rewrite the CopyRewriter class hierarchy:
- Pull out uncoalescable copy rewriting functionality into
PeepholeOptimizer class.
- Use an abstract base class to make it clear that rewriters are
independent.
- Remove unnecessary \brief in doxygen comments.
- Remove unused constructor and method from ValueTracker.
- Replace UseAdvancedTracking of ValueTracker with DisableAdvCopyOpt
use.
Even though upstream marked this as "No Functional Change", it does
contain some functional changes, and these fix a compiler hang for one
particular source file in the devel/godot port.
Ed Maste [Thu, 17 May 2018 14:04:59 +0000 (14:04 +0000)]
Add driver for Microchip LAN78xx USB3-GigE controller
This driver supports two Microchip USB-Ethernet controllers:
LAN7800 USB 3.1 to 10/100/1000 Mbps Ethernet
LAN7515 USB 2 to 10/100/1000 Mbps Ethernet with built-in USB hub
The LAN7515 is the Ethernet controller on the Raspberry Pi 3B+.
At present there is no datasheet for the LAN7515, but it is effectively
a USB 2 hub combined with a LAN7800 controller. A comprehensive LAN7800
datasheet is at http://www.microchip.com/wwwproducts/en/LAN7800.
This driver is based on the structure of the smsc(4) driver which
supports Microchip/SMSC's LAN95xx family. (Microchip acquired SMSC
in May 2012.) The Linux lan78xx driver served as a reference for some
functionality and registers.
The 'muge' driver name comes from "Microchip USB Gigabit Ethernet".
I made some style adjustments and minor edits to Arshan's submission.
It will be connected to the build after additional review and testing.
Thanks to Microchip for providing a number of Evaluation Boards (EVBs)
for development and testing.
Submitted by: Arshan Khanifar
Reviewed by: hselasky (earlier)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D15168
Emmanuel Vadot [Thu, 17 May 2018 10:19:52 +0000 (10:19 +0000)]
allwinner: Add h3 spi driver
This driver is compatible with H3/H5/A64.
Test was done on the OrangePi-PC2 board (H5 based), which have a mx25l1606e
spi flash on it, by writing u-boot image, reading it and booting from the spi.
There is still room for improvement especially on reading using the controller
automatic burst which will avoid us to write dummy data to the TX FIFO.
DMA is also not supported as we currently don't support the DMA controller on
those SoCs
Only add a kernel module for it.
Andriy Gapon [Thu, 17 May 2018 10:16:20 +0000 (10:16 +0000)]
fix a problem with bad performance after wakeup caused by r333321
This change reverts a "while here" part of r333321 that moved clearing
of suspended_cpus to an earlier place.
Apparently, there can be a problem when modifying (shared) memory before
restoring proper cache attributes. So, to be safe, move the clearing to
the old place.
Many thanks to Johannes Lundberg for bisecting the changes to that
particular commit and then bisecting the commit to the particular
change.
Reported by: many
Debugged by: Johannes Lundberg <johalun0@gmail.com>
MFC after: 1 week
X-MFC with: r333321
Mark Johnston [Thu, 17 May 2018 04:08:57 +0000 (04:08 +0000)]
Fix netdump configuration when VIMAGE is enabled.
We need to set the current vnet before iterating over the global
interface list. Because the dump device may only be set from the host,
only proceed with configuration if the thread belongs to the default
vnet. [1]
Also fix a resource leak that occurs if the priv_check() in set_dumper()
fails.
Lawrence Stewart [Thu, 17 May 2018 02:46:27 +0000 (02:46 +0000)]
Plug a memory leak and potential NULL-pointer dereference introduced in r331214.
Each TCP connection that uses the system default cc_newreno(4) congestion
control algorithm module leaks a "struct newreno" (8 bytes of memory) at
connection initialisation time. The NULL-pointer dereference is only germane
when using the ABE feature, which is disabled by default.
While at it:
- Defer the allocation of memory until it is actually needed given that ABE is
optional and disabled by default.
- Document the ENOMEM errno in getsockopt(2)/setsockopt(2).
- Document ENOMEM and ENOBUFS in tcp(4) as being synonymous given that they are
used interchangeably throughout the code.
- Fix a few other nits also accidentally omitted from the original patch.
Navdeep Parhar [Thu, 17 May 2018 01:42:18 +0000 (01:42 +0000)]
cxgbe(4): Allocate offload Tx queues when a card has resources
provisioned for NIC_ETHOFLD and the kernel has option RATELIMIT.
It is possible to use the chip's offload queues for normal NIC Tx and
not just TOE Tx. The difference is that these queues support out of
order processing of work requests and have a per-"flowid" mechanism for
tracking credits between the driver and hardware. This allows Tx for
any number of flows bound to different rate limits to be submitted to a
single Tx queue and the work requests for slow flows won't cause HOL
blocking for the rest.
Navdeep Parhar [Thu, 17 May 2018 00:52:48 +0000 (00:52 +0000)]
cxgbe(4): Add NIC_ETHOFLD to the NIC capabilities allowed by the driver
by default.
This is the first of a series of commits that will add support for
RATELIMIT kernel option to the base if_cxgbe driver, for use with
ordinary NIC traffic "flows". RATELIMIT is already supported by t4_tom
for the fully-offloaded TCP connections that it handles.
Kirk McKusick [Wed, 16 May 2018 23:30:03 +0000 (23:30 +0000)]
Revert change made in base r171522
(https://svnweb.freebsd.org/base?view=revision&revision=304232)
converting clrbuf() (which clears the entire buffer) to vfs_bio_clrbuf()
(which clears only the new pages that have been added to the buffer).
Failure to properly remove pages from the buffer cache can make
pages that appear not to need clearing to actually have bad random
data in them. See for example base r304232
(https://svnweb.freebsd.org/base?view=revision&revision=304232)
which noted the need to set B_INVAL and B_NOCACHE as well as clear
the B_CACHE flag before calling brelse() to release the buffer.
Rather than trying to find all the incomplete brelse() calls, it
is simpler, though more slightly expensive, to simply clear the
entire buffer when it is newly allocated.
Matt Macy [Wed, 16 May 2018 22:29:20 +0000 (22:29 +0000)]
hwpmc: Implement per-thread counters for PMC sampling
This implements per-thread counters for PMC sampling. The thread
descriptors are stored in a list attached to the process descriptor.
These thread descriptors can store any per-thread information necessary
for current or future features. For the moment, they just store the counters
for sampling.
The thread descriptors are created when the process descriptor is created.
Additionally, thread descriptors are created or freed when threads
are started or stopped. Because the thread exit function is called in a
critical section, we can't directly free the thread descriptors. Hence,
they are freed to a cache, which is also used as a source of allocations
when needed for new threads.
Stephen Hurd [Wed, 16 May 2018 21:03:22 +0000 (21:03 +0000)]
Work around lack of TX IRQs in iflib for netmap
When poll() is called via netmap, txsync is initially called,
and if there are no available buffers to reclaim, it waits for the driver
to notify of new buffers. Since the TX IRQ is generally not used in iflib
drivers, this ends up causing a timeout.
Work around this by having the reclaim DELAY(1) if it's initially unable
to reclaim anything, then schedule the tx task, which will spin by
continuously rescheduling itself until some buffers are reclaimed. In
general, the delay is enough to allow some buffers to be reclaimed, so
spinning is minimized.
Navdeep Parhar [Wed, 16 May 2018 17:55:16 +0000 (17:55 +0000)]
cxgbe(4): Fall back to a failsafe configuration built into the firmware
if an error is reported while pre-processing the configuration file that
the driver attempted to use.
Also, allow the user to explicitly use the built-in configuration with
hw.cxgbe.config_file="built-in"
MFC after: 2 days
Sponsored by: Chelsio Communications
vt(4): Resume vt_timer() in vtterm_post_input() only
There is no need to try to resume it after each smaller operations
(putchar, cursor_position, copy, fill).
The resume function already checks if the timer is armed before doing
anything, but it uses an atomic cmpset which is expensive. And resuming
the timer at the end of input processing is enough.
While here, we also skip timer resume if the input is for another
windows than the currently displayed one. I.e. if `ttyv0` is currently
displayed, any changes to `ttyv1` shouldn't resume the timer (which
would refresh `ttyv0`).
By doing the same benchmark as r333669, I get:
* vt(4), before r333669: 1500 ms
* vt(4), with this patch: 760 ms
* syscons(4): 700 ms
The situation was even worse when the vtterm_copy() and vtterm_fill()
callbacks were involved.
The new callbacks are:
* struct terminal_class->tc_pre_input()
* struct terminal_class->tc_post_input()
They are called in teken_input(), surrounding the while() loop.
The goal is to improve input processing speed of vt(4). As a benchmark,
here is the time taken to write a text file of 360 000 lines (26 MiB) on
`ttyv0`:
* vt(4), unmodified: 1500 ms
* vt(4), with this patch: 1200 ms
* syscons(4): 700 ms
This is on a Haswell laptop with a GENERIC-NODEBUG kernel.
At the same time, the locking is changed in the vt_flush() function
which is responsible to draw the text on screen. So instead of
(indirectly) using VTBUF_LOCK() just to read and reset the dirty area
of the internal buffer, the lock is held for about the entire function,
including the drawing part.
The change is mostly visible while content is scrolling fast: before,
lines could appear garbled while scrolling because the internal buffer
was accessed without locks (once the scrolling was finished, the output
was correct). Now, the scrolling appears correct.
In the end, the locking model is closer to what syscons(4) does.
Andriy Gapon [Wed, 16 May 2018 06:52:08 +0000 (06:52 +0000)]
followup to r332730/r332752: set kdb_why to "trap" for fatal traps
This change updates arm, arm64 and mips achitectures. Additionally, it
removes redundant checks for kdb_active where it already results in
kdb_reenter() and adds kdb_reenter() calls where they were missing.
Some architectures check the return value of kdb_trap(), but some don't.
I haven't changed any of that.
Some trap handling routines have a return code. I am not sure if I
provided correct ones for returns after kdb_reenter(). kdb_reenter
should never return unless kdb_jmpbufp is NULL for some reason.
Only compile tested for all affected architectures. There can be bugs
resulting from my poor understanding of architecture specific details.
Ed Maste [Wed, 16 May 2018 02:15:18 +0000 (02:15 +0000)]
Clarify that boot_mute / boot -m mutes kernel console only
Perhaps RB_MUTE could mute user startup (rc) output as well, but right
now it mutes only kernel console output, so make the documentation match
reality.