mckusick [Sun, 22 Jan 2017 17:49:14 +0000 (17:49 +0000)]
By default, when doing incremental restores the restore program
overwrites an existing file rather than removing it and creating a
new file. If the old and new version of the file both have extended
attributes and the extended attributes of the two versions of the
file are different, the result is that the new file ends up with
the union of the extended attributes of the old and new files.
To get the behavior of replacing the extended attributes rather
than augmenting them requires explicitly removing the old attributes
and then adding the new ones.
To get this behavior, the old file must be unlinked (which clears
out the old extended attributes). Then the new file of the same
name must be created and the new extended attributes added to it.
This behavior can be obtained by specifying the -u flag when running
restore. Rather than defaulting the -u option to on and possibly
breaking existing scripts using restore, this change simply notes
in the restore.8 manual page that the -u flag is recommended when
using restore on filesystems that contain extended attributes.
PR: 216127
Reported by: dewayne at heuristicsystems.com.au
Differential Revision: https://reviews.freebsd.org/D9208
adrian [Sun, 22 Jan 2017 07:05:41 +0000 (07:05 +0000)]
[athalq] fix rxtimestamp wrapping; print out per-packet timestamp deltas.
The delta here is just between the current TX/RX copmletion and the previous
TX/RX completion. The metadata needed to link TX descriptor timestamps to their
/completion/ timestamp isn't there yet.
adrian [Sun, 22 Jan 2017 05:45:42 +0000 (05:45 +0000)]
[ath] only apply the AR9300 delimiter workaround for the first sub-frame.
This is supposed to only be applied to the first subframe and only if
RTS/CTS is being done. I'm still not yet checking RTS/CTS exchange status
so it's just happening for all subframes on AR9380 and later.
This gets MCS23 throughput up from around 250mbit to 303mbit with RTS/CTS
protection enabled, and around 330mbit with no HT protection enabled.
Now, MCS23 has a PHY rate of 450mbit and we should be seeing closer to
400mbit for a straight one-way UDP test, but this beats the previous
maximum throughput.
Tested:
* AR9380 (STA) -> AR9580 (AP) - STA with the modifications, doing UDP TX
test using iperf.
jah [Sun, 22 Jan 2017 00:46:04 +0000 (00:46 +0000)]
Like r310481 for i386, move the objects used to create temporary
mappings for armv6 pmap zero and copy operations to the MD PCPU region.
Change sysmap initialization to only allocate KVA pages for CPUs that
are actually present.
While here, collapse CMAP3 into CMAP2 (their use was mutually exclusive
anyway) and "recover" some space in PCPU padding that has always been
available due to 64-byte cacheline padding.
loos [Sat, 21 Jan 2017 23:07:15 +0000 (23:07 +0000)]
Handle the rx queue stall while reading the packets from NIC (when the
descriptor state will not change anymore). This seems to eliminate the
race where we can miss a stalled queue under high load.
While here remove the unnecessary curly brackets.
Reported by: Konstantin Kormashev <konstantin@netgate.com>
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)
avos [Sat, 21 Jan 2017 21:03:26 +0000 (21:03 +0000)]
ath: adapt LDPC support checks
Set both IEEE80211_HTCAP_LDPC and IEEE80211_HTC_TXLDPC capability flags
if LDPC is supported + set 'do_ldpc = 1' only when it is not disabled,
not just supported.
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D9277
loos [Sat, 21 Jan 2017 19:49:39 +0000 (19:49 +0000)]
Simplify the handling of small packets padding in cpsw:
- Pad small packets to 60 bytes and not 64 (exclude the CRC bytes);
- Pad the packet using m_append(9), if the packet has enough space for
padding, which is usually true, it will not be necessary append a newly
allocated mbuf to the chain.
mav [Sat, 21 Jan 2017 19:38:26 +0000 (19:38 +0000)]
Add initial support for CTL module unloading.
It is only a first step and not perfect, but better then nothing.
The main blocker is CAM target frontend, that can not be unloaded,
since CAM does not have mechanism to unregister periph driver now.
Note that mandoc does not use anymore sqlite3 but a home made database format
An important improvement has been made as well in makewhatis performance:
Tests on my laptop shows makewhatis on the entire system goes from 26s to 12s
adrian [Sat, 21 Jan 2017 06:53:30 +0000 (06:53 +0000)]
[ath] ensure both iv_ampdu_limit and iv_ampdu_rxmax is set.
A recent change enforced the VAP limit as well as the peer limit.
I now need to actually set iv_ampdu_limit or we don't transmit more
than 8K sized aggregates.
This restores the expected (suboptimal, but still much faster) behaviour.
asomers [Fri, 20 Jan 2017 21:40:04 +0000 (21:40 +0000)]
Fix misc Coverity defects in camdd(8)
CID 1341620 Fix a small memory leak
CID 1341630 Though this is technically a false positive, rearrange the
code for clarity.
CID 1341635 Eliminate dead code
CID 1368663 Fix a double mutex unlock in the error path
Also:
* Use sig_atomic_t for variables accessed from signal handlers
* Don't conditionalize free(3) on its argument being non-null
asomers [Fri, 20 Jan 2017 21:21:41 +0000 (21:21 +0000)]
Misc Coverity fixes in camcontrol(8)
CID 1229913 Fix output of "camcontrol persist -i report_capabilities".
The reported Persistent Reservation Types were wrong in all
cases.
CID 1356029 Annotate the code so Coverity will know that this is a false
positive.
CID 1366830 Fix a memory leak in "camcontrol timestamp -s"
CID 1366832 Fix a segfault that could be caused by bad drive firmware
Also, fix the man page entry for the "camcontrol epc state" command to match
what the code does.
hselasky [Fri, 20 Jan 2017 17:40:31 +0000 (17:40 +0000)]
Fix for race leading to endless timer interrupts related to
configtimer().
During normal operation "state->nextcallopt" will always be less than
or equal to "state->nextcall" and checking only "state->nextcallopt"
before calling "callout_process()" is sufficient. However when
"configtimer()" is called a race might happen requiring both of these
binary times to be checked.
Short description of race:
1) A configtimer() call will reset both "state->nextcall" and
"state->nextcallopt" to the same binary time.
2) If a "callout_reset()" call happens between "configtimer()" and the
next "callout_process()" call, "state->nextcallopt" will get updated
and "state->nextcall" will remain at the current time. Refer to logic
inside cpu_new_callout().
3) getnextcpuevent() only respects "state->nextcall" and returns this
value over and over again, even if it is in the past, until "now >=
state->nextcallopt" becomes true. Then these two time variables are
corrected by a "callout_process()" call and the situation goes back to
normal.
The problem manifests itself in different ways. The common factor is
the timer process(es) consume all CPU on one or more CPU cores for a
long time, blocking other kernel processes from getting execution
time. This can be seen by very high interrupt counts as displayed by
"vmstat -i | grep timer" right after boot.
When EARLY_AP_STARTUP was enabled in r310177 the likelyhood of hitting
this bug apparently increased.
Example output from "vmstat -i" before patch:
cpu0:timer 7591 69
cpu9:timer 39031773 358089
cpu4:timer 9359 85
cpu3:timer 9100 83
cpu2:timer 9620 88
Example output from "vmstat -i" after patch:
cpu0:timer 4242 34
cpu6:timer 5531 44
cpu3:timer 6450 52
cpu1:timer 4545 36
cpu9:timer 7153 58
Before the patch cpu9 in the example above, was spinning in a loop in
order to reach 39 million interrupts just a few seconds after
bootup. After the patch the timer interrupt counts are more or less
consistent.
Discussed with: mav @
Reported by: several people
MFC after: 1 week
Sponsored by: Mellanox Technologies
rstone [Fri, 20 Jan 2017 17:16:48 +0000 (17:16 +0000)]
Fix reference to free memory in ixgbe/if_media.c
When ixgbe receives an interrupt indicating that a new optical module
may have been inserted, it discards all of its current media types
by calling ifmedia_removeall() and then creates a new set of media
types for the supported media on the new module. However,
ifmedia_removeall() was maintaining a pointer to whatever the
current media type was before the call to ifmedia_removealL().
The result of this was that any attempt to read the current media
type of the interface (e.g. via ifconfig) would return potentially
garbage data from free memory (or if one were particularly unlucky
on an architecture that does not malloc() from a direct map, page
fault the kernel).
Fix this by NULL'ing out the current media field in if_media.c,
and have ixgbe update the current media type after recreating
them.
hselasky [Fri, 20 Jan 2017 15:45:21 +0000 (15:45 +0000)]
Allow transmit packet bufring in software to be disabled.
- Add new sysctl node to control the transmit packet bufring.
- Add optimised version of the transmit routine which output packets
directly to the DMA ring instead of using bufring in case the transmit
lock is congested. This can reduce the number of taskswitches which in
turn influence the overall system CPU usage, depending on the
workload.
- Add " TX" suffix to debug name for transmit mutexes to silence some
witness warnings about aquiring duplicate locks having same name.
https://www.illumos.org/issues/6569
The core issue I've found is that there is no throttle for how many
deletes get assigned to one TXG. As a results when deleting large files
we end up filling consecutive TXGs with deletes/frees, then write
throttling other (more important) ops.
There is an easy test case for this problem. Try deleting several
large files (at least 1/2 TB) while you do write ops on the same
pool. What we've seen is performance of these write ops (let's
call it sideload I/O) would drop to zero.
More specifically the problem is that dmu_free_long_range_impl()
can/will fill up all of the dirty data in the pool "instantly",
before many of the sideload ops can get in. So sideload
performance will be impacted until all the files are freed.
The solution we have tested at Nexenta (with positive results)
creates a relatively simple throttle for how many "free" ops we let
into one TXG.
However this solution exposes other problems that should also be
addressed. If we are to slow down freeing of data that means one
has to wait even longer (assuming vnode ref count of 1) to get shell
back after an rm or for NFS thread to finish the free-ing op.
To avoid this the proposed solution is to call zfs_inactive() async
for "large" files. Async freeing then begs for the reclaimed space
to be accounted for in the zpool's "freeing" prop.
The other issue with having a longer delete is the inability to
export/unmount for a longer period of time. The proposed solution
is to interrupt freeing of blocks when a fs is unmounted.
Author: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
avg [Fri, 20 Jan 2017 13:39:07 +0000 (13:39 +0000)]
don't abort writing of a core dump after EFAULT
It's possible to get EFAULT when writing a segment backed by a file
if the segment extends beyond the file.
The core dump could still be useful if we skip the rest of the segment
and proceed to other segements.
The skipped segment (or a portion of it) will be zero-filled.
While there, use 'const' to signify that core_write() only reads the
buffer and use __DECONST before calling vn_rdwr_inchunks() because it
can be used for both reading and writing.
Before the change:
kernel: Failed to write core file for process mmap_trunc_core (error 14)
kernel: pid 77718 (mmap_trunc_core), uid 1001: exited on signal 6
After the change:
kernel: Failed to fully fault in a core file segment at VA 0x800645000 with size 0x4000 to be written at offset 0x29000 for process mmap_trunc_core
kernel: pid 4901 (mmap_trunc_core), uid 1001: exited on signal 6 (core dumped)
Reviewed by: julian, kib
Obtained from: Panzura (older version of the change)
MFC after: 5 days
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D9233
avg [Fri, 20 Jan 2017 13:21:27 +0000 (13:21 +0000)]
vmm_dev: work around a bogus error with gcc 6.3.0
The error is:
vmm_dev.c: In function 'alloc_memseg':
vmm_dev.c:261:11: error: null argument where non-null required (argument 1) [-Werror=nonnull]
Apparently, the gcc is unable to figure out that if a ternary operator
produced a non-NULL value once, then the operator with exactly the same
operands would produce the same value again.
hselasky [Fri, 20 Jan 2017 11:11:49 +0000 (11:11 +0000)]
Add runtime support for modifying the SQ and RQ completion event
moderation mode. The presence of this feature is indicated through the
firmware capabilities.