trasz [Fri, 4 Nov 2016 14:06:21 +0000 (14:06 +0000)]
MFC r297207:
Make the autofs(5) -hosts map more robust, primarily to make it correctly
handle NFS shares containing whitespace. This also adds the -E parameter
to showmount(8).
jhb [Fri, 4 Nov 2016 04:01:59 +0000 (04:01 +0000)]
MFC 301932: Use sbused() instead of sbspace() to avoid signed issues.
Inserting a full mbuf with an external cluster into the socket buffer
resulted in sbspace() returning -MLEN. However, since sb_hiwat is
unsigned, the -MLEN value was converted to unsigned in comparisons. As a
result, the socket buffer was never autosized. Note that sb_lowat is signed
to permit direct comparisons with sbspace(), but sb_hiwat is unsigned.
Follow suit with what tcp_output() does and compare the value of sbused()
with sb_hiwat instead.
Note: Since stable/10 does not include sbused(), this uses sb->sb_cc
instead.
jhb [Fri, 4 Nov 2016 03:49:53 +0000 (03:49 +0000)]
MFC 290175,290633,299206,300895,301898: Various TOE fixes.
290175:
cxgbe/tom: decide whether to shove segments or not only if there is
payload to transmit.
290633:
cxgbe/t4_tom: add a knob to the default configuration file to tune
the TOE for LAN operation. It is possible to set this to other values
(cluster for networks with little loss and really tight RTTs, and wan
for relatively large RTTs and/or lossy networks) depending on the
environment in which the TOE is being used.
None of this affects plain NIC operation in any way.
299206:
Set the correct vnet in TOE event handlers.
300895:
cxgbe/t4_tom: Exempt RDMA connections from a TCP sanity test for now, to
avoid panicking debug kernels.
t4_tom does not keep track of a connection once it switches to ULP mode
iWARP. If the connection falls out of ULP mode the driver/hardware seq#
etc. are out of sync. A better fix would be to figure out what the
current seq# are, update the driver's state, and perform all sanity
checks as usual.
301898:
cxgbe/t4_tom: Fix inverted assertion in r300895. It is RDMA
connections and not others that are allowed to fail the receive window
check.
jhb [Fri, 4 Nov 2016 03:25:34 +0000 (03:25 +0000)]
MFC 277763,280146,287631: Various fixes to DDP.
277763:
Lock the socket buffer before jumping to the 'out' label if sblock()
fails in t4_soreceive_ddp().
280146:
Move special DDP handling for closing a connection into a new
handle_ddp_close() function in t4_ddp.c as the logic is similar
to handle_ddp_data(). This allows all knowledge of the special
DDP mbufs to be private to t4_ddp.c as well.
287631:
Add a comment to clarify how to determine the amount of received DDP
data.
jch [Thu, 3 Nov 2016 19:58:12 +0000 (19:58 +0000)]
MFC r307966:
Remove an extraneous call to soisconnected() in syncache_socket(),
introduced with r261242. The useful and expected soisconnected()
call is done in tcp_do_segment().
Has been found as part of unrelated PR:212920 investigation.
Improve slightly (~2%) the maximum number of TCP accept per second.
rmacklem [Thu, 3 Nov 2016 00:58:50 +0000 (00:58 +0000)]
MFC: r307694
A problem w.r.t. interoperation between the FreeBSD NFSv4.1 server with
delegations enabled and the Linux NFSv4.1 client was reported in
reviews.freebsd.org/D7891.
I believe that the FreeBSD server behaviour conforms to the RFC and that
the Linux client has a bug. Therefore, I do not think the proposed patch
is appropriate. When nfsrv_writedelegifpos is non-zero, the FreeBSD
server will issue a write delegation for a read open if possible.
The Linux client then erroneously assumes that the credentials used for
the read open can write the file.
This patch reverses the default value for nfsrv_writedelegifpos to 0 so
that the default behaviour is Linux compatible and adds a sysctl that can
be used to set nfsrv_writedelegifpos.
This change should only affect users that are mounting a FreeBSD server
with delegations enabled (they are not enabled by default) with a Linux
NFSv4.1 client mount.
Certain warning alerts are ignored if they are received. This can mean that
no progress will be made if one peer continually sends those warning alerts.
Implement a count so that we abort the connection if we receive too many.
Issue reported by Shi Lei.
This is a direct commit to stable/10 and stable/9.
jhb [Mon, 31 Oct 2016 22:45:11 +0000 (22:45 +0000)]
MFC 291665,291685,291856,297467,302110,302263: Add support for VIs.
291665:
Add support for configuring additional virtual interfaces (VIs) on a port.
Each virtual interface has its own MAC address, queues, and statistics.
The dedicated netmap interfaces (ncxgbeX / ncxlX) were already implemented
as additional VIs on each port. This change allows additional non-netmap
interfaces to be configured on each port. Additional virtual interfaces
use the naming scheme vcxgbeX or vcxlX.
Additional VIs are enabled by setting the hw.cxgbe.num_vis tunable to a
value greater than 1 before loading the cxgbe(4) or cxl(4) driver.
NB: The first VI on each port is the "main" interface (cxgbeX or cxlX).
T4/T5 NICs provide a limited number of MAC addresses for each physical port.
As a result, a maximum of six VIs can be configured on each port (including
the "main" interface and the netmap interface when netmap is enabled).
One user-visible result is that when netmap is enabled, packets received
or transmitted via the netmap interface are no longer counted in the stats
for the "main" interface, but are not accounted to the netmap interface.
The netmap interfaces now also have a new-bus device and export various
information sysctl nodes via dev.n(cxgbe|cxl).X.
The cxgbetool 'clearstats' command clears the stats for all VIs on the
specified port along with the port's stats. There is currently no way to
clear the stats of an individual VI.
291685:
Fix build for !TCP_OFFLOAD case.
291856:
Fix RSS build.
297467:
Remove #ifdef's from various structures used in the cxgbe/cxl driver.
This provides a constant ABI and layout for these structures (especially
struct adapter) avoiding some foot shooting.
302110:
cxgbe(4): Merge netmap support from the ncxgbe/ncxl interfaces to the
vcxgbe/vcxl interfaces and retire the 'n' interfaces. The main
cxgbe/cxl interfaces and tunables related to them are not affected by
any of this and will continue to operate as usual.
The driver used to create an additional 'n' interface for every
cxgbe/cxl interface if "device netmap" was in the kernel. The 'n'
interface shared the wire with the main interface but was otherwise
autonomous (with its own MAC address, etc.). It did not have normal
tx/rx but had a specialized netmap-only data path. r291665 added
another set of virtual interfaces (the 'v' interfaces) to the driver.
These had normal tx/rx but no netmap support.
This revision consolidates the features of both the interfaces into the
'v' interface which now has a normal data path, TOE support, and native
netmap support. The 'v' interfaces need to be created explicitly with
the hw.cxgbe.num_vis tunable. This means "device netmap" will not
result in the automatic creation of any virtual interfaces.
The following tunables can be used to override the default number of
queues allocated for each 'v' interface. nofld* = 0 will disable TOE on
the virtual interface and nnm* = 0 to will disable native netmap
support.
# number of normal NIC queues
hw.cxgbe.ntxq_vi
hw.cxgbe.nrxq_vi
# number of TOE queues
hw.cxgbe.nofldtxq_vi
hw.cxgbe.nofldrxq_vi
# number of netmap queues
hw.cxgbe.nnmtxq_vi
hw.cxgbe.nnmrxq_vi
hw.cxgbe.nnm{t,r}xq{10,1}g tunables have been removed.
--- tl;dr version ---
The workflow for netmap on cxgbe starting with FreeBSD 11 is:
1) "device netmap" in the kernel config.
2) "hw.cxgbe.num_vis=2" in loader.conf. num_vis > 2 is ok too, you'll
end up with multiple autonomous netmap-capable interfaces for every
port.
3) "dmesg | grep vcxl | grep netmap" to verify that the interface has
netmap queues.
4) Use any of the 'v' interfaces for netmap. pkt-gen -i vcxl<n>... .
One major improvement is that the netmap interface has a normal data
path as expected.
5) Just ignore the cxl interfaces if you want to use netmap only. No
need to bring them up. The vcxl interfaces are completely independent
and everything should just work.
---------------------
302263:
cxgbe(4): Do not bring up an interface when IFCAP_TOE is enabled on it.
The interface's queues are functional after VI_INIT_DONE (which is short
of interface-up) and that's all that's needed for t4_tom to communicate
with the chip.
jhb [Mon, 31 Oct 2016 22:03:44 +0000 (22:03 +0000)]
MFC 289401: cxgbe(4): support for the kernel RSS option.
You need PCBGROUP and RSS in the kernel config to use this.
Note: Since RSS is not present in 10.x this is mostly a no-op and is
stubbed out by removing the #include of opt_rss.h. This is merged
primarily to reduce conflicts in future merges, however it does add a
couple of diagnostic messages related to RSS buckets vs RX queue
counts.
dim [Mon, 31 Oct 2016 18:37:44 +0000 (18:37 +0000)]
Pull in r228705 from upstream libc++ trunk (by Eric Fiselier):
[libcxx] Fix PR 22468 - std::function<void()> does not accept
non-void-returning functions
Summary:
The bug can be found here: https://llvm.org/bugs/show_bug.cgi?id=22468
`__invoke_void_return_wrapper` is needed to properly handle calling a
function that returns a value but where the std::function return type
is void. Without this '-Wsystem-headers' will cause
`function::operator()(...)` to not compile.
sbruno [Mon, 31 Oct 2016 16:48:16 +0000 (16:48 +0000)]
MFC r308038:
The buffer address is always overwritten in the extended descriptor format,
we have to refresh it ... always. This fixes problems reported in NetMap
with em(4) devices after conversion to extended descriptor format in
svn r293331.
mav [Mon, 31 Oct 2016 07:21:37 +0000 (07:21 +0000)]
MFC r307523: Make pass driver better support CAM_CDB_POINTER flag.
Previously pass driver just ignored the flag, making random kernel code
access user-space pointer, sometime causing crashes even for correctly
written applications if user-level context was switched or swapped out.
This patch tries to copyin the CDB into kernel space to avoid it.
ed [Sat, 29 Oct 2016 15:04:24 +0000 (15:04 +0000)]
Add posix_tnode to <search.h>.
In r307227 I've refactored the binary search tree functions to use the
posix_tnode type. As this change does not apply cleanly to this version
of FreeBSD, only make the change that matters: add the definition of the
newly introduced type.
This will ease source-level compatibility going forward.
Until we can resolve the numerous hole_birth bugs that have cropped up
recently, and come up with a way going forwards to protect users from
corruption, we should disable the hole_birth feature. Using a tunable
allows those who are confident that their data is correct to continue to
take advantage of the feature.
Closes #188
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Author: Paul Dagnelie <pcd@delphix.com>
dsl_dataset_space is looking at the ds_bp's fill count while
dmu_objset_write_ready() is concurrently modifying it. This fix adds an
rrwlock to protect the ds_bp.
Closes #180
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Author: Paul Dagnelie <pcd@delphix.com>
mav [Sat, 29 Oct 2016 08:48:01 +0000 (08:48 +0000)]
MFC r307507, r307509, r307515:
Consider device as clean even if SYNCHRONIZE CACHE failed.
If device reservation was preempted by other initiator, our sync request
will always fail. Without this change CAM tried to sync cache on every
following device close, including numerous GEOM tasting opens/closes,
causing lots of useless noise in logs.
mav [Sat, 29 Oct 2016 08:45:06 +0000 (08:45 +0000)]
MFC r307350: Add LUN options to limit UNMAP and WRITE SAME sizes.
CTL itself has no limits on on UNMAP and WRITE SAME sizes. But depending
on backends large requests may take too much time. To avoid that new
configuration options allow to hint initiator maximal sizes it should not
exceed.
mav [Fri, 28 Oct 2016 18:25:32 +0000 (18:25 +0000)]
MFC r300881, r302058 (by asomers):
Avoid issuing spa config updates for physical path when not necessary
ZFS's configuration needs to be updated whenever the physical path for a
device changes, but not when a new device is introduced. This is because new
devices necessarily cause config updates, but only if they are actually
accepted into the pool.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
Split vdev_geom_set_physpath out of vdev_geom_attrchanged. When
setting the vdev's physical path, only request a config update if
the physical path has changed. Don't request it when opening a
device for the first time, because the config sync will happen
anyway upstack.
sys/geom/geom_dev.c
Split g_dev_set_physpath and g_dev_set_media out of
g_dev_attrchanged
mav [Fri, 28 Oct 2016 18:24:05 +0000 (18:24 +0000)]
MFC r300059 (by asomers): Speed up vdev_geom_open_by_guids
Speedup is hard to measure because the only time vdev_geom_open_by_guids
gets called on many drives at the same time is during boot. But with
vdev_geom_open hacked to always call vdev_geom_open_by_guids, operations
like "zpool create" speed up by 65%.
* Read all of a vdev's labels in parallel instead of sequentially.
* In vdev_geom_read_config, don't read the entire label, including
the uberblock. That's a waste of RAM. Just read the vdev config
nvlist. Reduces the IO and RAM involved with tasting from 1MB to
448KB.
mav [Fri, 28 Oct 2016 18:22:00 +0000 (18:22 +0000)]
MFC r298814 (by asomers): Fix a use-after-free when "zpool import" fails
clear vd->vdev_tsd in vdev_geom_close_locked instead of vdev_geom_detach.
In the latter function, it would fail to happen in certain circumstances
where cp->private was unset. Ideally, the latter should never happen, but
it can happen when vdev open fails, or where spares are involved.
mav [Fri, 28 Oct 2016 18:20:14 +0000 (18:20 +0000)]
MFC r298786 (by asomers):
Refactor vdev_geom_attach and friends to reduce code duplication
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
Move checks for provider's sectorsize and mediasize into a single
location in vdev_geom_attach. Remove the zfs::vdev::taste class;
it's ok to use the regular vdev class for tasting. Consolidate guid
checks into a single location in vdev_attach_ok. Consolidate some
error handling code from vdev_geom_attach into vdev_geom_detach,
closing a resource leak of geom consumers in the process.
Using zvols as backing devices for ZFS pools is fraught with panics and
deadlocks. For example, attempting to online a missing device in the
presence of a zvol can cause a panic when vdev_geom tastes the zvol. Better
to completely disable vdev_geom from ever opening a zvol. The solution
relies on setting a thread-local variable during vdev_geom_open, and
returning EOPNOTSUPP during zvol_open if that thread-local variable is set.
Remove the check for MUTEX_HELD(&zfsdev_state_lock) in zvol_open. Its intent
was to prevent a recursive mutex acquisition panic. However, the new check
for the thread-local variable also fixes that problem.
Also, fix a panic in vdev_geom_taste_orphan. For an unknown reason, this
function was set to panic. But it can occur that a device disappears during
tasting, and it causes no problems to ignore this departure.
kib [Fri, 28 Oct 2016 12:58:40 +0000 (12:58 +0000)]
MFC r306807:
When making a pause after detecting hard kill of the single-user
shell, ensure that we do sleep for at least the specified time, in
presence of signals.
jhb [Fri, 28 Oct 2016 03:54:19 +0000 (03:54 +0000)]
MFC 303002: Include process IDs in core dumps.
When threads were added to the kernel, the pr_pid member of the
NT_PRSTATUS note was repurposed to store LWP IDs instead of process
IDs. However, the process ID was no longer recorded in core dumps.
This change adds a pr_pid field to prpsinfo (NT_PRSINFO). Rather than
bumping the prpsinfo version number, note parsers can use the note's
payload size to determine if pr_pid is present.
davidcs [Wed, 26 Oct 2016 18:13:30 +0000 (18:13 +0000)]
MFC r307578
1. Use taskqueue_create() instead of taskqueue_create_fast() for both
fastpath and slowpath taskqueues.
2. Service all transmits in taskqueue threads.
3. additional stats counters for keeping track of
- bd availability
- tx buf ring not emptied in the fp task queue.
These are drained via timeout taskqueue.
- tx attempts during link down.
jch [Tue, 25 Oct 2016 12:58:36 +0000 (12:58 +0000)]
MFC r307551:
Fix a double-free when an inp transitions to INP_TIMEWAIT state
after having been dropped.
This change enforces in_pcbdrop() logic in tcp_input():
"in_pcbdrop() is used by TCP to mark an inpcb as unused and avoid future packet
delivery or event notification when a socket remains open but TCP has closed."
sephe [Wed, 19 Oct 2016 08:45:19 +0000 (08:45 +0000)]
MFC 307261
hyperv/stor: Fix off-by-one bug; this brings back TRIM support.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reported by: Lili Deng <v-lide microsoft com>
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8238
sevan [Sun, 16 Oct 2016 23:44:02 +0000 (23:44 +0000)]
MFC r306732:
Document the history of fdisk based on the original post to comp.unix.bsd by Julian Elischer [1] and the Mach 2.
5 Installation notes [2].
I was unable to pin point the exact version of Mach the fdisk utility appeared as I could not find documentation
older than version 2.5 & no source code or repo history.
fdisk utility appears as a separate utility[3] in v2.5. Due to this, I have avoided stating the exact version fd
isk first appeared in Mach.
Add authors section.
sevan [Sun, 16 Oct 2016 23:39:15 +0000 (23:39 +0000)]
MFC r306731:
Document the history of fdisk based on the original post to comp.unix.bsd by Julian Elischer [1] and the Mach 2.5 Installation notes [2].
I was unable to pin point the exact version of Mach the fdisk utility appeared as I could not find documentation older than version 2.5 & no source code or repo history.
fdisk utility appears as a separate utility[3] in v2.5. Due to this, I have avoided stating the exact version fdisk first appeared in Mach.
Add authors section.
Make correction pointed by igor
[1] https://groups.google.com/d/topic/comp.unix.bsd/Hhi45vAHxDg/discussion
[2] ftp://ftp.mcs.vuw.ac.nz/doc/misc/mach-i386-doc/i386_install.ps
[3] ftp://ftp.mcs.vuw.ac.nz/doc/misc/mach-i386-doc/i386_manpages.ps
PR: 212469
Approved by: bcr (mentor)
Differential Revision: https://reviews.freebsd.org/D8104
sevan [Sun, 16 Oct 2016 23:28:58 +0000 (23:28 +0000)]
MFC r306724:
Add history section for bsdlabel(8)
http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.3BSD-Tahoe/usr/man/cat8/disklabel.0
Remove tab after space, highlighted by igor
sevan [Sun, 16 Oct 2016 23:09:04 +0000 (23:09 +0000)]
MFC r306718:
Add history section for echo(1)
Sourced using the draft copy of the second edition manual
http://www.tuhs.org/Archive/PDP-11/Distributions/research/1972_stuff/unix_2nd_edition_manual.pdf
sevan [Sun, 16 Oct 2016 22:22:46 +0000 (22:22 +0000)]
MFC r306611:
Amend history to mention predecessor originated from 386BSD[1] & current implementation from NetBSD[2].
Reword history since the utility was renamed once more in FreeBSD 5.0.
Separate out author & historical information regarding character code conversion.
Add AUTHORS section.