hselasky [Tue, 15 Nov 2016 09:00:01 +0000 (09:00 +0000)]
MFC r308416:
Add timer to watch the RQ when we are out of mbufs.
The firmware/hardware does not generate additional completion
events unless we post new buffers. Use a timer to try to post
more buffers in case we are temporarily out of mbufs. Else
the receive schedule completely stops.
mav [Sat, 12 Nov 2016 23:58:07 +0000 (23:58 +0000)]
MFC r308173:
Fix ZIL records ordering when ZVOL opened both with and without FSYNC.
Before this an earlier writes to a ZVOL opened without FSYNC could get to
ZIL after later writes to the same ZVOL opened with FSYNC. Fix this by
replicating functionality of ZPL (zv_sync_cnt equivalent to z_sync_cnt),
marking all log records sync if anybody opened the ZVOL with FSYNC.
mav [Sat, 12 Nov 2016 23:40:40 +0000 (23:40 +0000)]
MFC r308055: Add vdev_reopening support to vdev_geom.
It allows to avoid extra GEOM providers flapping without significant need.
Since GEOM got resize support, we don't need to reopen provider to get new
size. If provider was orphaned and no longer valid, ZFS should already
know that, and in such case reopen should be done in full as expected.
mav [Sat, 12 Nov 2016 23:38:04 +0000 (23:38 +0000)]
MFC r308051: Matching GUIDs, handle possible race on vdev detach.
In case of vdev detach, causing top level mirror vdev destruction, leaf
vdev changes its GUID to one of the destroyed mirror, that creates race
condition when GUID in vdev label may not match one in the pool config.
This change replicates logic nuance of vdev_validate() by adding special
exception, matching the vdev GUID against the top level vdev GUID.
Since this exception is not completely reliable (may give false positives
if we fail to erase label on detached vdev), use it only as last resort.
Quick way to reproduce this scenario now is detach vdev from a pool with
enabled autoextend. During vdev detach autoextend logic tries to reopen
remaining vdev, that always fails now since in-memory configuration is
already updated, while on-disk labels are not yet.
mav [Sat, 12 Nov 2016 23:32:00 +0000 (23:32 +0000)]
MFC r307318: MFV r307314:
6988 spa_sync() spends half its time in dmu_objset_do_userquota_updates
Using a benchmark which creates 2 million files in one TXG, I observe
that the thread running spa_sync() is on CPU almost the entire time we
are syncing, and therefore can be a performance bottleneck. About 50% of
the time in spa_sync() is in dmu_objset_do_userquota_updates().
The problem is that dmu_objset_do_userquota_updates() calls
zap_increment_int(DMU_USERUSED_OBJECT) once for every file that was
modified (or created). In this benchmark, all the files are owned by the
same user/group, so all 2 million calls to zap_increment_int() are
modifying the same entry in the zap. The same issue exists for the
DMU_GROUPUSED_OBJECT.
We should keep an in-memory map from user to space delta while we are
syncing, and when we finish, iterate over the in-memory map and modify
the ZAP once per entry. This reduces the number of calls to
zap_increment_int() from "number of objects modified" to "number of
owners/groups of modified files".
This reduced the time spent in spa_sync() in the file create benchmark
by ~33%, from 11 seconds to 7 seconds.
Closes #107
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Ned Bass <bass6@llnl.gov>
Reviewed by: Jinshan Xiong <jinshan.xiong@intel.com>
Author: Matthew Ahrens <mahrens@delphix.com>
hselasky [Sat, 12 Nov 2016 17:30:55 +0000 (17:30 +0000)]
MFC r308437 and r308461:
Range check the jitter values to avoid bogus sample rate adjustments.
The expected deviation should not be more than 1Hz per second. The USB
v2.0 specification also mandates this requirement. Refer to chapter
5.12.4.2 about feedback.
Allow higher sample rates to have more jitter than lower ones.
rmacklem [Tue, 8 Nov 2016 21:47:00 +0000 (21:47 +0000)]
MFC: r307891
Fix the man page to reflect the change done by r307890 to mountd.c
so that the "-n" option uses the sysctl for the correct NFS server.
This is a content change.
rmacklem [Tue, 8 Nov 2016 21:39:15 +0000 (21:39 +0000)]
MFC: r307890
mountd(8) was erroneously setting the sysctl for the old NFS server
when the new/default NFS server was running, for the "-n" option.
This patch fixes the problem for stable/10 and stable/9.
Since the new NFS server uses vfs.nfsd.nfs_privport == 0 by default,
there wouldn't have been many users affected by the code not setting
it to 0 when the "-n" option was specified.
hselasky [Mon, 7 Nov 2016 09:19:04 +0000 (09:19 +0000)]
MFC r307518:
Fix device delete child function.
When detaching device trees parent devices must be detached prior to
detaching its children. This is because parent devices can have
pointers to the child devices in their softcs which are not
invalidated by device_delete_child(). This can cause use after free
issues and panic().
Device drivers implementing trees, must ensure its detach function
detaches or deletes all its children before returning.
While at it remove now redundant device_detach() calls before
device_delete_child() and device_delete_children(), mostly in
the USB controller drivers.
Tested by: Jan Henrik Sylvester <me@janh.de>
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D8070
avos [Sun, 6 Nov 2016 16:44:33 +0000 (16:44 +0000)]
MFC r288990:
Fix regression from r248371. We need to copy packet header to new
mbuf. Unlike in the pre-r248371 code, assert that M_PKTHDR is set
only on a first mbuf.
avos [Sun, 6 Nov 2016 13:50:54 +0000 (13:50 +0000)]
MFC r283636:
- Don't request BUS_DMA_ALLOCNOW for dma tags, that requires enormous
amount of memory.
- Don't request segsize of BUS_SPACE_MAXSIZE_32BIT, when maxsize is
MCLBYTES.
With this change bwi_attach() can succeed on i386.
Sources from the "current" build tree and generated sources in the
object tree should be used instead of sources and headers from the
already installed source tree on the build host.
This was noticed while addressing issues in the upcoming amd update.
r307801:
Align whitespace.
r307801 is related to r307800 however it was a separate commit to
HEAD in order to maintain a separation between the functional change
and a correction of style.
jhb [Fri, 4 Nov 2016 22:03:41 +0000 (22:03 +0000)]
MFC 302313:
cxgbe(4): Avoid a NULL dereference while dumping the L2 table. Entries
used by switching filters that rewrite L2 information do not have any
associated ifnet.
301516:
cxgbetool: Allow max-rate > 10Gbps for rate-limited traffic.
301520:
cxgbe(4): Create a reusable struct type for scheduling class parameters.
301531:
cxgbe(4): Break up set_sched_class. Validate the channel number and
min/max rates against their actual limits (which are chip and port
specific) instead of hardcoded constants.
301535:
cxgbe(4): Track the state of the hardware traffic schedulers in the
driver. This works as long as everyone uses set_sched_class_params
to program them.
301540:
cxgbe(4): Provide information about traffic classes in the sysctl mib.
301542:
cxgbe(4): A couple of fixes to set_sched_queue.
- Validate the scheduling class against the actual limit (which is chip
specific) instead of a magic number.
- Return an error if an attempt is made to manipulate the tx queues of a
VI that hasn't been initialized.
301628:
cxgbe(4): Add a sysctl to manage the binding of a txq to a traffic class.
jhb [Fri, 4 Nov 2016 21:48:22 +0000 (21:48 +0000)]
MFC 297875: cxgbe(4): Always read the entire mailbox into the reply buffer.
The size of the reply can be different from the size of the command in
case a debug firmware asserts. fw_asrt() needs the entire reply in
order to decode the location of the assert.
jhb [Fri, 4 Nov 2016 21:43:10 +0000 (21:43 +0000)]
MFC 297776,297777,297779: Add DDB commands to cxgbe(4).
297776:
Add a function to lookup a device_t object by name.
This just walks the global list of devices looking for one with the
requested name. The one use case outside of devctl2's implementation
is for DDB commands that wish to lookup devices by name.
297777:
Add a 'show t4 tcb <nexus> <tid>' command to dump a TCB from DDB.
This allows the contents of a TCB to be extracted from a T4/T5 card in
DDB after a panic.
297779:
Add a 'show t4 devlog <nexus>' DDB command.
This command displays the adapter's firmware device log similar to the
dev.<nexus>.misc.devlog sysctl.
jhb [Fri, 4 Nov 2016 21:02:33 +0000 (21:02 +0000)]
MFC 297194:
cxgbe(4): Be consistent and call ETHER_BPF_MTAP before writing anything
to the descriptor ring no matter what path the frame takes within the
driver's tx.
jhb [Fri, 4 Nov 2016 20:56:28 +0000 (20:56 +0000)]
MFC 296975: cxgbe(4): Tidy up PAUSE frame accounting.
Figure out if the chip is counting PAUSE frames in the "normal" stats
and take them out if it is. This fixes a bug in the tx stats because
the default hardware behavior is different for Tx and Rx but the driver
was treating both the same way. The result was that OPACKETS, OBYTES,
and OMCASTS were under-reported (if tx_pause > 0) before this change.
Note that the mac_stats sysctl still gives you the raw value of these
statistics straight from the device registers.
jhb [Fri, 4 Nov 2016 20:38:26 +0000 (20:38 +0000)]
MFC 296950,296951: Configuration updates.
296950:
cxgbe(4): Update some register settings in the default configuration
files to match the "uwire" configuration.
296951:
cxgbe(4): Enable additional capabilities in the default configuration
files. All features with FreeBSD drivers of some kind are now in the
default configuration.
296735:
Fix the following gcc warnings on sparc64, when TCP_OFFLOAD is not
defined:
sys/dev/cxgbe/t4_main.c:7474: warning: 'sysctl_tp_tick' defined but not used
sys/dev/cxgbe/t4_main.c:7505: warning: 'sysctl_tp_dack_timer' defined but not used
sys/dev/cxgbe/t4_main.c:7519: warning: 'sysctl_tp_timer' defined but not used
This just adds a bunch of #ifdef TCP_OFFLOAD in the right places.
296949:
cxgbe(4): Remove a couple of pointless assignments in sysctl_meminfo.
Do not display range if start = stop (this is a workaround for some
unused regions).
jhb [Fri, 4 Nov 2016 19:07:12 +0000 (19:07 +0000)]
MFC 296552,296596,296603,296624,296627: Fixes related to memory windows.
296552:
cxgbe(4): Rename regwin_lock to reg_lock. It is used to protect access
to indirect registers only.
296596:
cxgbe(4): Allow the addr/len pair that is being validated in
validate_mem_range to span multiple memory types. Update
validate_mt_off_len to use validate_mem_range.
296603:
cxgbe(4): Add general purpose routines that offer safe access to the
chip's memory windows. Convert existing users of these windows to the
new routines.
296624:
cxgbe(4): Fix bug in r296603. The memory window needs to be
repositioned if the start address isn't in the window already. One
of the bounds check used the end address instead.
296627:
cxgbe(4): Improvements to the code that deals with the firmware's log.
- Query the location of the log very early during attach. Refresh the
location later after establishing contact with the firmware.
- Save the log's location as a flat address in devlog_params.
- Use a memory window instead of backdoor access to the EDC/MC to read
the log.
jhb [Fri, 4 Nov 2016 18:45:06 +0000 (18:45 +0000)]
MFC 295778,296249,296333,296383,296471,296478,296481,296485,296488-296491,
296493-296496,296544,296710-296711,297863,299685: Catch up to changes to
the internal shared code.
Note that this merge includes two different firmware updates, but the
effective change is to update to the last version (1.15.37.0). As such,
I've trimmed the log message of the first update (1.15.28.0).
In addition, the M_WAIT macro added in t4_regs.h had to be renamed to
CXGBE_M_WAIT to avoid a collision on 10.x that is not present on 11.
295778:
cxgbe: catch up with the latest hardware-related definitions.
296249:
cxgbe(4): Update T5 and T4 firmwares to 1.15.28.0.
296333:
cxgbe(4): First of many changes to reduce diffs with internal shared
code:
- Rename some CamelCase variables.
- s/t4_link_start/t4_link_l1cfg/g
- Pull in t4_get_port_type_description.
- Move t4_wait_op_done to t4_hw.c.
- Flip the order of the RDMA stats.
- Remove unsused function t4_iq_start_stop.
- Move t4_wait_op_done and t4_wait_op_done_val to t4_hw.c
296383:
cxgbe(4): Very basic T6 awareness. This is part of ongoing work to
update to the latest internal shared code.
- Add a chip_params structure to keep track of hardware constants for
all generations of Terminators handled by cxgbe.
- Update t4_hw_pci_read_cfg4 to work with T6.
- Update the hardware debug sysctls (hidden within dev.<tNnex>.<n>.misc.*) to
work with T6. Most of the changes are in the decoders for the CIM
logic analyzer and the MPS TCAM.
- Acquire the regwin lock around indirect register accesses.
296471:
cxgbe(4): Updated register dumps.
- Get the list of registers to read during a regdump from the shared
code instead of the OS specific code. This follows a similar move
internally. The shared code includes the list for T6.
- Update cxgbetool to be able to decode T5 VF, T6, and T6 VF register
dumps (and catch up with some updates to T4 and T5 register decode).
296478:
cxgbe(4): Add a struct sge_params to store per-adapter SGE parameters.
Move the code that reads all the parameters to t4_init_sge_params in the
shared code. Use these per-adapter values instead of globals.
296481:
cxgbe(4): Overhaul the shared code that deals with the chip's TP block,
which is responsible for filtering and RSS.
Add the ability to use filters that match on PF/VF (aka "VNIC id") while
here. This is mutually exclusive with filtering on outer VLAN tag with
Q-in-Q.
296485:
cxgbe(4): Update the interrupt handlers for hardware errors.
296488:
cxgbe(4): Updates to mailbox routines in the shared code.
296489:
cxgbe(4): Updates to the shared routines that deal with the serial EEPROM,
flash, and VPD.
296490:
cxgbe(4): Remove __devinit and SPEED_<foo> as part of catch up with
internal shared code.
296491:
cxgbe(4): Updates to shared routines that get/set various parameters via
the firmware.
296493:
cxgbe(4): Use t4_link_down_rc_str in shared code to decode the reason
the link is down, instead of doing it in OS specific code.
296494:
cxgbe(4): Many new functions in the shared code, unused at this time.
296495:
cxgbe(4): Fix t4_tp_get_rdma_stats.
296496:
cxgbe(4): Minor updates to the shared routines that deal with firmware images.
296544:
cxgbe(4): Reshuffle and rototill t4_hw.c, solely to reduce diffs with
the internal shared code.
296710:
cxgbe(4): Catch up with the latest list of card capabilities as reported
by the firmware.
296711:
cxgbe(4): Fix typo in previous commit.
297863:
Rename the 'M_B' macro in t4_regs.h to 'CXGBE_M_B'.
This fixes a conflict with the M_B macro in powerpc's
<machine/db_machdep.h> exposed by the recent addition of DDB commands
to the cxgbe driver.
299685:
cxgbe(4): Update T5 and T4 firmwares to 1.15.37.0.
These firmwares were obtained from the "Chelsio T5/T4 Unified Wire
v2.12.0.3 for Linux" release. Changes since 1.14.4.0 (which is the
firmware in -STABLE branches) are in the "Release Notes" accompanying
the Unified Wire release and are copy-pasted here as well.
Version : 1.15.37.0
Date : 04/27/2016
================================================================================
FIXES
-----
BASE:
- Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where the default ingress
queue was ignored.
- Fixed an issue where adapter failed to load fw by adjusting DRAM frequency.
- Fixed an issue in watchdog which was causing VM bring-up failure after reboot.
- Fixed 40G link failures with some switches when auto-negotiation enabled.
- Fixed to improve on link bring-up time.
- Per port buffer groups size doubled to improve performance.
- Fixed an issue where bogus d3hot bits were set causing traffic stall.
- Fixed an issue where sometimes adapter was not seen after reboot.
- Fixed an issue where iWARP was crashing in conjunction with traffic management.
- Fixed an issue where link failed to come up after removing twinax cable and
inserting optical module.
ETH
- Fixed a link flap issue on T580-CR.
OFLD
- Fixed a potential iSCSI data corruption issue by disabling RxFragEn flag.
FOiSCSI
- Fixed an issue in recovery path where connection was getting closed before
recovery processing was done.
- Fixed an issue in TCP port reuse.
- Fixed an issue in recovery path when large number (>64) of iSCSI connections
were in use.
- Returned ENETUNREACH if IP was not been provisioned yet and driver tried to
use given inerface.
- Fixed an issue where fw was sending ENETUNREACH event for normal tcp
disconnection.
DCBX
- Fixed an issue where iscsi tlv is sent incorrectly to host. (DCBX CEE)
- Fixed an issue where apply bit set for APP id was affecting the ETS and PFC
settings.(DCBX IEEE)
- Fixed an issue where app priority values are not handled correctly in fw.
(DCBX IEEE)
- Fixed an issue where enable/disable dcbx can cause crash. (DCBX CEE,DCBX IEEE)
FOFCoE
- Removed BB6 support.
ENHANCEMENTS
------------
BASE:
- Added new interface to program DCA settings in SGE contexts; allow 32-byte
IQE size
- Added PTP interface fw_ptp_ts to support PTP Frequeny and Offset adjustment.
- Added MPS raw interface.
ETH:
- New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.
OFLD:
- WR opcode is returned to host in cqe error response.
22.2. T4 Firmware
+++++++++++++++++
Version : 1.15.37.0
Date : 04/27/2016
================================================================================
FIXES
-----
BASE:
- Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where default ingress queue
was ignored.
- Fixed an issue in watchdog which was causing VM bring-up failure after reboot.
- Per port buffer groups size doubled to improve performance.
- Fixed an issue where iWARP was crashing in conjunction with traffic management.
FOiSCSI:
- Fixed an issue in recovery path where connection was getting closed before
recovery processing was done.
- Fixed an issue in TCP port reuse.
- Fixed an issue in recovery path when large number (>64) of iSCSI connections
were in use.
- Returned ENETUNREACH if IP had not been provisioned yet and driver tried to
use given inerface.
DCBX
- Fixed an issue where iscsi tlv is sent incorrectly to host.(DCBX CEE)
- Fixed an issue where enable/disable dcbx can cause crash in firmware.(DCBX CEE)
FOiSCSI
- Fixes an issue where fw was sending ENETUNREACH event for normal tcp
disconnection.
FOFCoE
- Removed BB6 support.
ENHANCEMENTS
------------
BASE:
- Added MPS raw interface.
ETH:
- New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.
================================================================================
jhb [Fri, 4 Nov 2016 18:16:00 +0000 (18:16 +0000)]
MFC 287297,296236: Cleanups to cxgbetool.
287297:
- Replace N(a)/N(i)/N(T)/LEN(a)/ARRAY_SIZE(a) with nitems()
- Add missing <err.h> for err() and <sys/sysctl.h> for sysctlbyname()
- NULL -> 0 for 5th parameter of sysctlbyname()
Note, the original commit touched several files under tools/tools, but
this commit only includes changes to cxgbetool.
296236:
Fix some whitespace nits in cxgbetool.c. No functional change.
trasz [Fri, 4 Nov 2016 14:06:21 +0000 (14:06 +0000)]
MFC r297207:
Make the autofs(5) -hosts map more robust, primarily to make it correctly
handle NFS shares containing whitespace. This also adds the -E parameter
to showmount(8).
jhb [Fri, 4 Nov 2016 04:01:59 +0000 (04:01 +0000)]
MFC 301932: Use sbused() instead of sbspace() to avoid signed issues.
Inserting a full mbuf with an external cluster into the socket buffer
resulted in sbspace() returning -MLEN. However, since sb_hiwat is
unsigned, the -MLEN value was converted to unsigned in comparisons. As a
result, the socket buffer was never autosized. Note that sb_lowat is signed
to permit direct comparisons with sbspace(), but sb_hiwat is unsigned.
Follow suit with what tcp_output() does and compare the value of sbused()
with sb_hiwat instead.
Note: Since stable/10 does not include sbused(), this uses sb->sb_cc
instead.
jhb [Fri, 4 Nov 2016 03:49:53 +0000 (03:49 +0000)]
MFC 290175,290633,299206,300895,301898: Various TOE fixes.
290175:
cxgbe/tom: decide whether to shove segments or not only if there is
payload to transmit.
290633:
cxgbe/t4_tom: add a knob to the default configuration file to tune
the TOE for LAN operation. It is possible to set this to other values
(cluster for networks with little loss and really tight RTTs, and wan
for relatively large RTTs and/or lossy networks) depending on the
environment in which the TOE is being used.
None of this affects plain NIC operation in any way.
299206:
Set the correct vnet in TOE event handlers.
300895:
cxgbe/t4_tom: Exempt RDMA connections from a TCP sanity test for now, to
avoid panicking debug kernels.
t4_tom does not keep track of a connection once it switches to ULP mode
iWARP. If the connection falls out of ULP mode the driver/hardware seq#
etc. are out of sync. A better fix would be to figure out what the
current seq# are, update the driver's state, and perform all sanity
checks as usual.
301898:
cxgbe/t4_tom: Fix inverted assertion in r300895. It is RDMA
connections and not others that are allowed to fail the receive window
check.
jhb [Fri, 4 Nov 2016 03:25:34 +0000 (03:25 +0000)]
MFC 277763,280146,287631: Various fixes to DDP.
277763:
Lock the socket buffer before jumping to the 'out' label if sblock()
fails in t4_soreceive_ddp().
280146:
Move special DDP handling for closing a connection into a new
handle_ddp_close() function in t4_ddp.c as the logic is similar
to handle_ddp_data(). This allows all knowledge of the special
DDP mbufs to be private to t4_ddp.c as well.
287631:
Add a comment to clarify how to determine the amount of received DDP
data.
jch [Thu, 3 Nov 2016 19:58:12 +0000 (19:58 +0000)]
MFC r307966:
Remove an extraneous call to soisconnected() in syncache_socket(),
introduced with r261242. The useful and expected soisconnected()
call is done in tcp_do_segment().
Has been found as part of unrelated PR:212920 investigation.
Improve slightly (~2%) the maximum number of TCP accept per second.