hselasky [Mon, 7 Aug 2017 13:25:57 +0000 (13:25 +0000)]
MFC r321782:
Remove some dead statistics related code and a structure field from the
mlx4en driver which is used by its Linux counterpart, but not under
FreeBSD.
hselasky [Mon, 7 Aug 2017 12:58:31 +0000 (12:58 +0000)]
MFC r321986:
Change reject message type when destroying cm_id in ibore.
This patch fixes an interopability issue between FreeBSD and non-FreeBSD
systems when the connection establishment is aborted. Refer to the
initial commit in Linux, drivers/infiniband/core/cm.c,
for a more detailed description.
Obtained from: Linux
Sponsored by: Mellanox Technologies
hselasky [Mon, 7 Aug 2017 12:49:30 +0000 (12:49 +0000)]
MFC r312882, r321983 and r321984:
Use the busdma API to allocate all DMA-able memory.
The MLX5 driver has four different types of DMA allocations which are
now allocated using busdma:
1) The 4K firmware DMA-able blocks. One busdma object per 4K allocation.
2) Data for firmware commands use the 4K firmware blocks split into four 1K blocks.
3) The 4K firmware blocks are also used for doorbell pages.
4) The RQ-, SQ- and CQ- DMA rings. One busdma object per allocation.
After this patch the mlx5en driver can be used with DMAR enabled in
the FreeBSD kernel.
hselasky [Mon, 7 Aug 2017 12:45:26 +0000 (12:45 +0000)]
MFC r312881:
Add support for device surprise removal and other PCI errors.
- When device disappears from PCI indicate error device state and:
1) Trigger command completion for all pending commands
2) Prevent new commands from executing and return:
- success for modify and remove/cleanup commands
- failure for create/query commands
3) When reclaiming pages for a device in error state don't ask FW to
return all given pages, just release the allocated memory
hselasky [Mon, 7 Aug 2017 12:29:41 +0000 (12:29 +0000)]
MFC r312877 and r312878:
Minor code refactor as a preparation step for suprise removal of CX-4
PCI device(s), changes:
- alloc_entry() now clears bit for page slot entry aswell
- update of cmd->ent_arr[] is now under cmd->alloc_lock
- complete command if alloc_entry() fails
mav [Mon, 7 Aug 2017 07:40:00 +0000 (07:40 +0000)]
MFC r321794: Improve FHA locality control for NFS read/write requests.
This change adds two new tunables, allowing to control serialization for
read and write NFS requests separately. It does not change the default
behavior since there are too many factors to consider, but gives additional
space for further experiments and tuning.
The main motivation for this change is very low write speed in case of ZFS
with sync=always or when NFS clients requests sychronous operation, when
every separate request has to be written/flushed to ZIL, and requests are
processed one at a time. Setting vfs.nfsd.fha.write=0 in that case allows
to increase ZIL throughput by several times by coalescing writes and cache
flushes. There is a worry that doing it may increase data fragmentation
on disks, but I suppose it should not happen for pool with SLOG.
sephe [Mon, 7 Aug 2017 02:15:13 +0000 (02:15 +0000)]
MFC 321762
hyperv: Add VF bringup scripts and devd rules.
How network VF works with hn(4) on Hyper-V in non-transparent mode:
- Each network VF has a cooresponding hn(4).
- The network VF and the it's cooresponding hn(4) have the same hardware
address.
- Once the network VF is up, e.g. ifconfig VF up:
o All of the transmission should go through the network VF.
o Most of the reception goes through the network VF.
o Small amount of reception may go through the cooresponding hn(4).
This reception will happen, even if the the cooresponding hn(4) is
down. The cooresponding hn(4) will change the reception interface
to the network VF, so that network layer and application layer will
be tricked into thinking that these packets were received by the
network VF.
o The cooresponding hn(4) pretends the physical link is down.
- Once the network VF is down or detached:
o All of the transmission should go through the cooresponding hn(4).
o All of the reception goes through the cooresponding hn(4).
o The cooresponding hn(4) fallbacks to the original physical link
detection logic.
All these features are mainly used to help live migration, during which
the network VF will be detached, while the network communication to the
VM must not be cut off. In order to reach this level of live migration
transparency, we use failover mode lagg(4) with the network VF and the
cooresponding hn(4) attached to it.
To ease user configuration for both network VF and non-network VF, the
lagg(4) will be created by the following rules, and the configuration
of the cooresponding hn(4) will be applied to the lagg(4) automatically.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11635
marius [Sun, 6 Aug 2017 16:12:56 +0000 (16:12 +0000)]
MFC: r321589
- Check the slot type capability, set SDHCI_SLOT_{EMBEDDED,NON_REMOVABLE}
for embedded slots. Fail in the sdhci(4) initialization for slot type
shared, which is completely unsupported by this driver at the moment. [1]
For Intel eMMC controllers, taking the embedded slot type into account
obsoltes setting SDHCI_QUIRK_ALL_SLOTS_NON_REMOVABLE so remove these quirk
entries.
- Hide the 1.8 V VDD capability when the slot is detected as non-embedded,
as the SDHCI specification explicitly states that 1.8 V VDD is applicable
to embedded slots only. [2]
- Define some easy bits of the SDHCI specification v4.20. [3]
- Don't leak bus_dma(9) resources in failure paths of sdhci_init_slot().
o Use SDHCI_CAN_DRIVE_TYPE_{A,C,D} to check for driver type support in
SDHCI_CAPABILITIES2 instead of SDHCI_CTRL2_DRIVER_TYPE_{A,C,D} which
are meant for setting the driver type in SDHCI_HOST_CONTROL2.
o Correct a typo in the comment part of r320577 (MFCed to stable/10 in
r320899).
o Add support for eMMC HS200 and HS400 bus speed modes at 200 MHz to
sdhci(4), mmc(4) and mmcsd(4).
On the system where the addition of DDR52 support increased the read
throughput to ~80 MB/s (from ~45 MB/s at high speed), HS200 yields
~154 MB/s and HS400 ~187 MB/s, i. e. performance now has more than
quadrupled compared to pre-r315598 (pre-r318495 in stable/10).
However, in fact this isn't a feature-only change; there are boards
based on Intel Bay Trail where DDR52 is problematic and the suggested
workaround is to use HS200 mode instead. So far exact details are
unknown, however, i. e. whether that's due to a defect in these SoCs
or on the boards.
Moreover, due to the above changes requiring to be aware of possible
MMC siblings in the fast path of mmc(4), corresponding information
now is cached in mmc_softc. As a side-effect, mmc_calculate_clock(),
now longer will trigger a panic in low memory situations and all of
mmc(4) operate on the same set of child devices.
o Fix a bug in the failure reporting of mmcsd_delete() that could lead
to a panic.
o Fix 2 bugs on resume, one in mmcsd(4) that could lead to a panic and
another one in mmc(4) that could lead to devices no longer working.
o Fix a memory leak in mmcsd_ioctl() in case copyin(9) fails. [1]
o Fix missing variable initialization in mmc_switch_status(). [2]
o Fix R1_SWITCH_ERROR detection in mmc_switch_status(). [3]
o Handle the case of device_add_child(9) failing, for example due to
a memory shortage, gracefully in mmc(4) and sdhci(4), including not
leaking memory for the instance variables in case of mmc(4), also
fixing [4].
o Correctly use the size of a pointer rather than that of a pointer to
a pointer (this bug was present in head r321385 only, i. e. not in a
stable branch). [5]
o Handle the case of an unknown SD CSD version in mmc_decode_csd_sd()
gracefully instead of calling panic(9).
o Again, check and handle the return values of some additional function
calls in mmc(4) instead of assuming that everything went right or mark
non-fatal errors by casting the return value to void.
o Correct a typo in the Linux IOCTL compatibility; it should have been
MMC_IOC_MULTI_CMD rather than MMC_IOC_CMD_MULTI.
o Now that we are reaching ever faster speeds (more improvement in this
regard is to be expected when adding ADMA support to sdhci(4)), apply
a few micro-optimizations to mmc(4), mmcsd(4) and sdhci(4).
o Correct confusing and error prone mix-ups between "br" or "bridge" in
mmc(4) and mmcsd(4) where - according to the terminology outlined in
comments of bridge.h and mmcbr_if.m around since their addition in
r163516 - the bus is meant and used instead.
o Remove comment lines from bridge.h incorrectly suggesting that there
would be a MMC bridge base class driver.
o Update comments in bridge.h regarding the star topology of SD and SDIO;
since version 3.00 of the SDHCI specification, for eSD and eSDIO bus
topologies are actually possible in form of so called "shared buses"
(in some subcontext later on renamed to "embedded" buses).
mav [Sun, 6 Aug 2017 08:15:21 +0000 (08:15 +0000)]
MFC r321720, r321856: Attach ichwd(4) only to ISA bus of the LPC bridge.
Resource allocation for parent device does not look good by itself, but
attempt to allocate them for unrelated device just does not end up good.
On Asus X99-E WS/USB3.1 system reporting ISA bridge via both PCI and ACPI
this reported to cause kernel panic on shutdown due to messed resources:
https://bugs.freenas.org/issues/25237.
ngie [Sat, 5 Aug 2017 16:55:07 +0000 (16:55 +0000)]
MFC r320702,r320703:
r320702:
Formalize LEAPSECONDS and OLDTIMEZONES in share/zoneinfo/... as
`MK_ZONEINFO_LEAPSECONDS_SUPPORT == yes` and
`MK_ZONEINFO_OLD_TIMEZONES_SUPPORT == yes`.
Keep `LEAPSECONDS` and `OLDTIMEZONES` for backwards compatibility,
but print out a warning notifying users that they should use the new
variables, in an effort to migrate them to the variables. This is being
done mostly for automated build tools, etc, that might rely on these
variables being set. The variables will be removed in the future on
^/head, e.g., after ^/stable/12 is cut.
Relnotes: yes
r320703:
Add tests to help verify Links functionality for .../contrib/tzdata/backwards
marius [Sat, 5 Aug 2017 14:49:40 +0000 (14:49 +0000)]
Fix a stable/10-specific mismerge in r322096; the MK_NCURSESW handling
should be within the MK_DIALOG block as libncurses{,w} isn't required
when building tzsetup(8) without dialog(3) support.
marius [Sat, 5 Aug 2017 12:54:07 +0000 (12:54 +0000)]
MFC: r274394, r274399, r307802
- Default `bsdconfig timezone' and `tzsetup' to `-s' in a VM.
- Hide dialog specific code behind HAVE_DIALOG. It allows to build a
stripped down version (missing the dialog UI) but perfectly function
tzsetup when world is built WITHOUT_DIALOG.
As in r315225, discard 3072 bytes of RC4 bytestream instead of 1024.
(This implementation of arc4rand(9) is used by the userland ipftest
utility as it approximates ipfilter kernelspace in userspace.)
PR: 217920
Submitted by: codarren@hackers.mu
Reviewed by: emaste, cem
Approved by: so (implicit, in r315225)
Differential Revision: D11747
Patterned after: r315225
hselasky [Thu, 3 Aug 2017 14:14:13 +0000 (14:14 +0000)]
MFC r312872:
Add support for reading advanced diagnostic counters.
By default reading the diagnostic counters is disabled. The firmware
decides which counters are supported and only those supported show up
in the dev.mce.X.diagnostics sysctl tree.
To enable reading of diagnostic counters set one or more of the
following sysctls to one:
hselasky [Thu, 3 Aug 2017 14:09:36 +0000 (14:09 +0000)]
MFC r312865:
Enforce reading the consumer and producer counters once to ensure
consistent return values from the mlx5e_sq_has_room_for()
function. The two counters are incremented by different threads under
different locks.
hselasky [Thu, 3 Aug 2017 14:05:03 +0000 (14:05 +0000)]
MFC r312536:
Allow transmit packet bufring in software to be disabled.
- Add new sysctl node to control the transmit packet bufring.
- Add optimised version of the transmit routine which output packets
directly to the DMA ring instead of using bufring in case the transmit
lock is congested. This can reduce the number of taskswitches which in
turn influence the overall system CPU usage, depending on the
workload.
- Add " TX" suffix to debug name for transmit mutexes to silence some
witness warnings about aquiring duplicate locks having same name.
hselasky [Thu, 3 Aug 2017 13:57:17 +0000 (13:57 +0000)]
MFC r312527:
Add runtime support for modifying the SQ and RQ completion event
moderation mode. The presence of this feature is indicated through the
firmware capabilities.
marius [Wed, 2 Aug 2017 20:27:30 +0000 (20:27 +0000)]
Apply the other half of merges to sdhci_imx(4) and sdhci_fsl(4) that
somehow stayed local when committing r318198, probably due to the
associated tree conflicts.
While at it, register the dependency of sdhci_fsl(4) on sdhci(4) and
allow the former to be built.
Fix probing FC targets with hard addressing turned on.
This largely reverts FreeBSD SVN change 289937 from October 25th, 2015.
The intent of that change was to keep loop IDs persistent across
chip reinits.
The problem is that the change turned on the PREVLOOP /
PREV_ADDRESS bit (bit 7 in Firmware Options 2), which tells the
Qlogic chip to not participate in the loop if it can't get the
requested loop address. It also turned off soft addressing on 2400
(4Gb) and newer controllers.
The isp(4) driver defaults to loop address 0, and the tape drives
I have tested default to loop address 0 if hard addressing is turned
on. So when hard loop addressing is turned on on the drive, the isp(4)
driver just refuses to participate in the loop.
The solution is to largely revert that change. I left some elements
in place that are related to virtual ports, since they were new.
This does work with IBM tape drives with hard and soft addressing
turned on. I have tested it with 4Gb, 8Gb, and 16Gb controllers.
sys/dev/isp.c:
Largely revert FreeBSD SVN change 289937. I left the
ispmbox.h changes in place.
Don't use the PREV_ADDRESS bit on initialization. It tells
the chip to not participate if it can't get the requested
loop ID.
Do use soft addressing on 2400 and newer chips.
Use hard addressing when the user has requested a specific
initiator ID. (hint.isp.X.iid=N in /boot/loader.conf)
Leave some of the virtual port options from that change in
place, but don't turn on the PREV_ADDRESS bit.
gavin [Wed, 2 Aug 2017 15:13:13 +0000 (15:13 +0000)]
Merge r316113,316184,316413 from head:
- Remove #define PCIS_SERIALBUS_SMBUS_PROGIF, unused since r200091
- Switch device_probe() from large case statement to a lookup table
- Add several missing SMBus controllers
trasz [Tue, 1 Aug 2017 15:51:16 +0000 (15:51 +0000)]
MFC r320359:
Add vfs.nfsd.nfsd_enable_uidtostring, which works just like
vfs.nfsd.nfsd_enable_stringtouid, but in reverse - when set to 1,
it forces the NFSv4 server to return numeric UIDs and GIDs instead
of "user@domain" strings. This helps with clients that can't
translate returned identifiers, eg when rerooting.
The same can be achieved by just never running nfsuserd(8),
but the sysctl is useful to toggle the behaviour back and forth
without rebooting.
MFC r320409:
Revert part of r320359, as suggested by rmacklem@. That case is only used
for nfsuserd -manage-gids and shouldn't depend on sysctl.
MFC r321196:
Rename vfs.nfsd.enable_uidtostring to vfs.nfs.enable_uidtostring.
It applies to both NFS client and NFS server, and is useful for both.
This is different from vfs.nfsd.enable_stringtouid, which is specific
to server side.
marius [Mon, 31 Jul 2017 22:19:39 +0000 (22:19 +0000)]
Update stable/10 from 10.3-STABLE to 10.4-PRERELEASE as part of
the 10.4 release cycle, also belatedly marking the official start
of the code slush.
Set the default mdoc(7) version to 10.4, and update the clang(1)
TARGET_TRIPLE to reflect 10.4. While at it, add missing FreeBSD
major versions to mdoc(7).
This status will be reported if the backend NIC is wireless; it's not
useful. Due to the high frequency of the reporting, this could be
pretty annoying; ignore it.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11651
MFC Note: ${FILES} documentation change omitted since r299094 will
not be MFCed to ^/stable/10 .
MFC r320442:
share/examples/tests/{atf,plain}/Makefile: tweak example Makefile snippets
- Including bsd.own.mk isn't required since no MK_<foo> knobs are being
manipulated.
- Update documentation to note that ${FILES} is installed via bsd.progs.mk,
not bsd.prog.mk.
ethernet: Add ethernet interface attached event and devctl notification.
ifnet_arrival_event may not be adequate under certain situation; e.g.
when the LLADDR is needed. So the ethernet ifattach event is announced
after all necessary bits are setup.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11617
dim [Fri, 28 Jul 2017 18:35:29 +0000 (18:35 +0000)]
MFC r321305:
Fix printf format warning in zfs_module.c
Clang 5.0.0 got better warnings about print format strings using %zd,
and this leads to the following -Werror warning on e.g. arm:
sys/boot/efi/boot1/zfs_module.c:186:18: error: format specifies type 'ssize_t' (aka 'int') but the argument has type 'off_t' (aka 'long long') [-Werror,-Wformat]
"(%lu)\n", st.st_size, spa->spa_name, filepath, EFI_ERROR_CODE(status));
^~~~~~~~~~
Fix this by casting off_t arguments to intmax_t, and using %jd instead.
share/examples/tests/Makefile: clean up example snippets/documentation
- TESTSDIR doesn't need to be specified after r289158.
- Including bsd.own.mk isn't required since no MK_<foo> knobs are being
manipulated.
- TESTS_SUBDIRS should be written out in an append format, one entry
per line, to provide a better, more conflict resistant example.
MFC: r321314
r320062 introduced a bug when doing NFSv4.1 mounts against some non-FreeBSD servers.
r320062 used nm_rsize, nm_wsize to set the maximum request/response sizes for
the NFSv4.1 session. If rsize,wsize are not specified as options, the
value of nm_rsize, nm_wsize is 0 at session creation, resulting in
values for request/response that are too small.
This patch fixes the problem. A workaround is to specify rsize=N,wsize=N
mount options explicitly, so they are set before session creation.
This bug only affects NFSv4.1 mounts against some non-FreeBSD servers.
MFC: r321248
Update the nfsv4 man page to reflect recent changes to support
the newer RFCs (5661 and 7530). The main man changes are for the
case of "numbers in strings" for user/groups that RFC7530 allows
and avoids use of nfsuserd(8).
MFC 321233
Raise the watchdog timer interval to 2 ticks, there by guaranteeing
that it fires between 1ms and 2ms. `
Treat two consecutive occurrences of Heartbeat failures as a legitimate
Heartbeat failure
MFC r320077
Change blist_alloc()'s allocation policy from first-fit to next-fit so
that disk writes are more likely to be sequential. This change is
beneficial on both the solid state and mechanical disks that I've
tested. (A similar change in allocation policy was made by DragonFly
BSD in 2013 to speed up Poudriere with "stressful memory parameters".)
Increase the width of blst_meta_alloc()'s parameter "skip" and the local
variables whose values are derived from it to 64 bits. (This matches the
width of the field "skip" that is stored in the structure "blist" and
passed to blst_meta_alloc().)
Eliminate a pointless check for a NULL blist_t.
Simplify blst_meta_alloc()'s handling of the ALL-FREE case.
Address nearby style errors.
MFC r320417
Address the remaining integer overflow issues with the "skip" parameters
and "next_skip" variables. The "skip" value in struct blist has long been
a 64-bit quantity but various functions have implicitly truncated this
value to 32 bits. Now, all arithmetic involving the "skip" value is 64
bits wide. (This should allow us to relax the size limit on a swap device
in the swap pager.)
Maintain the ability to test this allocator as a user-space application by
including <stdbool.h>.
Remove an unused variable from blst_radix_print().
MFC r320527
Change blst_leaf_alloc() to handle a cursor argument, and to improve
performance.
To find in the leaf bitmap all ranges of sufficient length, use a doubling
strategy with shift-and-and until each bit still set represents a bit
sequence of length 'count', or until the bitmask is zero. In the latter
case, update the hint based on the first bit sequence length not found to
be available. For example, seeking an interval of length 12, the set bits
of the bitmap would represent intervals of length 1, then 2, then 3, then
6, then 12. If no bits are set at the point when each bit represents an
interval of length 6, then the hint can be updated to 5 and the search
terminated.
If long-enough intervals are found, discard those before the cursor. If
any remain, use binary search to find the position of the first of them,
and allocate that interval.
Fix spurious timeouts on commands sent to mps(4) and mpr(4) controllers.
mps_wait_command() and mpr_wait_command() were using getmicrotime() to
determine elapsed time when checking for a timeout in polled mode.
getmicrotime() isn't guaranteed to monotonically increase, and that
caused spurious timeouts occasionally.
Switch to using getmicrouptime(), which does increase monotonically.
This fixes the spurious timeouts in my test case.