sephe [Mon, 7 Aug 2017 02:49:26 +0000 (02:49 +0000)]
MFC 321762
hyperv: Add VF bringup scripts and devd rules.
How network VF works with hn(4) on Hyper-V in non-transparent mode:
- Each network VF has a cooresponding hn(4).
- The network VF and the it's cooresponding hn(4) have the same hardware
address.
- Once the network VF is up, e.g. ifconfig VF up:
o All of the transmission should go through the network VF.
o Most of the reception goes through the network VF.
o Small amount of reception may go through the cooresponding hn(4).
This reception will happen, even if the the cooresponding hn(4) is
down. The cooresponding hn(4) will change the reception interface
to the network VF, so that network layer and application layer will
be tricked into thinking that these packets were received by the
network VF.
o The cooresponding hn(4) pretends the physical link is down.
- Once the network VF is down or detached:
o All of the transmission should go through the cooresponding hn(4).
o All of the reception goes through the cooresponding hn(4).
o The cooresponding hn(4) fallbacks to the original physical link
detection logic.
All these features are mainly used to help live migration, during which
the network VF will be detached, while the network communication to the
VM must not be cut off. In order to reach this level of live migration
transparency, we use failover mode lagg(4) with the network VF and the
cooresponding hn(4) attached to it.
To ease user configuration for both network VF and non-network VF, the
lagg(4) will be created by the following rules, and the configuration
of the cooresponding hn(4) will be applied to the lagg(4) automatically.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11635
marius [Sun, 6 Aug 2017 16:12:46 +0000 (16:12 +0000)]
MFC: r321589
- Check the slot type capability, set SDHCI_SLOT_{EMBEDDED,NON_REMOVABLE}
for embedded slots. Fail in the sdhci(4) initialization for slot type
shared, which is completely unsupported by this driver at the moment. [1]
For Intel eMMC controllers, taking the embedded slot type into account
obsoltes setting SDHCI_QUIRK_ALL_SLOTS_NON_REMOVABLE so remove these quirk
entries.
- Hide the 1.8 V VDD capability when the slot is detected as non-embedded,
as the SDHCI specification explicitly states that 1.8 V VDD is applicable
to embedded slots only. [2]
- Define some easy bits of the SDHCI specification v4.20. [3]
- Don't leak bus_dma(9) resources in failure paths of sdhci_init_slot().
marius [Sun, 6 Aug 2017 16:07:25 +0000 (16:07 +0000)]
MFC: r319350, r321385, r321490, r321588, r321948
o Use SDHCI_CAN_DRIVE_TYPE_{A,C,D} to check for driver type support in
SDHCI_CAPABILITIES2 instead of SDHCI_CTRL2_DRIVER_TYPE_{A,C,D} which
are meant for setting the driver type in SDHCI_HOST_CONTROL2.
o Add support for eMMC HS200 and HS400 bus speed modes at 200 MHz to
sdhci(4), mmc(4) and mmcsd(4).
On the system where the addition of DDR52 support increased the read
throughput to ~80 MB/s (from ~45 MB/s at high speed), HS200 yields
~154 MB/s and HS400 ~187 MB/s, i. e. performance now has more than
quadrupled compared to pre-r315598 (pre-r318494 in stable/11).
However, in fact this isn't a feature-only change; there are boards
based on Intel Bay Trail where DDR52 is problematic and the suggested
workaround is to use HS200 mode instead. So far exact details are
unknown, however, i. e. whether that's due to a defect in these SoCs
or on the boards.
Moreover, due to the above changes requiring to be aware of possible
MMC siblings in the fast path of mmc(4), corresponding information
now is cached in mmc_softc. As a side-effect, mmc_calculate_clock(),
now longer will trigger a panic in low memory situations and all of
mmc(4) operate on the same set of child devices.
o Fix a bug in the failure reporting of mmcsd_delete() that could lead
to a panic.
o Fix 2 bugs on resume, one in mmcsd(4) that could lead to a panic and
another one in mmc(4) that could lead to devices no longer working.
o Fix a memory leak in mmcsd_ioctl() in case copyin(9) fails. [1]
o Fix missing variable initialization in mmc_switch_status(). [2]
o Fix R1_SWITCH_ERROR detection in mmc_switch_status(). [3]
o Handle the case of device_add_child(9) failing, for example due to
a memory shortage, gracefully in mmc(4) and sdhci(4), including not
leaking memory for the instance variables in case of mmc(4), also
fixing [4].
o Correctly use the size of a pointer rather than that of a pointer to
a pointer (this bug was present in head r321385 only, i. e. not in a
stable branch). [5]
o Handle the case of an unknown SD CSD version in mmc_decode_csd_sd()
gracefully instead of calling panic(9).
o Again, check and handle the return values of some additional function
calls in mmc(4) instead of assuming that everything went right or mark
non-fatal errors by casting the return value to void.
o Correct a typo in the Linux IOCTL compatibility; it should have been
MMC_IOC_MULTI_CMD rather than MMC_IOC_CMD_MULTI.
o Now that we are reaching ever faster speeds (more improvement in this
regard is to be expected when adding ADMA support to sdhci(4)), apply
a few micro-optimizations to mmc(4), mmcsd(4) and sdhci(4).
o Correct confusing and error prone mix-ups between "br" or "bridge" in
mmc(4) and mmcsd(4) where - according to the terminology outlined in
comments of bridge.h and mmcbr_if.m around since their addition in
r163516 - the bus is meant and used instead.
o Remove comment lines from bridge.h incorrectly suggesting that there
would be a MMC bridge base class driver.
o Update comments in bridge.h regarding the star topology of SD and SDIO;
since version 3.00 of the SDHCI specification, for eSD and eSDIO bus
topologies are actually possible in form of so called "shared buses"
(in some subcontext later on renamed to "embedded" buses).
mav [Sun, 6 Aug 2017 08:14:46 +0000 (08:14 +0000)]
MFC r321720, r321856: Attach ichwd(4) only to ISA bus of the LPC bridge.
Resource allocation for parent device does not look good by itself, but
attempt to allocate them for unrelated device just does not end up good.
On Asus X99-E WS/USB3.1 system reporting ISA bridge via both PCI and ACPI
this reported to cause kernel panic on shutdown due to messed resources:
https://bugs.freenas.org/issues/25237.
ngie [Sat, 5 Aug 2017 16:44:31 +0000 (16:44 +0000)]
MFC r320702,r320703:
r320702:
Formalize LEAPSECONDS and OLDTIMEZONES in share/zoneinfo/... as
`MK_ZONEINFO_LEAPSECONDS_SUPPORT == yes` and
`MK_ZONEINFO_OLD_TIMEZONES_SUPPORT == yes`.
Keep `LEAPSECONDS` and `OLDTIMEZONES` for backwards compatibility,
but print out a warning notifying users that they should use the new
variables, in an effort to migrate them to the variables. This is being
done mostly for automated build tools, etc, that might rely on these
variables being set. The variables will be removed in the future on
^/head, e.g., after ^/stable/12 is cut.
Relnotes: yes
r320703:
Add tests to help verify Links functionality for .../contrib/tzdata/backwards
trasz [Sat, 5 Aug 2017 09:40:56 +0000 (09:40 +0000)]
MFC r319798:
Switch the example name for variables controlling loading memory images
in /boot/defaults/loader.conf to something that's actually commonly used,
"mdroot". It's arbitrary, but it's easier to find this way.
trasz [Sat, 5 Aug 2017 09:32:03 +0000 (09:32 +0000)]
MFC r318182:
Improve build(7): add missing "buildkernel" and "installkernel"
to the example, change the architectures to something more common,
and improve description of defaults for TARGET.
As in r315225, discard 3072 bytes of RC4 bytestream instead of 1024.
(This implementation of arc4rand(9) is used by the userland ipftest
utility as it approximates ipfilter kernelspace in userspace.)
PR: 217920
Submitted by: codarren@hackers.mu
Reviewed by: emaste, cem
Approved by: so (implicit, in r315225)
Differential Revision: D11747
Patterned after: r315225
hselasky [Thu, 3 Aug 2017 14:12:23 +0000 (14:12 +0000)]
MFC r312872:
Add support for reading advanced diagnostic counters.
By default reading the diagnostic counters is disabled. The firmware
decides which counters are supported and only those supported show up
in the dev.mce.X.diagnostics sysctl tree.
To enable reading of diagnostic counters set one or more of the
following sysctls to one:
hselasky [Thu, 3 Aug 2017 14:08:37 +0000 (14:08 +0000)]
MFC r312865:
Enforce reading the consumer and producer counters once to ensure
consistent return values from the mlx5e_sq_has_room_for()
function. The two counters are incremented by different threads under
different locks.
hselasky [Thu, 3 Aug 2017 14:03:48 +0000 (14:03 +0000)]
MFC r312536:
Allow transmit packet bufring in software to be disabled.
- Add new sysctl node to control the transmit packet bufring.
- Add optimised version of the transmit routine which output packets
directly to the DMA ring instead of using bufring in case the transmit
lock is congested. This can reduce the number of taskswitches which in
turn influence the overall system CPU usage, depending on the
workload.
- Add " TX" suffix to debug name for transmit mutexes to silence some
witness warnings about aquiring duplicate locks having same name.
hselasky [Thu, 3 Aug 2017 13:55:39 +0000 (13:55 +0000)]
MFC r312527:
Add runtime support for modifying the SQ and RQ completion event
moderation mode. The presence of this feature is indicated through the
firmware capabilities.
hselasky [Thu, 3 Aug 2017 13:45:26 +0000 (13:45 +0000)]
MFC r320773:
Implement fix for BULK IN-token retry mechanism. When the hardware is
programmed for infinite IN token retry after NAK, the SAF1761
hardware, however, does not retry the IN-token. This problem is
described in the SAF1761 errata, section 18.1.1.
While at it:
- Add some minor chip specific initialization for RTEMS.
- Add debug print for status registers in the interrupt filter.
Submitted by: Christian Mauderer <christian.mauderer@embedded-brains.de>
Fix probing FC targets with hard addressing turned on.
This largely reverts FreeBSD SVN change 289937 from October 25th, 2015.
The intent of that change was to keep loop IDs persistent across
chip reinits.
The problem is that the change turned on the PREVLOOP /
PREV_ADDRESS bit (bit 7 in Firmware Options 2), which tells the
Qlogic chip to not participate in the loop if it can't get the
requested loop address. It also turned off soft addressing on 2400
(4Gb) and newer controllers.
The isp(4) driver defaults to loop address 0, and the tape drives
I have tested default to loop address 0 if hard addressing is turned
on. So when hard loop addressing is turned on on the drive, the isp(4)
driver just refuses to participate in the loop.
The solution is to largely revert that change. I left some elements
in place that are related to virtual ports, since they were new.
This does work with IBM tape drives with hard and soft addressing
turned on. I have tested it with 4Gb, 8Gb, and 16Gb controllers.
sys/dev/isp.c:
Largely revert FreeBSD SVN change 289937. I left the
ispmbox.h changes in place.
Don't use the PREV_ADDRESS bit on initialization. It tells
the chip to not participate if it can't get the requested
loop ID.
Do use soft addressing on 2400 and newer chips.
Use hard addressing when the user has requested a specific
initiator ID. (hint.isp.X.iid=N in /boot/loader.conf)
Leave some of the virtual port options from that change in
place, but don't turn on the PREV_ADDRESS bit.
gavin [Wed, 2 Aug 2017 15:11:06 +0000 (15:11 +0000)]
Merge r316113,316184,316413 from head:
- Remove #define PCIS_SERIALBUS_SMBUS_PROGIF, unused since r200091
- Switch device_probe() from large case statement to a lookup table
- Add several missing SMBus controllers
mav [Wed, 2 Aug 2017 14:45:22 +0000 (14:45 +0000)]
MFC r320683: Add naive benchmark for SSDs in ZFS SLOG role.
ZFS SLOGs have very specific access pattern with many cache flushes,
which none of benchmarks I know can simulate. Since SSD vendors rarely
specify cache flush time, this measurement can be useful to explain why
some ZFS pools are slower then expected. This test writes data chunks
of different size followed by cache flush, alike to what ZFS SLOG does,
and measures average time.
To illustrate, here is result for 6 years old SATA Intel 710 Series SSD:
While the first one obviously has maximal throughput limitations, the
second one has so high cache flush latency (about 2 millisecond), that
it makes one almost useless in SLOG role, despite of its good throughput
numbers. Power loss protection is out of scope of this test, but I
suspect it can be related.
trasz [Tue, 1 Aug 2017 14:25:27 +0000 (14:25 +0000)]
MFC r320359:
Add vfs.nfsd.nfsd_enable_uidtostring, which works just like
vfs.nfsd.nfsd_enable_stringtouid, but in reverse - when set to 1,
it forces the NFSv4 server to return numeric UIDs and GIDs instead
of "user@domain" strings. This helps with clients that can't
translate returned identifiers, eg when rerooting.
The same can be achieved by just never running nfsuserd(8),
but the sysctl is useful to toggle the behaviour back and forth
without rebooting.
MFC r320409:
Revert part of r320359, as suggested by rmacklem@. That case is only used
for nfsuserd -manage-gids and shouldn't depend on sysctl.
MFC r321196:
Rename vfs.nfsd.enable_uidtostring to vfs.nfs.enable_uidtostring.
It applies to both NFS client and NFS server, and is useful for both.
This is different from vfs.nfsd.enable_stringtouid, which is specific
to server side.
mav [Tue, 1 Aug 2017 13:03:06 +0000 (13:03 +0000)]
MFC r320604, r320865:
Switch fabric scans from GID_FT to GID_PT+GFF_ID/GFT_ID.
Instead of using GID_FT SNS request to get list of registered FCP ports,
use GID_PT to get list of all Nx_Ports, and then use GFF_ID and/or GFT_ID
requests to find whether they are FCP and target capable.
The problem with old approach is that GID_FT does not report ports without
FC-4 type registered. In particular it was impossible to boot OS from
FreeBSD FC target using QLogic FC BIOS, since one does not register FC-4
type even on new cards and so ignored by old code as incompatible.
As a side bonus this allows initiator to skip pointless logins to other
initiators by fetching that information from SNS instead.
In case some switches do not implement GFF_ID/GFT_ID correctly, add sysctls
to disable that functionality. I handled broken GFF_ID of my Brocade 200E,
but there may be other switches with different bugs.
Linux also uses GID_PT, but GFF_ID is disabled by default there, and GFT_ID
is not supported.
This status will be reported if the backend NIC is wireless; it's not
useful. Due to the high frequency of the reporting, this could be
pretty annoying; ignore it.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11651
ethernet: Add ethernet interface attached event and devctl notification.
ifnet_arrival_event may not be adequate under certain situation; e.g.
when the LLADDR is needed. So the ethernet ifattach event is announced
after all necessary bits are setup.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11617
Only filter out the PF ioctls if we're building without pf support.
Until now those were always filtered out, so truss did not show symbolic
names for pf ioctls.
dim [Fri, 28 Jul 2017 19:10:34 +0000 (19:10 +0000)]
MFC r321342:
Pull in r295886 from upstream clang trunk (by Richard Smith):
PR32034: Evaluate _Atomic(T) in-place when T is a class or array type.
This is necessary in order for the evaluation of an _Atomic
initializer for those types to have an associated object, which an
initializer for class or array type needs.
This fixes an assertion when building recent versions of LinuxCNC.
dim [Fri, 28 Jul 2017 18:47:04 +0000 (18:47 +0000)]
MFC r321306:
Fix printf format warning in iflib.c
Clang 5.0.0 got better warnings about printf format strings using %zd,
and this leads to the following -Werror warning on e.g. arm:
sys/net/iflib.c:1517:8: error: format specifies type 'ssize_t' (aka 'int') but the argument has type 'bus_size_t' (aka 'unsigned long') [-Werror,-Wformat]
sctx->isc_tx_maxsize, nsegments, sctx->isc_tx_maxsegsize);
^~~~~~~~~~~~~~~~~~~~
sys/net/iflib.c:1517:41: error: format specifies type 'ssize_t' (aka 'int') but the argument has type 'bus_size_t' (aka 'unsigned long') [-Werror,-Wformat]
sctx->isc_tx_maxsize, nsegments, sctx->isc_tx_maxsegsize);
^~~~~~~~~~~~~~~~~~~~~~~
Fix this by casting bus_size_t arguments to uintmax_t, and using %ju
instead.
dim [Fri, 28 Jul 2017 18:35:29 +0000 (18:35 +0000)]
MFC r321305:
Fix printf format warning in zfs_module.c
Clang 5.0.0 got better warnings about print format strings using %zd,
and this leads to the following -Werror warning on e.g. arm:
sys/boot/efi/boot1/zfs_module.c:186:18: error: format specifies type 'ssize_t' (aka 'int') but the argument has type 'off_t' (aka 'long long') [-Werror,-Wformat]
"(%lu)\n", st.st_size, spa->spa_name, filepath, EFI_ERROR_CODE(status));
^~~~~~~~~~
Fix this by casting off_t arguments to intmax_t, and using %jd instead.
MFC r314319 (by oshogbo):
Don't try to open devices in the gettc() function which will always
fail in the Capability mode. Instead silently fallback to the syscall
method, which is done for example in the gettimeofday(2) function.
MFC r314320 (by oshogbo):
Remove unneeded variable initialization from r314319.
share/examples/tests/Makefile: clean up example snippets/documentation
- TESTSDIR doesn't need to be specified after r289158.
- Including bsd.own.mk isn't required since no MK_<foo> knobs are being
manipulated.
- TESTS_SUBDIRS should be written out in an append format, one entry
per line, to provide a better, more conflict resistant example.
r320442:
share/examples/tests/{atf,plain}/Makefile: tweak example Makefile snippets
- Including bsd.own.mk isn't required since no MK_<foo> knobs are being
manipulated.
- Update documentation to note that ${FILES} is installed via bsd.progs.mk,
not bsd.prog.mk.
MFC: r321314
r320062 introduced a bug when doing NFSv4.1 mounts against some non-FreeBSD servers.
r320062 used nm_rsize, nm_wsize to set the maximum request/response sizes for
the NFSv4.1 session. If rsize,wsize are not specified as options, the
value of nm_rsize, nm_wsize is 0 at session creation, resulting in
values for request/response that are too small.
This patch fixes the problem. A workaround is to specify rsize=N,wsize=N
mount options explicitly, so they are set before session creation.
This bug only affects NFSv4.1 mounts against some non-FreeBSD servers.
MFC r320152 (by avg): fstyp: move sys/ include path after zfs include paths
The reason is that FreeBSD refcount.h shadows ZFS refcount.h and that
will lead to a build error after a planned import of the ARC buf data
scatter-ization.
It's possible that some day we will have an opposite problem where
a ZFS header would shadow an essential FreeBSD header.
So, we need to think about a better long term solution.
The upstream change was made before we started to import upstream commits
individually. It was imported into the illumos vendor area as r242733.
That commit was MFV-ed in r260138, but as the commit message says
vdev_file.c was left intact.
This commit actually implements the parallel I/O for vdev_file using a
taskqueue with multiple thread. This implementation does not depend on
the illumos or FreeBSD bio interface at all, but uses zio_t to pass
around all the relevent data. So, the code looks a bit different from
the upstream.
This commit also incorporates ZoL commit
zfsonlinux/zfs/bc25c9325b0e5ced897b9820dad239539d561ec9 that fixed
https://github.com/zfsonlinux/zfs/issues/2270
We need to use a dedicated taskqueue for exactly the same reason as ZoL
as we do not implement TASKQ_DYNAMIC.
MFC r320239: MFV r319950:
5220 L2ARC does not support devices that do not provide 512B access
FreeBSD note: the actual change has been in FreeBSD since r297848. This
commit accounts for integration of that change with subsequent changes,
especially r320156 (MFV of r318946) and r314274.
https://www.illumos.org/issues/5220
There are disk devices that have logical sector size larger than 512B, for
example 4KB. That is, their physical sector size is larger than 512B and they
do not provide emulation for 512B sector sizes. For such devices both a data
offset and a data size must be properly aligned. L2ARC should arrange that
because it uses physical I/O.
zio_vdev_io_start() performs a necessary transformation if io_size is not
aligned to vdev_ashift, but that is done only for logical I/O. Something
similar should be done in L2ARC code.
* a temporary write buffer should be allocated if the original buffer is
not going to be compressed and its size is not aligned
* size of a temporary compression buffer should be ashift aligned
* for the reads, if a size of a target buffer is not sufficiently large and
it is not aligned then a temporary read buffer should be allocated
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
https://www.illumos.org/issues/8056
The send size estimate for a zvol can be too low, if the size of the record
headers (dmu_replay_record_t's) is a significant portion of the size.
This is typically the case when the data is highly compressible, especially
with embedded blocks.
The problem is that dmu_adjust_send_estimate_for_indirects() assumes that
blocks are the size of the "recordsize" property (128KB).
However, for zvols, the blocks are the size of the "volblocksize" property
(8KB). Therefore, we estimate that there will be 16x less record headers than
there really will be.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>