hselasky [Tue, 15 Aug 2017 13:37:04 +0000 (13:37 +0000)]
MFC r322250:
Count drop events due to lack of PCI bandwidth as queue drops and not as
input errors in the mlx5en(4) driver. This improves the sysadmin view of
physical port errors.
hselasky [Tue, 15 Aug 2017 09:21:46 +0000 (09:21 +0000)]
MFC r322248:
Fix for mlx4en(4) to properly call m_defrag().
The m_defrag() function can only defrag mbuf chains which have a valid
mbuf packet header. In r291699 when the mlx4en(4) driver was converted
into using BUSDMA(9), the call to m_defrag() was moved after the part
of the transmit routine which strips the header from the mbuf chain.
This effectivly disabled the mbuf defrag mechanism and such packets
simply got dropped.
This patch removes the stripping of mbufs from a chain and loads all
mbufs using busdma. If busdma finds there are no segments, unload
the DMA map and free the mbuf right away, because that means all
data in the mbuf has been inlined in the TX ring. Else proceed
as usual.
Add a per-ring rounter for the number of defrag attempts and
make sure the oversized_packets counter gets zeroed while at it.
The counters are per-ring to avoid excessive cache misses in the
TX path.
hselasky [Mon, 14 Aug 2017 12:59:14 +0000 (12:59 +0000)]
MFC r314878:
Add support for constant pointer constructs to READ_ONCE() in the
LinuxKPI. When the type of the argument is constant the temporary
variable cannot be assigned after the barrier. Instead assign the
temporary variable by initialization.
Approved by: re (kib)
Sponsored by: Mellanox Technologies
tuexen [Sun, 13 Aug 2017 07:40:05 +0000 (07:40 +0000)]
MFC r317244:
Represent "a syncache overflow hasn't happend yet" by using
-(SYNCOOKIE_LIFETIME + 1) instead of INT64_MIN, since it is
good enough and works when time_t is int32 or int64.
marius [Fri, 11 Aug 2017 00:41:43 +0000 (00:41 +0000)]
MFC: r322209
- If available, use TRIM instead of ERASE for implementing BIO_DELETE.
This also involves adding a quirk table as TRIM is broken for some
Kingston eMMC devices, though. Compared to ERASE (declared "legacy"
in the eMMC specification v5.1), TRIM has the advantage of operating
on write sectors rather than on erase sectors, which typically are
of a much larger size. Thus, employing TRIM, we don't need to fiddle
with coalescing BIO_DELETE requests that are also of (write) sector
units into erase sectors, which might not even add up in all cases.
- For some SanDisk iNAND devices, the CMD38 argument, e. g. ERASE,
TRIM etc., has to be specified via EXT_CSD[113], which now is also
handled via a quirk.
- My initial understanding was that for eMMC partitions, the granularity
should be used as erase sector size, e. g. 128 KB for boot partitions.
However, rereading the relevant parts of the eMMC specification v5.1,
this isn't actually correct. So drop the code which used partition
granularities for delmaxsize and stripesize. For the most part, this
change is a NOP, though, because a) for ERASE, mmcsd_delete() used
the erase sector size unconditionally for all partitions anyway and
b) g_disk_limit() doesn't actually take the stripesize into account.
- Take some more advantage of mmcsd_errmsg() in mmcsd(4) for making
error codes human readable.
marius [Thu, 10 Aug 2017 21:39:22 +0000 (21:39 +0000)]
MFC: r322097, r322203
- Since r301131 (MFCed to stable/10 in r321895), /etc/localtime is also
installed when selecting UTC in interactive configurations. Thus, the
code added in r220172 which treats a NULL zone file name as UTC and
removes /etc/localtime in that case can go again.
- Consistently refer to "could not delete" (as chosen by the oldest such
code in here) when unlink(2) fails instead of a to mixture of "delete"
and "unlink" in error messages.
Add IBM TS1155 density codes to libmt and the mt(1) man page.
These are taken directly from the density report from a TS1155
tape drive. (Using mt getdensity)
lib/libmt/mtlib.c:
Add 3592B5 encrypted/unencrypted density codes, and bpmm/bpi
values. The bpmm/bpi values are the same as TS1150, but
there are 50% more tracks.
usr.bin/mt/mt.1:
Add 3592B5 encrypted/unencrypted density codes, bpmm/bpi
values and number of tracks. Bump the man page date.
Sponsored by: Spectra Logic
------------------------------------------------------------------------
r322016 | ken | 2017-08-03 09:04:54 -0600 (Thu, 03 Aug 2017) | 6 lines
delphij [Thu, 10 Aug 2017 06:36:37 +0000 (06:36 +0000)]
Apply upstream fix:
Skip passwords longer than 1k in length so clients can't
easily DoS sshd by sending very long passwords, causing it to spend CPU
hashing them. feedback djm@, ok markus@.
Brought to our attention by tomas.kuthan at oracle.com, shilei-c at
360.cn and coredump at autistici.org
tuexen [Wed, 9 Aug 2017 13:26:12 +0000 (13:26 +0000)]
MFC r317208:
Syncoockies can be used in combination with the syncache. If the cache
overflows, syncookies are used.
This patch restricts the usage of syncookies in this case: accept
syncookies only if there was an overflow of the syncache recently.
This mitigates a problem reported in PR217637, where is syncookie was
accepted without any recent drops.
Thanks to glebius@ for suggesting an improvement.
jkim [Mon, 7 Aug 2017 22:30:18 +0000 (22:30 +0000)]
MFC: r322076
Detect hypervisor early so that we set lower hz on it.
> Description of fields to fill in above: 76 columns --|
> PR: If and which Problem Report is related.
> Submitted by: If someone else sent in the change.
> Reported by: If someone else reported the issue.
> Reviewed by: If someone else reviewed your modification.
> Approved by: If you needed approval for this commit.
> Obtained from: If the change is from a third party.
> MFC after: N [day[s]|week[s]|month[s]]. Request a reminder email.
> MFH: Ports tree branch name. Request approval for merge.
> Relnotes: Set to 'yes' for mention in release notes.
> Security: Vulnerability reference (one per line) or description.
> Sponsored by: If the change was sponsored by an organization.
> Differential Revision: https://reviews.freebsd.org/D### (*full* phabric URL needed).
> Empty fields above will be automatically removed.
_M .
M sys/amd64/amd64/machdep.c
M sys/amd64/include/md_var.h
M sys/i386/i386/machdep.c
M sys/i386/include/md_var.h
M sys/x86/x86/identcpu.c
hselasky [Mon, 7 Aug 2017 13:25:57 +0000 (13:25 +0000)]
MFC r321782:
Remove some dead statistics related code and a structure field from the
mlx4en driver which is used by its Linux counterpart, but not under
FreeBSD.
hselasky [Mon, 7 Aug 2017 12:58:31 +0000 (12:58 +0000)]
MFC r321986:
Change reject message type when destroying cm_id in ibore.
This patch fixes an interopability issue between FreeBSD and non-FreeBSD
systems when the connection establishment is aborted. Refer to the
initial commit in Linux, drivers/infiniband/core/cm.c,
for a more detailed description.
Obtained from: Linux
Sponsored by: Mellanox Technologies
hselasky [Mon, 7 Aug 2017 12:49:30 +0000 (12:49 +0000)]
MFC r312882, r321983 and r321984:
Use the busdma API to allocate all DMA-able memory.
The MLX5 driver has four different types of DMA allocations which are
now allocated using busdma:
1) The 4K firmware DMA-able blocks. One busdma object per 4K allocation.
2) Data for firmware commands use the 4K firmware blocks split into four 1K blocks.
3) The 4K firmware blocks are also used for doorbell pages.
4) The RQ-, SQ- and CQ- DMA rings. One busdma object per allocation.
After this patch the mlx5en driver can be used with DMAR enabled in
the FreeBSD kernel.
hselasky [Mon, 7 Aug 2017 12:45:26 +0000 (12:45 +0000)]
MFC r312881:
Add support for device surprise removal and other PCI errors.
- When device disappears from PCI indicate error device state and:
1) Trigger command completion for all pending commands
2) Prevent new commands from executing and return:
- success for modify and remove/cleanup commands
- failure for create/query commands
3) When reclaiming pages for a device in error state don't ask FW to
return all given pages, just release the allocated memory
hselasky [Mon, 7 Aug 2017 12:29:41 +0000 (12:29 +0000)]
MFC r312877 and r312878:
Minor code refactor as a preparation step for suprise removal of CX-4
PCI device(s), changes:
- alloc_entry() now clears bit for page slot entry aswell
- update of cmd->ent_arr[] is now under cmd->alloc_lock
- complete command if alloc_entry() fails
mav [Mon, 7 Aug 2017 07:40:00 +0000 (07:40 +0000)]
MFC r321794: Improve FHA locality control for NFS read/write requests.
This change adds two new tunables, allowing to control serialization for
read and write NFS requests separately. It does not change the default
behavior since there are too many factors to consider, but gives additional
space for further experiments and tuning.
The main motivation for this change is very low write speed in case of ZFS
with sync=always or when NFS clients requests sychronous operation, when
every separate request has to be written/flushed to ZIL, and requests are
processed one at a time. Setting vfs.nfsd.fha.write=0 in that case allows
to increase ZIL throughput by several times by coalescing writes and cache
flushes. There is a worry that doing it may increase data fragmentation
on disks, but I suppose it should not happen for pool with SLOG.
sephe [Mon, 7 Aug 2017 02:15:13 +0000 (02:15 +0000)]
MFC 321762
hyperv: Add VF bringup scripts and devd rules.
How network VF works with hn(4) on Hyper-V in non-transparent mode:
- Each network VF has a cooresponding hn(4).
- The network VF and the it's cooresponding hn(4) have the same hardware
address.
- Once the network VF is up, e.g. ifconfig VF up:
o All of the transmission should go through the network VF.
o Most of the reception goes through the network VF.
o Small amount of reception may go through the cooresponding hn(4).
This reception will happen, even if the the cooresponding hn(4) is
down. The cooresponding hn(4) will change the reception interface
to the network VF, so that network layer and application layer will
be tricked into thinking that these packets were received by the
network VF.
o The cooresponding hn(4) pretends the physical link is down.
- Once the network VF is down or detached:
o All of the transmission should go through the cooresponding hn(4).
o All of the reception goes through the cooresponding hn(4).
o The cooresponding hn(4) fallbacks to the original physical link
detection logic.
All these features are mainly used to help live migration, during which
the network VF will be detached, while the network communication to the
VM must not be cut off. In order to reach this level of live migration
transparency, we use failover mode lagg(4) with the network VF and the
cooresponding hn(4) attached to it.
To ease user configuration for both network VF and non-network VF, the
lagg(4) will be created by the following rules, and the configuration
of the cooresponding hn(4) will be applied to the lagg(4) automatically.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11635
marius [Sun, 6 Aug 2017 16:12:56 +0000 (16:12 +0000)]
MFC: r321589
- Check the slot type capability, set SDHCI_SLOT_{EMBEDDED,NON_REMOVABLE}
for embedded slots. Fail in the sdhci(4) initialization for slot type
shared, which is completely unsupported by this driver at the moment. [1]
For Intel eMMC controllers, taking the embedded slot type into account
obsoltes setting SDHCI_QUIRK_ALL_SLOTS_NON_REMOVABLE so remove these quirk
entries.
- Hide the 1.8 V VDD capability when the slot is detected as non-embedded,
as the SDHCI specification explicitly states that 1.8 V VDD is applicable
to embedded slots only. [2]
- Define some easy bits of the SDHCI specification v4.20. [3]
- Don't leak bus_dma(9) resources in failure paths of sdhci_init_slot().
o Use SDHCI_CAN_DRIVE_TYPE_{A,C,D} to check for driver type support in
SDHCI_CAPABILITIES2 instead of SDHCI_CTRL2_DRIVER_TYPE_{A,C,D} which
are meant for setting the driver type in SDHCI_HOST_CONTROL2.
o Correct a typo in the comment part of r320577 (MFCed to stable/10 in
r320899).
o Add support for eMMC HS200 and HS400 bus speed modes at 200 MHz to
sdhci(4), mmc(4) and mmcsd(4).
On the system where the addition of DDR52 support increased the read
throughput to ~80 MB/s (from ~45 MB/s at high speed), HS200 yields
~154 MB/s and HS400 ~187 MB/s, i. e. performance now has more than
quadrupled compared to pre-r315598 (pre-r318495 in stable/10).
However, in fact this isn't a feature-only change; there are boards
based on Intel Bay Trail where DDR52 is problematic and the suggested
workaround is to use HS200 mode instead. So far exact details are
unknown, however, i. e. whether that's due to a defect in these SoCs
or on the boards.
Moreover, due to the above changes requiring to be aware of possible
MMC siblings in the fast path of mmc(4), corresponding information
now is cached in mmc_softc. As a side-effect, mmc_calculate_clock(),
now longer will trigger a panic in low memory situations and all of
mmc(4) operate on the same set of child devices.
o Fix a bug in the failure reporting of mmcsd_delete() that could lead
to a panic.
o Fix 2 bugs on resume, one in mmcsd(4) that could lead to a panic and
another one in mmc(4) that could lead to devices no longer working.
o Fix a memory leak in mmcsd_ioctl() in case copyin(9) fails. [1]
o Fix missing variable initialization in mmc_switch_status(). [2]
o Fix R1_SWITCH_ERROR detection in mmc_switch_status(). [3]
o Handle the case of device_add_child(9) failing, for example due to
a memory shortage, gracefully in mmc(4) and sdhci(4), including not
leaking memory for the instance variables in case of mmc(4), also
fixing [4].
o Correctly use the size of a pointer rather than that of a pointer to
a pointer (this bug was present in head r321385 only, i. e. not in a
stable branch). [5]
o Handle the case of an unknown SD CSD version in mmc_decode_csd_sd()
gracefully instead of calling panic(9).
o Again, check and handle the return values of some additional function
calls in mmc(4) instead of assuming that everything went right or mark
non-fatal errors by casting the return value to void.
o Correct a typo in the Linux IOCTL compatibility; it should have been
MMC_IOC_MULTI_CMD rather than MMC_IOC_CMD_MULTI.
o Now that we are reaching ever faster speeds (more improvement in this
regard is to be expected when adding ADMA support to sdhci(4)), apply
a few micro-optimizations to mmc(4), mmcsd(4) and sdhci(4).
o Correct confusing and error prone mix-ups between "br" or "bridge" in
mmc(4) and mmcsd(4) where - according to the terminology outlined in
comments of bridge.h and mmcbr_if.m around since their addition in
r163516 - the bus is meant and used instead.
o Remove comment lines from bridge.h incorrectly suggesting that there
would be a MMC bridge base class driver.
o Update comments in bridge.h regarding the star topology of SD and SDIO;
since version 3.00 of the SDHCI specification, for eSD and eSDIO bus
topologies are actually possible in form of so called "shared buses"
(in some subcontext later on renamed to "embedded" buses).
mav [Sun, 6 Aug 2017 08:15:21 +0000 (08:15 +0000)]
MFC r321720, r321856: Attach ichwd(4) only to ISA bus of the LPC bridge.
Resource allocation for parent device does not look good by itself, but
attempt to allocate them for unrelated device just does not end up good.
On Asus X99-E WS/USB3.1 system reporting ISA bridge via both PCI and ACPI
this reported to cause kernel panic on shutdown due to messed resources:
https://bugs.freenas.org/issues/25237.
ngie [Sat, 5 Aug 2017 16:55:07 +0000 (16:55 +0000)]
MFC r320702,r320703:
r320702:
Formalize LEAPSECONDS and OLDTIMEZONES in share/zoneinfo/... as
`MK_ZONEINFO_LEAPSECONDS_SUPPORT == yes` and
`MK_ZONEINFO_OLD_TIMEZONES_SUPPORT == yes`.
Keep `LEAPSECONDS` and `OLDTIMEZONES` for backwards compatibility,
but print out a warning notifying users that they should use the new
variables, in an effort to migrate them to the variables. This is being
done mostly for automated build tools, etc, that might rely on these
variables being set. The variables will be removed in the future on
^/head, e.g., after ^/stable/12 is cut.
Relnotes: yes
r320703:
Add tests to help verify Links functionality for .../contrib/tzdata/backwards
marius [Sat, 5 Aug 2017 14:49:40 +0000 (14:49 +0000)]
Fix a stable/10-specific mismerge in r322096; the MK_NCURSESW handling
should be within the MK_DIALOG block as libncurses{,w} isn't required
when building tzsetup(8) without dialog(3) support.
marius [Sat, 5 Aug 2017 12:54:07 +0000 (12:54 +0000)]
MFC: r274394, r274399, r307802
- Default `bsdconfig timezone' and `tzsetup' to `-s' in a VM.
- Hide dialog specific code behind HAVE_DIALOG. It allows to build a
stripped down version (missing the dialog UI) but perfectly function
tzsetup when world is built WITHOUT_DIALOG.
As in r315225, discard 3072 bytes of RC4 bytestream instead of 1024.
(This implementation of arc4rand(9) is used by the userland ipftest
utility as it approximates ipfilter kernelspace in userspace.)
PR: 217920
Submitted by: codarren@hackers.mu
Reviewed by: emaste, cem
Approved by: so (implicit, in r315225)
Differential Revision: D11747
Patterned after: r315225
hselasky [Thu, 3 Aug 2017 14:14:13 +0000 (14:14 +0000)]
MFC r312872:
Add support for reading advanced diagnostic counters.
By default reading the diagnostic counters is disabled. The firmware
decides which counters are supported and only those supported show up
in the dev.mce.X.diagnostics sysctl tree.
To enable reading of diagnostic counters set one or more of the
following sysctls to one:
hselasky [Thu, 3 Aug 2017 14:09:36 +0000 (14:09 +0000)]
MFC r312865:
Enforce reading the consumer and producer counters once to ensure
consistent return values from the mlx5e_sq_has_room_for()
function. The two counters are incremented by different threads under
different locks.
hselasky [Thu, 3 Aug 2017 14:05:03 +0000 (14:05 +0000)]
MFC r312536:
Allow transmit packet bufring in software to be disabled.
- Add new sysctl node to control the transmit packet bufring.
- Add optimised version of the transmit routine which output packets
directly to the DMA ring instead of using bufring in case the transmit
lock is congested. This can reduce the number of taskswitches which in
turn influence the overall system CPU usage, depending on the
workload.
- Add " TX" suffix to debug name for transmit mutexes to silence some
witness warnings about aquiring duplicate locks having same name.
hselasky [Thu, 3 Aug 2017 13:57:17 +0000 (13:57 +0000)]
MFC r312527:
Add runtime support for modifying the SQ and RQ completion event
moderation mode. The presence of this feature is indicated through the
firmware capabilities.
marius [Wed, 2 Aug 2017 20:27:30 +0000 (20:27 +0000)]
Apply the other half of merges to sdhci_imx(4) and sdhci_fsl(4) that
somehow stayed local when committing r318198, probably due to the
associated tree conflicts.
While at it, register the dependency of sdhci_fsl(4) on sdhci(4) and
allow the former to be built.
Fix probing FC targets with hard addressing turned on.
This largely reverts FreeBSD SVN change 289937 from October 25th, 2015.
The intent of that change was to keep loop IDs persistent across
chip reinits.
The problem is that the change turned on the PREVLOOP /
PREV_ADDRESS bit (bit 7 in Firmware Options 2), which tells the
Qlogic chip to not participate in the loop if it can't get the
requested loop address. It also turned off soft addressing on 2400
(4Gb) and newer controllers.
The isp(4) driver defaults to loop address 0, and the tape drives
I have tested default to loop address 0 if hard addressing is turned
on. So when hard loop addressing is turned on on the drive, the isp(4)
driver just refuses to participate in the loop.
The solution is to largely revert that change. I left some elements
in place that are related to virtual ports, since they were new.
This does work with IBM tape drives with hard and soft addressing
turned on. I have tested it with 4Gb, 8Gb, and 16Gb controllers.
sys/dev/isp.c:
Largely revert FreeBSD SVN change 289937. I left the
ispmbox.h changes in place.
Don't use the PREV_ADDRESS bit on initialization. It tells
the chip to not participate if it can't get the requested
loop ID.
Do use soft addressing on 2400 and newer chips.
Use hard addressing when the user has requested a specific
initiator ID. (hint.isp.X.iid=N in /boot/loader.conf)
Leave some of the virtual port options from that change in
place, but don't turn on the PREV_ADDRESS bit.
gavin [Wed, 2 Aug 2017 15:13:13 +0000 (15:13 +0000)]
Merge r316113,316184,316413 from head:
- Remove #define PCIS_SERIALBUS_SMBUS_PROGIF, unused since r200091
- Switch device_probe() from large case statement to a lookup table
- Add several missing SMBus controllers
trasz [Tue, 1 Aug 2017 15:51:16 +0000 (15:51 +0000)]
MFC r320359:
Add vfs.nfsd.nfsd_enable_uidtostring, which works just like
vfs.nfsd.nfsd_enable_stringtouid, but in reverse - when set to 1,
it forces the NFSv4 server to return numeric UIDs and GIDs instead
of "user@domain" strings. This helps with clients that can't
translate returned identifiers, eg when rerooting.
The same can be achieved by just never running nfsuserd(8),
but the sysctl is useful to toggle the behaviour back and forth
without rebooting.
MFC r320409:
Revert part of r320359, as suggested by rmacklem@. That case is only used
for nfsuserd -manage-gids and shouldn't depend on sysctl.
MFC r321196:
Rename vfs.nfsd.enable_uidtostring to vfs.nfs.enable_uidtostring.
It applies to both NFS client and NFS server, and is useful for both.
This is different from vfs.nfsd.enable_stringtouid, which is specific
to server side.