arybchik [Thu, 14 Jan 2016 15:16:24 +0000 (15:16 +0000)]
MFC r291924
sfxge: switch to TxQ creation specific flags
It is better do not mix TxQ creation and receive event flags since only
checksum flags are applicable to TxQ.
Also it will allow to add a new TxQ creation specific flags.
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
arybchik [Thu, 14 Jan 2016 14:35:46 +0000 (14:35 +0000)]
MFC r291747
sfxge: [EF10] support RxQ scattering control
If, for example, a VF is configured to use a 1500 byte MTU, but the port
it is attached to is set to 9000 bytes, overlength frames can be received
by the VF. As Huntington scatters by default, these overlength packets
would be scattered across several descriptors, with all except the last
having the CONT bit set.
To avoid this, disable scatter when creating RXQs if the firmware
supports doing so, which all recent versions do. Then we only get
a single descriptor from an overlength frame. This will have the CONT
bit set to indicate it was truncated, so we can discard it.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
arybchik [Thu, 14 Jan 2016 14:27:20 +0000 (14:27 +0000)]
MFC r291590
sfxge: retry VF vAdaptor allocation if it fails because of no EVB port yet
After an MC reboot, a VF driver may reset before the PF driver has
finished bringing everything back up. This includes the VFs EVB port.
MC_CMD_VADAPTOR_ALLOC is the first MCDI call after an MC reboot to
require the EVB port, so if it fails with MC_CMD_ERR_NO_EVB_PORT,
retry the command a few times after waiting a while.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
It is really observed in the case of VLAN over sfxge interface.
Also this change makes total value equal to 35 which is default assumed
by the kernel for if_hw_tsomaxsegcount.
garga [Thu, 14 Jan 2016 14:18:10 +0000 (14:18 +0000)]
MFC r293312:
Make cap_mkdb and services_mkdb file operations sync
Similar fix was done for passwd and group operations in r285050. When a
temporary file is created and then renamed to replace official file there
are no checks to make sure data was written to disk and if a power cycle
happens at this time, system can end up with a 0 length file
arybchik [Thu, 14 Jan 2016 14:16:26 +0000 (14:16 +0000)]
MFC r291436
sfxge: add prefast annotation to common code return types
Using a typedef for common code return types (rather than "int")
allows the Prefast static analyser to understand when a function
has been successful (and thus when its postconditions must hold).
This greatly reduces then number of false positives reported by
prefast for error paths in common code functions.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
arybchik [Thu, 14 Jan 2016 14:14:00 +0000 (14:14 +0000)]
MFC r291432
sfxge: modify nvram update functions for uio platform to support
RFID-selectable presets
Dynamic config partitions on boards that support RFID are divided into
a number of segments, each formatted like a partition, with header,
trailer and end tags. The first segment is the current active
configuration.
The segments are initialised by manftest and each contain a different
configuration e.g. firmware variant. The firmware can be instructed
via RFID to copy a segment over the first segment, hence changing the
active configuration. This allows ops to change the configuration of
a board prior to shipment using RFID.
Changes to the dynamic config may need to be written to all segments (in
particular firmware versions written by manftest) or just the first
segment (changes to the active configuration). See SF-111324-SW.
If only the first segment is written the code still needs to be aware of
the possible presence of subsequent segments as writing to a segment may
cause its size to increase, which would overwrite the subsequent
segments and invalidate them.
Boards that do not support RFID will only have one segment in their
dynamic config partition.
Submitted by: Paul Fox <pfox at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
arybchik [Thu, 14 Jan 2016 14:13:13 +0000 (14:13 +0000)]
MFC r291398
sfxge: cleanup: report error on failure path in efx_vpd_hunk_verify
If the VPD is corrupt and contains an 'RV' keyword before the
END tag, then this function could return without setting the
return code to report the error.
Found by prefast.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
arybchik [Thu, 14 Jan 2016 14:10:28 +0000 (14:10 +0000)]
MFC r291396
sfxge: fix prefast warning in falconsiena_tx_qcreate
Keep prefast happy by returning the initial queue index
from falconsiena_tx_qcreate(). No change in behaviour, as
etxo_qcreate already zeros *addedp before the call.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
arybchik [Thu, 14 Jan 2016 14:08:13 +0000 (14:08 +0000)]
MFC r291393
sfxge: infer external port numbering for Pavia
Adjust external port mapping table to distinguish Pavia from Monza.
Now the presence of any 40G mode implies at least 2 outputs per
external port. So Pavia 4x10G ports are now mapped to 1,2,3,4;
Monza 4x10G ports map to 1,1,2,2 as before.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
arybchik [Thu, 14 Jan 2016 14:04:08 +0000 (14:04 +0000)]
MFC r291391
sfxge: do not use unnamed union in siena_mc_combo_rom_hdr_t
GCC 4.2.1 used on FreeBSD 8 and 9 branches does not like unnamed
union member in the structure. It is not strictly required in head,
but nice to have to minimize difference with out-of-tree driver.
glebius [Thu, 14 Jan 2016 09:11:42 +0000 (09:11 +0000)]
o Fix SCTP ICMPv6 error message vulnerability. [SA-16:01.sctp]
o Fix Linux compatibility layer incorrect futex handling. [SA-16:03.linux]
o Fix Linux compatibility layer setgroups(2) system call. [SA-16:04.linux]
o Fix TCP MD5 signature denial of service. [SA-16:05.tcp]
o Fix insecure default bsnmpd.conf permissions. [SA-16:06.bsnmpd]
allanjude [Thu, 14 Jan 2016 01:42:09 +0000 (01:42 +0000)]
MFC: r287473
Add the new sesutil(8) utility for managing SCSI Enclosure Services (SES) device.
MFC: r287493
Fix iteration bug
MFC: r287485, r287494, r287992
Please the angry gcc 4.2 gods
MFC: r287988
Improve and expand sesutil(8)
Return an error if no matching device is found
Locate can address a slot, in addition to a drive
Added fault, similar to locate but blinks a different LED
Added the map command, lists all devices connected to the SES controller
Added the status command, overall status of the SES controller
MFC: r292092
sesutil: fix map not printing the status of the LED device in an array
MFC: r292093
sesutil: pass the correct element type when printing the SES map
MFC: r292121
sesutil: Add extra information specific to some SES devices to sesutil map
MFC: r292122
Fix sesutil locate when a sesid is passed to locate command
MFC: r292262
Show the enclosure name and id in sesutil map
Relnotes: yes
Sponsored by: Gandi.net
Sponsored by: ScaleEngine Inc.
marius [Wed, 13 Jan 2016 21:38:52 +0000 (21:38 +0000)]
MFC: r292943, r292960
- (Ab)use udivx for dividing the u_int pc_cpuid when implementing
CPU_ISSET(), CPU_SET() etc. in sparc64 asm. This approach has the
benefit of not clobbering %y, allowing to revert r222827 and
partially r222828.
- In r222828, CATR() already was changed to use the equivalent of
PCPU_GET(cpuid) instead of the MD module ID for KTR_MASK, so
belatedly also catch up with KTR_CPU and the C side of ktr(9).
Originally, in r203838 CATR() was moved away from directly reading
the module ID or equivalent as that became impractical with other
CPU types than USI/II supported. With r222828 in place, per-CPU
data generally is set up soon enough, though, that employing
PCPU things in ktr(9) also for use during early stages works.
- Unfortunately, an exception to the latter is the ktr(9) use
in pmap_bootstrap(), which actually is run so early that even
checking for bootverbose being set via the loader doesn't work.
Consequently, replace the ktr(9) use in pmap_bootstrap() with
OF_printf(9) and put it under #ifdef DIAGNOSTIC instead.
delphij [Wed, 13 Jan 2016 08:22:53 +0000 (08:22 +0000)]
MFC r292861:
hyperv: vmbus: run non-blocking message handlers in vmbus_msg_swintr()
We'll remove the per-channel control_work_queue because it can't properly
do serialization of message handling, e.g., when there are 2 NIC devices,
vmbus_channel_on_offer() -> hv_queue_work_item() has a race condition:
for an SMP VM, vmbus_channel_process_offer() can run concurrently on
different CPUs and if the second NIC's
vmbus_channel_process_offer() -> hv_vmbus_child_device_register() runs
first, the second NIC's name will be hn0 and the first NIC's name will
be hn1!
We can fix the race condition by removing the per-channel control_work_queue
and run all the message handlers in the global
hv_vmbus_g_connection.work_queue -- we'll do this in the next patch.
With the coming next patch, we have to run the non-blocking handlers
directly in the kernel thread vmbus_msg_swintr(), because the special
handling of sub-channel: when a sub-channel (e.g., of the storvsc driver)
is received and being handled in vmbus_channel_on_offer() running on the
global hv_vmbus_g_connection.work_queue, vmbus_channel_process_offer()
invokes channel->sc_creation_callback, i.e., storvsc_handle_sc_creation,
and the callback will invoke hv_vmbus_channel_open() -> hv_vmbus_post_message
and expect a further reply from the host, but the handling of the further
messag can't be done because the current message's handling hasn't finished
yet; as result, hv_vmbus_channel_open() -> sema_timedwait() will time out
and th device can't work.
Also renamed the handler type from hv_pfn_channel_msg_handler to
vmbus_msg_handler: the 'pfn' and 'channel' in the old name make no sense.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D4596
MFC r292859:
hyperv: vmbus: remove the per-channel control_work_queue
Now vmbus_channel_on_offer() -> vmbus_channel_process_offer() can
safely run on the global hv_vmbus_g_connection.work_queue now.
We remove the per-channel control_work_queue to achieve the proper
serialization of the message handling.
I removed the bogus TODO in vmbus_channel_on_offer(): a vmbus offer
can only come from the parent partition, i.e., the host.
PR: kern/205156
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: Howard Su <howard0su gmail com>, delphij
Differential Revision: https://reviews.freebsd.org/D4597
kevlo [Wed, 13 Jan 2016 01:32:04 +0000 (01:32 +0000)]
MFC r293491:
- Add the definition of CHARCLASS_NAME_MAX, as per POSIX.1-2001.
- Avoid namespace pollution and move definitions of _POSIX2_CHARCLASS_NAME_MAX
and _POSIX2_COLL_WEIGHTS_MAX into the .2001 section.
With input from bde.
davidcs [Tue, 12 Jan 2016 22:58:46 +0000 (22:58 +0000)]
MFC r292638
Check for packet_length is greater than 60 bytes as well as packet_length is
greater than len_on_bd, before invoking the routine to handle jumbo over SGL
(bxe_service_rxsgl()).
Add counters for number of jumbo_over_SGL packets (rx_bxe_service_rxsgl) and
erroneous jumbo_over_SGL packets (rx_erroneous_jumbo_sge_pkts)
dim [Tue, 12 Jan 2016 19:33:43 +0000 (19:33 +0000)]
MFC r292950:
Drop the clang patch which adds recognition of 'CC' suffixes as aliases
for --driver-mode=g++, since this was never upstreamed. For backwards
compatibility, add a wrapper shell script.
allanjude [Tue, 12 Jan 2016 16:38:09 +0000 (16:38 +0000)]
MFC: r284589
Add the ability to detect ZFS and GELI encrypted file systems to fstyp(8)
MFC: r284644
Fix GCC Warnings
MFC: r284728
Only build ZFS support in absense of WITHOUT_ZFS
MFC: r285426
Remove excess copyrights
MFC: r286569
Use GELI sentinel constant
MFC: r287937
Eliminate unneeded copying of vdev data, goto, etc. and add a note
that checksum of vdev label should be checked (which is not done
currently).
No functional change.
While I'm there, raise WARNS to 2.
MFC: r292757
Fix order of includes in usr.sbin/fstyp/zfs.c
trasz [Tue, 12 Jan 2016 14:18:54 +0000 (14:18 +0000)]
Hide the "unmount of /dev failed (BUSY)" warning at shutdown or reboot,
introduced with r293742, just like it was hidden before that commit.
This is a direct commit to 10-STABLE; this special case is not needed
in 11-CURRENT, because devfs supports forced unmounts there. The forced
unmount could be MFC-ed, but there are some LORs at shutdown, and I have
a weird feelings about it.
trasz [Tue, 12 Jan 2016 10:14:57 +0000 (10:14 +0000)]
MFC r290548:
Userspace part of reroot support. This makes it possible to change
the root filesystem without full reboot, using "reboot -r". This can
be used to to eg. boot from a temporary md_image preloaded by loader(8),
setup an iSCSI session, and continue booting from rootfs mounted over
iSCSI.
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3693
trasz [Tue, 12 Jan 2016 10:11:29 +0000 (10:11 +0000)]
MFC r287964:
Kernel part of reroot support - a way to change rootfs without reboot.
Note that the mountlist manipulations are somewhat fragile, and not very
pretty. The reason for this is to avoid changing vfs_mountroot(), which
is (obviously) rather mission-critical, but not very well documented,
and thus hard to test properly. It might be possible to rework it to use
its own simple root mount mechanism instead of vfs_mountroot().
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D2698
trasz [Tue, 12 Jan 2016 10:09:03 +0000 (10:09 +0000)]
MFC r287107:
Make vfs_unmountall() unmount /dev after /, not before. The only
reason this didn't result in an unclean shutdown is that devfs ignores
MNT_FORCE flag.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3467
trasz [Tue, 12 Jan 2016 09:27:01 +0000 (09:27 +0000)]
MFC r289110:
Make geom_nop(4) collect statistics on all types of BIOs, not just
reads and writes.
PR: kern/198405
Submitted by: Matthew D. Fuller <fullermd at over-yonder dot net>
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3679
gjb [Tue, 12 Jan 2016 02:12:40 +0000 (02:12 +0000)]
MFC r293188:
Prevent memstick installation medium from attempting to mount
the root filesystem read-write. This causes problems booting
the memstick installation medium from write-protected USB flash
drives.
PR: 187161, 205886
Sponsored by: The FreeBSD Foundation
asomers [Mon, 11 Jan 2016 21:12:49 +0000 (21:12 +0000)]
MFC r292218
Don't retry SAS commands in response to protocol errors
sys/dev/mpr/mpr_sas_lsi.c
sys/dev/mps/mps_sas_lsi.c
When mp[rs]sas_get_sata_identify returns
MPI2_IOCSTATUS_SCSI_PROTOCOL_ERROR, don't bother retrying. Protocol
errors aren't likely to be fixed by sleeping.
Without this change, a system that generated may protocol errors due
to signal integrity issues was taking more than an hour to boot, due
to all the retries.
asomers [Mon, 11 Jan 2016 20:25:41 +0000 (20:25 +0000)]
MFC r292019
When iostat(8) receives SIGINT while running with "-w" or "-c", it will now
print statistics one more time before exiting. Also, it now implements the
wait using setitimer instead of sleep, so the waits will be more consistent
when the system is heavily loaded.
asomers [Mon, 11 Jan 2016 20:24:56 +0000 (20:24 +0000)]
MFC r292020
Increase devd's client socket buffer size to 256KB. This is not as large as
it looks, because we'll hit the sockbuf's mbuf limit long before hitting its
data limit. A 256KB data limit allows creating a ZFS pool on about 450
drives without overflowing the client socket buffers.
trasz [Mon, 11 Jan 2016 20:10:14 +0000 (20:10 +0000)]
MFC r287396:
It's 2015, and some people are still trying to use fdisk and then
go asking what debug flags to set for GEOM to make it work. Advice
them to use gpart(8) instead.
Something similar should probably done with disklabel,
but I need to rewrite the disklabel examples first.
jimharris [Mon, 11 Jan 2016 17:32:56 +0000 (17:32 +0000)]
MFC r293352:
nvme: add hw.nvme.min_cpus_per_ioq tunable
Due to FreeBSD system-wide limits on number of MSI-X vectors
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321),
it may be desirable to allocate fewer than the maximum number
of vectors for an NVMe device, in order to save vectors for
other devices (usually Ethernet) that can take better
advantage of them and may be probed after NVMe.
This tunable is expressed in terms of minimum number of CPUs
per I/O queue instead of max number of queues per controller,
to allow for a more even distribution of CPUs per queue. This
avoids cases where some number of CPUs have a dedicated queue,
but other CPUs need to share queues. Ideally the PR referenced
above will eventually be fixed and the mechanism implemented
here becomes obsolete anyways.
While here, fix a bug in the CPUs per I/O queue calculation to
properly account for the admin queue's MSI-X vector.
jimharris [Mon, 11 Jan 2016 17:31:18 +0000 (17:31 +0000)]
MFC r293328:
nvme: do not revert to single I/O queue when per-CPU queues not available
Previously nvme(4) would revert to a single I/O queue if it could not
allocate enought interrupt vectors or NVMe submission/completion queues
to have one I/O queue per core. This patch determines how to utilize a
smaller number of available interrupt vectors, and assigns (as closely
as possible) an equal number of cores to each associated I/O queue.