Add an argument to the x86 pmap_invalidate_cache_range() to request
forced invalidation of the cache range regardless of the presence of
self-snoop feature. Some recent Intel GPUs in some modes are not
coherent, and dirty lines in CPU cache must be flushed before the
pages are transferred to GPU domain.
Reviewed by: alc (previous version)
Tested by: pho (amd64)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
John Baldwin [Wed, 8 Oct 2014 16:22:59 +0000 (16:22 +0000)]
Add schedgraph traces for callout handlers. Specifically, a callwheel logs
a running event each time it executes a callout function. The event
includes the function pointer, argument, and whether or not it was run from
hardware interrupt context. The callwheel is marked idle when each handler
completes. This effectively logs the duration of each callout routine in
the graph.
Michael Tuexen [Wed, 8 Oct 2014 15:30:59 +0000 (15:30 +0000)]
Ensure that the list of streams sent in a stream reset parameter fits
in an mbuf-cluster.
Thanks to Peter Bostroem for drawing my attention to this part of the code.
Michael Tuexen [Wed, 8 Oct 2014 15:29:49 +0000 (15:29 +0000)]
Ensure that the number of stream reported in srs_number_streams is
consistent with the amount of data provided in the SCTP_RESET_STREAMS
socket option.
Thanks to Peter Bostroem from Google for drawing my attention to
this part of the code.
Kashyap D Desai [Wed, 8 Oct 2014 09:37:47 +0000 (09:37 +0000)]
In the passthru IOCTL path, the mfi command pool was freely accessible N times
where as there are limited number(32) of mfi commands in the pool.
The mfi command pool is now restricted to 27 simultaneous accesses by using
a counting semaphore while calling the passthru function.
In the mrsas_cam.c source file there was a same function name mrsas_poll(),
which was same as the mrsas_poll() implemented in the mrsas.c file for the
polling interface.
To clearly distinguish the functionality by usage we have renamed the former
as mrsas_cam_poll().
In the passthru function let's say it has got an mfi command from the pool
but it has failed in one of the DMA function call which will lead to leak
an mfi command because in the ERROR case it directly returns and not freeing up
the occupied mfi command.
Kashyap D Desai [Wed, 8 Oct 2014 09:35:52 +0000 (09:35 +0000)]
d_poll() callback function is the entry point for poll system call for the application.
It is meant to notify the applications which will be waiting for some
controller events to be occured.
Kashyap D Desai [Wed, 8 Oct 2014 09:34:25 +0000 (09:34 +0000)]
Extended MSI-x vectors support for Invader and Fury(12Gb/s HBA).
This Driver will create multiple MSI-x vector depending upon what FW expose.
As of now 12 Gbp/s MR controller (Invader and Fury) expose 96 msix vector.
As of now 6 Gbp/s MR controller (Thunderbolt) expose 16 msix vector.
Kashyap D Desai [Wed, 8 Oct 2014 09:19:35 +0000 (09:19 +0000)]
This is a feature provided to run 32-bit linux binaries on FreeBSD 64bit
machine, for which 32bit compatibilty code has been added.
As in linux there is only one device entry that is used to fire IOCTL commands,
a new device entry megaraid_sas_ioctl_node is added for solely this
purpose.
From one dev node i.e mrgaraid_sa_ioctl_node we have to find out the
controller instance in case of multicontroller, for which one management info
structure has been added.
Kashyap D Desai [Wed, 8 Oct 2014 08:48:18 +0000 (08:48 +0000)]
Current MegaRAID firmware and hence the driver only supported 64VDs.
E.g: If the user wants to create more than 64VD on a controller,
it is not possible on current firmware/driver.
New feature and requirement to support upto 256VD, firmware/driver/apps need changes.
In addition to that, there must be a backward compatibility of the new driver with the
older firmware and vice versa.
RAID map is the interface between Driver and FW to fetch all required
fields(attributes) for each Virtual Drives.
In the earlier design driver was using the FW copy of RAID map where as
in the new design the Driver will keep the RAID map copy of its own; on which
it will operate for any raid map access in fast path.
Local driver raid map copy will provide ease of access through out the code
and provide generic interface for future FW raid map changes.
For the backward compatibility driver will notify FW that it supports 256VD
to the FW in driver capability field.
Based on the controller properly returned by the FW, the Driver will know
whether it supports 256VD or not and will copy the RAID map accordingly.
At any given time, driver will always have old or new Raid map.
Reviewed by : ambrisko
MFC after : 2 weeks
Sponsored by: AVAGO Technologies
Pyun YongHyeon [Wed, 8 Oct 2014 05:47:01 +0000 (05:47 +0000)]
Add support for QAC AR816x/AR817x Gigabit/Fast Ethernet controllers.
These controllers seem to have the same feature of AR813x/AR815x and
improved RSS support(4 TX queues and 8 RX queues). alc(4) supports
all hardware features except RSS. I didn't implement RX checksum
offloading for AR816x/AR817x just because I couldn't get
confirmation from the Vendor whether AR816x/AR817x corrected its
predecessor's RX checksum offloading bug on fragmented packets.
This change adds supports for the following controllers.
o AR8161 PCIe Gigabit Ethernet controller
o AR8162 PCIe Fast Ethernet controller
o AR8171 PCIe Gigabit Ethernet controller
o AR8172 PCIe Fast Ethernet controller
o Killer E2200 Gigabit Ethernet controller
Tested by: Many
Relnotes: yes
MFC after: 2 weeks
HW donated by: Qualcomm Atheros Communications, Inc.
Pyun YongHyeon [Wed, 8 Oct 2014 05:34:39 +0000 (05:34 +0000)]
Add new quirk PCI_QUIRK_MSI_INTX_BUG to pci(4).
QAC AR816x/E2200 controller has a silicon bug that MSI interrupt
does not assert if PCIM_CMD_INTxDIS bit of command register is set.
Pyun YongHyeon [Wed, 8 Oct 2014 01:03:32 +0000 (01:03 +0000)]
Fix a long standing bug in MAC statistics register access. One
additional register was erroneously added in the MAC register set
such that 7 TX statistics counters were wrong.
Sean Bruno [Tue, 7 Oct 2014 21:50:28 +0000 (21:50 +0000)]
Implement PLPMTUD blackhole detection (RFC 4821), inspired by code
from xnu sources. If we encounter a network where ICMP is blocked
the Needs Frag indicator may not propagate back to us. Attempt to
downshift the mss once to a preconfigured value.
Default this feature to off for now while we do not have a full PLPMTUD
implementation in our stack.
Adds the following new sysctl's for control:
net.inet.tcp.pmtud_blackhole_detection -- turns on/off this feature
net.inet.tcp.pmtud_blackhole_mss -- mss to try for ipv4
net.inet.tcp.v6pmtud_blackhole_mss -- mss to try for ipv6
Adds the following new sysctl's for monitoring:
-- Number of times the code was activated to attempt a mss downshift
net.inet.tcp.pmtud_blackhole_activated
-- Number of times the blackhole mss was used in an attempt to downshift
net.inet.tcp.pmtud_blackhole_min_activated
-- Number of times that we failed to connect after we downshifted the mss
net.inet.tcp.pmtud_blackhole_failed
Gavin Atkinson [Tue, 7 Oct 2014 19:07:50 +0000 (19:07 +0000)]
Support the Vodafone R215 LET USB dongle, which is apparently a rebadged
E5372 with different product IDs.
Interestingly, the standard E5372 IDs (12d1:1506) are currently listed in
u3g.c and are the same as the E3131. However, the R215/E5372 is an NCM
device and works well with cdce(4) whereas the E3131 isn't. More work
may be needed to better identify the other device IDs.
Bjoern A. Zeeb [Tue, 7 Oct 2014 18:00:34 +0000 (18:00 +0000)]
Since introducing the extra mapping in r250103 for architectural performance
events we have actually counted 'Branch Instruction Retired' when people
asked for 'Unhalted core cycles' using the 'unhalted-core-cycles' event mask
mnemonic.
Andriy Gapon [Tue, 7 Oct 2014 16:08:21 +0000 (16:08 +0000)]
l2arc_write_buffers: reduce headroom value
FreeBSD has ARC_BUFC_NUMMETADATALISTS metadata lists and ARC_BUFC_NUMDATALISTS
data lists (currently both are 16) while illumos has just a single list
of each kind.
headroom determines how much data is scanned on a single list
during each run of the l2arc feed thread.
Because FreeBSD has more lists we proportionally decrease the limit.
Andriy Gapon [Tue, 7 Oct 2014 14:30:24 +0000 (14:30 +0000)]
reduce L2ARC_WRITE_SIZE on FreeBSD
FreeBSD has ARC_BUFC_NUMMETADATALISTS metadata lists and ARC_BUFC_NUMDATALISTS
data lists (currently both are 16) while illumos has just a single list
of each kind.
L2ARC_WRITE_SIZE determines the default value of l2arc_write_max which
defines limits on how much data is scanned and written to a cache device
during each run of the l2arc feed thread. The limits are applied on the
per buffer list basis.
Because FreeBSD has more lists we proportionally reduce the limits.
POSIX treats negative time_t as undefined (i.e. may be valid too,
depends on system's policy we don't have) and we don't set EOVERFLOW
in mktime/timegm as POSIX requires to surely distinguish -1 return
as valid negative time from -1 as error return.
Mark Johnston [Mon, 6 Oct 2014 21:52:40 +0000 (21:52 +0000)]
Treat D keywords as identifiers in certain postfix expressions. This allows
one to, for example, access the "provider" field of a struct g_consumer,
even though "provider" is a D keyword.
Neel Natu [Mon, 6 Oct 2014 20:48:01 +0000 (20:48 +0000)]
Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT'.
The hypervisor hides the MONITOR/MWAIT capability by unconditionally setting
CPUID.01H:ECX[3] to 0 so the guest should not expect these instructions to
be present anyways.
John Baldwin [Mon, 6 Oct 2014 18:16:45 +0000 (18:16 +0000)]
Properly set the timeout in a query_state. The global query_timeout
configuration value is an integer count of seconds, it is not a timeval.
Using memcpy() to copy a timeval from it put garbage into the tv_usec
field.
Luigi Rizzo [Mon, 6 Oct 2014 15:48:28 +0000 (15:48 +0000)]
Add netmap support to libpcap. Tcpdump and other native pcap application can now
run directly on netmap ports using netmap:foo or valeXX:YY device names.
Modifications to existing code are small and trivial, the netmap-specific
code is all in a new file.
Please be aware that in netmap mode the physical interface is disconnected from
the host stack, so libpcap will steal the traffic not just make a copy.
For the full version of the code (including linux and autotools support) see
https://code.google.com/p/netmap-libpcap/
Craig Rodrigues [Mon, 6 Oct 2014 14:43:02 +0000 (14:43 +0000)]
MFV:
use calloc in get_line() when allocating line to ensure it is fully initialized,
fixes a later uninitialized value in copy_param() (FreeBSD #193499).
PR: 193499
Submitted by: Thomas E. Dickey <tom@invisible-island.net>
Alexander Motin [Mon, 6 Oct 2014 12:20:46 +0000 (12:20 +0000)]
Add support for MaxBurstLength and Expected Data transfer Length parameters.
Before this change target could send R2T request for write transfer of any
size, that could violate iSCSI RFC, which allows initiator to limit maximum
R2T size by negotiating MaxBurstLength connection parameter.
Also report an error in case of write underflow, when initiator provides
less data than initiator expects. Previously in such case our target
sent R2T request for non-existing data, violating the RFC, and confusing
some initiators. SCSI specs don't explicitly define how write underflows
should be handled and there are different oppinions, but reporting error
is hopefully better then violating iSCSI RFC with unpredictable results.
we can't easily predict (in current parsing model)
if the keyword is ipfw(8) reserved keyword or port name.
Checking proto database via getprotobyname() consumes a lot of
CPU and leads to tens of seconds for parsing large ruleset.
Use list of reserved keywords and check them as pre-requisite
before doing getprotobyname().
Alexander Motin [Mon, 6 Oct 2014 10:58:54 +0000 (10:58 +0000)]
Use r271207 optimization only for MSI-enabled HBAs.
It was found that VirtualBox' AHCI does not allow nterrupt to be cleared
before the interrupt status register is read, causing interrupt storm.
AHCI specification allows to skip this register use when multi-vector MSI
is enabled and so interrupting port is known. For single-vector MSI that
is not stated explicitly, but if the port is only one, it is obviously
known too.
Xin LI [Mon, 6 Oct 2014 06:04:10 +0000 (06:04 +0000)]
5162 zfs recv should use loaned arc buffer to avoid copy
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Bayard Bell <Bayard.Bell@nexenta.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matthew Ahrens <mahrens@delphix.com>
Xin LI [Mon, 6 Oct 2014 06:00:50 +0000 (06:00 +0000)]
5178 zdb -vvvvv on old-format pool fails in dump_deadlist()
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matthew Ahrens <mahrens@delphix.com>
Xin LI [Mon, 6 Oct 2014 05:54:39 +0000 (05:54 +0000)]
5176 lock contention on godfather zio
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Reviewed by: Bayard Bell <Bayard.Bell@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matthew Ahrens <mahrens@delphix.com>
Xin LI [Mon, 6 Oct 2014 05:42:20 +0000 (05:42 +0000)]
MFV r272500:
Don't inherit flags other than DS_FLAG_CI_DATASET and DS_FLAG_INCONSISTENT
when cloning. This prevents DS_FLAG_DEFER_DESTROY being inherited from a
clone that is marked for deferred destroy, which causes snapshots of the
clone being destroyed when getting a hold or clone.
Illumos issue:
5150 zfs clone of a defer_destroy snapshot causes strangeness
Acquire the lock in read mode when just needed to ensure the stability
of the keg list. The UMA lock may be held for a long time (relatively
speaking) in uma_reclaim() on machines with lots of zones/kegs. If the
uma_timeout() would fire during that period, subsequent callouts on that
CPU may be significantly delayed.
On error, sbuf_bcat() returns -1. Some callers returned this -1 to
the upper layers, which interpret it as errno value, which happens to
be ERESTART. The result was spurious restarts of the sysctls in loop,
e.g. kern.proc.proc, instead of returning ENOMEM to caller.
Convert -1 from sbuf_bcat() to ENOMEM, when returning to the callers
expecting errno.
In collaboration with: pho
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week
Andrew Turner [Sun, 5 Oct 2014 11:06:22 +0000 (11:06 +0000)]
Merge the big-endian ARM targets together, and the little-endian ARM
targets. With this we assume any ARM target containing eb is big-endian,
otherwise it is little-endian.
1) For %Z format, understand "UTC" name too.
2) Return NULL if timegm() fails, because it means we can convert
what we have in GMT to local time needed.