Move the SCTP syscalls to netinet with the rest of the SCTP code. The
syscalls themselves are tightly coupled with the network stack and
therefore should not be in the generic socket code.
The following four syscalls have been marked as NOSTD so they can be
dynamically registered in sctp_syscalls_init() function:
sys_sctp_peeloff
sys_sctp_generic_sendmsg
sys_sctp_generic_sendmsg_iov
sys_sctp_generic_recvmsg
The syscalls are also set up to be dynamically registered when COMPAT32
option is configured.
As a side effect of moving the SCTP syscalls, getsock_cap needs to be
made available outside of the uipc_syscalls.c source file. A proper
prototype has been added to the sys/socketvar.h header file.
API tests from the SCTP reference implementation have been run to ensure
compatibility. (http://code.google.com/p/sctp-refimpl/source/checkout)
Submitted by: Steve Kiernan <stevek@juniper.net>
Reviewed by: tuexen, rrs
Obtained from: Juniper Networks, Inc.
Add sysctl knob to disable port power on a specific USB HUB. You need
to reset the USB HUB using "usbconfig -d X.Y reset" or boot having the
setting in /boot/loader.conf before it activates.
Alexander Motin [Thu, 9 Oct 2014 09:12:08 +0000 (09:12 +0000)]
Make iSCSI connection close somewhat less aggressive.
It allows to push out some final data from the send queue to the socket
before its close. In particular, it increases chances for logout response
to be delivered to the initiator.
Add CROSS_TOOLCHAIN macro select pre seeded external toolchain configuration files
The goal is to provide pre seeded toolchain configurations withing the ports tree
to allow the use of an external toolchain in a simple way:
make CROSS_TOOLCHAIN=powerpc64-gcc TARGET=powerpc TARGET_ARCH=powerpc64 buildworld
This will look for the external toolchain definition in /usr/local/share/mk/powerpc64-gcc.mk
While here add the notion of X_COMPILER_TYPE to the external toolchain framework to allow
to deal with differences between gcc and clang in regards of cross building
Xin LI [Thu, 9 Oct 2014 06:02:53 +0000 (06:02 +0000)]
MFV r272802:
- Limit ARC for zdb at 256MB. zdb do not typically revisit data
in the ARC.
- Increase default max_inflight from 200 to 1000 (can be overriden
by -I) so we can queue more I/Os when doing scrubbing.
- Print status while loading meataslabs for leak detection.
Illumos issues:
5169 zdb should limit its ARC size
5170 zdb -c should create more scrub i/os by default
5171 zdb should print status while loading metaslabs for leak detection
Xin LI [Thu, 9 Oct 2014 05:53:04 +0000 (05:53 +0000)]
3693 restore_object uses at least two transactions to restore an object
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Andriy Gapon <andriy.gapon@hybridcluster.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
Xin LI [Thu, 9 Oct 2014 05:50:23 +0000 (05:50 +0000)]
5175 implement dmu_read_uio_dbuf() to improve cached read performance
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
Xin LI [Thu, 9 Oct 2014 05:48:06 +0000 (05:48 +0000)]
5169 zdb should limit its ARC size
5170 zdb -c should create more scrub i/os by default
5171 zdb should print status while loading metaslabs for leak detection
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Bayard Bell <Bayard.Bell@nexenta.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
Adrian Chadd [Thu, 9 Oct 2014 05:33:25 +0000 (05:33 +0000)]
Add a bus method to fetch the VM domain for the given device/bus.
* Add a bus_if.m method - get_domain() - returning the VM domain or
ENOENT if the device isn't in a VM domain;
* Add bus methods to print out the domain of the device if appropriate;
* Add code in srat.c to save the PXM -> VM domain mapping that's done and
expose a function to translate VM domain -> PXM;
* Add ACPI and ACPI PCI methods to check if the bus has a _PXM attribute
and if so map it to the VM domain;
* (.. yes, this works recursively.)
* Have the pci bus glue print out the device VM domain if present.
Note: this is just the plumbing to start enumerating information -
it doesn't at all modify behaviour.
Check for mbuf copy failure when there are multiple multicast sockets
This partitular case is the only path where the mbuf could be NULL.
udp_append() checked for a NULL mbuf only after invoking the tunneling
callback. Our only in tree tunneling callback - SCTP - assumed a non
NULL mbuf, and it is a bit odd to make the callbacks responsible for
checking this condition.
This also reduces the differences between the IPv4 and IPv6 code.
The M_FLOWID flag should be propagated to the new mbuf pkthdr in
m_move_pkthdr() and m_dup_pkthdr(). The new mbuf already got the
existing flowid value, but would be ignored since the flag was
not set.
Fix draining in ttydev_leave():
1. ERESTART is not only returned when the revoke count changed. It
is also returned when a signal is received. While a change in
the revoke count should be ignored, a signal should not.
2. Waiting until the output queue is entirely drained can cause a
hang when the underlying device is stuck or broken.
Have tty_drain() take care of this by telling it when we're leaving.
When leaving, tty_drain() will use a timed wait to address point 2
above and it will check the revoke count to handle point 1 above.
The timeout is set to 1 second, which is arbitrary and long enough
to expect a change in the output queue.
Properly NUL-terminate the on-stack buffer for reading /boot.config
or /boot/config. In qemu, on a warm boot, the stack is not all zeroes
and we parse beyond the file's contents.
John Baldwin [Wed, 8 Oct 2014 21:53:24 +0000 (21:53 +0000)]
Rewrite timeout(9) to be callout(9)-centric instead. Move the description
of timeout(9) to the end and mark it prominently as deprecated. Document
somewhat how times are specified for the 'sbt' variants. Better explain
how using callout_init_*() to associate a lock with a callout resolves
common races.
When tunneling interface is going to insert mbuf into netisr queue after stripping
outer header, consider it as new packet and clear the protocols flags.
This fixes problems when IPSEC traffic goes through various tunnels and router
doesn't send ICMP/ICMPv6 errors.
Add an argument to the x86 pmap_invalidate_cache_range() to request
forced invalidation of the cache range regardless of the presence of
self-snoop feature. Some recent Intel GPUs in some modes are not
coherent, and dirty lines in CPU cache must be flushed before the
pages are transferred to GPU domain.
Reviewed by: alc (previous version)
Tested by: pho (amd64)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
John Baldwin [Wed, 8 Oct 2014 16:22:59 +0000 (16:22 +0000)]
Add schedgraph traces for callout handlers. Specifically, a callwheel logs
a running event each time it executes a callout function. The event
includes the function pointer, argument, and whether or not it was run from
hardware interrupt context. The callwheel is marked idle when each handler
completes. This effectively logs the duration of each callout routine in
the graph.
Michael Tuexen [Wed, 8 Oct 2014 15:30:59 +0000 (15:30 +0000)]
Ensure that the list of streams sent in a stream reset parameter fits
in an mbuf-cluster.
Thanks to Peter Bostroem for drawing my attention to this part of the code.
Michael Tuexen [Wed, 8 Oct 2014 15:29:49 +0000 (15:29 +0000)]
Ensure that the number of stream reported in srs_number_streams is
consistent with the amount of data provided in the SCTP_RESET_STREAMS
socket option.
Thanks to Peter Bostroem from Google for drawing my attention to
this part of the code.
Kashyap D Desai [Wed, 8 Oct 2014 09:37:47 +0000 (09:37 +0000)]
In the passthru IOCTL path, the mfi command pool was freely accessible N times
where as there are limited number(32) of mfi commands in the pool.
The mfi command pool is now restricted to 27 simultaneous accesses by using
a counting semaphore while calling the passthru function.
In the mrsas_cam.c source file there was a same function name mrsas_poll(),
which was same as the mrsas_poll() implemented in the mrsas.c file for the
polling interface.
To clearly distinguish the functionality by usage we have renamed the former
as mrsas_cam_poll().
In the passthru function let's say it has got an mfi command from the pool
but it has failed in one of the DMA function call which will lead to leak
an mfi command because in the ERROR case it directly returns and not freeing up
the occupied mfi command.
Kashyap D Desai [Wed, 8 Oct 2014 09:35:52 +0000 (09:35 +0000)]
d_poll() callback function is the entry point for poll system call for the application.
It is meant to notify the applications which will be waiting for some
controller events to be occured.
Kashyap D Desai [Wed, 8 Oct 2014 09:34:25 +0000 (09:34 +0000)]
Extended MSI-x vectors support for Invader and Fury(12Gb/s HBA).
This Driver will create multiple MSI-x vector depending upon what FW expose.
As of now 12 Gbp/s MR controller (Invader and Fury) expose 96 msix vector.
As of now 6 Gbp/s MR controller (Thunderbolt) expose 16 msix vector.
Kashyap D Desai [Wed, 8 Oct 2014 09:19:35 +0000 (09:19 +0000)]
This is a feature provided to run 32-bit linux binaries on FreeBSD 64bit
machine, for which 32bit compatibilty code has been added.
As in linux there is only one device entry that is used to fire IOCTL commands,
a new device entry megaraid_sas_ioctl_node is added for solely this
purpose.
From one dev node i.e mrgaraid_sa_ioctl_node we have to find out the
controller instance in case of multicontroller, for which one management info
structure has been added.
Kashyap D Desai [Wed, 8 Oct 2014 08:48:18 +0000 (08:48 +0000)]
Current MegaRAID firmware and hence the driver only supported 64VDs.
E.g: If the user wants to create more than 64VD on a controller,
it is not possible on current firmware/driver.
New feature and requirement to support upto 256VD, firmware/driver/apps need changes.
In addition to that, there must be a backward compatibility of the new driver with the
older firmware and vice versa.
RAID map is the interface between Driver and FW to fetch all required
fields(attributes) for each Virtual Drives.
In the earlier design driver was using the FW copy of RAID map where as
in the new design the Driver will keep the RAID map copy of its own; on which
it will operate for any raid map access in fast path.
Local driver raid map copy will provide ease of access through out the code
and provide generic interface for future FW raid map changes.
For the backward compatibility driver will notify FW that it supports 256VD
to the FW in driver capability field.
Based on the controller properly returned by the FW, the Driver will know
whether it supports 256VD or not and will copy the RAID map accordingly.
At any given time, driver will always have old or new Raid map.
Reviewed by : ambrisko
MFC after : 2 weeks
Sponsored by: AVAGO Technologies
Pyun YongHyeon [Wed, 8 Oct 2014 05:47:01 +0000 (05:47 +0000)]
Add support for QAC AR816x/AR817x Gigabit/Fast Ethernet controllers.
These controllers seem to have the same feature of AR813x/AR815x and
improved RSS support(4 TX queues and 8 RX queues). alc(4) supports
all hardware features except RSS. I didn't implement RX checksum
offloading for AR816x/AR817x just because I couldn't get
confirmation from the Vendor whether AR816x/AR817x corrected its
predecessor's RX checksum offloading bug on fragmented packets.
This change adds supports for the following controllers.
o AR8161 PCIe Gigabit Ethernet controller
o AR8162 PCIe Fast Ethernet controller
o AR8171 PCIe Gigabit Ethernet controller
o AR8172 PCIe Fast Ethernet controller
o Killer E2200 Gigabit Ethernet controller
Tested by: Many
Relnotes: yes
MFC after: 2 weeks
HW donated by: Qualcomm Atheros Communications, Inc.
Pyun YongHyeon [Wed, 8 Oct 2014 05:34:39 +0000 (05:34 +0000)]
Add new quirk PCI_QUIRK_MSI_INTX_BUG to pci(4).
QAC AR816x/E2200 controller has a silicon bug that MSI interrupt
does not assert if PCIM_CMD_INTxDIS bit of command register is set.
Pyun YongHyeon [Wed, 8 Oct 2014 01:03:32 +0000 (01:03 +0000)]
Fix a long standing bug in MAC statistics register access. One
additional register was erroneously added in the MAC register set
such that 7 TX statistics counters were wrong.
Sean Bruno [Tue, 7 Oct 2014 21:50:28 +0000 (21:50 +0000)]
Implement PLPMTUD blackhole detection (RFC 4821), inspired by code
from xnu sources. If we encounter a network where ICMP is blocked
the Needs Frag indicator may not propagate back to us. Attempt to
downshift the mss once to a preconfigured value.
Default this feature to off for now while we do not have a full PLPMTUD
implementation in our stack.
Adds the following new sysctl's for control:
net.inet.tcp.pmtud_blackhole_detection -- turns on/off this feature
net.inet.tcp.pmtud_blackhole_mss -- mss to try for ipv4
net.inet.tcp.v6pmtud_blackhole_mss -- mss to try for ipv6
Adds the following new sysctl's for monitoring:
-- Number of times the code was activated to attempt a mss downshift
net.inet.tcp.pmtud_blackhole_activated
-- Number of times the blackhole mss was used in an attempt to downshift
net.inet.tcp.pmtud_blackhole_min_activated
-- Number of times that we failed to connect after we downshifted the mss
net.inet.tcp.pmtud_blackhole_failed
Gavin Atkinson [Tue, 7 Oct 2014 19:07:50 +0000 (19:07 +0000)]
Support the Vodafone R215 LET USB dongle, which is apparently a rebadged
E5372 with different product IDs.
Interestingly, the standard E5372 IDs (12d1:1506) are currently listed in
u3g.c and are the same as the E3131. However, the R215/E5372 is an NCM
device and works well with cdce(4) whereas the E3131 isn't. More work
may be needed to better identify the other device IDs.
Bjoern A. Zeeb [Tue, 7 Oct 2014 18:00:34 +0000 (18:00 +0000)]
Since introducing the extra mapping in r250103 for architectural performance
events we have actually counted 'Branch Instruction Retired' when people
asked for 'Unhalted core cycles' using the 'unhalted-core-cycles' event mask
mnemonic.
Andriy Gapon [Tue, 7 Oct 2014 16:08:21 +0000 (16:08 +0000)]
l2arc_write_buffers: reduce headroom value
FreeBSD has ARC_BUFC_NUMMETADATALISTS metadata lists and ARC_BUFC_NUMDATALISTS
data lists (currently both are 16) while illumos has just a single list
of each kind.
headroom determines how much data is scanned on a single list
during each run of the l2arc feed thread.
Because FreeBSD has more lists we proportionally decrease the limit.
Andriy Gapon [Tue, 7 Oct 2014 14:30:24 +0000 (14:30 +0000)]
reduce L2ARC_WRITE_SIZE on FreeBSD
FreeBSD has ARC_BUFC_NUMMETADATALISTS metadata lists and ARC_BUFC_NUMDATALISTS
data lists (currently both are 16) while illumos has just a single list
of each kind.
L2ARC_WRITE_SIZE determines the default value of l2arc_write_max which
defines limits on how much data is scanned and written to a cache device
during each run of the l2arc feed thread. The limits are applied on the
per buffer list basis.
Because FreeBSD has more lists we proportionally reduce the limits.