yongari [Tue, 1 Mar 2011 00:04:34 +0000 (00:04 +0000)]
MFC r218289:
Disable TX IP checksum offloading for RTL8168C controllers. The
controller in question generates frames with bad IP checksum value
if packets contain IP options. For instance, packets generated by
ping(8) with record route option have wrong IP checksum value. The
controller correctly computes checksum for normal TCP/UDP packets
though.
There are two known RTL8168/8111C variants in market and the issue
I observed happened on RL_HWREV_8168C_SPIN2. I'm not sure
RL_HWREV_8168C also has the same issue but it would be better to
assume it has the same issue since they shall share same core.
RTL8102E which is supposed to be released at the time of
RTL8168/8111C announcement does not have the issue.
Tested by: Konstantin V. Krotov ( kkv <> insysnet dot ru )
yongari [Tue, 1 Mar 2011 00:01:34 +0000 (00:01 +0000)]
MFC r217911:
Add support for RTL8105E PCIe Fast Ethernet controller. It seems
the controller has a kind of embedded controller/memory and vendor
applies a large set of magic code via undocumented PHY registers in
device initialization stage. I guess it's a firmware image for the
embedded controller in RTL8105E since the code is too big compared
to other DSP fixups. However I have no idea what that magic code
does and what's purpose of the embedded controller. Fortunately
driver seems to still work without loading the firmware.
While I'm here change device description of RTL810xE controller.
yongari [Mon, 28 Feb 2011 23:41:27 +0000 (23:41 +0000)]
MFC r217902:
Do not use interrupt taskqueue on controllers with MSI/MSI-X
capability. One of reason using interrupt taskqueue in re(4) was
to reduce number of TX/RX interrupts under load because re(4)
controllers have no good TX/RX interrupt moderation mechanism.
Basic TX interrupt moderation is done by hardware for most
controllers but RX interrupt moderation through undocumented
register showed poor RX performance so it was disabled in r215025.
Using taskqueue to handle RX interrupt greatly reduced number of
interrupts but re(4) consumed all available CPU cycles to run the
taskqueue under high TX/RX network load. This can happen even with
RTL810x fast ethernet controller and I believe this is not
acceptable for most systems.
To mitigate the issue, use one-shot timer register to moderate RX
interrupts. The timer register provides programmable one-shot timer
and can be used to suppress interrupt generation. The timer runs at
125MHZ on PCIe controllers so the minimum time allowed for the
timer is 8ns. Data sheet says the register is 32 bits but
experimentation shows only lower 13 bits are valid so maximum time
that can be programmed is 65.528us. This yields theoretical maximum
number of RX interrupts that could be generated per second is about
15260. Combined with TX completion interrupts re(4) shall generate
less than 20k interrupts. This number is still slightly high
compared to other intelligent ethernet controllers but system is
very responsive even under high network load.
Introduce sysctl variable dev.re.%d.int_rx_mod that controls amount
of time to delay RX interrupt processing in units of us. Value 0
completely disables RX interrupt moderation. To provide old
behavior for controllers that have MSI/MSI-X capability, introduce
a new tunable hw.re.intr_filter. If the tunable is set to non-zero
value, driver will use interrupt taskqueue. The default value of
the tunable is 0. This tunable has no effect on controllers that
has no MSI/MSI-X capability or if MSI/MSI-X is explicitly disabled
by administrator.
While I'm here cleanup interrupt setup/teardown since re(4) uses
single MSI/MSI-X message at this moment.
rwatson [Mon, 28 Feb 2011 23:28:35 +0000 (23:28 +0000)]
Merge userspace DTrace support from head to stable/8:
r209721:
Merge from vendor-sys/opensolaris:
* add fasttrap files
r209731:
Introduce USD_{SET,GET}{BASE,LIMIT}. These help setting up the user
segment descriptor hi and lo values. Idea from Solaris.
Reviewed by: kib
r209763:
Fix style issues with the previous commit, namely
use-tab-instead-of-space and don't use underscores in macro variables.
Pointed out by: bde
r210292:
Fix typo in comment.
r210357:
MFamd64:
Add USD_GETBASE(), USD_SETBASE(), USD_GETLIMIT() and USD_SETLIMIT().
r210611:
Bump the witness pendlist to 768 to accomodate the increased number of
spinlocks.
r211553:
Add sysname to struct opensolaris_utsname. This is needed by one DTrace
test.
r211566:
Add a sysname char * to struct opensolaris_utsname.
r211606:
Add the FreeBSD definition for the fasttrap ioctls.
r211607:
Add a function compatibility function dtrace_instr_size_isa() that on
FreeBSD does the same as dtrace_dis_isize().
r211608:
Kernel DTrace support for:
o uregs (sson@)
o ustack (sson@)
o /dev/dtrace/helper device (needed for USDT probes)
r211610:
Add more compatibility structure members needed by the upcoming fasttrap
DTrace device.
r211611:
Destroy the helper device when unloading.
r211613:
Fix style issues.
r211614:
Bump KDTRACE_THREAD_ZERO and use M_ZERO as a malloc flag instead of
calling bzero.
r211615:
Remove an elif and add an or-clause.
r211616:
Add an extra comment to the SDT probes definition. This allows us to get
use '-' in probe names, matching the probe names in Solaris.
Add userland SDT probes definitions to sys/sdt.h.
r211617:
Call the systrace_probe_func() when the error value.
r211618:
Port this to FreeBSD. We miss some suword functions, so we use copyout.
r211738:
Port the fasttrap provider to FreeBSD. This provider is responsible for
injecting debugging probes in the userland programs and is the basis for
the pid provider and the usdt provider.
r211744:
MD fasttrap implementation.
r211745:
Replace a pksignal() call with tdksignal().
Pointed out by: kib
r211746:
Update for the recent location of the fasttrap code.
r211747:
Replace structure assignments with explicity memcpy calls. This allows
Clang to compile this file: it was using the builtin memcpy and we want
to use the memcpy defined in gptboot.c. (Clang can't compile boot2 yet).
Submitted by: Dimitry Andric <dimitry at andric.com>
Reviewed by: jhb
r211751:
Add a trap code for DTrace induced traps.
r211752:
Add two DTrace trap type values. Used by fasttrap.
r211753:
Enable fasttrap and make dtraceall depend on fasttrap when building i386
or amd64.
r211804:
Call the necessary DTrace function pointers when we have different kinds
of traps.
r211813:
Add the necessary DTrace function pointers.
r211839:
Sync DTrace bits with amd64 and fix the build.
r211924:
Register an interrupt vector for DTrace return probes. There is some
code missing in lapic to make sure that we don't overwrite this entry,
but this will be done on a sequent commit.
r211925:
Replace a memory barrier with a mutex barrier.
r211926:
Add the path necessary to find fasttrap_isa.h to CFLAGS.
r211929:
Remove debugging.
r212004:
When DTrace is enabled, make sure we don't overwrite the IDT_DTRACE_RET
entry with an IRQ for some hardware component.
Reviewed by: jhb
r212093:
Make the /dev/dtrace/helper node have the mode 0660. This allows
programs that refuse to run as root (pgsql) to install probes when their
user is part of the wheel group.
r212357:
Fix two bugs in DTrace:
* when the process exits, remove the associated USDT probes
* when the process forks, duplicate the USDT probes.
r212465:
Avoid a LOR (sleepable after non-sleepable) in
fasttrap_tracepoint_enable().
r212494:
Revamp locking a bit. This fixes three problems:
* processes now can't go away while we are inserting probes (fixes a panic)
* if a trap happens, we won't be holding the process lock (fixes a hang)
* fix a LOR between the process lock and the fasttrap bucket list lock
Thanks to kib for pointing some problems.
r212568:
Bump __FreeBSD_version to reflect the userland DTrace changes
Sponsored by: The FreeBSD Foundation
Userspace DTrace work by: rpaulo
yongari [Mon, 28 Feb 2011 21:21:24 +0000 (21:21 +0000)]
MFC r217857:
Prefer MSI-X to MSI on controllers that support MSI-X. All
recent PCIe controllers(RTL8102E or later and RTL8168/8111C or
later) supports either 2 or 4 MSI-X messages. Unfortunately vendor
did not publicly release RSS related information yet. However
switching to MSI-X is one-step forward to support RSS.
ken [Mon, 28 Feb 2011 16:39:15 +0000 (16:39 +0000)]
MFC: r219036
Silence 'out of chain frames' warnings and bump the number of frames.
mps.c: Hide the 'out of chain frames' warning behind MPS_INFO.
mps_sas.c: Hide the SIM queue freeze/unfreeze messages behind MPS_INFO.
mpsvar.h: Bump the number of chain frames from 1024 to 2048. From
testing, it looks like this makes it less likely that we'll
run out of chain frames, and it doesn't cost much memory
(32K).
alc [Sat, 26 Feb 2011 21:27:41 +0000 (21:27 +0000)]
MFC r217453
For some time now, the kernel and kmem objects have been ordinary
OBJT_PHYS objects. Thus, there is no need for handling them specially
in vm_fault(). In fact, this special case handling would have led to
an assertion failure just before the call to pmap_enter().
alc [Sat, 26 Feb 2011 21:08:09 +0000 (21:08 +0000)]
MFC r216090
Correct an error in the allocation of the vm_page_dump array in
vm_page_startup(). Specifically, the dump_avail array should be used
instead of the phys_avail array to calculate the size of vm_page_dump.
ume [Sat, 26 Feb 2011 20:04:14 +0000 (20:04 +0000)]
MFC r217642:
- Hide the internal scope address representation of the KAME IPv6
stack from the output of `netstat -ani'.
- The node-local multicast address in the output of `netstat -rn'
should be handled as well.
brucec [Sat, 26 Feb 2011 11:46:06 +0000 (11:46 +0000)]
MFC r218985:
Use the cprd_mem field when setting the start and length for a memory
resource - the layout of cprd_port is identical but using cprd_mem
makes the code easier to understand.
PR: kern/118493
Submitted by: Weongyo Jeong <weongyo.jeong at gmail.com>
netchild [Fri, 25 Feb 2011 15:32:44 +0000 (15:32 +0000)]
MFC r216591:
Suggest to run the delete-old target after the second mergemaster. If you run
it before, your rc scripts may still reference old files/directories and
if you are in the unlucky situation to have triggered a reboot (intentionally
or not) between the delete-old run and the mergemaster, your system may not
start anymore.
While I'm here, give a hint about delete-old-libs.
Noticed by: bcr (luckily in a discussion and not by getting hit by
this)
brucec [Thu, 24 Feb 2011 11:03:16 +0000 (11:03 +0000)]
MFC r218910:
The FD_FORM ioctl used to ignore errors from the floppy controller; now when
it encounters an error it returns an error from the ioctl.
Ignore any errors when using the FD_FORM ioctl.
hrs [Wed, 23 Feb 2011 19:07:50 +0000 (19:07 +0000)]
Import and update relnotes items for 8.2R:
fix SA table's cell width[1],
alq(4) improvement in details, TCP reassembly improved[2],
xz rewording[3], and
various grammer fixes[4].
yongari [Tue, 22 Feb 2011 21:24:36 +0000 (21:24 +0000)]
MFC r218710:
Fix a regression introduced in r215906. The change made in r215906
caused link re-negotiation whenever application joins or leaves a
multicast group. If driver is running, it would have established a
link so there is no need to start re-negotiation. The re-negotiation
broke established link which in turn stopped multicast application
working while re-negotiation is in progress.
brucec [Tue, 22 Feb 2011 17:37:13 +0000 (17:37 +0000)]
MFC r218839:
In the distribution list, 'A' is listed as the key to press to select both
'All' and 'Minimal'. Update the keys for Minimal and Custom to avoid the
conflict.
PR: bin/153809
Submitted by: Janne Snabb <snabb at epipe.com>
ken [Mon, 21 Feb 2011 18:11:56 +0000 (18:11 +0000)]
MFC: r218812
Fix several issues with the mps(4) driver.
When the driver ran out of DMA chaining buffers, it kept the timeout for
the I/O, and I/O would stall.
The driver was not freezing the device queue on errors.
mps.c: Pull command completion logic into a separate
function, and call the callback/wakeup for commands
that are never sent due to lack of chain buffers.
Add a number of extra diagnostic sysctl variables.
Handle pre-hardware errors for configuration I/O.
This doesn't panic the system, but it will fail the
configuration I/O and there is no retry mechanism.
So the device probe will not succeed. This should
be a very uncommon situation, however.
mps_sas.c: Freeze the SIM queue when we run out of chain
buffers, and unfreeze it when more commands
complete.
Freeze the device queue when errors occur, so that
CAM can insure proper command ordering.
Report pre-hardware errors for task management
commands. In general, that shouldn't be possible
because task management commands don't have S/G
lists, and that is currently the only error path
before we get to the hardware.
Handle pre-hardware errors (like out of chain
elements) for SMP requests. That shouldn't happen
either, since we should have enough space for two
S/G elements in the standard request.
For commands that end with
MPI2_IOCSTATUS_SCSI_IOC_TERMINATED and
MPI2_IOCSTATUS_SCSI_EXT_TERMINATED, return them
with CAM_REQUEUE_REQ to retry them unconditionally.
These seem to be related to back end, transport
related problems that are hopefully transient. We
don't want to go through the retry count for
something that is not a permanent error.
Keep track of the number of outstanding I/Os.
mpsvar.h: Track the number of free chain elements.
Add variables for the number of outstanding I/Os,
and I/O high water mark.
Add variables to track the number of free chain
buffers and the chain low water mark, as well as
the number of chain allocation failures.
ken [Mon, 21 Feb 2011 16:55:53 +0000 (16:55 +0000)]
MFC: r218811:
In the MPS driver, during device removal processing, don't assume that
the controller firmware will return all of our commands. Instead, keep
track of outstanding I/Os and return them to CAM once device removal
processing completes.
mpsvar.h: Declare the new "io_list" in the mps_softc.
mps.c: Initialize the new "io_list" in the mps softc.
mps_sas.c: o Track SCSI I/O requests on the io_list from the
time of mpssas_action() through mpssas_scsiio_complete().
o Zero out the request structures used for device
removal commands prior to filling them out.
o Once the target reset task management function completes
during device removal processing, assume any SCSI I/O
commands that are still oustanding will never return
from the controller, and process them manually.
yongari [Mon, 21 Feb 2011 01:19:09 +0000 (01:19 +0000)]
MFC r217524,217766:
Change model names of controller RTL_HWREV_8168_SPIN[123] to real ones.
s/RL_HWREV_8168_SPIN1/RL_HWREV_8168B_SPIN1/g
s/RL_HWREV_8168_SPIN2/RL_HWREV_8168B_SPIN2/g
s/RL_HWREV_8168_SPIN3/RL_HWREV_8168B_SPIN3/g
No functional changes.
r217766:
Apply TX interrupt moderation to all RTL810xE PCIe Fast Ethernet
controllers. Experimentation with RTL8102E, RTL8103E and RTL8105E
showed dramatic decrement of TX completion interrupts under high TX
load(e.g. from 147k interrupts/second to 10k interrupts/second)
With this change, TX interrupt moderation is applied to all
controllers except RTL8139C+.
yongari [Mon, 21 Feb 2011 01:08:13 +0000 (01:08 +0000)]
MFC r217499:
Implement initial jumbo frame support for RTL8168/8111 C/D/E PCIe
GbE controllers. It seems these controllers no longer support
multi-fragmented RX buffers such that driver have to allocate
physically contiguous buffers.
o Retire RL_FLAG_NOJUMBO flag and introduce RL_FLAG_JUMBOV2 to
mark controllers that use new jumbo frame scheme.
o Configure PCIe max read request size to 4096 for standard frames
and reduce it to 512 for jumbo frames.
o TSO/checksum offloading is not supported for jumbo frames on
these controllers. Reflect it to ioctl handler and driver
initialization.
o Remove unused rl_stats_no_timeout in softc.
o Embed a pointer to structure rl_hwrev into softc to keep track
of controller MTU limitation and remove rl_hwrev in softc since
that information is available through a pointer to structure
rl_hwrev.
Special thanks to Realtek for donating sample hardwares which made
this possible.
yongari [Mon, 21 Feb 2011 00:58:50 +0000 (00:58 +0000)]
MFC r217247,217381-217382,217384-217385:
r217247:
When driver is not running, do not send DUMP command to controller
and just show old (cached) values. Controller will not respond to
the command unless MAC is enabled so DUMP request for down
interface caused request timeout.
r217381:
Allow TX/RX checksum offloading to be configured independently.
r217382:
re_reset() should be called only after setting device specific
features.
r217384:
Make sure to check validity of dma maps before destroying.
r217385:
If driver is not able to allocate RX buffer, do not start driver.
While I'm here move RX buffer allocation and descriptor
initialization up to not touch hardware registers in case of RX
buffer allocation failure.
yongari [Mon, 21 Feb 2011 00:47:39 +0000 (00:47 +0000)]
MFC r217246,217832:
r217246:
Implement TSO on RealTek RTL8168/8111 C or later controllers.
RealTek changed TX descriptor format for later controllers so these
controllers require MSS configuration in different location of TX
descriptor. TSO is enabled by default for controllers that use new
descriptor format.
For old controllers, TSO is still disabled by default due to broken
frames under certain conditions but users can enable it.
Special thanks to Hayes Wang at RealTek.
r217832:
Disable TSO for all Realtek controllers. Experimentation showed
RTL8111C generated corrupted frames where TCP option header was
broken. All other sample controllers I have did not show such
problem so it could be RTL8111C specific issue. Because there are
too many variants it's hard to tell how many controllers have such
issue. Just disable TSO by default but have user override it.
yongari [Sun, 20 Feb 2011 01:24:59 +0000 (01:24 +0000)]
MFC r215432:
MCP55 is the only NVIDIA controller that supports VLAN tag
insertion/stripping and it also supports TSO over VLAN. Implement
TSO over VLAN support for MCP55 controller.
While I'm here clean up SIOCSIFCAP ioctl handler. Since nfe(4)
sets ifp capabilities based on various hardware flags in device
attach, there is no need to check hardware flags again in
SIOCSIFCAP ioctl handler. Also fix a bug which toggled both TX and
RX checksum offloading even if user requested either TX or RX
checksum configuration change.
Tested by: Rob Farmer ( rfarmer <> predatorlabs dot net )
yongari [Sun, 20 Feb 2011 01:15:26 +0000 (01:15 +0000)]
MFC r218141:
alc_rev was used without initialization such that it failed to
apply AR8152 v1.0 specific initialization code. Fix this bug by
explicitly reading PCI device revision id via PCI accessor.
Reported by: Gabriel Linder ( linder.gabriel <> gmail dot com )
yongari [Sun, 20 Feb 2011 01:02:11 +0000 (01:02 +0000)]
MFC r217353:
- Add a locked variant of jme_start() and invoke it directly while holding
the lock instead of queueing it to a task.
- Do not invoke jme_rxintr() to reclaim any unprocessed but received
packets when shutting down the interface. Instead, just drop these
packets to match the behavior of other drivers.
- Hold the driver lock in the interrupt handler to avoid races with
ioctl requests to down the interface.
kib [Fri, 18 Feb 2011 09:47:58 +0000 (09:47 +0000)]
MFC r218550:
For UIO_NOCOPY case of reading request on zfs vnode, which has vm object
attached, activate the page after the successful read, and free the page
if read was unsuccessfull.
brucec [Wed, 16 Feb 2011 21:41:44 +0000 (21:41 +0000)]
MFC r218620:
If the pf.conf(5) example file is copied when setting up a firewall it's
easy to forget about icmp. Update the file to show allowing icmp through
the firewall.
zack [Tue, 15 Feb 2011 20:53:01 +0000 (20:53 +0000)]
MFC: 217336
In the experimental NFS server, when converting an open-owner to a lock-owner,
start at sequence id 1 instead of 0, to match up with both Solaris and Linux.
keramida [Tue, 15 Feb 2011 06:33:35 +0000 (06:33 +0000)]
MFC 217746 from /head/usr.bin/top
Touch up the sample memory usage numbers a bit, to avoid wrapping
on terminal boundary. While here add definition for 'G' and fix
the indentation of 'K' units.
bz [Mon, 14 Feb 2011 16:54:03 +0000 (16:54 +0000)]
MFC r216466:
Bring back (most of) NATM to avoid further bitrot after r186119.
Keep three lines disabled which I am unsure if they had been used at all.
This will allow us to seek testers and possibly bring it all back.
ed [Sun, 13 Feb 2011 19:37:05 +0000 (19:37 +0000)]
Partially merge a change made in r200166.
It seems the utmpx fixes for who(1) also contained a small change to
make it work properly with the pts/%u naming. Unfortunately, this change
was never merged to FreeBSD 8. Properly remove the /dev/ part of the TTY
name instead of stripping until the last /.
Reported by: Eivind E <eivinde terraplane org>
Tested by: uqs@
jpaetzel [Sun, 13 Feb 2011 15:15:47 +0000 (15:15 +0000)]
MFC 218523:
Netgear renamed the WG311 to the WG311v1 after they released a second
version of it. There is also a WG311v3 which uses a chipset covered by
malo(4). Along the way add the WG311T to the list which is also an
atheros chipset.
simon [Sun, 13 Feb 2011 10:22:43 +0000 (10:22 +0000)]
MFC 218625:
Fix Incorrectly formatted ClientHello SSL/TLS handshake messages could
cause OpenSSL to parse past the end of the message.
Note: Applications are only affected if they act as a server and call
SSL_CTX_set_tlsext_status_cb on the server's SSL_CTX. This includes
Apache httpd >= 2.3.3, if configured with "SSLUseStapling On".
The very quick MFC is done to get this fix into 7.4 / 8.2.
Discussed with: re
Approved by: so (simon, for "instant" MFC)
Obtained from: OpenSSL CVS
Security: http://www.openssl.org/news/secadv_20110208.txt
Security: CVE-2011-0014
ae [Fri, 11 Feb 2011 05:56:14 +0000 (05:56 +0000)]
MFC r218278:
vdev's sectorsize should not be greater than 8 Kbytes and also
it should be power of 2. This prevents non-aligned access while
probing vdev's labels.
ae [Fri, 11 Feb 2011 05:37:05 +0000 (05:37 +0000)]
MFC r218014:
Add new user-friendly aliases for partition types for the MBR and
EBR schemes: fat32, ebr, linux-data, linux-raid, linux-swap and
linux-lvm. Add bios-boot GUID and alias for the GPT scheme. It used by
GRUB 2 loader. Also do sorting definitions of types in diskmbr.h
and in g_part.c.
marius [Tue, 8 Feb 2011 22:08:00 +0000 (22:08 +0000)]
MFC: r216961
Reserve INTR_MD[1-4] similarly to what BUS_DMA_BUS[1-4] are intended for
and switch sparc64 to use the first one for bus error filter handlers of
bridge drivers instead of (ab)using INTR_FAST for that so we eventually
can get rid of the latter.