New netgraph node ng_patch(4). It performs data modification of packets
passing through. Modifications are restricted to a subset of C language
operations on unsigned integers of 8, 16, 32 or 64 bit size.
These are: set to new value (=), addition (+=), subtraction (-=),
multiplication (*=), division (/=), negation (= -), bitwise AND (&=),
bitwise OR (|=), bitwise eXclusive OR (^=), shift left (<<=),
shift right (>>=). Several operations are all applied to a packet
sequentially in order they were specified by user.
Matt Jacob [Tue, 8 Jun 2010 22:46:44 +0000 (22:46 +0000)]
Rearrange how things are done to avoid dereferencing stale pointers in
the case of immediate unconfigure after configure. Hold the periph an
extra count while we have the task to create sysctl context outstanding
so that the periph doesn't go away unexpectedly.
Reorganize the code in bdwrite() which handles move of dirtiness
from the buffer pages to buffer. Combine the code to set buffer
dirty range (previously in vfs_setdirty()) and to clean the pages
(vfs_clean_pages()) into new function vfs_clean_pages_dirty_buf(). Now
the vm object lock is acquired only once.
Drain the VPO_BUSY bit of the buffer pages before setting valid
and clean bits in vfs_clean_pages_dirty_buf() with new helper
vfs_drain_busy_pages(). pmap_clear_modify() asserts that page is not
busy.
In vfs_busy_pages(), move the wait for draining of VPO_BUSY before
the dirtyness handling, to follow the structure of
vfs_clean_pages_dirty_buf().
Reported and tested by: pho
Suggested and reviewed by: alc
MFC after: 2 weeks
John Baldwin [Tue, 8 Jun 2010 17:08:13 +0000 (17:08 +0000)]
- Use a bit more care when moving I/O APIC interrupts between CPUs. Mask
the interrupt followed by a brief delay if it is not currently masked
before moving the interrupt.
- Move the icu_lock out of ioapic_program_intpin() and into callers. This
closes a race where ioapic_program_intpin() could use a stale value of
the masked state to compute the masked bit in the register.
John Baldwin [Tue, 8 Jun 2010 16:17:47 +0000 (16:17 +0000)]
Fix a sign bug that caused adaptive spinning in sx_xlock() to not work
properly. Among other things it did not drop Giant while spinning
leading to livelocks.
Reviewed by: rookie, kib, jmallett
MFC after: 3 days
Matt Jacob [Tue, 8 Jun 2010 16:17:25 +0000 (16:17 +0000)]
Implement the usage of Report Luns as part of SCSI probing for SCP3 or
better devices. This can be disabled on a per-device basis using quirks as
well.
This also handles the case where there is actually no connected LUN 0
(which can definitely be the case for storage arrays).
Randall Stewart [Tue, 8 Jun 2010 03:39:31 +0000 (03:39 +0000)]
2 Bugs:
1) Only use both mapping arrays when NR sack is off. This
way we can hold off moving the cumack (not the best but
workable) when NR-sack is on.
2) We must make sure to just return on the move of the
bit to the NR array if the cum-ack as already went
past the TSN. This prevents marking a bit behind the
array and hitting the invariant code that panic's us.
A number of netfront fixes and stability improvements:
- Re-enable TSO. This was broken previously due to CSUM_TSO clearing the
CSUM_TCP flag, so our checksum flags were incorrectly set going to the
netback driver. That was fixed in r206844 in tcp_output.c, so we can
turn TSO back on here.
- Fix the way transmit slots are calculated, so that we can't overfill
the ring.
- Avoid sending packets with more fragments/segments than netback can
handle. The Linux netback code can only handle packets of
MAX_SKB_FRAGS, which turns out to be 18 on machines with 4K pages. We
can easily generate packets with 32 or so fragments with TSO turned on.
Right now the solution is just to drop the packets (since netback
doesn't seem to handle it gracefully), but we should come up with a way
to allow a driver to tell the TCP stack the maximum number of fragments
it can handle in a single packet.
- Fix the way the consumer is tracked in the receive path. It could get
out of sync fairly easily.
- Use standard Xen ring macros to make it clearer how netfront is using
the rings.
- Get rid of Linux-ish negative errno return values.
Randall Stewart [Mon, 7 Jun 2010 18:29:10 +0000 (18:29 +0000)]
This fixes a BUG in the handling of the cum-ack calculation.
We were only paying attention to the nr-mapping-array. Which
seems to make sense on the surface, by definition things
up to the cum-ack should be deliverable thus in the nr-mapping-array.
However (there is always a gotcha) thats not true when it
comes to large messages. The stack may hold the message
while re-assembling it not not deliver it based on several
thresholds. If that happens (which it would for smaller
large messages) then the cum-ack is figured wrong. We
now properly use both arrays in the cum-ack calculation.
Matt Jacob [Mon, 7 Jun 2010 17:39:36 +0000 (17:39 +0000)]
Fix XPT_GET_TRAN_SETTING for FC which has been broken for while so that
it will figure out the correct target to handle index and be able to find
things like WWPN, etc.
Navdeep Parhar [Mon, 7 Jun 2010 08:23:16 +0000 (08:23 +0000)]
cxgb(4): add an 'nfilters' tunable that lets the user place an upper
limit on the number of hardware filters (and thus the amount of TCAM
reserved for filtering).
Randall Stewart [Sun, 6 Jun 2010 20:34:17 +0000 (20:34 +0000)]
1) Optimize the cleanup and don't always depend on
the timer. This is done by considering the locks
we will destroy and if they are contended we consider
it the same as a reference count being up. Fixing this
appears to cleanup another crash that was appearing with
all the timers where the socket buf lock got corrupted.
2) Fix the sysctl code to take a lot more care when looking
at INP's that are in the GONE or ALLGONE state.
Randall Stewart [Sun, 6 Jun 2010 19:24:32 +0000 (19:24 +0000)]
Ok, yet another bug in killing off all the hundreds
of apitesters.. Basically we end up with attempting
to destroy a lock thats contended on. A cookie echo
arrives at the same time that the close is happening.
The close gets the lock but the cookie echo has already
passed the check for the gone flag and is then locked
waiting on the create lock.. when we go to destroy it
bam. For now we do the timer destroy for all calls
to close.. We can probably optimize this later so that
we check whats being contended on and if there is contention
then do the timer thing. but this is probably safest since
the inp has been removed from all lists and references and
only the timer can find it.. once the locks are released all
other places will instantly see the GONE flag and bail (thats
what the change in sctp_input is one place that was lacking
the bail code).
Randall Stewart [Sun, 6 Jun 2010 16:11:16 +0000 (16:11 +0000)]
1) Further enhance the INVARIANT lock validation (no locks) are
held by checking the create and inp locks as well.
2) Fix a bug in that when a socket is closed an INIT-ACK
is returned, we do NOT unlock the locked_tcb unless its
different (an unlikely scenario). If we blindly unlock as
we were doing before we can end up unlocking the actual
stcb thats about to be sent down to the free function which
requires the lock be held.
Randall Stewart [Sun, 6 Jun 2010 16:09:12 +0000 (16:09 +0000)]
Fix a bug in the sctp_inpcb_free. Basically if the socket
was setup to do an abortive close an association that was
in the accept_queue could get stuck and never freed. Now
we properly start the kill timer on the socket and turn
off the flag (same thing we do for the graceful close method).
MFC after: 1 week
Randall Stewart [Sun, 6 Jun 2010 16:07:40 +0000 (16:07 +0000)]
Fix a bug in sctp_abort_assoc(). DON'T call the sctp_inpcb_free
when the gone flag is set. You don't know what locks the
caller has set and there is already a kill timer running.
Robert Watson [Sun, 6 Jun 2010 15:27:08 +0000 (15:27 +0000)]
Rework tcpp output so that it generates a comma-delimited list of values,
optionally with a header if "-h" is passed. Toast CPU time measurement
in the server for now. Remove -C and -T, since we now always report
both connections/sec and Gb/sec.
Some revisions of the Serverworks K2 SATA controller have a data
corruption bug where if an ATA command is issued before DMA is started,
data will become available to the controller before it knows what to do
with it. This results in either data corruption or a controller crash.
This patch remedies the problem by adopting the workaround employed
by Linux and Darwin: starting the DMA engine prior to sending the ATA
command.
Alan Cox [Sun, 6 Jun 2010 06:07:44 +0000 (06:07 +0000)]
Don't set PG_WRITEABLE in init_pte_prot() (and thus pmap_enter()) unless
the page is managed.
Don't set the machine-independent layer's dirty field for the page being
mapped in init_pte_prot(). (The dirty field is only supposed to set when
a mapping is removed or write-protected and the page was managed and
modified.)
Determine whether or not to perform dirty bit emulation based on whether
or not the page is managed, i.e., pageable, not based on whether the page
is being mapped into the kernel address space. Nearly all of the kernel
address space consists of unmanaged pages, so this has neglible impact on
the overhead of dirty bit emulation for the kernel address space. However,
there can also exist unmanaged pages in the user address space. Previously,
dirty bit emulation was unnecessarily performed on these pages.
Pyun YongHyeon [Sat, 5 Jun 2010 23:29:24 +0000 (23:29 +0000)]
Fix a bug introduced in r199011. When bge(4) reuses loaded RX
buffers it should also reinitialize RX descriptors otherwise some
stale data could be passed to controller. This could end up with
mbuf double free or unexpected NULL pointer dereference in upper
stack. To fix the issue, save loaded buffer's length and
reinitialize RX descriptors with the saved value whenever bge(4)
reuses the loaded RX buffers.
While I'm here, increase the number of RX buffers to 512 from 256.
This simplifies RX buffer handling as well as giving more RX
buffers. Controller supports just fixed number of RX buffers
(i.e. 512) and bge(4) used to rely on hope that our CPU is fast
enough to keep up with the controller. With this change, bge(4)
will use 1MB for RX buffers but I don't think it would cause
problems in these days.
Randall Stewart [Sat, 5 Jun 2010 21:27:43 +0000 (21:27 +0000)]
This change does the following:
1) Fix the alignment of a comment.
2) Fix a BUG where we were NOT paying attention
to the RESEND marking on retransmitting control
chunks.. and worse we were not decrementing the
retran count that could cause us to loop forever.
3) Add in the valdiate_no_lock function on invariants
so that we will really check all ways out to be sure
a lock does not slip out locked.
Randall Stewart [Sat, 5 Jun 2010 21:20:28 +0000 (21:20 +0000)]
This does two changes:
1) Makes it so that the INVARIANT function validate nolocks is
available anywhere.
2) Fixes a BUG where a close has been done on a collision socket
and the cookie processing would return leaving a lock held.
MFC after: 1 week
Partially revert r208162 while waiting for review on a more comprehensive
fix. On Apple OpenPICs, the low/high bit of the interrupt sense is only
respected for interrupt 0. We currently erroneously program all OpenPIC
interrupts level high instead of level low by default, which only matters
for some G5 systems where the SATA controllers use IRQ 0.
This change is a quick fix that will be reverted once the effect of
changing the default interrupt sense on embedded systems is known.
Make sure that interrupt sense settings set after interrupts are enabled
are respected. This fixes loading the Apple onboard audio driver
(snd_ai2s) as a module after boot, which would previously cause a panic.
Use the fpu_kern_enter() interface to properly separate usermode FPU
context from in-kernel execution of padlock instructions and to handle
spurious FPUDNA exceptions that sometime are raised when doing padlock
calculations.
Introduce the x86 kernel interfaces to allow kernel code to use
FPU/SSE hardware. Caller should provide a save area that is chained
into the stack of the areas; pcb save_area for usermode FPU state is
on top. The pcb now contains a pointer to the current FPU saved area,
used during FPUDNA handling and context switches. There is also a
facility to allow the kernel thread to use pcb save_area.
Change the dreaded warnings "npxdna in kernel mode!" into the panics
when FPU usage is not registered.
Don't try to copy a socket after "xxx is a socket (not copied)." message.
Previously, it would either try to copy it anyway and fail (without -R),
or create fifo instead of the socket (with -R).
Found with: Coverity Prevent
CID: 5623
MFC after: 2 weeks
Matt Jacob [Sat, 5 Jun 2010 00:55:21 +0000 (00:55 +0000)]
I was getting panics in sleepq_add for the second sleep in isp_kthread.
I don't know why- but it occurred to me in looking at the second sleep
is that all I want is a pause- not an actual sleep. So do that instead.