melifaro [Sun, 31 Mar 2013 10:17:39 +0000 (10:17 +0000)]
Merge r248070.
Fix long-standing issue with interface routes being unprotected:
Use RTM_PINNED flag to mark route as immutable.
Forbid deleting immutable routes without special rtrequest1_fib() flag.
Adding interface address with prefix already in route table is handled
by atomically deleting old prefix and adding interface one.
mckusick [Sat, 30 Mar 2013 20:57:35 +0000 (20:57 +0000)]
MFC of 246876, 246877, and 247387:
MFC reviewed by: kib
MFC 246876:
Add barrier write capability to the VFS buffer interface. A barrier
write is a disk write request that tells the disk that the buffer
being written must be committed to the media along with any writes
that preceeded it before any future blocks may be written to the drive.
Barrier writes are provided by adding the functions bbarrierwrite
(bwrite with barrier) and babarrierwrite (bawrite with barrier).
Following a bbarrierwrite the client knows that the requested buffer
is on the media. It does not ensure that buffers written before that
buffer are on the media. It only ensure that buffers written before
that buffer will get to the media before any buffers written after
that buffer. A flush command must be sent to the disk to ensure that
all earlier written buffers are on the media.
Reviewed by: kib
Tested by: Peter Holm
MFC 246877:
The UFS2 filesystem allocates new blocks of inodes as they are needed.
When a cylinder group runs short of inodes, a new block for inodes is
allocated, zero'ed, and written to the disk. The zero'ed inodes must
be on the disk before the cylinder group can be updated to claim them.
If the cylinder group claiming the new inodes were written before the
zero'ed block of inodes, the system could crash with the filesystem in
an unrecoverable state.
Rather than adding a soft updates dependency to ensure that the new
inode block is written before it is claimed by the cylinder group
map, we just do a barrier write of the zero'ed inode block to ensure
that it will get written before the updated cylinder group map can
be written. This change should only slow down bulk loading of newly
created filesystems since that is the primary time that new inode
blocks need to be created.
Reported by: Robert Watson
Reviewed by: kib
Tested by: Peter Holm
MFC 247387:
An inode block must not be blockingly read while cg block is owned.
The order is inode buffer lock -> snaplk -> cg buffer lock, reversing
the order causes deadlocks.
Inode block must not be written while cg block buffer is owned. The
FFS copy on write needs to allocate a block to copy the content of the
inode block, and the cylinder group selected for the allocation might
be the same as the owned cg block. The reserved block detection code
in the ffs_copyonwrite() and ffs_bp_snapblk() is unable to detect the
situation, because the locked cg buffer is not exposed to it.
In order to maintain the dependency between initialized inode block
and the cg_initediblk pointer, look up the inode buffer in
non-blocking mode. If succeeded, brelse cg block, initialize the inode
block and write it. After the write is finished, reread cg block and
update the cg_initediblk.
If inode block is already locked by another thread, let the another
thread initialize it. If another thread raced with us after we
started writing inode block, the situation is detected by an update of
cg_initediblk. Note that double-initialization of the inode block is
harmless, the block cannot be used until cg_initediblk is incremented.
Sponsored by: The FreeBSD Foundation
In collaboration with: pho
Reviewed by: mckusick
X-MFC-note: after r246877
mckusick [Sat, 30 Mar 2013 00:22:26 +0000 (00:22 +0000)]
MFC of 246289:
For UFS2 i_blocks is unsigned. The current "sanity" check that it
has gone below zero after the blocks in its inode are freed is a
no-op which the compiler fails to warn about because of the use of
the DIP macro. Change the sanity check to compare the number of
blocks being freed against the value i_blocks. If the number of
blocks being freed exceeds i_blocks, just set i_blocks to zero.
tijl [Fri, 29 Mar 2013 13:23:43 +0000 (13:23 +0000)]
MFC r248256:
- Fix two possible overflows when testing if ELF program headers are on
the first page:
1. Cast uint16_t operands in a multiplication to unsigned int because
otherwise the implicit promotion to int results in a signed
multiplication that can overflow and the behaviour on integer
overflow is undefined.
2. Replace (offset + size > PAGE_SIZE) with (size > PAGE_SIZE - offset)
because the sum may overflow.
- Use the same tests to see if the path to the interpreter is on the first
page. There's no overflow here because size is already limited by
MAXPATHLEN, but the compiler optimises the new tests better. Also fix an
off-by-one error.
- Simplify tests to see if an ELF note program header is on the first page.
This also fixes an off-by-one error.
bryanv [Fri, 29 Mar 2013 02:09:46 +0000 (02:09 +0000)]
MFC 247870:
Remove the virtio dependency entry for the VirtIO device drivers. This
will prevent the kernel from linking if the device driver are included
without the virtio module. Remove pci and scbus for the same reason.
Also explain the relationship and necessity of the virtio and virtio_pci
modules. Currently in FreeBSD, we only support VirtIO PCI, but it could
be replaced with a different interface (like MMIO) and the device
(network, block, etc) will still function.
yongari [Fri, 29 Mar 2013 00:22:43 +0000 (00:22 +0000)]
MFC r248226:
r241438 broke IPMI access on Sun Fire X2200 M2(BCM5715).
Fix the IPMI regression by sending BGE_FW_DRV_STATE_UNLOAD to
ASF/IPMI firmware in driver attach phase. Sending heartheat to
ASF/IPMI is enabled only after upping interface so
setting driver state to BGE_FW_DRV_STATE_START in attach phase
broke IPMI access.
While I'm here, add NVRAM arbitration lock before performing
controller reset. ASF/IPMI firmware may be able to access the NVRAM
while controller reset is in progress. Without the arbitration
lock before resetting the controller, ASF/IPMI may not initialize
properly.
sbruno [Thu, 28 Mar 2013 17:27:46 +0000 (17:27 +0000)]
MFC r247279
The 5300 series ciss(4) board does not work in performant mode with our
currnet initialization sequence. Set it to simple mode only so that
systems can be updated from stable/7 to newer installations.
At some point, we should figure out why we cannot initialize performant
mode on this board.
gavin [Thu, 28 Mar 2013 09:03:15 +0000 (09:03 +0000)]
When r241373 was merged, one file appears to have been missed from the
commit. Merge it:
Remove undefined behavior from sranddev() and
srandomdev(). This doesn't actually work
with any modern C compiler:
In particular, both clang and modern gcc
verisons silently elide any xor operation
with 'junk'.
No mergeinfo changes with this commit as r241475 already updated the
mergeinfo.
delphij [Thu, 28 Mar 2013 05:35:46 +0000 (05:35 +0000)]
MFC r248788 (erwin):
Update BIND to 9.8.4-P2
Removed the check for regex.h in configure in order
to disable regex syntax checking, as it exposes
BIND to a critical flaw in libregex on some
platforms. [RT #32688]
gjb [Tue, 19 Mar 2013 19:49:06 +0000 (19:49 +0000)]
- Revert r248483, keeping devel/subversion as a port dependency
for the doc/ build.
- Remove logic in release/Makefile that modifies local src/ tree.[1]
Submitted by: delphij [1]
Approved by: re (jpaetzel)
delphij [Sat, 16 Mar 2013 01:16:57 +0000 (01:16 +0000)]
Diff reduction against stable/9: this is a pure mechanical comment
change that eliminates the unwanted diff caused by different
Pod::Simple version and have no runtime impact.
We do this in the hope of minimizing variants of patches that would
need to be published, should an update is required during the
remaining lifetime of the stable/8 branch.
lstewart [Mon, 11 Mar 2013 08:21:43 +0000 (08:21 +0000)]
MFC r247906:
The hashmask returned by hashinit() is a valid index in the returned
hash array. Fix a siftr(4) potential memory leak and INVARIANTS
triggered kernel panic in hashdestroy() by ensuring the last array index
in the flow counter hash table is flushed of entries.
rstone [Fri, 8 Mar 2013 21:07:31 +0000 (21:07 +0000)]
MFC r244631
Correct a series of errors in the hand-rolled locking for drace_debug.c:
- Use spinlock_enter()/spinlock_exit() to prevent a thread holding a
debug lock from being preempted to prevent other threads waiting
on that lock from starvation.
- Handle the possibility of CPU migration in between the fetch of curcpu
and the call to spinlock_enter() by saving curcpu in a local variable.
- Use memory barriers to prevent reordering of loads and stores of the
data protected by the lock outside of the critical section
- Eliminate false sharing of the locks by moving them into the structures
that they protect and aligning them to a cacheline boundary.
- Record the owning thread in the lock to make debugging future problems
easier.
rstone [Fri, 8 Mar 2013 19:36:44 +0000 (19:36 +0000)]
MFC r241009
Ensure that all cases that enqueue a netgraph item for delivery by a
ngthread properly set the item's depth to 1. In particular, prior to this
change if ng_snd_item failed to acquire a lock on a node, the item's depth
would not be set at all. This fix ensures that the error code from rcvmsg/
rcvdata is properly passed back to the apply callback. For example, this
fixes a bug where an error from rcvmsg/rcvdata would not previously
propagate back to a libnetgraph consumer when the message was queued.
rstone [Fri, 8 Mar 2013 19:02:45 +0000 (19:02 +0000)]
MFC r240923
Some aac(4) adapters will always report that a direct access device is
offline in response to a INQUIRY command that does not retreive vital
product data(I personally have observed the behaviour on an Adaptec 2405
and a 5805). Force the peripheral qualifier to "connected" so that upper
layers correctly recognize that a disk is present.
This bug was uncovered by r216236. Prior to that fix, aac(4) was
accidentally clearing the peripheral qualifier for all inquiry commands.
This fixes an issue where passthrough devices were not created for
disks behind aac(4) controllers suffering from the bug. I have
verified that if a disk is not present that we still properly detect
that and not create the passthrough device.
marius [Fri, 8 Mar 2013 12:58:25 +0000 (12:58 +0000)]
- Complete r231621 (MFC'ed to stable/8 in r232093) by also blacklisting the
bridge used by VMware for PCIe devices. While at it, update the comment
now that we know that MSI-X doesn't work with ESXi 5.1 for Intel 82576
either and the underlying issue is a bug in the MSI-X allocation code of
the hypervisor.
Reported by: Harald Schmalzbauer
- Make the nomatch table const.
marius [Fri, 8 Mar 2013 12:50:31 +0000 (12:50 +0000)]
MFC: r247600, r247620
- While Netra X1 generally show no ill effects when registering a power
fail interrupt handler, there seems to be either a broken batch of them
or a tendency to develop a defect which causes this interrupt to fire
inadvertedly. Given that apart from this problem these machines work
just fine, add a tunable allowing the setup of the power fail interrupt
to be disabled.
While at it, remove the DEBUGGER_ON_POWERFAIL compile time option and
make that behavior also selectable via the newly added tunable.
- Correct an incorrect argument to shutdown_nice(9).
- Mark unused parameters as such.
- Use NULL instead of 0 for pointers.
marius [Fri, 8 Mar 2013 12:44:40 +0000 (12:44 +0000)]
MFC: r223951
Partially merge r223648 and r223949 from gem(4) (committed to stable/8 in
r224367):
- Consistently use the newly introduced sc_mac_rxcfg throughout the driver
instead of reading the old content of CAS_MAC_RX_CONF.
- Increment if_iqdrops instead of if_ierrors in case of RX buffer allocation
failure.
- According to the Cassini datasheet the RX MAC should also be disabled in
cas_setladrf() before changing its configuration.
- Add error messages to gem_disable_{r,t}x() and take advantage of these
throughout the driver instead of duplicating their functionality all over
the place.
MFC: r247579
- Move reporting of failures to disable RX/TX MAC under bootverbose as at
least the Saturn chips of 501-6738 cards may fail to do so the first
time, which isn't fatal though.
Reported by: Paul Keusemann
- Explain why we don't enable infinite bursts on sparc64.
- Given that these chips support memory write invalidate, make sure that
it's enabled in the command register. Also make sure that PERR# and
SERR# assertion is enabled.
marius [Fri, 8 Mar 2013 12:11:41 +0000 (12:11 +0000)]
MFC: r247573
- Remove an unused header.
- Use NULL instead of 0 for pointers.
- Let ofw_pcib_probe() return BUS_PROBE_DEFAULT instead of 0 so specialized
PCI-PCI-bridge drivers may attach instead.
- Add WARs for PLX Technology PEX 8114 bridges and PEX 8532 switches.
Ideally, these should live in MI code but at least for the latter we're
missing the necessary infrastructure there.
marius [Fri, 8 Mar 2013 11:41:52 +0000 (11:41 +0000)]
MFC: r247571
- Apparently, r186520 was just wrong and the clock of Oxford OX16PCI958 is
neither DEFAULT_RCLK * 2 nor DEFAULT_RCLK * 10 but plain DEFAULT_RCLK
and there's no (open) source indicating otherwise. This was tested with
an EXSYS EX-41098-2, whose clock is not configurable and identifies as:
puc0@pci0:5:1:0: class=0x070200 card=0x06711415 chip=0x95381415 rev=0x01 hdr=0x00
vendor = 'Oxford Semiconductor Ltd'
class = simple comms
subclass = multiport serial
Note that this exactly matches the card mentioned in PR 129665 so no
sub-device/sub-vendor based quirking of the latter is possible. So maybe
we should grow some sort of tunable, in case non-default cards such as
the latter aren't configurable either (this also wouldn't be the first
time an allegedly tested commit turns out to be wrong though).
- Make the TiMedia tables const.
marius [Fri, 8 Mar 2013 11:33:02 +0000 (11:33 +0000)]
MFC: r247565, r247590
- Make tables, device ID strings etc const.
- Use NULL instead of 0 for pointers.
- Remove redundant bzero(9)'ing of the softc.
- Remove redundant/unused softc members.
- Don't allocate MSI/MSI-X as RF_SHAREABLE.
- Re-use bus accessor macros instead of duplicating them.
- In bce_miibus_{read,write}_reg(), remove superfluous limiting of the PHY
address (missed in r213893, which was MFC'ed to stable/8 in r214909).
tuexen [Fri, 8 Mar 2013 01:33:07 +0000 (01:33 +0000)]
MFC r247412:
Fix a potential race in returning setting errno when an
association goes down.
Reported by Mozilla in
https://bugzilla.mozilla.org/show_bug.cgi?id=845513
tuexen [Fri, 8 Mar 2013 01:29:05 +0000 (01:29 +0000)]
MFC r246674:
Don't send kernel provided information in the User Initiated
ABORT cause, since the user can also provide this kind of
information. So the receiver doesn't know who provided the
information.
While there: Fix a bug where the stack would send a malformed
ABORT chunk when using a send() call with SCTP_ABORT|SCT_SENDALL
flags.
tuexen [Fri, 8 Mar 2013 01:10:33 +0000 (01:10 +0000)]
MFC r244021:
Ensure that the padding of the last parameter of an INIT chunk
is not included in the chunk length as required by RFC 4960.
While there, cleanup sctp_send_initiate().
tuexen [Fri, 8 Mar 2013 00:48:15 +0000 (00:48 +0000)]
MFC r243157:
Get the accounting working. We now have counters how many
chunks for each SCTP outgoing stream are in the send and
sent queue.
While there, improve the naming of NR-SACK related constants
recently introduced.
tuexen [Fri, 8 Mar 2013 00:46:05 +0000 (00:46 +0000)]
MFC r242714:
Add per outgoing stream accounting for chunks in the send
and sent queue. This provides no functional change, but is
a preparation for an upcoming stream reset improvement.
Done with rrs@.
tuexen [Fri, 8 Mar 2013 00:41:20 +0000 (00:41 +0000)]
MFC r242627:
Move from early SSN assignment to late SSN assignment.
This doesn't change functionality, but makes upcoming change
much easier.
Developed with rrs@ at the IETF 85.
tuexen [Thu, 7 Mar 2013 23:53:11 +0000 (23:53 +0000)]
MFC r238790:
Fix the sctp_sockstore union such that userland programs don't depend
on INET and/or INET6 to be defined and in-tune with how the kernel
was compiled.