Fixes up the handling of shared vnode lock lookups in the NFS client,
adds a FS type specific flag indicating that the FS supports shared
vnode lock lookups, adds some logic in vfs_lookup.c to test this flag
and set lock flags appropriately.
- amd on 6.x is a non-starter (without this change). Using amd under
heavy load results in a deadlock (with cascading vnode locks all the
way to the root) very quickly.
- This change should also fix the more general problem of cascading
vnode deadlocks when an NFS server goes down.
Ideally, we wouldn't need these changes, as enabling shared vnode lock
lookups globally would work. Unfortunately, UFS, for example isn't
ready for shared vnode lock lookups, crashing pretty quickly.
This change is the result of discussions with Stephan Uphoff (ups@).
Introduce a spinlock for synchronizing access to the video output hardware
in syscons. This replaces a simple access semaphore that was assumed to be
protected by Giant but often was not. If two threads that were otherwise
SMP-safe called printf at the same time, there was a high likelyhood that
the semaphore would get corrupted and result in a permanently frozen video
console. This is similar to what is already done in the serial console
drivers.
Back out one of the Giant removals from revision 1.272. Giant was not here to
protect the vnode, it was present to synchronize access to TTY session
information between exit(2) and the TTY code. While we are here, note that
Giant is required for TTY protection.
Fix synchronization in gmirror and graid3 which I broken. Synchronization
request can still have bio_to set to sc_provider (this is READ part of a
synchronization request) and in this case g_{mirror,raid3}_sync() wasn't
called as it should be.
andre [Wed, 13 Sep 2006 13:08:27 +0000 (13:08 +0000)]
Rewrite of TCP syncookies to remove locking requirements and to enhance
functionality:
- Remove a rwlock aquisition/release per generated syncookie. Locking
is now integrated with the bucket row locking of syncache itself and
syncookies no longer add any additional lock overhead.
- Syncookie secrets are different for and stored per syncache buck row.
Secrets expire after 16 seconds and are reseeded on-demand.
- The computational overhead for syncookie generation and verification
is one MD5 hash computation as before.
- Syncache can be turned off and run with syncookies only by setting the
sysctl net.inet.tcp.syncookies_only=1.
This implementation extends the orginal idea and first implementation
of FreeBSD by using not only the initial sequence number field to store
information but also the timestamp field if present. This way we can
keep track of the entire state we need to know to recreate the session in
its original form. Almost all TCP speakers implement RFC1323 timestamps
these days. For those that do not we still have to live with the known
shortcomings of the ISN only SYN cookies. The use of the timestamp field
causes the timestamps to be randomized if syncookies are enabled.
The idea of SYN cookies is to encode and include all necessary information
about the connection setup state within the SYN-ACK we send back and thus
to get along without keeping any local state until the ACK to the SYN-ACK
arrives (if ever). Everything we need to know should be available from
the information we encoded in the SYN-ACK.
A detailed description of the inner working of the syncookies mechanism
is included in the comments in tcp_syncache.c.
dd a series of regression tests to validate that privilege requirements are
implemented properly for a number of kernel subsystems. In general, they
try to exercise the privilege first as the root user, then as a test user,
in order to determine when privilege is being checked.
Currently, these tests do not compare inside/outside jail, and probably
should be enhanced to do that.
Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Introduce a new entry point, mac_create_mbuf_from_firewall. This entry point
exists to allow the mandatory access control policy to properly initialize
mbufs generated by the firewall. An example where this might happen is keep
alive packets, or ICMP error packets in response to other packets.
This takes care of kernel panics associated with un-initialize mbuf labels
when the firewall generates packets.
[1] I modified this patch from it's original version, the initial patch
introduced a number of entry points which were programmatically
equivalent. So I introduced only one. Instead, we should leverage
mac_create_mbuf_netlayer() which is used for similar situations,
an example being icmp_error()
This will minimize the impact associated with the MFC
- Revert making bus_generic_add_child() the default for BUS_ADD_CHILD().
Instead, we want busses to explicitly specify an add_child routine if they
want to support identify routines, but by default disallow having outside
drivers add devices.
- Give smbus(4) an explicit bus_add_child() method.
Add note about identify routines needing to use BUS_ADD_CHILD rather
than device_add_child. The latter is only for busses adding children,
not children drivers telling a bus that they have an instance...
Minor overhaul of SMBus support:
- Change smbus_callback() to pass a void * rather than caddr_t.
- Change smbus_bread() to pass a pointer to the count and have it be an
in/out parameter. The input is the size of the buffer (same as before),
but on return it will contain the actual amount of data read back from
the bus. Note that this value may be larger than the input value. It
is up to the caller to treat this as an error if desired.
- Change the SMB_BREAD ioctl to write out the updated struct smbcmd which
will contain the actual number of bytes read in the 'count' field. To
preserve the previous ABI, the old ioctl value is mapped to SMB_OLD_BREAD
which doesn't copy the updated smbcmd back out to userland. I doubt anyone
actually used the old BREAD anyway as it was rediculous to do a bulk-read
but not tell the using program how much data was actually read.
- Make the smbus driver and devclass public in the smbus module and
push all the DRIVER_MODULE()'s for attaching the smbus driver to
various foosmb drivers out into the foosmb modules. This makes all
the foosmb logic centralized and allows new foosmb modules to be
self-contained w/o having to hack smbus.c everytime a new smbus driver
is added.
- Add a new SMB_EINVAL error bit and use it in place of EINVAL to return
an error for bad arguments (such as invalid counts for bread and bwrite).
- Map SMB bus error bits to EIO in smbus_error().
- Make the smbus driver call bus_generic_probe() and require child drivers
such as smb(4) to create device_t's via identify routines. Previously,
smbus just created one anonymous device during attach, and if you had
multiple drivers that could attach it was just random chance as to which
driver got to probe for the sole device_t first.
- Add a mutex to the smbus(4) softc and use it in place of dummy splhigh()
to protect the 'owner' field and perform necessary synchronization for
smbus_request_bus() and smbus_release_bus().
- Change the bread() and bwrite() methods of alpm(4), amdpm(4), and
viapm(4) to only perform a single transaction and not try to use a
loop of multiple transactions for a large request. The framing and
commands to use for a large transaction depend on the upper-layer
protocol (such as SSIF for IPMI over SMBus) from what I can tell, and the
smb(4) driver never allowed bulk read/writes of more than 32-bytes
anyway. The other smb drivers only performed single transactions.
- Fix buffer overflows in the bread() methods of ichsmb(4), alpm(4),
amdpm(4), amdsmb(4), intpm(4), and nfsmb(4).
- Use SMB_xxx errors in viapm(4).
- Destroy ichsmb(4)'s mutex after bus_generic_detach() to avoid problems
from child devices making smb upcalls that would use the mutex during
their detach methods.
Actually hook up the IPI_INVLCACHE IDT vectors backing
pmap_invalidate_cache() in the SMP case so pmap_mapdev() in multiuser
doesn't panic with a trap 30. I broke this many months ago when I
added pmap_invalidate_cache() as early parts of the PAT work.
andre [Mon, 11 Sep 2006 19:56:10 +0000 (19:56 +0000)]
Fix a NULL pointer dereference of ro->ro_rt->rt_flags by checking for the
validity of ro->ro_rt first. This prevents crashing on any non-normally
routed IP packet.
Coverity CID: 162 (incorrectly, it was re-introduced by previous commit)
Add a default method for BUS_ADD_CHILD() that just calls
device_add_child_ordered(). Previously, a device driver that wanted to
add a new child device in its identify routine had to know if the parent
driver had a custom bus_add_child method and use BUS_ADD_CHILD() in that
case, otherwise use device_add_child(). Getting it wrong in either
direction would result in panics or failure to add the child device. Now,
BUS_ADD_CHILD() always works isolating child drivers from having to know
intimate details about the parent driver.
- Fix rman_manage_region() to be a lot more intelligent. It now checks
for overlaps, but more importantly, it collapses adjacent free regions.
This is needed to cope with BIOSen that split up ports for system devices
(like IPMI controllers) across multiple system resource entries.
- Now that rman_manage_region() is not so dumb, remove extra logic in the
x86 nexus drivers to populate the IRQ rman that manually coalesced the
regions.
null commit to provide commit message to previous
at the request of Sam Leffler: The previous commit
established min and maxtags for VMware pseudo disks
to fix a submitted PR.
The run_filter() procedure is a means of working around DMA engine bugs in
old/broken hardware. Unfortunately, it adds cache pressure and possible
mispredicted branches to the fast path of the bus_dmamap_load collection of
functions. Since it's meant for slow path exception processing, de-inline
it and allow its conditions to be pre-computed at tag_create time and thus
short-circuited at runtime.
While here, cut down on the size of _bus_dmamap_load_buffer() by pushing the
bounce page logic into a non-inlined function. Again, this helps with
cache pressure and mispredicted branches.
According to the TSC, this shaves off a few cycles on average. Unfortunately,
the data varies quite a bit due to interrupts and preemption, so it's hard to
get a good measurement. Real world measurements of network PPS are welcomed.
A merge to amd64 and other arches is pending more testing.
Add a knob for compiling the tree -DNDEBUG. This turns off all the
asserts and makes binaries smaller. The binaries also become
repeatable again. As it was, without this md5's of binaries built
with different paths differed.
make use of the host route's mtu for processing. This means we can now
support a network w/ split mtu's by assigning each host route the correct
mtu. an aspiring programmer could write a daemon to probe hosts and find
out if they support a larger mtu.
Fix locking race in ttymodem(). The locking of the proctree happens too late
and opens a small race window before tp->t_session->s_leader is accessed. In case
tp->t_session has just been set to NULL elsewhere, we get a panic().
This fix is a bandaid until someone else fixes the whole locking in the tty subsystem.
Definitly more work needs to be done.
Delay an orphan event if provider has still in-flight I/O requests.
This way GEOM classes can safely detach from provider when an orphan
event is received. This fixes 'detach with active requests' panic for
gstripe/gconcat under load.
Remove slightly oddly placed suser() call from the KTR/ALQ setup sysctl:
it was present only in the enable path, not the disable path, which one
presumes would be equally of interest. Either way, it was not needed,
as the sysctl framework already calls suser() if the operation is a
write operation, which configuration requests are.