Bruce M Simpson [Mon, 22 Jan 2007 11:45:25 +0000 (11:45 +0000)]
Docuemnt exactly which functions access which NSS databases.
Point out that FreeBSD libc has compat stubs for GNU glibc NSS
modules which access NSDB_PASSWD/NSDB_GROUP, but not NSDB_HOSTS;
based on painful experience porting nss_mdns.
Below is slightly edited description of the LOR by Tor Egge:
--------------------------
[Deadlock] is caused by a lock order reversal in vfs_lookup(), where
[some] process is trying to lock a directory vnode, that is the parent
directory of covered vnode) while holding an exclusive vnode lock on
covering vnode.
A simplified scenario:
root fs var fs
/ A / (/var) D
/var B /log (/var/log) E
vfs lock C vfs lock F
Within each file system, the lock order is clear: C->A->B and F->D->E
When traversing across mounts, the system can choose between two lock orders,
but everything must then follow that lock order:
L1: C->A->B
|
+->F->D->E
L2: F->D->E
|
+->C->A->B
The lookup() process for namei("/var") mixes those two lock orders:
VOP_LOOKUP() obtains B while A is held
vfs_busy() obtains a shared lock on F while A and B are held (follows L1,
violates L2)
vput() releases lock on B
VOP_UNLOCK() releases lock on A
VFS_ROOT() obtains lock on D while shared lock on F is held
vfs_unbusy() releases shared lock on F
vn_lock() obtains lock on A while D is held (violates L1, follows L2)
dounmount() follows L1 (B is locked while F is drained).
Without unmount activity, vfs_busy() will always succeed without blocking
and the deadlock isn't triggered (the system behaves as if L2 is followed).
With unmount, you can get 4 processes in a deadlock:
p1: holds D, want A (in lookup())
p2: holds shared lock on F, want D (in VFS_ROOT())
p3: holds B, want drain lock on F (in dounmount())
p4: holds A, want B (in VOP_LOOKUP())
You can have more than one instance of p2.
The reversal was introduced in revision 1.81 of src/sys/kern/vfs_lookup.c and
MFCed to revision 1.80.2.1, probably to avoid a cascade of vnode locks when nfs
servers are dead (VFS_ROOT() just hangs) spreading to the root fs root vnode.
- Tor Egge
To fix the LOR, ups@ noted that when crossing the mount point, ni_dvp
is actually not used by the callers of namei. Thus, placeholder deadfs
vnode vp_crossmp is introduced that is filled into ni_dvp.
Idea by: ups
Reviewed by: tegge, ups, jeff, rwatson (mac interaction)
Tested by: Peter Holm
MFC after: 2 weeks
Warner Losh [Mon, 22 Jan 2007 04:34:03 +0000 (04:34 +0000)]
Add quirk for EasyMP3 EM732X usb 2.0 flash mp3 player.
(It appears that the quirk proceedures link has disappeared and that
this PR complied with it, if there's a problem, please contact me).
Marius Strobl [Sun, 21 Jan 2007 19:32:51 +0000 (19:32 +0000)]
Change the remainder of the drivers for DMA'ing devices enabled in the
sparc64 GENERIC and the sound device drivers known working on sparc64
to use bus_get_dma_tag() to obtain the parent DMA tag so we can get rid
of the sparc64_root_dma_tag kludge eventually. Except for ath(4), sk(4),
stge(4) and ti(4) these changes are runtime tested (unless I booted up
the wrong kernels again...).
Marius Strobl [Sat, 20 Jan 2007 17:14:12 +0000 (17:14 +0000)]
Quiet GCC4 warnings regarding the width of printf()-arguments not
matching the format. While at it limit the format to unsigned int as
we're only interested in the 11 least significant bits anyway.
Jeff Roberson [Sat, 20 Jan 2007 17:03:33 +0000 (17:03 +0000)]
- We do need to IPI the idlethread on some systems. It may be stuck in
a power saving mode otherwise.
- If the thread is already bound in sched_bind() unbind it before
re-binding it to a new cpu. I don't like these semantics but they are
expected by some code in the tree. Patch by jkoshy.
Dont expose em->shared to the outside world before its properly
initialized. Might not affect anything but its at least a better
coding style.
Dont expose em via p->p_emuldata until its properly initialized.
This also enables us to get rid of some locking and simplify the
code because we are workin on a local copy.
In linux_fork and linux_vfork create the process in stopped state
to be sure that the new process runs with fully initialized emuldata
structure [1]. Also fix the vfork (both in linux_clone and linux_vfork)
race that could result in never woken up process [2].
Reported by: Scot Hetzel [1]
Suggested by: jhb [2]
Reviewed by: jhb (at least some important parts)
Submitted by: rdivacky
Tested by: Scot Hetzel (on amd64)
Change 2 comments (in the new code) to comply to style(9).
Marius Strobl [Sat, 20 Jan 2007 14:06:01 +0000 (14:06 +0000)]
- Use bus_get_dma_tag() to obtain the parent DMA tag so dma(4) will
work when we start requiring this.
- Don't specify an alignment when creating our own parent DMA tag;
the supported DMA engines require no alignment constraint (f.e. the
LANCE child does though) and it's no inherited by the child DMA
tags anyway (which probably is a bug though).
- Fix whitespace nits.
Marius Strobl [Sat, 20 Jan 2007 13:37:15 +0000 (13:37 +0000)]
- For the sake of completeness mention back-end support for the ILACC
and add a list of known-working PCI devices.
- For consistency throughout this man page also talk about C-Bus and
ISA adapters rather than cards.
- Add missing .Tn.
- Mention ifconfig(8) along with listing selectable media types.
- Add/un-comment hardware notes for the newly supported 'lebuffer'
variants (the transition from P/N 501-1860 to 501-1869 isn't a typo).
Marius Strobl [Sat, 20 Jan 2007 12:53:30 +0000 (12:53 +0000)]
Add front-ends for the 'lebuffer' variants found on some SBus cards.
These are shared-memory variants based on Am79C90-compatible chips
that apart from the missing DMA engine are similar to the 'ledma'
variant including using a (pseudo-)bus/device for the buffer that
the actual LANCE device hangs off from. The performance of these is
close to that of the 'ledma' one, like expected at a few times the
CPU load though.
Mike Pritchard [Sat, 20 Jan 2007 12:28:15 +0000 (12:28 +0000)]
Quota system cleanup.
1) Do not account for uids/gids that appear negative to prevent
the creation of 131GB+ quota files. This is the same as the kernel
now determines which files to provide quota accounting for.
Related to PR kern/38156. This should also prevent boots from
hanging if a negative uid appears in the file systems.
2) Do not count system files in the usage counts. These currently are
file system snapshot and quota data files. This is how the kernel
now handles those files.
3) Correctly generate new quota data files if the current files
do not exist or are zero length in size. PR kern/30958.
It should now be possible to newfs / mount / touch quota.{user,group}
and quotaon a file system and have everything work.
4) Change some diagnostics to report the file system and type of
id (uid or gid) that is being reported.
5) Truncate the quota data files if possible, instead of letting
them grow to a big enough size to hold the largest UID/GID on
the system (typically "nobody"). The kernel should now be able to
grow the files as needed without deadlocking the system.
Mike Pritchard [Sat, 20 Jan 2007 11:58:32 +0000 (11:58 +0000)]
Quota system cleanup.
1) Do not do quota accounting for the actual quota data files
or for file system snapshot files ("system" files). This
prevents a deadlock descibed in PR kern/30958 if the kernel
ever has to grow the quota file. Snapshot files were already
exempt from the quota checks, but this change generalized the check.
2) Fix a cast that caused extremely large uids/gids to incorrectly
write the quota information to the data file at a truncated
value for a uint_t32 id value. The incorrect cast caused quota
files in this case to be around 4GB in size, with the correct cast
they can now be 131GB in size. Also related to PR kern/30958.
3) Check for what appear to be negative UIDs/GIDs and not account
for them. This prevents the quota files from becoming 131GB in
size and causing quotacheck to run forever at bootup. This could
also cause the kernel to try and expand the quota file, which might
deadlock due to the issue in #1. kern/30958 and kern/38156
(and some much older closed PR's).
4) With the deadlock problems gone, the kernel can now expand the
size of the quota database files if it needs to.
5) Pass in the i-node count change value to chkiq and chkiqchg as an
int, like it used to be before the common routine was split up
into 2 different routines to increase / decrease the i-node in-use
count. Prevents an underflow on the i-node count. Related
to PR kern/89247.
6) Prevent the block usage from growing slowly if a file system is
full and the write was denied due to that fact. PR kern/89247.
Some of these changes require an updated quotacheck to prevent
the creation of huge (131GB) quota data files (item #3).
#1/#4 probably fixes a lot of the random hangs when quotas are enabled,
possibly some of the jail hangs.
Marius Strobl [Sat, 20 Jan 2007 10:47:16 +0000 (10:47 +0000)]
For setting the port PCnet chips must be powered down or stopped and
unlike documented may not take effect without an initialization. So
don't invoke (*sc_mediachange) directly in lance_mediachange() but
go through lance_init_locked(). It's suboptimal to impose this for
all chips but given that besides the affected PCI bus front-end the
only other front-end which supports media selection is and likely
ever will be the 'ledma' front-end I see not enough reason to break
the in-driver API for this (though one could argue both ways here).
Jeff Roberson [Sat, 20 Jan 2007 09:03:43 +0000 (09:03 +0000)]
- In tdq_transfer() always set NEEDRESCHED when necessary regardless of
the ipi settings. If NEEDRESCHED is set and an ipi is later delivered
it will clear it rather than cause extra context switches. However, if
we miss setting it we can have terrible latency.
- In sched_bind() correctly implement bind. Also be slightly more
tolerant of code which calls bind multiple times. However, we don't
change binding if another call is made with a different cpu. This
does not presently work with hwpmc which I believe should be changed.
Matt Jacob [Sat, 20 Jan 2007 04:00:21 +0000 (04:00 +0000)]
MFP4: Move default setting to the end of isp_reset instead of the
front of isp_init so we can read NVRAM even if we're role ISP_NONE.
Prepare for reintroduction of channels (for FC) for N-Port
Virtualization.
Fix a botch in handle assignment that caused us to nuke one device
when a new one arrives and end up with two devices with the same
identity in the virtual target mapping table.
Marius Strobl [Sat, 20 Jan 2007 00:56:49 +0000 (00:56 +0000)]
- Display the media instance numbers and allow the user to set the active
one. This is based on NetBSD but unlike NetBSD this implementation prints
the instance number for all media instances and doesn't skip it for the
first one as I don't see a reason to suppress it except for the vague
reason to preserve the output for single-instance configurations.
- Fix some whitespace nits.
Marius Strobl [Sat, 20 Jan 2007 00:55:03 +0000 (00:55 +0000)]
- In miibus_attach() remove IFM_IMASK from the dontcare_mask of the
ifmedia_init() invocation. IFM_IMASK makes only sense here when all of
the maxium of 32 PHYs on each one MII bus support disjoint sets of media,
which generally isn't the case (though it would be nice if we had a way
to let NIC drivers indicate that for the few card models where the PHY
configuration is known/fixed and IFM_IMASK actually makes sense).
- Add and use a miibus_print_child() for the bus_print_child method which
additionally prints the PHY number (which actually is the PHY address)
so one can figure out the media instance <-> PHY number mapping from the
PHY driver attach output. This is intented to be usefull in situations
where the addresses of the PHYs on the bus are known (f.e. of internal/
integrated PHYs) so one can feed the appropriate media instance number
to ifconfig(8) (with the upcoming change for ifconfig(8)).
This is more or less inspired by the NetBSD mii_print().
Marius Strobl [Sat, 20 Jan 2007 00:52:29 +0000 (00:52 +0000)]
- Don't set MIIF_NOISOLATE so ukphy(4) can be used in configurations with
multiple PHYs. In case some PHYs currently driven by ukphy(4) exhibit
problems when isolating due to incomplete implementations or silicon bugs
we'll need to add specific drivers for these. Looking at NetBSD and
OpenBSD I don't expect problems here though (quite the contrary; we still
seem to set MIIF_NOISOLATE without good reason in a bunch of PHY drivers).
- Fix a style(9) whitespace nit.
John Baldwin [Fri, 19 Jan 2007 22:37:52 +0000 (22:37 +0000)]
- Change the PCI-X registers constants to be relative to the PCI-X PCI
capability rather than hardcoded offsets for a particular card. While
I'm here, expand the constants some.
- Change the ahd(4) driver to use pci_find_extcap() to locate the PCI-X
capability to keep up with the first change.
Jeff Roberson [Fri, 19 Jan 2007 21:56:08 +0000 (21:56 +0000)]
Major revamp of ULE's cpu load balancing:
- Switch back to direct modification of remote CPU run queues. This added
a lot of complexity with questionable gain. It's easy enough to
reimplement if it's shown to help on huge machines.
- Re-implement the old tdq_transfer() call as tdq_pickidle(). Change
sched_add() so we have selectable cpu choosers and simplify the logic
a bit here.
- Implement tdq_pickpri() as the new default cpu chooser. This algorithm
is similar to Solaris in that it tries to always run the threads with
the best priorities. It is actually slightly more complex than
solaris's algorithm because we also tend to favor the local cpu over
other cpus which has a boost in latency but also potentially enables
cache sharing between the waking thread and the woken thread.
- Add a bunch of tunables that can be used to measure effects of different
load balancing strategies. Most of these will go away once the
algorithm is more definite.
- Add a new mechanism to steal threads from busy cpus when we idle. This
is enabled with kern.sched.steal_busy and kern.sched.busy_thresh. The
threshold is the required length of a tdq's run queue before another
cpu will be able to steal runnable threads. This prevents most queue
imbalances that contribute the long latencies.
Marius Strobl [Fri, 19 Jan 2007 11:15:34 +0000 (11:15 +0000)]
Convert the remainder of the low hanging fruits regarding including
headers in .S directly rather than getting to their macros through
genassym.c/assym.s so there are less headers genassym.c has to be
kept in sync with.
While at it fix some stytle(9) bugs (indentation, prototype format,
sort headers, etc) and remove trailing whitespace.
Warner Losh [Fri, 19 Jan 2007 01:16:35 +0000 (01:16 +0000)]
On FreeBSD/arm, any value > 50 bits will result in a rediculously huge
number being returned for mktime and timegm calls. Choose 48 because
that works well. This does reduce the dynamic range of tm_year from
about 2 billion years down to "only" about 9 million years. Please
contact me if this restriction poses a problem.
Due to the complexity of the code, I admit that I didn't trace down
what, exactly, was overflowing with longer bits. This fixes software
that we run on the embedded systems we have.
Marius Strobl [Thu, 18 Jan 2007 22:01:19 +0000 (22:01 +0000)]
- Add a uart_rxready() and corresponding device-specific implementations
that can be used to check whether receive data is ready, i.e. whether
the subsequent call of uart_poll() should return a char, and unlike
uart_poll() doesn't actually receive data.
- Remove the device-specific implementations of uart_poll() and implement
uart_poll() in terms of uart_getc() and the newly added uart_rxready()
in order to minimize code duplication.
- In sunkbd(4) take advantage of uart_rxready() and use it to implement
the polled mode part of sunkbd_check() so we don't need to buffer a
potentially read char in the softc.
- Fix some mis-indentation in sunkbd_read_char().
Marius Strobl [Thu, 18 Jan 2007 18:32:26 +0000 (18:32 +0000)]
- Rename UPA_BUS_SPACE to NEXUS_BUS_SPACE; besides an UPA bus, nexus(4)
may also reflect a Fireplane/Safari or JBus bus (or a virtual bus which
in turn reflects a JBus bus or something like that...).
- In the both the sparc64 and sun4v bus_machdep.c use __FBSDID.
- Spell SBus the official way in comments.
- Replace hardcoded function names (all of which were actually outdated)
in panic and status strings with __func__.
- Fix whitespace nits.
Gleb Smirnoff [Thu, 18 Jan 2007 13:55:21 +0000 (13:55 +0000)]
Revise the ng_ppp(4) node, so that code flow is more clear. All non-link
hooks get their per hook rcvdata methods, and all functions are organized
corresponding to protocol stack model.
Submitted by: Alexander Motin <mav alkar.net>
Reviewed by: archie, julian
Marius Strobl [Thu, 18 Jan 2007 13:52:44 +0000 (13:52 +0000)]
Remove the compat shims for the ISA old-stlye in{b,w,l}()/out{b,w,l}()
and friends along with all hacks required to implement them. None of
the drivers currently built (as part of GENERIC, LINT or modules) on
sparc64 or sun4v and none of those we might want to use there in
future uses them, AFAICT there actually never was a driver hooked up
to the sparc64 or sun4v build that correctly used these functions
(and it looks like that due to a bug read{b,w,l}()/write{b,w,l}() and
the other functions working on a memory handle never actually worked on
sun4v). All they ever were good for on sparc64 and sun4v was erroneously
dragging in dependencies on isa(4) in drivers like f.e. dpt(4), si(4)
and syscons(4) in source files that supposedly were bus-neutral and
hiding issues with drivers like f.e. ng_bt3c(4) that used these
functions with busses other than isa(4) and therefore couldn't work on
these platforms.
Marius Strobl [Thu, 18 Jan 2007 13:33:36 +0000 (13:33 +0000)]
Wrap the EISA-specific parts of the dpt(4) and si(4) back-ends in
the newly added DEV_EISA. This is done so that these back-ends can
be compiled on platforms not providing in{b,w,l}()/out{b,w,l}() and
friends (but may wish to use them together with bus front-ends other
than the EISA one).
Marius Strobl [Thu, 18 Jan 2007 13:08:08 +0000 (13:08 +0000)]
On sparc64 also use the fillw() this header provides for ia64 so
the sparc64 MD code doesn't need to provide a memsetw() along with
the ISA compat cruft.
Randall Stewart [Thu, 18 Jan 2007 09:58:43 +0000 (09:58 +0000)]
- most all includes (#include <>) migrate to the sctp_os_bsd.h file
- Finally all splxx() are removed
- Count error fixed in mapping array which might
cause a wrong cumack generation.
- Invariants around panic for case D + printf when no invariants.
- one-to-one model race condition fixed by using
a pre-formed connection and then completing the
work so accept won't happen on a non-formed
association.
- Some additional paranoia checks in sctp_output.
- Locks that were missing in the accept code.
Add support for LINUX_O_DIRECT, LINUX_O_DIRECT and LINUX_O_NOFOLLOW flags
to open() [1].
Improve locking for accessing session control structures [2].
Try to document (most likely harmless) races in the code [3].
Based on submission by: Intron (intron at intron ac) [1]
Reviewed by: jhb [2]
Discussed with: netchild, rwatson, jhb [3]
Correct errors in previous commit. I didn't realize that ${CPUTYPE} is
passed unmodified to gcc. Therefore, "prescott" should be used for Prescott,
Nocona, Core and Core 2 CPUs when building 32-bit code, and "nocona" should
be used for Prescott, Nocona and Core 2 CPUs when building 64-bit code.
On i386, make "prescott" an alias for "nocona" (instead of the other way
around), and introduce "core", along with the alias "core2". All of these
enable SSE3.
Add 3436 file system regression tests in 184 files.
Almost all regression tests are based on very flexible fstest tool.
They verify correctness (POSIX conformance) of almost all file
system-related system calls.
The motivation behind this work is my ZFS port and POSIX, who doesn't
provide free test suites.
Runs on: FreeBSD/UFS, FreeBSD/ZFS, Solaris/UFS, Solaris/ZFS
To try it out:
# cd fstest
# make
# find tests/* -type d | xargs prove
Olivier Houchard [Wed, 17 Jan 2007 00:58:25 +0000 (00:58 +0000)]
Create bus dma tags for both the PCI bus and the IXP425 root bus. Set the
PCI bus' one as the default one, and explicitely use the other one for
non-PCI devices.
This is needed because the PCI bus can only address 64MB of RAM, while some
IXP425 boards have 128MB or more, and most of the PCI drivers do not bother
providing the parent dma tag.
Olivier Houchard [Wed, 17 Jan 2007 00:53:05 +0000 (00:53 +0000)]
- Add bounce pages for arm, largely based on the i386 implementation.
- Add a default parent dma tag, similar to what has been done for sparc64.
- Before invalidating the dcache in POSTREAD, save the bits which are in the
same cachelines than our buffers, but not part of it, and restore them after
the invalidation.
Tom Rhodes [Tue, 16 Jan 2007 23:43:14 +0000 (23:43 +0000)]
Add a 3rd entry in the cache, which keeps the end position
from just before extending a file. This has the desired effect
of keeping the write speed constant. And yes, that helps a lot
copying large files always at full speed now, and I have seen
improvements using benchmarks/bonnie.
Marius Strobl [Tue, 16 Jan 2007 22:08:27 +0000 (22:08 +0000)]
Resurrect upa(4), now used for the subordinate/slave UPA bridge and
bus hanging off from the Fireplane/Safari bus in some USIII machines.
This is part 3/4 of allowing creator(4) to work in these machines.
The little info needed on how to configure the bridge and to work
around the incorrect values contained in the `interrupts' properties
of its children were obtained form OpenSolaris.
Marius Strobl [Tue, 16 Jan 2007 21:08:22 +0000 (21:08 +0000)]
- Merge sys/sparc64/creator/creator_upa.c into sys/dev/fb/creator.c.
The separate bus front-end was inherited from the OpenBSD creator(4),
which at that time had a mainbus(4) (for USI/II machines, which use
an UPA interconnection bus as the nexus) and an upa(4) (for USIII
machines, which use a subordinate/slave UPA bus hanging off from the
Fireplane/Safari interconnection bus) front-end. With FreeBSD and
newbus there is/will be no need to have two separate bus front-ends
for these busses, so we can easily coallapse the shared front-end
and the back-end into a single source file (note that the FreeBSD
creator_upa.c was misnomer anyway; based on what it actually attached
to that should have been creator_nexus.c), actually OpenBSD meanwhile
also has moved to a shared front-end and a single source file. Due
to the low-level console support creator.c also wasn't free from bus
related things before.
While at it, also split sys/sparc64/creator/creator.h into a
sys/dev/fb/creatorreg.h that only contains register macros and move
the structures to the top of sys/dev/fb/creator.c as suggested by
style(9) so creator(4) is no longer scattered over two directories.
- Use OF_decode_addr()/sparc64_fake_bustag() to obtain the bus tags and
handles for the low-level console support instead of hardcoding
support for AFB/FFB hanging off from nexus(4) only. This is part 2/4
of allowing creator(4) to work in USIII machines (which have a UPA
bus hanging off from the Fireplane/Safari bus reflected by the nexus),
which already makes it work as the low-level console there.
- Allocate resources in the bus attach routine regardless of whether
creator(4) is used as for the low-level console and thus the required
bus tags and handles have been already obtained or not so the resources
are marked as taken in the respective RMAN.
- For both obtaining the bus tags and handles for the low-level console
support as well as allocating the corresponding resources in the
regular bus attach routine don't bother to get all for the maximum of
24 register banks but only (for) the two tag/handle pairs required for
providing the video interface for syscons(4) support. If we can't
allocate the rest of them just limit the memory range accessible via
creator_fb_mmap() accordingly.
- Sanity check the memory range spanned by the first and last resources
and the resources in between as far as possible, as the XFree86/Xorg
sunffb(4) expects to be able to access the whole region, even though
the backing resources are actually non-continuous. Limit and check
the memory range accessible via creator_fb_mmap() accordingly.
- Reduce the size of buffers for OFW properties to what they actually
need to hold.
- Rename some tables to creator_<foo> for consistency.
- Also for the sizes in the creator_fb_mmap() mapping table entries use
macros for consistency, add macros for the remaining register banks
for completeness.
Marius Strobl [Tue, 16 Jan 2007 20:42:21 +0000 (20:42 +0000)]
Teach OF_decode_addr() about the bus space used for devices on the
nexus (which might or might not reflect an UPA interconnection bus;
accordingly UPA_BUS_SPACE should be renamed to NEXUS_BUS_SPACE at a
later point) and subordinate/slave UPA busses. This is part 1/4 of
allowing creator(4) to work in USIII machines (which have a UPA bus
hanging off from the Fireplane/Safari bus reflected by the nexus).
Marius Strobl [Tue, 16 Jan 2007 20:35:23 +0000 (20:35 +0000)]
o In re_newbuf() and re_encap() if re_dma_map_desc() aborts the mapping
operation as it ran out of free descriptors or if there are too many
segments in the first place, call bus_dmamap_unload() in order to
unload the already loaded segments.
For trying to map the defragmented mbuf (chain) in re_encap() this
introduces re_dma_map_desc() setting arg.rl_maxsegs to 0 as a new
failure mode. Previously we just ignored this case, corrupting our
view of the TX ring.
o In re_txeof():
- Don't clear IFF_DRV_OACTIVE unless there are at least 4 free TX
descriptors. Further down the road re_encap() will bail if there
aren't at least 4 free TX descriptors, causing re_start() to
abort and prepend the dequeued mbuf again so it makes no sense
to pretend we could process mbufs again when in fact we won't.
While at it replace this magic 4 with a macro RL_TX_DESC_THLD
throughout this driver.
- Don't cancel the watchdog timeout as soon as there's at least one
free TX descriptor but instead only if all descriptors have been
handled. It's perfectly normal, especially in the DEVICE_POLLING
case, that re_txeof() is called when only a part of the enqueued
TX descriptors have been handled, causing the watchdog to be
disarmed prematurely.
o In re_encap():
- If m_defrag() fails just drop the packet like other NIC drivers
do. This should only happen when there's a mbuf shortage, in which
case it was possible to end up with an IFQ full of packets which
couldn't be processed as they couldn't be defragmented as they
were taking up all the mbufs themselves. This includes adjusting
re_start() to not trying to prepend the mbuf (chain) if re_encap()
has freed it.
- Remove dupe initialization of members of struct rl_dmaload_arg to
values that didn't change since trying to process the fragmented
mbuf chain.
While at it remove an unused member from struct rl_dmaload_arg.
o In re_start() remove a abandoned, banal comment. The corresponding
code was moved to re_attach() some time ago.
With these changes re(4) now survives one day (until stopped) of
hammering out packets here.
John Baldwin [Tue, 16 Jan 2007 17:04:42 +0000 (17:04 +0000)]
Fix the subvendor ID for PCI-PCI bridges.
- Retire the PCI_SUB*_1 constants and don't try to read a subvendor ID out
of them. There isn't a standard subvendor ID field for PCI-PCI bridges.
Instead, the dword at offset 0x34 is actually mostly reserved except for
the LSB which is the capabilities pointer.
- Add support for the PCI-PCI bridge subvendor ID capability (13) and use
it to set the subvendor ID for PCI-PCI bridges.
When we try to set set-gid bit with chmod(2) on a file, which we own, but our
effective group ID (and any of our group) doesn't match the group ID of the
file, we get EPERM. This doesn't conform POSIX. POSIX requires that we should
return 0, but silently clear the set-gid bit.