Bring busdmafied sk(4) to all architectures.
- MPSAFE. No more recursive lock required.
- bus_dma(9) conversion. I think it should work on all architectures.
- optimized Rx handler for each normal and jumbo frames. Previously
sk(4) used jumbo frame management code to handle normal sized
frames. As the handler needs an additional lock to protect jumbo
frame management structure from races, it used two lock operations
for each received packet. Now sk(4) uses single lock operation for
normal frame.(Jumbo frame still needs two lock operations as before.)
The hardware supports DMA scatter operations for Rx descriptors such
that it's possible to take advantagee of m_cljget(9) for jumbo frames.
However, due to a unknown reasons it resulted in poor performance on
sparc64. So I dropped m_cljget(9) approach. This should be revisited
since it would reduce one lock operation for jumbo frame handling.
- Tx TCP/Rx IP checksum offload support. According to the data sheet
of SK-NET GENESIS the hardware supports Rx IP/TCP/UDP offload.
But I couldn't make it work on my Yukon hardware. So Rx TCP/UDP was
disabled at the moment. It seems that newer Yukon chips can support
Tx UDP checksum offload too. But I need more documentation first.
- Added more wait time in reading VPD data. It seems that ASUS LOM
takes a very long time to respond VPD read signal.
- Added an additional lock for MII register access callbacks.
- Added more strict received packet validation routine. Previously it
passed corrupted packets to upper layers under certain conditions.
- A new function sk_yukon_tick() to handle auto-negotiation properly.
- Interrupt handler now checks shared interrupt source and protects
the interrupt handler from NULL pointer dereference which was caused
by odd status word value. The status word can returns 0xffffffff if
cable is unplugged while Rx/Tx/auto-negotiation is in progress.
- suspend/resume support(not tested).
- Added Rx/Tx FIFO flush routine for Yukon
- Activate Tx descriptor poll timer in order to protect possible loss
of SK_TXBMU_TX_START command. Previously the driver continuously issued
SK_TXBMU_TX_START when it notices pending Tx descriptors not processed
yet in interrupt handler. That approach would add additional PCI
write access overhead under high Tx load situations and it might fail
if the first SK_TXBMU_TX_START was lost and no interrupt is generated
from the first SK_TXBMU_TX_START command.
- s/printf/if_printf/, s/printf/device_printf/, Axe sk_unit in softc.
- Setting multicast/station address is now safe on strict-alignment
architectures.
- Fix long standing bug in VLAN header length setup.
- Added/corrected register definitions for Yukon.
(Register information from Linux skge driver.)
- Added Rx status definition for Marvell Yukon/XaQti XMAC.
(Rx status register information from Linux skge driver.)
- Update if_oerrors if we encounter watchdog error.
- callout(9) conversion
Special thanks to jkim who let me know RX status differences between
Yukon and XaQti XMAC.
It seems that there is still occasional watchdog timeout error but I
couldn't reproduce it and need more information to analyze it from
users.
Tested by: bz(amd64), me(i386, sparc64), current ML
Frank Behrens frank ! pinky ( sax $ de
In the case when reset via keyboard controller doesn't work for some reason
(i.e. no keyboard controller present), try two other common methods for
resetting i386 machine - pci reset and port 0x92 fast reset. Only if neither
works warn user and resort to "unmap entire address space and hope for good"
hack. This makes my MacBook Pro rebooting just fine and should also help
other legacy-free hardware out there.
Also, disable interrupts unconditionally in cpu_reset_real(), since we don't
want any interference.
The size of I/O ranges can be anything from 16 bytes to 2G bytes.
Lower the minimum for memory mapped I/O from 32 bytes to 16 bytes.
This fixes bus enumeration on ia64 now that the Diva auxiliary
serial port is attached to.
Peter Wemm [Wed, 26 Apr 2006 21:49:20 +0000 (21:49 +0000)]
MFamd64: shrink pv entries from 24 bytes to about 12 bytes. (336 pv entries
per page = effectively 12.19 bytes per pv entry after overheads).
Instead of using a shared UMA zone for 24 byte pv entries (two 8-byte tailq
nodes, a 4 byte pointer, and a 4 byte address), we allocate a page at a
time per process. This provides 336 pv entries per process (actually, per
pmap address space) and eliminates one of the 8-byte tailq entries since
we now can track per-process pv entries implicitly. The pointer to
the pmap can be eliminated by doing address arithmetic to find the metadata
on the page headers to find a single pointer shared by all 336 entries.
There is an 11-int bitmap for the freelist of those 336 entries.
This is mostly a mechanical conversion from amd64, except:
* i386 has to allocate kvm and map the pages, amd64 has them outside of kvm
* native word size is smaller, so bitmaps etc become 32 bit instead of 64
* no dump_add_page() etc stuff because they are in kvm always.
* various pmap internals tweaks because pmap uses direct map on amd64 but
on i386 it has to use sched_pin and temporary mappings.
Also, sysctl vm.pmap.pv_entry_max and vm.pmap.shpgperproc are now
dynamic sysctls. Like on amd64, i386 can now tune the pv entry limits
without a recompile or reboot.
This is important because of the following scenario. If you have a 1GB
file (262144 pages) mmap()ed into 50 processes, that requires 13 million
pv entries. At 24 bytes per pv entry, that is 314MB of ram and kvm, while
at 12 bytes it is 157MB. A 157MB saving is significant.
Sam Leffler [Wed, 26 Apr 2006 16:02:36 +0000 (16:02 +0000)]
intercept public safety channels and do explicit mapping of freq->ieee
channel number since we're not ready at the net80211 layer to deal with them;
note this mapping has to match what's done in ieee80211_mhz2ieee
Sam Leffler [Wed, 26 Apr 2006 16:00:37 +0000 (16:00 +0000)]
back out public safety-specific channel number mapping; we can't do
it until we know it should be applied as otherwise we can map 11a
channels into the 2.4G range and choose the wrong item from the
chanenl array
Robert Watson [Wed, 26 Apr 2006 14:18:55 +0000 (14:18 +0000)]
Reconstitute struct mac_policy_ops by breaking out individual function
pointer prototypes from it into their own typedefs. No functional or
ABI change. This allows policies to declare their own function
prototypes based on a common definition from mac_policy.h rather than
duplicating these definitions.
Use the same method for detecting actual presence of AT-style keyboard
controller as we use in boot blocks (querying status register until
bit 1 goes off). If that doesn't happed during reasonable period assume
that the hardware doesn't have AT-style keyboard controller. This makes
FreeBSD working almost OOB on MacBook Pro (still there are issues with
putting second CPU core on-line, but since installation CD comes with
UP kernel with this change one should be able to install FreeBSD without
playing tricks with hints). Other legacy-free hardware (e.g. IBM NetVista
S40) should benefit from this as well, but since I don't have any I can't
verify.
It should make no difference on the ordinary i386 hardware (since in
that case that hardware already would be having an issues with A20
routines in boot blocks). I don't know much about AT-style keyboard
controller on other platforms (and don't have dedicated access to one),
therefore, the code is restricted to i386 for now. I suspect that amd64
may need this as well, but I would rather leave this decision to someone
who knows better about the platform(s) in question.
I have tested this change on as many "ordinary i386 boxes" as I can get
my hands on, and it doesn't create any false negatives on hardware with
AT-style keyboard present.
John Baldwin [Tue, 25 Apr 2006 20:34:04 +0000 (20:34 +0000)]
- Overhaul the 'ps' command in ddb to be mostly readable again. :) It is
now back to using fixed-size columns for output and each line of output
should fit in 80 columns on both 32-bit and 64-bit architectures. In
general the output is close to that of the userland ps(1) with the
exception that the 'wmesg' field is mostly similar to the "state" field
in top(1) in that it will show either a wmesg, a lock name (prefixed with
an *), "CPU xx" (for a running thread), or nothing if none of those three
conditions are true. It also respects td_name when listing threads in
a multithreaded process. There is a somewhat evilly-defined PTR64 macro
I use to make account for the change in the size of the 'wchan' column
in the formatted output (wchan is now the only pointer in the ps output
and is available so it can be passed to 'show sleepq', 'show turnstile',
or 'show lock').
- Add two new commands "show proc [process]" and "show thread [thread]"
that show details about the specified process or thread (specified
either by pid/tid or pointer), respectively. If an address it not
specified, it uses the current kdb thread.
John Baldwin [Tue, 25 Apr 2006 20:28:17 +0000 (20:28 +0000)]
Add some new commands to hopefully make it easier to diagnose lock-related
problems in ddb:
- "show threadchain [thread]" will start with the specified thread (or the
current kdb thread by default) and show it's state. If it is blocked on
a lock, it will find the owner of the lock and show its state, etc.
- "show allchains" will find all of the threads that are blocked on a
lock (but do not have any threads blocked on a lock they hold) and show
the resulting thread chain.
- "show lockchain <lock>" takes a pointer to a lock_object (such as a
mutex or rwlock). If there is a turnstile for that lock, then it will
display all the threads blocked on the lock. In addition, for each
thread blocked on the lock, it will display any contested locks they
hold, and recurse on those locks to show any threads blocked on those
locks, etc.
John Baldwin [Tue, 25 Apr 2006 20:24:23 +0000 (20:24 +0000)]
Use db_lookup_thread() to lookup the thread for the passed in address
and change 'show locks' to only list the locks for a given thread
rather than for all the threads in the process containing a specified
thread.
John Baldwin [Tue, 25 Apr 2006 20:22:48 +0000 (20:22 +0000)]
Add two helper functions: db_lookup_thread() and db_lookup_proc(). They
take the addr value passed to a ddb command and attempt to use it to
lookup a struct thread * or struct proc *, respectively. Each function
first reparses the passed in value as if it was an ID entered in base 10.
For threads the ID is treated as a thread ID, for proceses the ID is
treated as a PID. If a thread or proc matching the ID is found, it is
returned. For db_lookup_thread(), if the check_pid argument is true and
it didn't find a thread with a matching thread ID, it will treat the ID as
a PID and look for a matching process. If it finds one it returns the
first thread in the process. If none of the ID lookups succeeded, then
the functions assume that the passed in address is a thread or proc
pointer, respectively. This allows one to use tids, pids, or structure
pointers interchangeably in ddb functions that want to lookup threads or
processes if desired.
o Set to zero engine_type, engine_id and pad (cisco calls it
sampling_interval) fields in netflow v5 header. We do not use
them but some netflow tools show garbage.
John Baldwin [Tue, 25 Apr 2006 18:42:22 +0000 (18:42 +0000)]
Use PTOV() to convert physical addresses to appropriate virtual addresses
in the loader when searching for the ACPI RSDP. (The loader runs in a flat
mode with va 0 == pa 0xa000.)
Robert Watson [Tue, 25 Apr 2006 17:38:08 +0000 (17:38 +0000)]
Rename 'last' to 'inp' in udp_append(): the name 'last' is due to
the fact that the loop through inpcb's in udp_input() tracks the
last inpcb while looping. We keep that name in the calling loop
but not in the delivery routine itself.
Robert Watson [Tue, 25 Apr 2006 11:48:16 +0000 (11:48 +0000)]
Extend getsock() to return the struct file flags read while holding the
file lock, in the style of fgetsock().
Modify accept1() to use getsock() instead of fgetsock(), relying on the
file descriptor reference rather than an acquired socket reference to
prevent the listen socket from being destroyed during accept(). This
avoids additional reference count operations, which should improve
performance, and also avoids accept1() operating on a socket whose file
descriptor has been torn down, which may have resulted in protocol
shutdown starting.
Robert Watson [Tue, 25 Apr 2006 11:17:35 +0000 (11:17 +0000)]
Abstract inpcb drop logic, previously just setting of INP_DROPPED in TCP,
into in_pcbdrop(). Expand logic to detach the inpcb from its bound
address/port so that dropping a TCP connection releases the inpcb resource
reservation, which since the introduction of socket/pcb reference count
updates, has been persisting until the socket closed rather than being
released implicitly due to prior freeing of the inpcb on TCP drop.
Bump up the NFS server dupreq cache limit to 2K (from 64). With a small
duplicate request cache, under heavy load a lot of non-idempotent requests
were getting served again, resulting in errors.
Jung-uk Kim [Tue, 25 Apr 2006 00:06:37 +0000 (00:06 +0000)]
Check if reported HTT cores are physical cores. This commit does not
affect AMD CPUs at all because HTT bit is disabled earlier. Intel
multicore CPUs and ULE scheduler may be affected.
o Move ISA specific code from ppc.c to ppc_isa.c -- a bus front-
end for isa(4).
o Add a seperate bus frontend for acpi(4) and allow ISA DMA for
it when ISA is configured in the kernel. This allows acpi(4)
attachments in non-ISA configurations, as is possible for ia64.
o Add a seperate bus frontend for pci(4) and detect known single
port parallel cards.
o Merge PC98 specific changes under pc98/cbus into the MI driver.
The changes are minor enough for conditional compilation and
in this form invites better abstraction.
o Have ppc(4) usabled on all platforms, now that ISA specifics
are untangled enough.
Colin Percival [Mon, 24 Apr 2006 21:17:01 +0000 (21:17 +0000)]
Adjust dangerous-shared-cache-detection logic from "all shared data
caches are dangerous" to "a shared L1 data cache is dangerous". This
is a compromise between paranoia and performance: Unlike the L1 cache,
nobody has publicly demonstrated a cryptographic side channel which
exploits the L2 cache -- this is harder due to the larger size, lower
bandwidth, and greater associativity -- and prohibiting shared L2
caches turns Intel Core Duo processors into Intel Core Solo processors.
As before, the 'machdep.hyperthreading_allowed' sysctl will allow even
the L1 data cache to be shared.
Discussed with: jhb, scottl
Security: See FreeBSD-SA-05:09.htt for background material.
Robert Watson [Mon, 24 Apr 2006 08:20:02 +0000 (08:20 +0000)]
Instead of calling tcp_usr_detach() from tcp_usr_abort(), break out
common pcb tear-down logic into tcp_detach(), which is called from
either. Invoke tcp_drop() from the tcp_usr_abort() path rather than
tcp_disconnect(), as we want to drop it immediately not perform a
FIN sequence. This is one reason why some people were experiencing
panics in sodealloc(), as the netisr and aborting thread were
simultaneously trying to tear down the socket. This bug could often
be reproduced using repeated runs of the listenclose regression test.
MFC after: 3 months
PR: 96090
Reported by: Peter Kostouros <kpeter at melbpc dot org dot au>, kris
Tested by: Peter Kostouros <kpeter at melbpc dot org dot au>, kris
David Malone [Sun, 23 Apr 2006 17:06:18 +0000 (17:06 +0000)]
Add some new options to mac_bsdestended. We can now match on:
subject: ranges of uid, ranges of gid, jail id
objects: ranges of uid, ranges of gid, filesystem,
object is suid, object is sgid, object matches subject uid/gid
object type
We can also negate individual conditions. The ruleset language is
a superset of the previous language, so old rules should continue
to work.
These changes require a change to the API between libugidfw and the
mac_bsdextended module. Add a version number, so we can tell if
we're running mismatched versions.
Update man pages to reflect changes, add extra test cases to
test_ugidfw.c and add a shell script that checks that the the
module seems to do what we expect.
Robert Watson [Sun, 23 Apr 2006 15:33:38 +0000 (15:33 +0000)]
Move handling of SQ_COMP exception case in sofree() to the top of the
function along with the remainder of the reference checking code. Move
comment from body to header with remainder of comments. Inclusion of a
socket in a completed connection queue counts as a true reference, and
should not be handled as an under-documented edge case.
Robert Watson [Sun, 23 Apr 2006 15:23:31 +0000 (15:23 +0000)]
Update natm PCB debugging code:
- Depend on opt_ddb.h, since npcb_dump() is ifdef'd DDB.
- Include ddb/ddb.h so we can call db_printf() and use DB_SHOW_COMMAND().
- Don't test results of malloc() under DIAGNOSTIC, let the memory allocator
take care of its own invariants.
Robert Watson [Sun, 23 Apr 2006 15:06:16 +0000 (15:06 +0000)]
Modify in6_pcbpurgeif0() to accept a pcbinfo structure rather than a pcb
list head structure; this improves congruence to IPv4, and also allows
in6_pcbpurgeif0() to lock the pcbinfo. Modify in6_pcbpurgeif0() to lock
the pcbinfo before iterating the pcb list, use queue(9)'s LIST_FOREACH()
for the iteration, and to lock individual inpcb's while manipulating
them.
MFother arches :
date: 2006/04/12 04:22:50; author: alc; state: Exp; lines: +14 -41
Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap. To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.
Robert Watson [Sat, 22 Apr 2006 19:23:24 +0000 (19:23 +0000)]
Introduce a new TCP mutex, isn_mtx, which protects the initial sequence
number state, rather than re-using pcbinfo. This introduces some
additional mutex operations during isn query, but avoids hitting the TCP
pcbinfo lock out of yet another frequently firing TCP timer.
Robert Watson [Sat, 22 Apr 2006 19:10:02 +0000 (19:10 +0000)]
Remove pcbinfo locking from in_setsockaddr() and in_setpeeraddr();
holding the inpcb lock is sufficient to prevent races in reading
the address and port, as both the inpcb lock and pcbinfo lock are
required to change the address/port.
Improve consistency of spelling in assertions about inp != NULL.
Ariff Abdullah [Sat, 22 Apr 2006 09:44:00 +0000 (09:44 +0000)]
Add support for (latest) VIA VT8251 (rev. 0x07) audio controller.
A slight difference of this chip from its previous siblings is that
it need a gentle "wake up" on every (full) DMA buffer completion to
avoid stalled interrupt handler.
Thanks to George Hartzell for permission on doing remote debugging.
Prime MFC candidate for 6.1-RELEASE. Please reply to this commit if
there are any objections (so I won't bug re@), since the changes
are too small and only specific to VT8251.
PR: i386/95949
Tested by: [1] George Hartzel
myself (remotely)
MFC after: 3 days
Paul Saab [Fri, 21 Apr 2006 19:26:21 +0000 (19:26 +0000)]
Don't try to kill embryonic processes in killpg1(). This prevents
a race condition between fork() and kill(pid,sig) with pid < 0 that
can cause a kernel panic.