jhb [Tue, 20 Mar 2007 21:53:31 +0000 (21:53 +0000)]
Add a new apic0 psuedo-device to claim memory resources for the memory
address ranges used by local and I/O APICs in the system. Some systems
also reserve these ranges as system resources via either PnPBIOS or
ACPI, so this device currently attaches after acpi0 and legacy0 so that
the system resources are given precedence.
jhb [Tue, 20 Mar 2007 21:08:39 +0000 (21:08 +0000)]
Add a new ram0 pseudo-device that claims memory resouces for physical
addresses corresponding to system RAM. On amd64 ram0 uses the SMAP
and claims all the type 1 SMAP regions. On i386 ram0 uses the
dump_avail[] array. Note that on i386 we have to ignore regions above
4G in PAE kernels since bus resources use longs.
jkim [Tue, 20 Mar 2007 20:22:45 +0000 (20:22 +0000)]
- Add macros for newly added CPUID bits in the corresponding header files.
- Use correct capticalization in xTPR as Intel uses in their documents.
- Use proper description instead of vendor code name in comment.
jhb [Tue, 20 Mar 2007 20:21:44 +0000 (20:21 +0000)]
Tweak the probe/attach order of devices on the x86 nexus devices.
Various BIOS-related psuedo-devices are added at an order of 5. acpi0 is
added at an order of 10, and legacy0 is added at an order of 11.
bms [Tue, 20 Mar 2007 13:15:20 +0000 (13:15 +0000)]
Increase default size of raw IP send and receive buffers to the same as
udp_sendspace, to avoid a situation where jumbograms (datagrams > 9KB)
are unnecessarily fragmented.
A common use case for this is OSPF link-state database synchronization
during adjacency bringup on a high speed network with a large MTU.
It is not possible to auto-tune this setting until a socket is bound to
a given interface, and because the laddr part of the inpcb tuple may be
overridden, it makes no sense to do so. Applications may request a larger
socket buffer size by using the SO_SENDBUF and SO_RECVBUF socket options.
Certain applications such as Quagga ospfd do not probe for interface MTU
and therefore do not increase SO_SENDBUF in this use case.
XORP is not affected by this problem as it preemptively uses SO_SENDBUF
and SO_RECVBUF to account for any possible additional latency in XRL IPC.
PR: kern/108375
Requested by: Vladimir Ivanov
MFC after: 1 week
rrs [Tue, 20 Mar 2007 10:23:11 +0000 (10:23 +0000)]
- window update sacks sent incorrectly after
shutdown which caused extra abort from peer.
- RTT time calculation was not being done in
express sack handling since it refered to an unused
variable (rto_pending). Removed variable.
- socket buffer high water access macro-ized.
kmacy [Tue, 20 Mar 2007 06:21:47 +0000 (06:21 +0000)]
cxgb_stop is only called from cxgb_ioctl so:
- don't acquire port lock, already held in ioctl
- rename to cxgb_stop_locked
- switch callout_drain to callout_stop to avoid a hang from having the port lock held
jasone [Tue, 20 Mar 2007 03:44:10 +0000 (03:44 +0000)]
Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order to
avoid substantial potential bloat for static binaries that do not
otherwise use any printf(3)-family functions. [1]
Rearrange arena_run_t so that the region bitmask can be minimally sized
according to constraints related to each bin's size class. Previously,
the region bitmask was the same size for all run headers, which wasted
a measurable amount of memory.
Rather than making runs for small objects as large as possible, make
runs as small as possible such that header overhead stays below a
certain bound. There are two exceptions that override the header
overhead bound:
1) If the bound is impossible to honor, it is relaxed on a
per-size-class basis. Since there is one bit of header
overhead per object (plus a constant), it is impossible to
achieve a header overhead less than or equal to 1/(# of bits
per object). For the current setting of maximum 0.5% header
overhead, this relaxation comes into play for {2, 4, 8,
16}-byte objects, for which header overhead is (on 64-bit
systems) {7.1, 4.3, 2.2, 1.2}%, respectively.
2) There is still a cap on small run size, still set to 64kB.
This comes into play for {1024, 2048}-byte objects, for which
header overhead is {1.6, 3.1}%, respectively.
In practice, this reduces the run sizes, which makes worst case
low-water memory usage due to fragmentation less bad. It also reduces
worst case high-water run fragmentation due to non-full runs, but this
is only a constant improvement (most important to small short-lived
processes).
Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that
the external fragmentation reduction makes 1MB the new sweet spot (as
small as possible without adversely affecting performance).
njl [Tue, 20 Mar 2007 00:58:19 +0000 (00:58 +0000)]
If we got an OBE/IBF event, we failed to re-enable the GPE. This would
cause the EC to stop handling future events because the GPE stayed masked.
Set a flag when queueing a GPE handler since it will ultimately re-enable
the GPE. In all other cases, re-enable it ourselves. I reworked the
patch from the submitter.
bms [Tue, 20 Mar 2007 00:36:10 +0000 (00:36 +0000)]
Implement reference counting for ifmultiaddr, in_multi, and in6_multi
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.
This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.
With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.
Obtained from: p4 branch bms_netdev
Reviewed by: andre
Sponsored by: Garance A Drosehn
Idea from: NetBSD
MFC after: 1 month
jkim [Mon, 19 Mar 2007 23:17:39 +0000 (23:17 +0000)]
Revert couple of changes from 1.51 and 1.52. Reading link status with BMSR
is okay for most of the chipsets but BCM5701 PHY does not seem to like it.
Set media to IFM_NONE if link is not up instead of the previous value.
brian [Mon, 19 Mar 2007 18:51:02 +0000 (18:51 +0000)]
When we write extended attributes, assert that the inode hasn't
already been deleted. The assertion is important to show that
we won't end up accounting for extended attribute blocks (using
fs_pendingblocks) in our subsequent call to fs_alloc().
bms [Mon, 19 Mar 2007 18:39:36 +0000 (18:39 +0000)]
Clean up the ether_input() path by using the M_PROMISC flag.
Main points of this change:
* Drop frames immediately if the interface is not marked IFF_UP.
* Always trim off the frame checksum if present.
* Always use M_VLANTAG in preference to passing 802.1Q frames
to consumers.
* Use __func__ consistently for KASSERT().
* Use the M_PROMISC flag to detect situations where ether_input()
may reenter itself on the same call graph with the same mbuf which
was promiscuously received on behalf of subsystems such as
netgraph, carp, and vlan.
* 802.1P frames (that is, VLAN frames with an ID of 0) will now be
passed to layer 3 input paths.
* Deal with the special case for CARP in a sane way.
This is a significant rewrite of code on the critical path. Please report
any issues to me if they arise. Frames will now only pass through dummynet
if M_PROMISC is cleared, to avoid problems with re-entry.
The handling of CARP needs to be revisited architecturally. The M_PROMISC
flag may potentially be demoted to a link-layer flag only as it is in
NetBSD, where the idea originated.
Discussed on: net
Idea from: NetBSD
Reviewed by: yar
MFC after: 1 month
andre [Mon, 19 Mar 2007 18:35:13 +0000 (18:35 +0000)]
Maintain a pointer and offset pair into the socket buffer mbuf chain to
avoid traversal of the entire socket buffer for larger offsets on stream
sockets.
rrs [Mon, 19 Mar 2007 11:11:16 +0000 (11:11 +0000)]
Adds a hash table to speed local address lookup
on a per VRF basis (BSD has only one VRF currently).
Hash table is sized to 16 but may need to be adjusted
for machines with large numbers of addresses.
Reviewed by: gnn
rrs [Mon, 19 Mar 2007 06:53:02 +0000 (06:53 +0000)]
- errno -> becomes error in sctp_output.c and sctputil.c
- SB_CLEAR macro defined and used for sb clearing.
- Fix for CMT express_sack_handling did not do proper
pseudo-cumack updates.
- Get rid of extraneous function that was never used ip_2_ip6_hdr()
- Fixed source address selection bug (initialization problem).
- Source address selection debug added.
rik [Sun, 18 Mar 2007 23:28:53 +0000 (23:28 +0000)]
Give a chance for packet to appear with a correct input interfaces
in case of multiple interfaces with the same MAC in the same bridge.
This commit do not solve the entire problem. Only case where packet
arrived from such interface.
PR: kern/109815
MFC after: 7 days
Submitted by: Eygene Ryabinkin and rik@
Discussed with: bms@, thompsa@, yar@
njl [Sun, 18 Mar 2007 01:03:03 +0000 (01:03 +0000)]
Disable burst mode by default. Testing has shown that while it works on
most systems, it causes the EC not to respond for some Acer and Compaq/HP
laptops. This is the default value for Linux also. For systems that need
it, burst mode can be enabled via the tunable/sysctl:
debug.acpi.ec.burst="1"
jeff [Sat, 17 Mar 2007 18:18:08 +0000 (18:18 +0000)]
- Turn all explicit giant acquires into conditional VFS_LOCK_GIANTs.
Only ops which used namei still remained.
- Implement a scheme for reducing the overhead of tracking which vops
require giant by constantly reducing the number of recursive giant
acquires to one, leaving us with only one vfslocked variable.
- Remove all NFSD lock acquisition and release from the individual nfs
ops. Careful examination has shown that they are not required. This
greatly simplifies the code.
Sponsored by: Isilon Systems, Inc.
Discussed with: rwatson
Tested by: kkenn
Approved by: re
jeff [Sat, 17 Mar 2007 18:13:32 +0000 (18:13 +0000)]
- Cast the intermediate value in priority computtion back down to
unsigned char. Weirdly, casting the 1 constant to u_char still produces
a signed integer result that is then used in the % computation. This
avoids that mess all together and causes a 0 pri to turn into 255 % 64
as we expect.
kmacy [Sat, 17 Mar 2007 06:40:09 +0000 (06:40 +0000)]
Fix the most obvious of the bugs introduced by recent syncache changes
- *ip is not initialized in the case of inet6 connection, but ip->ip_len is
being changed anyway
Now the question is, why does it think an ipv4 connection is an ipv6 connection?
xemacs still doesn't work over X11 forwarding, but the kernel no longer panics.
rwatson [Fri, 16 Mar 2007 19:18:49 +0000 (19:18 +0000)]
Revert/re-make previous commit in a manner that maintains hyphenation of
extended attributes. I'm not sure I like it, but it is grammatically more
correct.
ariff [Fri, 16 Mar 2007 17:19:03 +0000 (17:19 +0000)]
[stage: 9/9]
- SWAPLR quirk for (unknown, luckily it is mine) broken uaudio stick.
Fixing by rewiring is impossible without damaging it. Luckily,
we can fix it using "other" methods :) .
- Add uaudio_get_vendor(), _product() and _release() in uaudio.c
(currently used by uaudio_pcm quirk).
- Implement CHANNEL_SETFRAGMENTS().
- Drop channel locking in few places where it is about to sleep
somewhere. This should help eliminating illegal locking acquisition
where the current thread is about to sleep, and also few deadlock
cases. Dropping it right here is quite safe since it is already
protected by CHN_F_BUSY flag and other threads won't bother to touch it.
Solving other illegal locking issues are quite tricky without converting
most usbd_do_request() calls to its equivalent _async() calls,
which I intend to do it later after getting full test report from
other people with different uaudio hardwares.
- Fix memory leak issues during detach. This seems common to any drivers
(notably emu10kx, csapcm?) with bridge functions.
ariff [Fri, 16 Mar 2007 17:18:17 +0000 (17:18 +0000)]
[stage: 8/9]
Implement CHANNEL_SETFRAGMENTS() for snd_atiixp, snd_es137x, snd_hda
and snd_via8233. CHANNEL_SETBLOCKSIZE() will basically call
CHANNEL_SETFRAGMENTS() internally using conservative blocksize /
blockcount hints. Other drivers will be converted later.
ariff [Fri, 16 Mar 2007 17:16:56 +0000 (17:16 +0000)]
[stage: 6/9]
- Disable stray buffer management, since sample size aligned buffering
are pretty much guaranteed through out the entire feeder_* chain
processes.
- Few style(9) cleanups.
ariff [Fri, 16 Mar 2007 17:16:24 +0000 (17:16 +0000)]
[stage: 5/9]
channel.c/channel_if.m:
- Macros cleanups, prefer inlined min() over MIN().
- Rework chn_read()/chn_write() for better dead interrupt detection
policy. Reduce scheduling overhead by doing pure 5 seconds sleep
before giving up, instead of several cycle of brute micro sleeping.
- Avoid calling wakeup_one() for non-sleeping channel (for example,
vchan parent channel).
- EWOULDBLOCK -> EAGAIN.
- Fix possible divide-by-zero panic on chn_sync().
- Re-enforce ^2 blocksize policy, since there are too many broken
userland apps that blindly assume it without even trying to do
serious calculations.
- New channel method - CHANNEL_SETFRAGMENTS(), a refined version of
CHANNEL_SETBLOCKSIZE(). It accept _both_ blocksize and blockcount
arguments, so the driver internals will have better hints for
buffering and timing calculations.
- Hook FEEDER_SWAPLR into feederchain building process.
feeder_fmt.c:
- Unified version of various filters, avoiding duplications.
- malloc()less feeder_fmt. Informations can be retrieved dynamically
by doing table lookup on static data. For cases such as converting
from stereo to mono or reducing bit depth where input data is larger
than output, cycle remaining available free space until it has been
exhausted and start kicking 8 bytes reservoir space from there to
complete the remaining requested count.
- Introduce FEEDER_SWAPLR. Few super broken hardwares (found on several
extremely cheap uaudio stick, possibly others) mistakenly wired left
and right channels wrongly, screwing output or input.
ariff [Fri, 16 Mar 2007 17:15:33 +0000 (17:15 +0000)]
[stage: 4/9]
- Rearrange FEEDER_* constants starting from 0 to 31, so the future
additions will be much easier and consistent.
- Introduce FEEDER_SWAPLR. Few super broken hardwares (found on several
extremely cheap uaudio stick, possibly others) mistakenly wired left
and right channels wrongly, screwing output or input.
ariff [Fri, 16 Mar 2007 17:14:41 +0000 (17:14 +0000)]
[stage: 3.2/9]
malloc()less feeder_vchan. Informations can be retrieved dynamically
by doing table lookup on static data. Reduce mixing overhead by
doing direct copy on first channel. Mixing process will begin starting
from second channel onwards.
ariff [Fri, 16 Mar 2007 17:14:19 +0000 (17:14 +0000)]
[stage: 3.1/9]
malloc()less feeder_volume. Informations can be retrieved dynamically
by doing table lookup on static data. Increase resolution from 6bit
to PCM_FXSHIFT (8bit) for better resolution and finer volume changes.
ariff [Fri, 16 Mar 2007 17:13:12 +0000 (17:13 +0000)]
[stage: 1/9]
- Convert sx lock to plain mutex. Since the access of /dev/sndstat
is pretty much exclusive and protected by toggling sndstat_isopen,
plain mutex is more than enough.
- Enable SBUF_AUTOEXTEND to avoid buffer truncation.
delphij [Fri, 16 Mar 2007 03:50:53 +0000 (03:50 +0000)]
Mention a limitation that was inherted from RFC1952, making
it impossible to obtain correct file size from a file that
is larger than 4GB before compression.
pjd [Fri, 16 Mar 2007 03:23:32 +0000 (03:23 +0000)]
Pass special device to the ufs_disk_fillout() function, instead of mount
point path. This way we properly handle the case when file system listed
in /etc/fstab was unmounted and another file system was mounted on the
same mount point.
pjd [Fri, 16 Mar 2007 03:13:28 +0000 (03:13 +0000)]
The ufs_disk_fillout(3) can take special device name (with or without /dev/
prefix) as an argument and mount point path. At the end it has to find
device name file system is stored on, which means when mount point path is
given, it tries to look into /etc/fstab and find special device
corresponding to the given mount point. This is not perfect, because it
doesn't handle the case when file system is mounted by hand and mount point
is given as an argument.
I found this problem while trying to use snapinfo(8), which passes mount
points to the ufs_disk_fillout(3) function, but I had file system mounted
manually, so snapinfo(8) was exiting with the error below:
ufs_disk_fillout: No such file or directory
I modified libufs(3) to handle those arguments (the order is important):
1. special device with /dev/ prefix
2. special device without /dev/ prefix
3. mount point listed in /etc/fstab, directory exists
4. mount point listed in /etc/fstab, directory doesn't exist
5. mount point of a file system mounted by hand