John Baldwin [Mon, 6 Jul 2009 18:23:00 +0000 (18:23 +0000)]
After the per-CPU IDT changes, the IDT vector of an interrupt could change
when the interrupt was moved from one CPU to another. If the interrupt was
enabled, then the old IDT vector needs to be disabled and the new IDT vector
needs to be enabled. This was mostly masked prior to the recent MSI changes
since in the older code almost all allocated IDT vectors were already enabled
and the enabled vectors on the BSP during boot covered enough of the IDT
range. However, after the MSI changes, MSI interrupts that were allocated
but not enabled (e.g. DRM with MSI) during boot could result in an allocated
IDT vector that wasn't enabled. The round-robin at the end of boot could
place another interrupt at the same IDT vector without enabling the IDT
vector causing trap 30 faults.
Fix this by explicitly disabling/enabling the old and new IDT vectors for
enabled interrupt sources when moving an interrupt between CPUs via the
pic_assign_cpu() method. While here, fix a bug in my earlier changes so
that an I/O APIC interrupt pin is left unchanged if ioapic_assign_cpu()
fails to allocate a new IDT vector and returns ENOSPC.
Jack F Vogel [Mon, 6 Jul 2009 17:23:48 +0000 (17:23 +0000)]
The new method of reading the mac address from the
RAR(0) register does not work on this old adapter,
provide a local routine that does it the older way.
- pkg_install is maintained by portmgr.
- BSD.x11{,-4}.dist aren't used anymore and BSD.local.dist now lives
in ports/Templates/. Most people apparently missed that move and still
commit to the src copy, so I'll have to remove it eventually but for
now, the MAINTAINERS line can go.
In the current code, rdlock_count is not correctly handled for some cases.
The most notable is that it is not bumped in rwlock_rdlock_common() when
the hard path (__thr_rwlock_rdlock()) returns successfully.
This can lead to deadlocks in libthr when rwlocks recursion in read mode
happens.
Fix the interested parts by correctly handling rdlock_count.
PR: threads/136345
Reported by: rink
Tested by: rink
Reviewed by: jeff
Approved by: re (kib)
MFC: 2 weeks
Tim Kientzle [Mon, 6 Jul 2009 02:02:45 +0000 (02:02 +0000)]
This addresses some issues with my earlier -R fix that
were pointed out by Brooks Davis and Alexey Dokuchaev:
* It now tries to lookup arguments as names first, then tries
to parse them as numbers. In particular, this makes the
behavior consistent with POSIX conventions when usernames
consist entirely of digits.
* It now uses strtoul() for the numeric parsing.
Finally, I've included an update to the test harness
to exercise the new numeric cases for -R.
Alan Cox [Sun, 5 Jul 2009 21:40:21 +0000 (21:40 +0000)]
PAE adds another level to the i386 page table. This level is a small
4-entry table that must be located within the first 4GB of RAM. This
requirement is met by defining an UMA zone with a custom back-end
allocator function. This revision makes two changes to this back-end
allocator function: (1) It replaces the use of contigmalloc() with the
use of kmem_alloc_contig(). This eliminates "double accounting", i.e.,
accounting by both the UMA zone and malloc tags. (I made the same
change for the same reason to the zones supporting jumbo frames a week
ago.) (2) It passes through the "wait" parameter, i.e., M_WAITOK,
M_ZERO, etc. to kmem_alloc_contig() rather than ignoring it.
pmap_init() calls uma_zalloc() with both M_WAITOK and M_ZERO. At the
moment, this is harmless only because the default behavior of
contigmalloc()/kmem_alloc_contig() is to wait and because pmap_init()
doesn't really depend on the memory being zeroed.
The back-end allocator function in the Xen pmap is dead code. I am
changing it nonetheless because I don't want to leave any "bad examples"
in the source tree for someone to copy at a later date.
Sam Leffler [Sun, 5 Jul 2009 18:17:37 +0000 (18:17 +0000)]
Add ieee80211_ageq; a facility for staging packets that require
long-term work before they can be serviced. Packets are tagged and
assigned an age (in seconds) at the point they are added to the
queue. If a packet is not retrieved before it's age expires it is
reclaimed. Tagging can take two forms: a reference to an ieee80211_node
(as happens in the tx path) or an opaque token in cases where there
is no reference or the node structure is not stable (i.e. it's going
to be destroyed).
o add ic_stageq to replace the per-node wds staging queue used for
dynamic wds
o add ieee80211_mac_hash for building ageq tokens; this computes a
32-bit hash from an 802.11 mac address (copied from the bridge)
o while here fix a stray ';' noticed in IEEE80211_PSQ_INIT
Ariff Abdullah [Sun, 5 Jul 2009 18:15:06 +0000 (18:15 +0000)]
- Increase dynamic range of filter coefficients from 28bit to 30bit.
This cause dramatic effect in overall precision and conversion quality
by pushing down most aliasing artifacts around -180 dB.
- Guard against possible 64bit overflow during accumulation process by
slightly normalize and saturate sample and coefficient multiplication,
possible during extreme 32bit downsampling (eg. 380KHz -> 8KHz) with
custom preset that require more than ~7000 taps filter (which is
overkill).
- Add knobs through FEEDER_RATE_PRESETS to set dynamic range of filter
coefficients/accumulator and prefered polynomial interpolator:
COEFFICIENT_BIT:X
(where 1 <= X <= 30, default: 30)
ACCUMULATOR_BIT:X
(where 32 <= X <=64, default: 58)
Sam Leffler [Sun, 5 Jul 2009 17:59:19 +0000 (17:59 +0000)]
Revamp 802.11 action frame handling:
o add a new facility for components to register send+recv handlers
o ieee80211_send_action and ieee80211_recv_action now use the registered
handlers to dispatch operations
o rev ieee80211_send_action api to enable passing arbitrary data
o rev ieee80211_recv_action api to pass the 802.11 frame header as it may
be difficult to locate
o update existing IEEE80211_ACTION_CAT_BA and IEEE80211_ACTION_CAT_HT handling
o update mwl for api rev
Sam Leffler [Sun, 5 Jul 2009 17:45:48 +0000 (17:45 +0000)]
Cleanup ALIGNED_POINTER:
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent
Reviewed by: bde, imp, marcel
Approved by: re (kensmith)
Add a new options (-s) that, when specified, skips the question about
adjusting the clock to UTC.
That avoids to write on /etc/wall_cmos_clock which is useful in some
cases (example: host user in a jail).
Sponsored by: Sandvine Incorporated
Initially submitted by: Matt Koivisto <mkoivisto at sandvine dot com>
Approved by: re (kib)
When forking a vm space that has wired map entries, do not forget to
charge the objects created by vm_fault_copy_entry. The object charge
was set, but reserve not incremented.
Rui Paulo [Fri, 3 Jul 2009 21:12:37 +0000 (21:12 +0000)]
acpi_hp.c:
- sysctl dev.acpi_hp.0.verbose to toggle debug output
- A modification so this can deal with different array lengths
when reading the CMI BIOS - now it works ok on HP Compaq nx7300
as well.
- Change behaviour to query only max_instance-1 CMI BIOS instances,
because all HPs seen so far are broken in that respect
(or there is a fundamental misunderstanding on my side, possible
as well). This way a disturbing ACPI Error Field exceeds Buffer
message is avoided.
- New bit to set on dev.acpi_hp.0.cmi_detail (0x8) to
also query the highest guid instance of CMI bios
acpi_hp.4:
- Document dev.acpi_hp.0.verbose sysctl in man page
- Document new bit for dev.acpi_hp.0.cmi_detail
- Add a section to manpage about hardware that has been reported
to work ok
Submitted by: Michael Gmelin <freebsdusb at bindone.de>
Approved by: re (kib)
MFC after: 2 weeks
Tim Kientzle [Fri, 3 Jul 2009 17:54:33 +0000 (17:54 +0000)]
This fixes bsdcpio's -R option to accept numeric
user or group Ids as well as user or group names.
In particular, this fixes freesbie2, which uses
-R 0:0 to copy a bunch of files so that the result
will be owned by root.
Also fixes a related bug that mixed-up the uid
and gid specified by -R when in passthrough mode.
Thanks to Dominique Goncalves for reporting this
regression.
In vn_vget_ino() and their inline equivalents, mnt_ref() the mount point
around the sequence that drop vnode lock and then busies the mount point.
Not having vlocked node or direct reference to the mp allows for the
forced unmount to proceed, making mp unmounted or reused.
Tested by: pho
Reviewed by: jeff
Approved by: re (kensmith)
MFC after: 2 weeks
Ariff Abdullah [Thu, 2 Jul 2009 10:02:10 +0000 (10:02 +0000)]
Slightly increase amount of bandwidth of resampling filter for
feeder_rate_quality=3. This have the benefit of reducing aliasing
artifacts due to alias masking.
Spectrogram analysis:
o Old preset (100:36:0.90)
http://people.freebsd.org/~ariff/z_comparison/z_q3_old.png
o New preset (100:36:0.92):
http://people.freebsd.org/~ariff/z_comparison/z_q3_new.png
Robert Watson [Thu, 2 Jul 2009 09:15:30 +0000 (09:15 +0000)]
Clean up a number of aspects of token generation from audit arguments to
system calls:
- Centralize generation of argument tokens for VM addresses in a macro,
ADDR_TOKEN(), and properly encode 64-bit addresses in 64-bit arguments.
- Fix up argument numbers across a large number of syscalls so that they
match the numeric argument into the system call.
- Don't audit the address argument to ioctl(2) or ptrace(2), but do keep
generating tokens for mmap(2), minherit(2), since they relate to passing
object access across execve(2).
Jeff Roberson [Wed, 1 Jul 2009 20:43:46 +0000 (20:43 +0000)]
- Use fd_lastfile + 1 as the upper bound on nd. This is more correct than
using the size of the descriptor array.
- A lock is not needed to fetch fd_lastfile. The results are stale the
instant it is dropped.
- Use a private mutex pool for select since the pool mutex is not used
as a leaf.
- Fetch the si_mtx pointer first before resorting to hashing to compute
the mutex address.
Fix a panic which (reportedly) can happen when unmounting a filesystem
with I/O requests in flight on kernels compiled with "options INVARIANTS".
Also, make it obvious it's not right to call g_valid_obj() (and macros
using it, e.g. G_VALID_CONSUMER()) without topology lock held.
DPCPU area was not properly mapped into kernel VA space, which caused page
fault on the first DPCPU access. This patch fixes the problem by mapping DPCPU
area into kernel VA space.
Submitted by: Michal Hajduk, Piotr Ziecik
Reviewed by: cognet, stas
Approved by: re (kib)
Obtained from: Semihalf
John Baldwin [Wed, 1 Jul 2009 17:20:07 +0000 (17:20 +0000)]
Improve the handling of cpuset with interrupts.
- For x86, change the interrupt source method to assign an interrupt source
to a specific CPU to return an error value instead of void, thus allowing
it to fail.
- If moving an interrupt to a CPU fails due to a lack of IDT vectors in the
destination CPU, fail the request with ENOSPC rather than panicing.
- For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used
on the first interrupt in a group. Moving the first interrupt in a group
moves the entire group.
- Use the icu_lock to protect intr_next_cpu() on x86 instead of the
intr_table_lock to fix a LOR introduced in the last set of MSI changes.
- Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with
interrupts. Previously, binding an interrupt to a CPU only performed a
privilege check if the interrupt had an interrupt thread. Interrupts
without a thread could be bound by non-root users as a result.
- If an interrupt event's assign_cpu method fails, then restore the original
cpuset mask for the associated interrupt thread.
Robert Watson [Wed, 1 Jul 2009 16:56:56 +0000 (16:56 +0000)]
When auditing unmount(2), capture FSID arguments as regular text strings
rather than as paths, which would lead to them being treated as relative
pathnames and hence confusingly converted into absolute pathnames.
Capture flags to unmount(2) via an argument token.
Approved by: re (audit argument blanket)
MFC after: 3 days
Rick Macklem [Wed, 1 Jul 2009 16:42:03 +0000 (16:42 +0000)]
When unmounting an NFS mount using sec=krb5[ip], the umount system
call could get hung sleeping on "gsssta" if the credentials for a user
that had been accessing the mount point have expired. This happened
because rpc_gss_destroy_context() would end up calling itself when the
"destroy context" RPC was attempted, trying to refresh the credentials.
This patch just checks for this case in rpc_gss_refresh() and returns
without attempting the refresh, which avoids the recursive call to
rpc_gss_destroy_context() and the subsequent hang.
Reviewed by: dfr
Approved by: re (Ken Smith), kib (mentor)
Rick Macklem [Wed, 1 Jul 2009 16:38:18 +0000 (16:38 +0000)]
Make sure that cr_error is set to ESHUTDOWN when closing the connection.
This is normally done by a loop in clnt_dg_close(), but requests that aren't
in the pending queue at the time of closing, don't get set. This avoids a
panic in xdrmbuf_create() when it is called with a NULL cr_mrep if
cr_error doesn't get set to ESHUTDOWN while closing.
Reviewed by: dfr
Approved by: re (Ken Smith), kib (mentor)
With NFSv4 ACLs, it is possible that applying a mode to an ACL which
is identical to the mode computed from that ACL will modify the ACL.
For example, mode computed from the following ACL is 0600:
In chmod(1) utility, there is an optimisation, which makes it not
call chmod(2) if the mode of the file is the same as the new mode.
Disable that optimisation for files which may have NFSv4 ACLs.
Stanislav Sedov [Wed, 1 Jul 2009 13:07:02 +0000 (13:07 +0000)]
- Fix the bug in write(2) called with incorrect parameters resulting in writes
always started from the start of the packet.
- Fix usage string (multiple addresses can be specified).
- Make the source more style(9) compliant.
- Improve error reporting (do not silently fail if something goes
wrong).
- Make functions static.
- Use warns level 6.
Approved by: re (kib)
Discussed with: Marc Balmer <marc@msys.ch>, brian, mbr
Ed Maste [Tue, 30 Jun 2009 13:38:49 +0000 (13:38 +0000)]
Add FIONSPACE from NetBSD. FIONSPACE is provided so that programs may
easily determine how much space is left in the send queue; they do not
need to know the send queue size.
Stanislav Sedov [Tue, 30 Jun 2009 12:35:47 +0000 (12:35 +0000)]
- Add support to atomically set/clear individual bits of a MSR register
via cpuctl(4) driver. Two new CPUCTL_MSRSBIT and CPUCTL_MSRCBIT ioctl(2)
calls treat the data field of the argument struct passed as a mask
and set/clear bits of the MSR register according to the mask value.
- Allow user to perform atomic bitwise AND and OR operaions on MSR registers
via cpucontrol(8) utility. Two new operations ("&=" and "|=") have been
added. The first one applies bitwise AND operaion between the current
contents of the MSR register and the mask, and the second performs bitwise
OR. The argument can be optionally prefixed with "~" inversion operator.
This allows one to mimic the "clear bit" behavior by using the command
like this:
cpucontrol -m 0x10&=~0x02 # clear the second bit of TSC MSR
Inversion operator support in all modes (assignment, OR, AND).
For SU mounts, softdep_fsync() might drop vnode lock, allowing other
threads to put dirty buffers on the vnode bufobj list. For regular files
and synchronous fsync requests, check for the condition and restart the
fsync vop if a new dirty buffer arrived.
Softdep_fsync() may need to lock parent directory of the synced vnode.
Use inlined (due to FFSV_FORCEINSMQ) version of vn_vget_ino() to prevent
mountpoint from being unmounted and freed while no vnodes are locked.
Rui Paulo [Tue, 30 Jun 2009 09:51:41 +0000 (09:51 +0000)]
acpi_wmi_if:
- Document different semantics for ACPI_WMI_PROVIDES_GUID_STRING_METHOD
acpi_wmi.c:
- Modify acpi_wmi_provides_guid_string_method to return absolut number of
instances known for the given GUID.
acpi_hp.c:
- sysctl dev.acpi_hp.0.verbose to toggle debug output
- A modification so this can deal with different array lengths
when reading the CMI BIOS - now it works ok on HP Compaq nx7300
as well.
- Change behaviour to query only max_instance-1 CMI BIOS instances,
because all HPs seen so far are broken in that respect
(or there is a fundamental misunderstanding on my side, possible
as well). This way a disturbing ACPI Error Field exceeds Buffer
message is avoided.
- New bit to set on dev.acpi_hp.0.cmi_detail (0x8) to
also query the highest guid instance of CMI bios
acpi_hp.4:
- Document dev.acpi_hp.0.verbose sysctl in man page
- Document new bit for dev.acpi_hp.0.cmi_detail
- Add a section to manpage about hardware that has been reported
to work ok
Submitted by: Michael Gmelin, freebsdusb at bindone.de
Approved by: re (kib)
MFC after: 2 weeks
Bjoern A. Zeeb [Tue, 30 Jun 2009 05:21:00 +0000 (05:21 +0000)]
In case we cannot queue a packet reaching the queue limit, retain the
semantics netisr_queue() always had and free the mbuf along with
returning the error.
Stacey Son [Mon, 29 Jun 2009 20:19:19 +0000 (20:19 +0000)]
Dynamically allocate the gidset field in audit record.
This fixes a problem created by the recent change that allows a large
number of groups per user. The gidset field in struct kaudit_record
is now dynamically allocated to the size needed rather than statically
(using NGROUPS).
Sam Leffler [Mon, 29 Jun 2009 18:42:54 +0000 (18:42 +0000)]
Update to 3.6.2.2 firmware (latest w/o host-based power save support):
o new tx ack queue (not used right now)
o proxy-sta related changes (no proxy sta in driver)
o explicit dwds ena/dis (needed only with proxy sta)
o cleanup BA policy handling
o new ampdu aggressive mode support
o CFEnd use now controllable
Jack F Vogel [Mon, 29 Jun 2009 18:17:10 +0000 (18:17 +0000)]
Type problem when FreeBSD is in a virtualized environment, the
result was when the RX index wrapped it was converted into some
sort of gibberish and written into the RDT register, effectively
killing the RX side of the thing :)
Attilio Rao [Mon, 29 Jun 2009 16:03:18 +0000 (16:03 +0000)]
Don't assume a default (currently 15) value for preloaded klds when
loading hwpmc, but calculate at runtime and allocate the necessary space.
Also the current logic is wrong as it can lead to an endless loop.
Sponsored by: Sandvine Incorporated
Reported by: Ryan Stone <rstone at sandvine dot com>
Tested by: Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
Approved by: re (kib)
Pyun YongHyeon [Mon, 29 Jun 2009 05:12:21 +0000 (05:12 +0000)]
Disable Rx checksum offload until I find more clue why it breaks
under certain environments. However give users chance to override
it when he/she surely knows his/her hardware works with Rx checksum
offload.
Reported by: Ulrich Spoerlein ( uqs <> spoerlein dot net )
MFC after: 1 week
Approved by: re (kensmith)
Alexander Kabaev [Sun, 28 Jun 2009 23:51:39 +0000 (23:51 +0000)]
Eliminate .text relocations in shared libraries compiled with stack protector.
Use libssp_nonshared library to pull __stack_chk_fail_local symbol into
each library that needs it instead of pulling it from libc. GCC generates
local calls to this function which result in absolute relocations put into
position-independent code segment, making dynamic loader do extra work everys
time given shared library is being relocated and making affected text pages
non-shareable.
Marius Strobl [Sun, 28 Jun 2009 22:42:51 +0000 (22:42 +0000)]
- Work around the broken loader behavior of not demapping no longer
used kernel TLB slots when unloading the kernel or modules, which
results in havoc when loading a kernel and modules which take up
less TLB slots afterwards as the unused but locked ones aren't
accounted for in virtual_avail. Eventually this should be fixed
in the loader which isn't straight forward though and the kernel
should be robust against this anyway. [1]
- Ensure that the addresses allocated directly from phys_avail[] by
pmap_bootstrap_alloc() are always colored properly. This implicit
assumption was broken in r194784 as unlike the other consumers the
DPCPU area allocated for the BSP isn't a multiple of PAGE_SIZE *
DCACHE_COLORS. [2]
- Remove the no longer used global msgbuf_phys.
- Remove the redundant ekva parameter of pmap_bootstrap_alloc().
- Correct some outdated function names in ktr(9) invocations.
Stanislav Sedov [Sun, 28 Jun 2009 21:49:43 +0000 (21:49 +0000)]
- Turn the third (islocked) argument of the knote call into flags parameter.
Introduce the new flag KNF_NOKQLOCK to allow event callers to be called
without KQ_LOCK mtx held.
- Modify VFS knote calls to always use KNF_NOKQLOCK flag. This is required
for ZFS as its getattr implementation may sleep.
There are a number of ways an application can check if there are
inbound data waiting on a filedescriptor, such as a pipe or a socket,
for instance by using select(2), poll(2), kqueue(2), ioctl(FIONREAD)
etc.
But we have no way of finding out if written data have yet to be
disposed of, for instance, transmitted (and ack'ed!) to some remote
host, or read by the applicantion at the far end of the pipe.
The closest we get, is calling shutdown(2) on a TCP socket in
non-blocking mode, but this has the undesirable sideeffect of
preventing future communication.
Add a complement to FIONREAD, called FIONWRITE, which returns the
number of bytes not yet properly disposed of. Implement it for
all sockets.
Background:
A HTTP server will want to time out connections, if no new request
arrives within a certain period after the last transmitted response
has actually been sent (and ack'ed).
For a busy HTTP server, this timeout can be subsecond duration.
In order to signal to a load-balancer that the connection is truly
dead, TCP_RST will be the preferred method, as this avoids the need
for a RTT delay for FIN handshaking, with a client which, surprisingly
often, no longer at the remote IP number.
If a slow, distant client is being served a response which is big
enough to fill the window, but small enough to fit in the socket
buffer, the write(2) call will return immediately.
If the session timeout is armed at that time, all bytes in the
response may not have been transmitted by the time it fires.
FIONWRITE allows the timeout to check that no data is outstanding
on the connection, before it TCP_RST's it.
Marc Fonvieille [Sun, 28 Jun 2009 08:59:46 +0000 (08:59 +0000)]
- release/* update to use freebsd-doc-* packages instead of building
FreeBSD docset during 'make release' this will speed up release
builds;
- sysinstall(8) has also been updated to use these packages with a new
menu allowing people to choose what localized doc to install;
- mention in UPDATING that docs from the FreeBSD Documentation project
are now installed in /usr/local/share/doc/freebsd instead of
/usr/share/doc.
Alan Cox [Sat, 27 Jun 2009 21:37:36 +0000 (21:37 +0000)]
Correct a long-standing performance bug in cluster_rbuild(). Specifically,
in the case of a file system with a block size that is less than the page
size, cluster_rbuild() looks at too many of the page's valid bits.
Consequently, it may terminate prematurely, resulting in poor performance.
Reported by: bde
Reviewed by: tegge
Approved by: re (kib)
Sam Leffler [Sat, 27 Jun 2009 20:06:56 +0000 (20:06 +0000)]
Add HAL_RX_FILTER_BSSID support (to disable bssid match):
o add HAL_CAP_BSSIDMATCH to identify parts that have the support for
disabling bssid match
o honor capability for set/get rx filter
o use HAL_CAP_BSSIDMATCH in driver to decide whether to use the bssid
match disable or fall back to promisc mode