Robert Watson [Tue, 6 Oct 2009 20:35:41 +0000 (20:35 +0000)]
Remove tcp_input lock statistics; these are intended for debugging only
and are not intended to ship in 8.0 as they dirty additional cache
lines in a performance-critical per-packet path.
Roman Divacky [Tue, 6 Oct 2009 20:19:16 +0000 (20:19 +0000)]
Fix tcsh losing history when tcsh terminates because the pty beneath it
is closed.
Diagnosed by Ted Anderson:
New signal queuing logic was introduced in 6.15 and allows the signal handlers
to be run explicitly by calling handle_pending_signals, instead of
immediately when the signal is delivered. This function is called at
various places, typically when receiving a EINTR from a slow system call
such as read or write. In the pty exit case, it was called from xwrite,
called from flush, while printing the "exit" message after receiving EOF
when reading from the pty (note that the read did not return EINTR but
zero bytes, indicating EOF). The SIGHUP handler, phup(), called
rechist, which opened the history file and began writing the merged
history to it. This process invoked flush recursively to actually write
the data. In this case, however, the flush noticed it was being called
recursively and decided fail by calling stderror.
My conclusion was that the signal was being handled at a bad time. But
whether to fix flush not to care about the recursive call, or to handle
the signal some other time and when to handle it, was unclear to me.
However, by adding an extra call to handle_pending_signals, just after
process() returns to main(), I was able to avoid the truncated history
after network outages and similar failures. I verified this fix in
version 6.17.
Robert Watson [Tue, 6 Oct 2009 17:14:39 +0000 (17:14 +0000)]
In rtld's map_object(), use pread(..., 0) rather than read() to read the
ELF header from the front of the file. As all other I/O on the binary
is done using mmap(), this avoids the need for seek privileges on the
file descriptor during run-time linking.
Robert Watson [Tue, 6 Oct 2009 14:05:57 +0000 (14:05 +0000)]
Add basename_r(3) to complement basename(3). basename_r(3) which accepts
a caller-allocated buffer of at least MAXPATHLEN, rather than using a
global buffer.
Attilio Rao [Tue, 6 Oct 2009 13:45:49 +0000 (13:45 +0000)]
Per their definition, atomic instructions used in conjuction with
memory barriers should also ensure that the compiler doesn't reorder paths
where they are used. GCC, however, does that aggressively, even in
presence of volatile operands. The most reliable way GCC offers for avoid
instructions reordering is clobbering "memory" even if that is
theoretically an heavy-weight operation, flushing the content of all
the registers and forcing reload of them (We could rely, however, on
gcc DTRT by just understanding the purpose as this is a well-known
pattern for many modern operating-systems).
Not all our memory barriers, right now, clobber memory for GCC-like
compilers. The most notable cases are IA32 and amd64 where the memory
barrier are treacted the same as normal atomic instructions.
Fix this by offering the possibility to implement atomic instructions
with memory barriers separately from the normal version and implement
the GCC-like specific one using memory clobbering.
Thanks to Chris Lattner (@apple) for his discussion on llvm specifics.
Reported by: jhb
Reviewed by: jhb
Tested by: rdivacky, Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
Robert Watson [Mon, 5 Oct 2009 22:24:13 +0000 (22:24 +0000)]
In tcp_input(), we acquire a global write lock at first only if a
segment is likely to trigger a TCP state change (i.e., FIN/RST/SYN).
If we later have to upgrade the lock, we acquire an inpcb reference
and drop both global/inpcb locks before reacquiring in-order. In
that gap, the connection may transition into TIMEWAIT, so we need
to loop back and reevaluate the inpcb after relocking.
MFC after: 3 days
Reported by: Kamigishi Rei <spambox at haruhiism.net>
Reviewed by: bz
Xin LI [Mon, 5 Oct 2009 21:11:04 +0000 (21:11 +0000)]
fts_open() requires that the list passed as argument to contain at least
one path. When the list is empty (contain only a NULL pointer), return
EINVAL instead of pretending to succeed, which will cause a NULL pointer
deference in a later fts_read() call.
Noticed by: Christoph Mallon (via rdivacky@)
MFC after: 2 weeks
Fix NFSv4 ACLs on sparc64. Turns out that fuword(9) fetches 64 bits
instead of sizeof(int), and on sparc64 that resulted in fetching wrong
value for acl_maxcnt, which in turn caused __acl_get_link(2) to fail
with EINVAL.
PR: sparc64/139304
Submitted by: Dmitry Afanasiev <KOT at MATPOCKuH.Ru>
Jung-uk Kim [Mon, 5 Oct 2009 16:26:54 +0000 (16:26 +0000)]
- Revert r191568 partially. Forcing AHCI mode by changing device subclass
and progif is evil. It doesn't work reliably[1] and we should honor BIOS
configuration by the user.
- If the SATA controller is enbled but combined mode is disabled, mask off
the emulated IDE channel on the legacy IDE controller.
Robert Watson [Mon, 5 Oct 2009 14:49:16 +0000 (14:49 +0000)]
First cut at implementing SOCK_SEQPACKET support for UNIX (local) domain
sockets. This allows for reliable bi-directional datagram communication
over UNIX domain sockets, in contrast to SOCK_DGRAM (M:N, unreliable) or
SOCK_STERAM (bi-directional bytestream). Largely, this reuses existing
UNIX domain socket code. This allows applications requiring record-
oriented semantics to do so reliably via local IPC.
Some implementation notes (also present in XXX comments):
- Currently we lack an sbappend variant able to do datagrams and control
data without doing addresses, so we mark SOCK_SEQPACKET as PR_ADDR.
Adding a new variant will solve this problem.
- UNIX domain sockets on FreeBSD provide back-pressure/flow control
notification for stream sockets by manipulating the send socket
buffer's size during pru_send and pru_rcvd. This trick works less well
for SOCK_SEQPACKET as sosend_generic() uses sb_hiwat not just to
manage blocking, but also to determine maximum datagram size. Fixing
this requires rethinking how back-pressure is done for SOCK_SEQPACKET;
in the mean time, it's possible to get EMSGSIZE when buffers fill,
instead of blocking.
John Baldwin [Mon, 5 Oct 2009 14:13:16 +0000 (14:13 +0000)]
When the timeout backoff hits the maximum value, leave it capped at the
maximum value rather than setting it to the result of a boolean expression
that is always true.
Stanislav Sedov [Mon, 5 Oct 2009 10:08:58 +0000 (10:08 +0000)]
- Drop unused pmap_use_l1 function and comment out currently unused
pmap_dcache_wbinv_all/pmap_copy_page functions which we might want
to take advatage of later. This fixes the build with PMAP_DEBUG
defined.
Edwin Groothuis [Mon, 5 Oct 2009 07:13:15 +0000 (07:13 +0000)]
Modified locale(1) to be able to show the altmon_X fields and the [cxX]_fmt's.
Also modify the "-k list" option to display only fields with a certain prefix.
Edwin Groothuis [Mon, 5 Oct 2009 07:11:19 +0000 (07:11 +0000)]
Modified locale(1) to be able to show the altmon_X fields and the [cxX]_fmt's.
Also modify the "-k list" option to display only fields with a certain prefix.
Matt Jacob [Mon, 5 Oct 2009 01:31:16 +0000 (01:31 +0000)]
The cylinder group tag cg_initediblk needs to match the number of inodes
actually initialized. In the growfs case for UFS2, no inodes were actually
being initialized and the number of inodes noted as initialized was the
number of inodes per group. This created a filesystem that was deemed
corrupted because the inodes thus added were full of garbage.
David Schultz [Sun, 4 Oct 2009 19:43:36 +0000 (19:43 +0000)]
Better glibc compatibility for getline/getdelim:
- Tolerate applications that pass a NULL pointer for the buffer and
claim that the capacity of the buffer is nonzero.
- If an application passes in a non-NULL buffer pointer and claims the
buffer has zero capacity, we should free (well, realloc) it
anyway. It could have been obtained from malloc(0), so failing to
free it would be a small memory leak.
Attilio Rao [Sat, 3 Oct 2009 15:02:55 +0000 (15:02 +0000)]
When releasing a lockmgr held in shared way we need to use a write memory
barrier in order to avoid, on architectures which doesn't have strong
ordered writes, CPU instructions reordering.
Bjoern A. Zeeb [Sat, 3 Oct 2009 11:57:21 +0000 (11:57 +0000)]
Make sure that the primary native brandinfo always gets added
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.
The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.
Fix RTS/CTS flow control, broken by the TTY overhaul. The new TTY
interface is fairly simple WRT dealing with flow control, but
needed 2 new RX buffer functions with "get-char-from-buf" separated
from "advance-buf-pointer" so that the pointer could be advanced
only when ttydisc_rint() succeeded.
Bjoern A. Zeeb [Fri, 2 Oct 2009 17:48:51 +0000 (17:48 +0000)]
Add a mitigation feature that will prevent user mappings at
virtual address 0, limiting the ability to convert a kernel
NULL pointer dereference into a privilege escalation attack.
If the sysctl is set to 0 a newly started process will not be able
to map anything in the address range of the first page (0 to PAGE_SIZE).
This is the default. Already running processes are not affected by this.
You can either change the sysctl or the tunable from loader in case
you need to map at a virtual address of 0, for example when running
any of the extinct species of a set of a.out binaries, vm86 emulation, ..
In that case set security.bsd.map_at_zero="1".
Superseeds: r197537
In collaboration with: jhb, kib, alc
Hiroki Sato [Fri, 2 Oct 2009 07:00:20 +0000 (07:00 +0000)]
Enable adding a link-local address even if ND6_IFF_IFDISABLED.
Note that when the interface has ND6_IFF_IFDISABLED, a newly-added
address is always marked as IN6_IFF_TENTATIVE so that the interface
can perform DAD after the ND6_IFF_IFDISABLED is cleared.
Hiroki Sato [Fri, 2 Oct 2009 06:19:34 +0000 (06:19 +0000)]
Revert the previous afexists() change. Knobs configured explicitly by
the user should not be ignored if possible even if the kernel does not
support the prerequisite feature.
Qing Li [Fri, 2 Oct 2009 01:45:11 +0000 (01:45 +0000)]
Remove a log message from production code. This log message can be
triggered by a misconfigured host that is sending out gratuious ARPs.
This log message can also be triggered during a network renumbering
event when multiple prefixes co-exist on a single network segment.
Qing Li [Fri, 2 Oct 2009 01:34:55 +0000 (01:34 +0000)]
Previously, if an address alias is configured on an interface, and
this address alias has a prefix matching that of another address
configured on the same interface, then the ARP entry for the alias
is not deleted from the ARP table when that address alias is removed.
This patch fixes the aforementioned issue.
Ed Maste [Thu, 1 Oct 2009 21:44:30 +0000 (21:44 +0000)]
In fill_kinfo_thread, copy the thread's name into struct kinfo_proc even
if it is empty. Otherwise the previous thread's name would remain in the
struct and then be reported for this thread.
Jilles Tjoelker [Thu, 1 Oct 2009 21:40:08 +0000 (21:40 +0000)]
sh: Disallow mismatched quotes in backticks (`...`).
Due to the amount of code removed by this, it seems that allowing unmatched
quotes was a deliberate imitation of System V sh and real ksh. Most other
shells do not allow unmatched quotes (e.g. bash, zsh, pdksh, NetBSD /bin/sh,
dash).
Jung-uk Kim [Thu, 1 Oct 2009 20:56:15 +0000 (20:56 +0000)]
Compile ACPI debugger and disassembler for kernel modules unconditionally.
These files will generate almost empty object files without ACPI_DEBUG/DDB
options. As a result, size of acpi.ko will increase slightly.
Qing Li [Thu, 1 Oct 2009 20:32:29 +0000 (20:32 +0000)]
The flow-table associates TCP/UDP flows and IP destinations with
specific routes. When the routing table changes, for example,
when a new route with a more specific prefix is inserted into the
routing table, the flow-table is not updated to reflect that change.
As such existing connections cannot take advantage of the new path.
In some cases the path is broken. This patch will update the affected
flow-table entries when a more specific route is added. The route
entry is properly marked when a route is deleted from the table.
In this case, when the flow-table performs a search, the stale
entry is updated automatically. Therefore this patch is not
necessary for route deletion.
Andrew Thompson [Thu, 1 Oct 2009 18:37:16 +0000 (18:37 +0000)]
EHCI Hardware BUG workaround
The EHCI HW can use the qtd_next field instead of qtd_altnext when a short
packet is received. This contradicts what is stated in the EHCI datasheet.
Also the total-bytes field in the status field of the following TD gets
corrupted upon reception of a short packet! We work this around in software by
not queueing more than one job/TD at a time of up to 16Kbytes! The bug has been
seen on multiple INTEL based EHCI chips. Other vendors have not been tested
yet.
- Applications using /dev/usb/X.Y.Z, where Z is non-zero are affected, but not
applications using LibUSB v0.1, v1.2 and v2.0.
- Mass Storage (umass) is affected.
Submitted by: Hans Petter Selasky
MFC after: 3 days
Correct the pthread stub prototype for pthread_mutexattr_settype to allow for
the type argument. This is known to fix some pthread_mutexattr_settype()
invocations, especially when it comes to pulseaudio.
Approved by: kib
deischen (threads)
MFC after: 3 days
Provide default implementation for VOP_ACCESS(9), so that filesystems which
want to provide VOP_ACCESSX(9) don't have to implement both. Note that
this commit makes implementation of either of these two mandatory.
As a workaround, for Intel CPUs, do not use CLFLUSH in
pmap_invalidate_cache_range() when self-snoop is apparently not reported
in cpu features. We get a reserved trap when clflushing APIC registers
window.
XEN in full system virtualization mode removes self-snoop from CPU
features, making this a problem.
Tested by: csjp
Reviewed by: alc
MFC after: 3 days
John Baldwin [Wed, 30 Sep 2009 17:05:26 +0000 (17:05 +0000)]
Split the 'video' ACPI lock up into two locks to resolve a LOR with the
sysctl lock. The 'video' lock now protects the 'bus' of video output
devices attached to a graphics adapter. It is used when iterating over
the list of outputs, etc. The 'video_output' lock is used to lock the
output-specific data similar to a driver lock for the individual video
outputs.
Don't do an IPv6 operation when the kernel doesn't have
an IPv6 support.
Reported by: Alexander Best <alexbestms__at__math.uni-muenster.de>
Confirmed by: Paul B. Mahol <onemda__at__gmail.com>,
Alexander Best <alexbestms__at__math.uni-muenster.de>
Andrew Gallatin [Wed, 30 Sep 2009 14:42:06 +0000 (14:42 +0000)]
Two more mxge watchdog fixes:
1) Restore the PCI Express control register after a watchdog
reset. This is required because the device will come out
of watchdog reset with the pectl reg at its default state,
and important BIOS configuration (like max payload size)
could be lost.
2) Call mxge_start_locked() for every tx queue before dropping
the lock in the watchdog handler. This is required, as
the queue's buf ring may have filled during the reset.
Coleman Kane [Wed, 30 Sep 2009 14:28:38 +0000 (14:28 +0000)]
Correct a bug that could lead to a kernel panic if a user attempted to
perform 802.11 operations directly on the ndis0 interface before the
first VAP (wlan0) had been created. This would lead to a NULL-pointer
dereference in the kernel.
Submitted by: Paul B. Mahol <onemda@gmail.com>
MFC after: 3 days
When releasing a read/shared lock we need to use a write memory barrier
in order to avoid, on architectures which doesn't have strong ordered
writes, CPU instructions reordering.
Diagnosed by: fabio
Reviewed by: jhb
Tested by: Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>