kib [Sat, 10 Oct 2009 21:17:30 +0000 (21:17 +0000)]
Refine r195509, instead of checking that vnode type is VBAD, that is
set quite late in the revocation path, properly verify that vnode is
not doomed before calling VOP.
Reported and tested by: Harald Schmalzbauer <h.schmalzbauer omnilan de>
MFC after: 3 days
kib [Sat, 10 Oct 2009 15:31:24 +0000 (15:31 +0000)]
Define architectural load bases for PIE binaries. Addresses were selected
by looking at the bases used for non-relocatable executables by gnu ld(1),
and adjusting it slightly.
Discussed with: bz
Reviewed by: kan
Tested by: bz (i386, amd64), bsam (linux)
MFC after: some time
kib [Sat, 10 Oct 2009 15:27:10 +0000 (15:27 +0000)]
Calculate relocation base for the main object, and apply the relocation
adjustment for all virtual addresses encoded into the ELF structures of
it. PIE binary could and should be loaded at non-zero mapbase.
For sym_zero pseudosymbol used as a return value from find_symdef()
for undefined weak symbols, st_value also should be adjusted, since
_rtld_bind corrects symbol values by relocbase.
Discussed with: bz
Reviewed by: kan
Tested by: bz (i386, amd64), bsam (linux)
MFC after: some time
kib [Sat, 10 Oct 2009 14:56:34 +0000 (14:56 +0000)]
Postpone dropping fp till both kq_global and kqueue mutexes are
unlocked. fdrop() closes file descriptor when reference count goes to
zero. Close method for vnodes locks the vnode, resulting in "sleepable
after non-sleepable". For pipes, pipe mutex is before kqueue lock,
causing LOR.
attilio [Fri, 9 Oct 2009 15:51:40 +0000 (15:51 +0000)]
atomic_cmpset_barr_* was added in order to cope with compilers willing to
specify their own version of atomic_cmpset_* which could have been
different than the membar version.
Right now, however, FreeBSD is bound mostly to GCC-like compilers and
it is desired to add new support and compat shim mostly when there is
a real necessity, in order to avoid too much compatibility bloats.
In this optic, bring back atomic_cmpset_{acq, rel}_* to be the same as
atomic_cmpset_* and unwind the atomic_cmpset_barr_* introduction.
Requested by: jhb
Reviewed by: jhb
Tested by: Giovanni Trematerra <giovanni dot trematerra at
gmail dot com>
pjd [Fri, 9 Oct 2009 09:42:22 +0000 (09:42 +0000)]
If provider is open for writing when we taste it, skip it for classes that
depend on on-disk metadata. This was we won't attach to providers that are used
by other classes. For example we don't want to configure partitions on da0 if
it is part of gmirror, what we really want is partitions on mirror/foo.
During regular work it works like this: if provider is open for writing a class
receives the spoiled event from GEOM and detaches, once provider is closed the
taste event is send again and class can rediscover its metadata if it is still
there. This doesn't work that way when new class arrives, because GEOM gives
all existing providers for it to taste, also those open for writing. Classes
have to decided on their own if they want to deal with such providers (eg.
geom_dev) or not (classes modified by this commit).
Reported by: des, Oliver Lehmann <lehmann@ans-netz.de>
Tested by: des, Oliver Lehmann <lehmann@ans-netz.de>
Discussed with: phk, marcel
Reviewed by: marcel
MFC after: 3 days
jkim [Thu, 8 Oct 2009 17:41:53 +0000 (17:41 +0000)]
Clean up amd64 suspend/resume code.
- Allocate memory for wakeup code after ACPI bus is attached. The early
memory allocation hack was inherited from i386 but amd64 does not need it.
- Exclude real mode IVT and BDA explicitly. Improve comments about memory
allocation and reason for the exclusions. It is a no-op in reality, though.
- Remove an unnecessary CLD from wakeup code and re-align.
We could look at these features in the future (if people consider them
to be important enough), but we'd better discard them now. This fixes
some artifacts people reported when using TERM=xterm.
pjd [Wed, 7 Oct 2009 20:56:15 +0000 (20:56 +0000)]
On FreeBSD it is enough to report provider removal when orphan event is
received, we don't have to do it on every ENXIO error in I/O path.
Solaris has no GEOM so they have to handle it in a less clean way.
rwatson [Wed, 7 Oct 2009 20:20:51 +0000 (20:20 +0000)]
Add a new errno, ENOTCAPABLE, to be returned when a process requests an
operation on a file descriptor that is not authorized by the descriptor's
capability flags.
zml [Wed, 7 Oct 2009 19:50:14 +0000 (19:50 +0000)]
Handle GRANTED_RES messages more gracefully: Send along a grant cookie
to reference the lock, look up the grant cookie when the GRANTED_RES
comes back. Properly handle the case of an error on the grant. Add a
short expiration window so that granted locks are not freed immediately.
stas [Wed, 7 Oct 2009 13:12:43 +0000 (13:12 +0000)]
- Add support for new BGE chips (5761, 5784 and 57780). These chips uses new
BGE_PCI_PRODID_ASICREV register to store the chip identifier and its revision.
- Add new grouping macro for 7575+ chips (BGE_IS_5755_PLUS).
- Add IDs for Fujitsu-branded Broadcom adapters.
pjd [Wed, 7 Oct 2009 12:38:19 +0000 (12:38 +0000)]
Fix situation where Mac OS X NFS client creates a file and when it tries
to set ownership and mode in the same setattr operation, the mode was
overwritten by secpolicy_vnode_setattr().
PR: kern/118320
Submitted by: Mark Thompson <info-gentoo@mark.thompson.bz>
MFC after: 3 days
attilio [Tue, 6 Oct 2009 23:48:28 +0000 (23:48 +0000)]
- All the functions in atomic.h needs to be in "physical" form (like
not defined through macros or similar) in order to be later compiled in
the kernel and offer this way the support for modules (and
compatibility among the UP case and SMP case).
Fix this for the newly introduced atomic_cmpset_barr_* cases by defining
and specifying a template. Note that the new DEFINE_CMPSET_GEN()
template save more typing on amd64 than the current code. [1]
- Fix the style for memory barriers on amd64.
[1] Reported by: Paul B. Mahol <onemda at gmail dot com>
jilles [Tue, 6 Oct 2009 22:00:14 +0000 (22:00 +0000)]
sh: Send the "xyz: not found" message to redirected fd 2.
This also fixes that trying to execute a non-regular file with a command
name without '/' returns 127 instead of 126.
The fix is rather simplistic: treat CMDUNKNOWN as if the command were found
as an external program. The resulting fork is a bit wasteful but executing
unknown commands should not be very frequent.
rwatson [Tue, 6 Oct 2009 20:35:41 +0000 (20:35 +0000)]
Remove tcp_input lock statistics; these are intended for debugging only
and are not intended to ship in 8.0 as they dirty additional cache
lines in a performance-critical per-packet path.
rdivacky [Tue, 6 Oct 2009 20:19:16 +0000 (20:19 +0000)]
Fix tcsh losing history when tcsh terminates because the pty beneath it
is closed.
Diagnosed by Ted Anderson:
New signal queuing logic was introduced in 6.15 and allows the signal handlers
to be run explicitly by calling handle_pending_signals, instead of
immediately when the signal is delivered. This function is called at
various places, typically when receiving a EINTR from a slow system call
such as read or write. In the pty exit case, it was called from xwrite,
called from flush, while printing the "exit" message after receiving EOF
when reading from the pty (note that the read did not return EINTR but
zero bytes, indicating EOF). The SIGHUP handler, phup(), called
rechist, which opened the history file and began writing the merged
history to it. This process invoked flush recursively to actually write
the data. In this case, however, the flush noticed it was being called
recursively and decided fail by calling stderror.
My conclusion was that the signal was being handled at a bad time. But
whether to fix flush not to care about the recursive call, or to handle
the signal some other time and when to handle it, was unclear to me.
However, by adding an extra call to handle_pending_signals, just after
process() returns to main(), I was able to avoid the truncated history
after network outages and similar failures. I verified this fix in
version 6.17.
rwatson [Tue, 6 Oct 2009 17:14:39 +0000 (17:14 +0000)]
In rtld's map_object(), use pread(..., 0) rather than read() to read the
ELF header from the front of the file. As all other I/O on the binary
is done using mmap(), this avoids the need for seek privileges on the
file descriptor during run-time linking.
rwatson [Tue, 6 Oct 2009 14:05:57 +0000 (14:05 +0000)]
Add basename_r(3) to complement basename(3). basename_r(3) which accepts
a caller-allocated buffer of at least MAXPATHLEN, rather than using a
global buffer.
attilio [Tue, 6 Oct 2009 13:45:49 +0000 (13:45 +0000)]
Per their definition, atomic instructions used in conjuction with
memory barriers should also ensure that the compiler doesn't reorder paths
where they are used. GCC, however, does that aggressively, even in
presence of volatile operands. The most reliable way GCC offers for avoid
instructions reordering is clobbering "memory" even if that is
theoretically an heavy-weight operation, flushing the content of all
the registers and forcing reload of them (We could rely, however, on
gcc DTRT by just understanding the purpose as this is a well-known
pattern for many modern operating-systems).
Not all our memory barriers, right now, clobber memory for GCC-like
compilers. The most notable cases are IA32 and amd64 where the memory
barrier are treacted the same as normal atomic instructions.
Fix this by offering the possibility to implement atomic instructions
with memory barriers separately from the normal version and implement
the GCC-like specific one using memory clobbering.
Thanks to Chris Lattner (@apple) for his discussion on llvm specifics.
Reported by: jhb
Reviewed by: jhb
Tested by: rdivacky, Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
rwatson [Mon, 5 Oct 2009 22:24:13 +0000 (22:24 +0000)]
In tcp_input(), we acquire a global write lock at first only if a
segment is likely to trigger a TCP state change (i.e., FIN/RST/SYN).
If we later have to upgrade the lock, we acquire an inpcb reference
and drop both global/inpcb locks before reacquiring in-order. In
that gap, the connection may transition into TIMEWAIT, so we need
to loop back and reevaluate the inpcb after relocking.
MFC after: 3 days
Reported by: Kamigishi Rei <spambox at haruhiism.net>
Reviewed by: bz
delphij [Mon, 5 Oct 2009 21:11:04 +0000 (21:11 +0000)]
fts_open() requires that the list passed as argument to contain at least
one path. When the list is empty (contain only a NULL pointer), return
EINVAL instead of pretending to succeed, which will cause a NULL pointer
deference in a later fts_read() call.
Noticed by: Christoph Mallon (via rdivacky@)
MFC after: 2 weeks
trasz [Mon, 5 Oct 2009 19:56:56 +0000 (19:56 +0000)]
Fix NFSv4 ACLs on sparc64. Turns out that fuword(9) fetches 64 bits
instead of sizeof(int), and on sparc64 that resulted in fetching wrong
value for acl_maxcnt, which in turn caused __acl_get_link(2) to fail
with EINVAL.
PR: sparc64/139304
Submitted by: Dmitry Afanasiev <KOT at MATPOCKuH.Ru>
jkim [Mon, 5 Oct 2009 16:26:54 +0000 (16:26 +0000)]
- Revert r191568 partially. Forcing AHCI mode by changing device subclass
and progif is evil. It doesn't work reliably[1] and we should honor BIOS
configuration by the user.
- If the SATA controller is enbled but combined mode is disabled, mask off
the emulated IDE channel on the legacy IDE controller.
rwatson [Mon, 5 Oct 2009 14:49:16 +0000 (14:49 +0000)]
First cut at implementing SOCK_SEQPACKET support for UNIX (local) domain
sockets. This allows for reliable bi-directional datagram communication
over UNIX domain sockets, in contrast to SOCK_DGRAM (M:N, unreliable) or
SOCK_STERAM (bi-directional bytestream). Largely, this reuses existing
UNIX domain socket code. This allows applications requiring record-
oriented semantics to do so reliably via local IPC.
Some implementation notes (also present in XXX comments):
- Currently we lack an sbappend variant able to do datagrams and control
data without doing addresses, so we mark SOCK_SEQPACKET as PR_ADDR.
Adding a new variant will solve this problem.
- UNIX domain sockets on FreeBSD provide back-pressure/flow control
notification for stream sockets by manipulating the send socket
buffer's size during pru_send and pru_rcvd. This trick works less well
for SOCK_SEQPACKET as sosend_generic() uses sb_hiwat not just to
manage blocking, but also to determine maximum datagram size. Fixing
this requires rethinking how back-pressure is done for SOCK_SEQPACKET;
in the mean time, it's possible to get EMSGSIZE when buffers fill,
instead of blocking.
jhb [Mon, 5 Oct 2009 14:13:16 +0000 (14:13 +0000)]
When the timeout backoff hits the maximum value, leave it capped at the
maximum value rather than setting it to the result of a boolean expression
that is always true.
stas [Mon, 5 Oct 2009 10:08:58 +0000 (10:08 +0000)]
- Drop unused pmap_use_l1 function and comment out currently unused
pmap_dcache_wbinv_all/pmap_copy_page functions which we might want
to take advatage of later. This fixes the build with PMAP_DEBUG
defined.
edwin [Mon, 5 Oct 2009 07:13:15 +0000 (07:13 +0000)]
Modified locale(1) to be able to show the altmon_X fields and the [cxX]_fmt's.
Also modify the "-k list" option to display only fields with a certain prefix.
edwin [Mon, 5 Oct 2009 07:11:19 +0000 (07:11 +0000)]
Modified locale(1) to be able to show the altmon_X fields and the [cxX]_fmt's.
Also modify the "-k list" option to display only fields with a certain prefix.
mjacob [Mon, 5 Oct 2009 01:31:16 +0000 (01:31 +0000)]
The cylinder group tag cg_initediblk needs to match the number of inodes
actually initialized. In the growfs case for UFS2, no inodes were actually
being initialized and the number of inodes noted as initialized was the
number of inodes per group. This created a filesystem that was deemed
corrupted because the inodes thus added were full of garbage.
das [Sun, 4 Oct 2009 19:43:36 +0000 (19:43 +0000)]
Better glibc compatibility for getline/getdelim:
- Tolerate applications that pass a NULL pointer for the buffer and
claim that the capacity of the buffer is nonzero.
- If an application passes in a non-NULL buffer pointer and claims the
buffer has zero capacity, we should free (well, realloc) it
anyway. It could have been obtained from malloc(0), so failing to
free it would be a small memory leak.
attilio [Sat, 3 Oct 2009 15:02:55 +0000 (15:02 +0000)]
When releasing a lockmgr held in shared way we need to use a write memory
barrier in order to avoid, on architectures which doesn't have strong
ordered writes, CPU instructions reordering.
Make sure that the primary native brandinfo always gets added
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.
The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.
marcel [Fri, 2 Oct 2009 22:30:44 +0000 (22:30 +0000)]
Fix RTS/CTS flow control, broken by the TTY overhaul. The new TTY
interface is fairly simple WRT dealing with flow control, but
needed 2 new RX buffer functions with "get-char-from-buf" separated
from "advance-buf-pointer" so that the pointer could be advanced
only when ttydisc_rint() succeeded.
Add a mitigation feature that will prevent user mappings at
virtual address 0, limiting the ability to convert a kernel
NULL pointer dereference into a privilege escalation attack.
If the sysctl is set to 0 a newly started process will not be able
to map anything in the address range of the first page (0 to PAGE_SIZE).
This is the default. Already running processes are not affected by this.
You can either change the sysctl or the tunable from loader in case
you need to map at a virtual address of 0, for example when running
any of the extinct species of a set of a.out binaries, vm86 emulation, ..
In that case set security.bsd.map_at_zero="1".
Superseeds: r197537
In collaboration with: jhb, kib, alc