Ed Schouten [Wed, 3 Oct 2012 13:51:03 +0000 (13:51 +0000)]
Fix faulty error code handling in read(2) on TTYs.
When performing a non-blocking read(2), on a TTY while no data is
available, we should return EAGAIN. But if there's a modem disconnect,
we should return 0. Right now we only return 0 when doing a blocking
read, which is wrong.
Optimize prev. commit for speed.
1) Don't iterate the loop from the environment array beginning each time,
iterate it under the last place we deactivate instead.
2) Call __rebuild_environ() not on each iteration but once, only at the end
of whole loop (of course, only in case if something is changed).
rpc: convert all uid and gid variables of the type uid_t and gid_t.
As part of the previous commit, uses of xdr_int() were replaced
with xdr_u_int(). This has undesired effects as the second
argument doesn't match exactly uid_t or gid_t. It also breaks
assumptions in the size of the provided types.
To work around those issues we revert back to the use of xdr_int()
but provide proper casting so the behaviour doesn't change.
While here fix a style issue in the affected lines.
Devin Teske [Wed, 3 Oct 2012 02:32:47 +0000 (02:32 +0000)]
Import sysutils/sysrc from the ports tree (current version 5.1). Importing
disconnected under the WITH_BSDCONFIG flag (a good idea since this version of
sysrc(8) indeed requires the `sysrc.subr' module installed by bsdconfig(8)).
Multiple reasons sysrc should not simply continue to live in ports. The most
important being that it is tightly coupled with the base.
Alexander Motin [Tue, 2 Oct 2012 22:03:21 +0000 (22:03 +0000)]
Implement SATA revision (speed) control for legacy SATA controller for
both boot (via loader tunables) and run-time (via `camcontrol negotiate`).
Tested to work at least on NVIDIA MCP55 chipset.
When creating a client with clnt_tli_create, it uses strdup to copy
strings for these fields if nconf is passed in. clnt_dg_destroy frees
these strings already. Make sure clnt_vc_destroy frees them in the same
way.
This change matches the reference (OpenSolaris) implementation.
__rpc_getconfip is supposed to return the first netconf
entry supporting tcp or udp, respectively. The code will
currently return the *last* entry, plus it will leak
memory when there is more than one such entry.
This change matches the reference (OpenSolaris)
implementation.
Adrian Chadd [Tue, 2 Oct 2012 17:45:19 +0000 (17:45 +0000)]
Migrate the power-save functions to be overridable VAP methods.
This turns ieee80211_node_pwrsave(), ieee80211_sta_pwrsave() and
ieee80211_recv_pspoll() into methods.
The intent is to let drivers override these and tie into the power save
management pathway.
For ath(4), this is the beginning of forcing a node software queue to
stop and start as needed, as well as supporting "leaking" single frames
from the software queue to the hardware.
Right now, ieee80211_recv_pspoll() will attempt to transmit a single frame
to the hardware (whether it be a data frame on the power-save queue or
a NULL data frame) but the driver may have hardware/software queued frames
queued up. This initial work is an attempt at providing the hooks required
to implement correct behaviour.
Allowing ieee80211_node_pwrsave() to be overridden allows the ath(4)
driver to pause and unpause the entire software queue for a given node.
It doesn't make sense to transmit anything whilst the node is asleep.
Please note that there are other corner cases to correctly handle -
specifically, setting the MORE data bit correctly on frames to a station,
as well as keeping the TIM updated. Those particular issues can be
addressed later.
Using putenv() and later direct pointer contents modification it is possibe
to craft environment variables with similar names like that:
a=1
a=2
...
unsetenv("a") should remove them all to make later getenv("a") impossible.
Fix it to do so (this is GNU autoconf test #3 failure too).
John Baldwin [Tue, 2 Oct 2012 12:25:30 +0000 (12:25 +0000)]
Rename the module for 'device enc' to "if_enc" to avoid conflicting with
the CAM "enc" peripheral (part of ses(4)). Previously the two modules
used the same name, so only one was included in a linked kernel causing
enc0 to not be created if you added IPSEC to GENERIC. The new module
name follows the pattern of other network interfaces (e.g. "if_loop").
Gleb Smirnoff [Tue, 2 Oct 2012 12:03:02 +0000 (12:03 +0000)]
There is a complex race in in_pcblookup_hash() and in_pcblookup_group().
Both functions need to obtain lock on the found PCB, and they can't do
classic inter-lock with the PCB hash lock, due to lock order reversal.
To keep the PCB stable, these functions put a reference on it and after PCB
lock is acquired drop it. If the reference was the last one, this means
we've raced with in_pcbfree() and the PCB is no longer valid.
This approach works okay only if we are acquiring writer-lock on the PCB.
In case of reader-lock, the following scenario can happen:
- 2 threads locate pcb, and do in_pcbref() on it.
- These 2 threads drop the inp hash lock.
- Another thread comes to delete pcb via in_pcbfree(), it obtains hash lock,
does in_pcbremlists(), drops hash lock, and runs in_pcbrele_wlocked(), which
doesn't free the pcb due to two references on it. Then it unlocks the pcb.
- 2 aforementioned threads acquire reader lock on the pcb and run
in_pcbrele_rlocked(). One gets 1 from in_pcbrele_rlocked() and continues,
second gets 0 and considers pcb freed, returns.
- The thread that got 1 continutes working with detached pcb, which later
leads to panic in the underlying protocol level.
To plumb that problem an additional INPCB flag introduced - INP_FREED. We
check for that flag in the in_pcbrele_rlocked() and if it is set, we pretend
that that was the last reference.
Discussed with: rwatson, jhb
Reported by: Vladimir Medvedkin <medved rambler-co.ru>
Eitan Adler [Tue, 2 Oct 2012 00:30:15 +0000 (00:30 +0000)]
Correct the tip about finding all the directories on the system
Add a tip about clearing the screen.
Make things more consistent by removing quotes around 'make search'
Rick Macklem [Mon, 1 Oct 2012 12:28:58 +0000 (12:28 +0000)]
Attila Bogar and Herbert Poeckl both reported similar problems
w.r.t. a Linux NFS client doing a krb5 NFS mount against the
FreeBSD server. We determined this was a Linux bug:
http://www.spinics.net/lists/linux-nfs/msg32466.html, however
the mount failed to work, because the Destroy operation with a
bogus encrypted checksum destroyed the authenticator handle.
This patch changes the rpcsec_gss code so that it doesn't
Destroy the authenticator handle for this case and, as such,
the Linux mount will work.
Tested by: Attila Bogar and Herbert Poeckl
MFC after: 2 weeks
- Enforce CAP_MKFIFO on mkfifoat(2), not on mknodat(2). Without this change
mkfifoat(2) was not restricted.
- Introduce CAP_MKNOD and enforce it on mknodat(2).
Sponsored by: FreeBSD Foundation
MFC after: 2 weeks
Inherit USB mode from RootHUB port where the USB device is connected.
Only RootHUB ports can be dual mode. Disallow OTG ports on external HUBs.
This simplifies some checks in the USB controller drivers.
Andrew Turner [Mon, 1 Oct 2012 05:12:17 +0000 (05:12 +0000)]
Fix the clobber list on the atomic operators that do comparisons. Without
this some compilers will place a cmp instruction before the atomic operation
and expect to be able to use the result afterwards. By adding "cc" to the
list of used registers we tell the compiler to not do this.
The USB Bluetooth driver should only grab its own interfaces. This allows the
USB bluetooth driver to co-exist with other USB device classes and drivers.
- Simplify the implementation of atomic_compare_exchange_strong_explicit.
- Evaluate the memory order argument in atomic_fetch_*_explicit macros.
- Implement atomic_store_explicit using atomic_exchange_explicit instead
of a plain assignment.
Simplify send out queue code:
- Write method of a queue now is void,length of item is taken
as queue property.
- Write methods don't need to know about mbud, supply just buf
to them.
- No need for safe queue iterator in pfsync_sendout().
Alan Cox [Sat, 29 Sep 2012 17:20:16 +0000 (17:20 +0000)]
Add support for mincore(). Specifically, this is an adaptation of the
pmap_mincore() implementation that was added to the original arm pmap
in r235717.
Almost each time when loader opens a file, this leads to calling
disk_open(). Very often this is called several times for one file.
This leads to reading partition table metadata for each call. To
reduce the number of disk I/O we have a simple block cache, but it
is very dumb and more than half of I/O operations related to reading
metadata, misses this cache.
Introduce new cache layer to resolve this problem. It is independent
and doesn't need initialization like bcache, and will work by default
for all loaders which use the new DISK API. A successful disk_open()
call to each new disk or partition produces new entry in the cache.
Even more, when disk was already open, now opening of any nested
partitions does not require reading top level partition table.
So, if without this cache, partition table metadata was read around
20-50 times during boot, now it reads only once. This affects the booting
from GPT and MBR from the UFS.
Steve Kargl [Sat, 29 Sep 2012 16:40:12 +0000 (16:40 +0000)]
* src/math_private.h:
. Change the API for the LD80C by removing the explicit passing
of the sign bit. The sign can be determined from the last
parameter of the macro.
. On i386, load long double by bit manipulations to work around
at least a gcc compiler issue. On non-i386 ld80 architectures,
use a simple assignment.
* ld80/s_expl.c:
. Update the only consumer of LD80C.
Disable splitfs support, since we aren't support floppies for a long
time. This slightly reduces an overhead, when loader tries to open
file that doesn't exist.
libc: Use O_CLOEXEC for various internal file descriptors.
This fixes a race condition where another thread may fork() before CLOEXEC
is set, unintentionally passing the descriptor to the child process.
This commit only adds O_CLOEXEC flags to open() or openat() calls where no
fcntl(fd, F_SETFD, FD_CLOEXEC) follows. The separate fcntl() call still
leaves a race window so it should be fixed later.
Allow deferred word-splitting via f_sysrc_get() by allowing $IFS in the
"clean-room" environment used to query rc.conf(5) parameters.
This brings bsdconfig(8)'s sysrc.subr in-line with both the sysrc(8) manual
[provided by sysutils/sysrc] and sysrc(8)'s own sysrc.subr (now identical to
bsdconfig(8)'s sysrc.subr as of this patch).
Finally, this will allow a clean import of sysutils/sysrc (sans sysrc.subr,
already provided here).
Reviewed by: jilles
Approved by: adrian (co-mentor)
Simplify and somewhat redesign interaction between pf_purge_thread() and
pf_purge_expired_states().
Now pf purging daemon stores the current hash table index on stack
in pf_purge_thread(), and supplies it to next iteration of
pf_purge_expired_states(). The latter returns new index back.
The important change is that whenever pf_purge_expired_states() wraps
around the array it returns immediately. This makes our knowledge about
status of states expiry run more consistent. Prior to this change it
could happen that n-th run stopped on i-th entry, and returned (1) as
full run complete, then next (n+1) full run stopped on j-th entry, where
j < i, and that broke the mark-and-sweep algorythm that saves references
rules. A referenced rule was freed, and this later lead to a crash.
The drbr(9) API appeared to be so unclear, that most drivers in
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.
o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
ALTQ queueing or buf_ring(9) queueing.
- It doesn't touch the buf_ring stats any more.
- It doesn't touch ifnet stats anymore.
- drbr_stats_update() no longer exists.
o buf_ring(9) handles its stats itself:
- It handles br_drops itself.
- br_prod_bytes stats are dropped. Rationale: no one ever
reads them but update of a common counter on every packet
negatively affects performance due to excessive cache
invalidation.
- buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
we no longer account bytes.
o Drivers handle their stats theirselves: if_obytes, if_omcasts.
o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer
use drbr_stats_update(), and update ifnet stats theirselves.
o bxe(4) was the most correct driver, it didn't call
drbr_stats_update(), thus it was the only driver accurate under
moderate load. Now it also maintains stats itself.
o ixgbe(4) had already taken stats from hardware, so just
- drop software stats updating.
- take multicast packet count from hardware as well.
o mxge(4) just no longer needs NO_SLOW_STATS define.
o cxgb(4), cxgbe(4) need no change, since they obtain stats
from hardware.
Alexander Motin [Fri, 28 Sep 2012 12:13:34 +0000 (12:13 +0000)]
Change queue overflow checks from DIAGNOSTIC+panic() to KASSERT() to make
them enabled on HEAD by default. It is probably better to do single compare
then hunt for unexpected memory corruption.
John Baldwin [Fri, 28 Sep 2012 11:59:32 +0000 (11:59 +0000)]
- Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specific
bits under #ifdef _KERNEL but leave definitions for various structures
defined by standards ($PIR table, SMAP entries, etc.) available to
userland.
- Consolidate duplicate SMBIOS table structure definitions in ipmi(4)
and smbios(4) in <machine/pc/bios.h> and make them available to
userland.
Fix the mis-handling of the VV_TEXT on the nullfs vnodes.
If you have a binary on a filesystem which is also mounted over by
nullfs, you could execute the binary from the lower filesystem, or
from the nullfs mount. When executed from lower filesystem, the lower
vnode gets VV_TEXT flag set, and the file cannot be modified while the
binary is active. But, if executed as the nullfs alias, only the
nullfs vnode gets VV_TEXT set, and you still can open the lower vnode
for write.
Add a set of VOPs for the VV_TEXT query, set and clear operations,
which are correctly bypassed to lower vnode.
Make the loader a bit smarter, when it tries to open disk and the slice
number is not exactly specified. When the disk has MBR, also try to read
BSD label after ptable_getpart() call. When the disk has GPT, also set
d_partition to 255. Mostly, this is how it worked before.
Remove the topology lock from disk_gone(), it might be called with regular
mutexes held and the topology lock is an sx lock.
The topology lock was there to protect traversing through the list of providers
of disk's geom, but it seems that disk's geom has always exactly one provider.
Change the code to call g_wither_provider() for this one provider, which is
safe to do without holding the topology lock and assert that there is indeed
only one provider.
libc/fts: Use O_CLOEXEC for internal file descriptors.
Because fts keeps internal file descriptors open across calls, making such
descriptors close-on-exec helps not only multi-threaded applications but
also single-threaded applications.
In particular, this prevents passing a temporary file descriptor for saving
the current directory to processes created via find -exec.
Ryan Stone [Thu, 27 Sep 2012 20:12:51 +0000 (20:12 +0000)]
Ensure that all cases that enqueue a netgraph item for delivery by a
ngthread properly set the item's depth to 1. In particular, prior to this
change if ng_snd_item failed to acquire a lock on a node, the item's depth
would not be set at all. This fix ensures that the error code from rcvmsg/
rcvdata is properly passed back to the apply callback. For example, this
fixes a bug where an error from rcvmsg/rcvdata would not previously
propagate back to a libnetgraph consumer when the message was queued.