ae [Tue, 14 Jun 2011 07:20:16 +0000 (07:20 +0000)]
Add IPv6 support to the ipfw uid/gid check. Pass an ip_fw_args structure
to the check_uidgid() function, since it contains all needed arguments
and also pointer to mbuf and now it is possible use in_pcblookup_mbuf()
function.
Since i can not test it for the non-FreeBSD case, i keep this ifdef
unchanged.
Tested by: Alexander V. Chernikov
MFC after: 3 weeks
ken [Mon, 13 Jun 2011 22:08:24 +0000 (22:08 +0000)]
Instead of using an atomic operation to determine whether the devstat(9)
device node has been created, pass MAKEDEV_CHECKNAME in so that the devfs
code will do the check.
Use a regular static variable as before, that's good enough to keep us from
calling into devfs most of the time.
gibbs [Mon, 13 Jun 2011 21:21:02 +0000 (21:21 +0000)]
Fix a couple of race conditions in devstat(9) initialization.
In devstat_new_entry(), there is no need to initialize the queue
and the mutex in this function. There are ways to do static
initialization on both, so use STAILQ_HEAD_INITIALIZER and
MTX_SYSINIT to initialize the queue and the mutex.
In devstat_alloc(), use an atomic test and set routine to guard
making our entry in /dev. Using just a plain static variable
creates a race condition on multiprocessor machines. If you
attempt to create a second entry in devfs, the kernel will panic.
jilles [Mon, 13 Jun 2011 21:03:27 +0000 (21:03 +0000)]
sh: Fix duplicate prototypes for builtins.
Have mkbuiltins write the prototypes for the *cmd functions to builtins.h
instead of builtins.c and include builtins.h in more .c files instead of
duplicating prototypes for *cmd functions in other headers.
gibbs [Mon, 13 Jun 2011 20:36:29 +0000 (20:36 +0000)]
Several enhancements to the Xen block back driver.
sys/dev/xen/blkback/blkback.c:
o Implement front-end request coalescing. This greatly improves the
performance of front-end clients that are unaware of the dynamic
request-size/number of requests negotiation available in the
FreeBSD backend driver. This required a large restructuring
in how this driver records in-flight transactions and how those
transactions are mapped into kernel KVA. For example, the driver
now includes a mini "KVA manager" that allocates ranges of
contiguous KVA to patches of requests that are physically
contiguous in the backing store so that a single bio or UIO
segment can be used to represent the I/O.
o Refuse to open any backend files or devices if the system
has yet to mount root. This avoids a panic.
o Properly handle "onlined" devices. An "onlined" backend
device stays attached to its backing store across front-end
disconnections. This feature is intended to reduce latency
when a front-end does a hand-off to another driver (e.g.
PV aware bootloader to OS kernel) or during a VM reboot.
o Harden the driver against a pathological/buggy front-end
by carefully vetting front-end XenStore data such as the
front-end state.
o Add sysctls that report the negotiated number of
segments per-request and the number of requests that
can be concurrently in flight.
bz [Mon, 13 Jun 2011 20:11:28 +0000 (20:11 +0000)]
Add a new option -P to suppress getservbyport(3) calls when printing rules.
This allows one to force consistent printing of numeric port numbers like
we do with -n for other tools like netstat (just that -n was already taken)
rather than the service names.
-P is currently unused in OpenBSD so the change is eligible for upstreaming.
jhb [Mon, 13 Jun 2011 15:38:31 +0000 (15:38 +0000)]
Advance the advertised window (rcv_adv) to the currently received data
(rcv_nxt) if we advertising a zero window. This can be true when ACK'ing
a window probe whose one byte payload was accepted rather than dropped
because the socket's receive buffer was not completely full, but the
remaining space was smaller than the window scale.
This ensures that window probe ACKs satisfy the assumption made in r221346
and closes a window where rcv_nxt could be greater than rcv_adv.
jilles [Sun, 12 Jun 2011 23:06:04 +0000 (23:06 +0000)]
sh: Save/restore changed variables in optimized command substitution.
In optimized command substitution, save and restore any variables changed by
expansions (${var=value} and $((var=assigned))), instead of trying to
determine if an expansion may cause such changes.
If $! is referenced in optimized command substitution, do not cause jobs to
be remembered longer.
This fixes $(jobs $!) again, simplifies the man page and shortens the code.
mckusick [Sun, 12 Jun 2011 18:46:48 +0000 (18:46 +0000)]
Disable the soft updates journaling after a filesystem is successfully
downgraded to read-only. It will be restarted if the filesystem is
upgraded back to read-write.
rmacklem [Sat, 11 Jun 2011 21:14:22 +0000 (21:14 +0000)]
Make three one line changes to the rc scripts so that
they work with the new NFS client being the default,
since the new NFS client's module name is nfscl and
not nfsclient.
kib [Sat, 11 Jun 2011 20:15:19 +0000 (20:15 +0000)]
Assert that page is VPO_BUSY or page owner object is locked in
vm_page_undirty(). The assert is not precise due to VPO_BUSY owner
to tracked, so assertion does not catch the case when VPO_BUSY is
owned by other thread.
joel [Sat, 11 Jun 2011 09:08:46 +0000 (09:08 +0000)]
Enable sound support by default on i386 and amd64.
The generic sound driver has been added, along with enough
device-specific drivers to support the most common audio
chipsets.
We've discussed enabling it from time to time over the years
and we've received numerous requests from users, so we decided
that shipping 9.0 with working audio by default would be the
best thing to do.
Bug reports should be sent to the multimedia@ mailing list, as
usual.
gibbs [Sat, 11 Jun 2011 04:59:01 +0000 (04:59 +0000)]
Monitor and emit events for XenStore changes to XenBus trees
of the devices we manage. These changes can be due to writes
we make ourselves or due to changes made by the control domain.
The goal of these changes is to insure that all state transitions
can be detected regardless of their source and to allow common
device policies (e.g. "onlined" backend devices) to be centralized
in the XenBus bus code.
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_if.m:
Add a new method for XenBus drivers "localend_changed".
This method is invoked whenever a write is detected to
a device's XenBus tree. The default implementation of
this method is a no-op.
sys/xen/xenbus/xenbus_if.m:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkback/blkback.c:
Change the signature of the "otherend_changed" method.
This notification cannot fail, so it should return void.
sys/xen/xenbus/xenbusb_back.c:
Add "online" device handling to the XenBus Back Bus
support code. An online backend device remains active
after a front-end detaches as a reconnect is expected
to occur in the near future.
sys/xen/interface/io/xenbus.h:
Add comment block further explaining the meaning and
driver responsibilities associated with the XenBus
Closed state.
sys/xen/xenbus/xenbusb.c:
sys/xen/xenbus/xenbusb.h:
sys/xen/xenbus/xenbusb_back.c:
sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusb_if.m:
o Register a XenStore watch against the local XenBus tree
for all devices.
o Cache the string length of the path to our local tree.
o Allow the xenbus front and back drivers to hook/filter both
local and otherend watch processing.
o Update the device ivar version of "state" when we detect
a XenStore update of that node.
sys/dev/xen/control/control.c:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbusb.c:
sys/xen/xenbus/xenbusb.h:
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenstore/xenstorevar.h:
Allow clients of the XenStore watch mechanism to attach
a single uintptr_t worth of client data to the watch.
This removes the need to carefully place client watch
data within enclosing objects so that a cast or offsetof
calculation can be used to convert from watch to enclosing
object.
jeff [Fri, 10 Jun 2011 22:48:35 +0000 (22:48 +0000)]
Implement fully asynchronous partial truncation with softupdates journaling
to resolve errors which can cause corruption on recovery with the old
synchronous mechanism.
- Append partial truncation freework structures to indirdeps while
truncation is proceeding. These prevent new block pointers from
becoming valid until truncation completes and serialize truncations.
- On completion of a partial truncate journal work waits for zeroed
pointers to hit indirects.
- softdep_journal_freeblocks() handles last frag allocation and last
block zeroing.
- vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it
is only implemented in one place.
- Block allocation failure handling moved up one level so it does not
proceed with buf locks held. This permits us to do more extensive
reclaims when filesystem space is exhausted.
- softdep_sync_metadata() is broken into two parts, the first executes
once at the start of ffs_syncvnode() and flushes truncations and
inode dependencies. The second is called on each locked buf. This
eliminates excessive looping and rollbacks.
- Improve the mechanism in process_worklist_item() that handles
acquiring vnode locks for handle_workitem_remove() so that it works
more generally and does not loop excessively over the same worklist
items on each call.
- Don't corrupt directories by zeroing the tail in fsck. This is only
done for regular files.
- Push a fsync complete record for files that need it so the checker
knows a truncation in the journal is no longer valid.
jeff [Fri, 10 Jun 2011 22:18:25 +0000 (22:18 +0000)]
- If the fsync in ufs_direnter fails SUJ can later panic because we have
partially added a name. Allow ufs_direnter() to continue in the
hopes that it is a transient error. If it is not, the directory
is corrupted already from IO errors and writing this new block
is not likely to make things worse.
attilio [Fri, 10 Jun 2011 20:23:56 +0000 (20:23 +0000)]
- Fix races on detach handling of AAC_IFFLAGS_* mask
- Fix races on setting AAC_AIFFLAGS_ALLOCFIBS
- Remove some unused AAC_IFFLAGS_* bits.
Please note that the kthread still makes a difference between the
total mask and AAC_AIFFLAGS_ALLOCFIBS because more flags may be
added in the future to aifflags.
gibbs [Fri, 10 Jun 2011 20:10:30 +0000 (20:10 +0000)]
Remove C constructs that are incompatible with C++ from various
OpenSolaris and ZFS header files. These changes are sufficient
to allow a C++ program to use the libzfs library.
Note: The majority of these files already included 'extern "C"'
declarations, so the intention of providing C++ compatibility
already existed even if it wasn't provided.
cddl/compat/opensolaris/include/assert.h:
Wrap our compatibility assert implementation in
'extern "C"'. Since this is a compatibility header
I matched the Solaris style of doing this explicitly
rather than rely on FreeBSD's __BEGIN/END_DECLS macro.
sys/cddl/compat/opensolaris/sys/kstat.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_pool.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/ddt.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h:
Rename parameters in function declarations that conflict
with C++ keywords. This was the solution preferred by
members of the Illumos community.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h:
In C, nested structures are visible in the global namespace,
but in C++, they take on the namespace of the structure in
which they are contained. Flatten nested structure
definitions within struct zfs_cmd so these structures are
visible in the global namespace when compiled in both
languages.
jilles [Fri, 10 Jun 2011 13:47:11 +0000 (13:47 +0000)]
skel/.shrc: Improve commented CDPATH example for POSIX requirements.
POSIX says an empty entry in CDPATH shall not result in the new directory
being printed, while any non-empty entry shall result in the new directory
being printed, including ".". Therefore, the value of CDPATH should almost
always start with a colon, not dot and colon.
Our sh does not print the name for empty entries as well as "." entries.
jilles [Thu, 9 Jun 2011 23:12:23 +0000 (23:12 +0000)]
sh: Do parameter expansion before printing PS4 (set -x).
The function name expandstr() and the general idea of doing this kind of
expansion by treating the text as a here document without end marker is from
dash.
All variants of parameter expansion and arithmetic expansion also work (the
latter is not required by POSIX but it does not take extra code and many
other shells also allow it).
Command substitution is prevented because I think it causes too much code to
be re-entered (for example creating an unbounded recursion of trace lines).
Unfortunately, our LINENO is somewhat crude, otherwise PS4='$LINENO+ ' would
be quite useful.
nwhitehorn [Thu, 9 Jun 2011 19:47:30 +0000 (19:47 +0000)]
Add -Wa,-many to CFLAGS on PowerPC. This aids in building a kernel using
clang, which would otherwise complain about some 64-bit bridge mode
instructions.
mav [Thu, 9 Jun 2011 16:30:13 +0000 (16:30 +0000)]
Intel NM10 chipset's SATA controller has same PCI ID and revision as ICH7's,
but has only 2 SATA ports instead of 4. The worst part is that SStatus and
SError registers for missing ports are not implemented and return wrong
values (0xffffffff), that caused infinite reset loop.
Just ignore that SError value while I found no better way to identify them.
jkim [Wed, 8 Jun 2011 23:44:59 +0000 (23:44 +0000)]
Tidy up r222866.
- Re-add accidentally removed atomic op. for sysctl(9) handler.
- Remove a period(`.') at the end of a debugging message.
- Consistently spell "low" for "TSC-low" timecounter throughout.
davidch [Wed, 8 Jun 2011 21:18:14 +0000 (21:18 +0000)]
- Major reorganization of mbuf handling throughout the driver to
increase robustness (no more calls to panic(9)) and simplify
code.
- Allocate RX/TX data structures as a single buffer rather than
an array of 4KB pages to simplify code.
- Fixed LRO (aka TPA) code. Removed kernel module parameter and
support enabling disabling LRO through ifconfig(8) command line.
LRO is still disabled by default but should be enabled for best
performance on an endpoint device.
- Fixed statistcs code and removed kernel module parameter (stats
should just work).
- Added many software counters to help identify the cause of some
performance issues.
- Streamlined adapter internal init/stop code paths.
- Fiddled with debug code (adding some here, removing some there).
- Continued style(9) adjustments.
jkim [Wed, 8 Jun 2011 20:08:06 +0000 (20:08 +0000)]
Increase quality of TSC (or TSC-low) timecounter to 1000 if it is P-state
invariant. For SMP case (TSC-low), it also has to pass SMP synchronization
test and the CPU vendor/model has to be white-listed explicitly. Currently,
all Intel CPUs and single-socket AMD Family 15h processors are listed here.
jkim [Wed, 8 Jun 2011 19:38:31 +0000 (19:38 +0000)]
Introduce low-resolution TSC timecounter "TSC-low". It replaces the normal
TSC timecounter if TSC frequency is higher than ~4.29 MHz (or 2^32-1 Hz) or
multiple CPUs are present. The "TSC-low" frequency is always lower than a
preset maximum value and derived from TSC frequency (by being halved until
it becomes lower than the maximum). Note the maximum value for SMP case is
significantly lower than UP case because we want to reduce (rare but known)
"temporal anomalies" caused by non-serialized RDTSC instruction. Normally,
it is still higher than "ACPI-fast" timecounter frequency (which was default
timecounter hardware for long time until r222222) to be useful.
attilio [Wed, 8 Jun 2011 19:28:59 +0000 (19:28 +0000)]
In the current code, a double panic condition may lead to dumps
interleaving.
Signal dumping to happen only for the first panic which should be the
most important.
Sponsored by: Sandvine Incorporated
Submitted by: Nima Misaghian (nmisaghian AT sandvine DOT com)
MFC after: 2 weeks
andreast [Wed, 8 Jun 2011 16:00:30 +0000 (16:00 +0000)]
- Improve error handling.
- Add retry loops in the i2c read/write functions.
- Combied the ADC channel selection and readout of the value into
one iicbus_transfer to avoid possible races.
nwhitehorn [Wed, 8 Jun 2011 13:23:35 +0000 (13:23 +0000)]
Compile RTLD with global dot symbols on 64-bit PowerPC, as a crutch for
GDB's ability to locate r_debug_state (which is actually the only function
that need be compiled this way).
Add the missing call to ip6_ipsec_filtertunnel() to be able to control
whether decapsulated IPsec packets will be passed to pfil again depending
on the setting of the net.ip6.ipsec6.filtertunnel sysctl.
avg [Wed, 8 Jun 2011 08:12:15 +0000 (08:12 +0000)]
remove code for dynamic offlining/onlining of CPUs on x86
The code has definitely been broken for SCHED_ULE, which is a default
scheduler. It may have been broken for SCHED_4BSD in more subtle ways,
e.g. with manually configured CPU affinities and for interrupt devilery
purposes.
We still provide a way to disable individual CPUs or all hyperthreading
"twin" CPUs before SMP startup. See the UPDATING entry for details.
Interaction between building CPU topology and disabling CPUs still
remains fuzzy: topology is first built using all availble CPUs and then
the disabled CPUs should be "subtracted" from it. That doesn't work
well if the resulting topology becomes non-uniform.
This work is done in cooperation with Attilio Rao who in addition to
reviewing also provided parts of code.
This also replaces the local fix in r219209 that made .Ac emit
ASCII angle quotes with an official fix. In the official fix,
ASCII quotes are output when using the .Aq, .Ao and .Ac calls,
but only when nested into the .An macro.
marius [Tue, 7 Jun 2011 23:15:21 +0000 (23:15 +0000)]
- For the case when tl1_align(_trap) is used to call rsf_fatal via
RSF_FATAL we need to switch to alternate globals for KSTACK_CHECK just
like tl1_data_excptn(_trap) does. This is more or less cosmetic because
in case RSF_FATAL is called we're already heading south.
- Correct an END().
- Read the window state from the correct register for a CATR().
For the moment document the possible problem introduced with dynamic address
family detection in world, mostly noticed by ifconfig(8), when running with
an old kernel.
Reported by: Andrzej Tobola (ato iem.pw.edu.pl)
Reported by: gcooper