Ed Schouten [Fri, 29 May 2009 06:41:23 +0000 (06:41 +0000)]
Last minute TTY API change: remove mutex argument from tty_alloc().
I don't want people to override the mutex when allocating a TTY. It has
to be there, to keep drivers like syscons happy. So I'm creating a
tty_alloc_mutex() which can be used in those cases. tty_alloc_mutex()
should eventually be removed.
The advantage of this approach, is that we can just remove a function,
without breaking the regular API in the future.
Attilio Rao [Fri, 29 May 2009 01:49:27 +0000 (01:49 +0000)]
Reverse the logic for ADAPTIVE_SX option and enable it by default.
Introduce for this operation the reverse NO_ADAPTIVE_SX option.
The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed
and the new flag, offering the reversed logic, SX_NOADAPTIVE is added.
Additively implements adaptive spininning for sx held in shared mode.
The spinning limit can be handled through sysctls in order to be tuned
while the code doesn't reach the release, after which time they should
be dropped probabilly.
This change has made been necessary by recent benchmarks where it does
improve concurrency of workloads in presence of high contention
(ie. ZFS).
KPI breakage is documented by __FreeBSD_version bumping, manpage and
UPDATING updates.
Requested by: jeff, kmacy
Reviewed by: jeff
Tested by: pho
Rick Macklem [Thu, 28 May 2009 20:28:13 +0000 (20:28 +0000)]
Change the "-4" argument for nfsd and mountd to "-e" to avoid
confusion, since it does not refer to IPv4 nor NFSv4, but to
running the experimental server instead of the regular one.
Rick Macklem [Thu, 28 May 2009 19:45:11 +0000 (19:45 +0000)]
Add the kernel build glue for the experimental NFS subsystem that
includes support for NFSv4. The subsystem can optionally be linked
into the kernel using the two options:
NFSCL - the client
NFSD - the server
It is also built as three modules:
nfscl - the client
nfsd - the server
nfscommon - functions shared by the client and server
Doug Rabson [Thu, 28 May 2009 08:22:36 +0000 (08:22 +0000)]
Some of the boot loader code only works on a ufs file system, but it
uses the generic struct dirent, which happens to look identical to UFS's
struct direct. If BSD ever changes dirent then this will be a problem.
Submitted by: matthew dot fleming at isilon dot com
Kip Macy [Thu, 28 May 2009 08:18:12 +0000 (08:18 +0000)]
MFdevbranch 192944
- add FreeBSD implementation of xdrmem_control needed by zfs
- have zfs define xdr_ops using FreeBSD's definition
- remove solaris xdr files from zfs compile
Brian Somers [Thu, 28 May 2009 07:43:06 +0000 (07:43 +0000)]
Update this script so that it handles different ruleset failures
differently. The output now shows the ruleset and shortens to
slightly different text (using $daily_status_mail_rejects_shorten),
but it should be more descriptive.
PR: 35018
Inspired by: Mikhail Teterin - mi at aldan dot algebra dot com
MFC after: 3 weeks
Alan Cox [Thu, 28 May 2009 06:52:14 +0000 (06:52 +0000)]
Revise vm_pageout_scan()'s handling of partially dirty pages. Specifically,
rather than unconditionally making partially dirty pages fully dirty, only
make partially dirty pages fully dirty if the pmap says that the page has
been modified.
(This change is also a small optimization. It eliminate an unnecessary call
to pmap_is_modified() on pages that are mapped read only.)
Adrian Chadd [Thu, 28 May 2009 04:17:05 +0000 (04:17 +0000)]
Say hello to a very basic, read-only, Xen Hypervisor RTC.
The hypervisor doesn't provide a single "TOD" - it instead provides a
"start time" and a "running time". These are added together to form
the current TOD. The TOD is in UTC.
This RTC is only (initially) designed to be read at startup. There's
some further poking that needs to happen to pick up hypervisor time
changes (ie, by the Dom0 time being adjusted by something). This
time adjustment currently can cause "weird stuff" in the DomU clock;
I'll begin investigating and repairing that in subsequent commits.
Adrian Chadd [Thu, 28 May 2009 04:03:16 +0000 (04:03 +0000)]
Don't call the watch callback if its NULL.
I'm not sure what series of events is leading up to this watch event
being received with no callback info and it should be investigated.
I'm triggering it somehow by registering an RTC device (which will
show up in a subsequent commit.)
Rick Macklem [Wed, 27 May 2009 22:02:54 +0000 (22:02 +0000)]
Modify mountd to handle the experimental nfs server as well as the
regular one. It now takes a "-4" command line argument to force it
to use the experimental server. Otherwise it will use the regular
server unless the experimental server is the only one linked into
the kernel. A third kind of line has been added to /etc/exports,
which is specific to NFSv4 and defines where the NFSv4 tree root is
and can be used to limit access to NFSv4 state handling operations
that do not use any file handle.
Rick Macklem [Wed, 27 May 2009 19:56:51 +0000 (19:56 +0000)]
Add support for the experimental nfs client to mount_nfs. The
experimental client is used when the fstype is "newnfs" or the "nfsv4"
option is specified. It includes the addition of the option:
gssname - to specify a client side initiator host based principal name
which is specific to NFSv4.
It also includes a change to mount.c, so that it knows about
mount_newnfs, but not mount_nfs4.
Rick Macklem [Wed, 27 May 2009 19:41:29 +0000 (19:41 +0000)]
Fix handling of NFSv4 Close operations in ncl_inactive(). Only
do them for NFSv4 and flush writes to the server before doing
the Close(s), as required. Also, use the a_td argument instead of
curthread.
Ed Schouten [Wed, 27 May 2009 19:28:04 +0000 (19:28 +0000)]
Rename the queue macros I introduced last year.
Last year I added SLIST_REMOVE_NEXT and STAILQ_REMOVE_NEXT, to remove
entries behind an element in the list, using O(1) time. I recently
discovered NetBSD also has a similar macro, called SLIST_REMOVE_AFTER.
In my opinion this approach is a lot better:
- It doesn't have the unused first argument of the list pointer. I added
this, mainly because OpenBSD also had it.
- The _AFTER suffix makes a lot more sense, because it is related to
SLIST_INSERT_AFTER. _NEXT is only used to iterate through the list.
The reason why I want to rename this now, is to make sure we don't
release a major version with the badly named macros.
Andrew Thompson [Wed, 27 May 2009 19:27:29 +0000 (19:27 +0000)]
Add support for the Apple MacBook Pro keyboard
- add key mappings for fn keys
- byte swapping for certain models
- Fix leds for keyboards which require an ID byte for the HID output structures
Bruce M Simpson [Wed, 27 May 2009 18:57:13 +0000 (18:57 +0000)]
Merge final round of MLD changes from p4:
ip6_input.c, in6.h:
* Add netinet6-specific mbuf flag M_RTALERT_MLD, shadowing M_PROTO6.
* Always set this flag if HBH Router Alert option is present for MLD,
even when not forwarding.
icmp6.c:
* In icmp6_input(), spell m->m_pkthdr.rcvif as ifp to be consistent.
* Use scope ID for verifying input. Do not apply SSM filters here, no inpcb.
* Check for M_RTALERT_MLD when validating MLD traffic, as we can't see
IPv6 hop options outside of ip6_input().
in6_mcast.c:
* Use KAME scope/zone ID in in6_multi.
* Update net.inet6.ip6.mcast.filters implementation to use scope IDs
for comparisons.
* Fix scope ID treatment in multicast socket option processing.
Scope IDs passed in from userland will be ignored as other less
ambiguous APIs exist for specifying the link.
* Tighten userland input checks in IPv6 SSM delta and full-state ops.
* Source filter embedded scope IDs need to be revisited, for now
just clear them and ignore them on input.
* Adapt KAME behaviour of looking up the scope ID in the default zone
for multicast leaves, when the interface is ambiguous.
mld6.c:
* Tighten origin checks on MLD traffic as per RFC3810 Section 6.2:
* ip6_src MAY be the unspecified address for MLDv1 reports.
* ip6_src MAY have link-local address scope for MLDv1 reports,
MLDv1 queries, and MLDv2 queries.
* Perform address field validation *before* accepting queries.
* Use KAME scope/zone ID in query/report processing.
* Break const correctness for mld_v1_input_report(), mld_v1_input_query()
as we temporarily modify the input mbuf chain.
* Clear the scope ID before handoff to userland MLD daemon.
* Fix MLDv1 old querier present timer processing.
With the protocol defaults, hosts should revert to MLDv2 after 260s.
* Add net.inet6.mld.v1enable sysctl, default to on.
ifmcstat.c:
* Use sysctl by default; -K requests kvm(3) if so compiled.
Rink Springer [Wed, 27 May 2009 18:12:27 +0000 (18:12 +0000)]
ia64: Move MCA information retrieval to a per-CPU kthread
Once AP's are launched, their MCA state information is stored and later obtainable using a sysctl. Since the size of the MCA state information is unknown, it will be malloc'ed as needed. However, when 'ia64_ap_startup' runs, it's not yet safe to call malloc and this may cause 'panic: blockable sleep lock (sleep mutex) 8192 @ /usr/src/sys/vm/uma_core.c'. This commit avoids this issue by scheduling a separate kthread to obtain this information, which immediately terminates afterwards.
Ed Schouten [Wed, 27 May 2009 17:27:03 +0000 (17:27 +0000)]
Update ee(1) in the base system to version 1.5.0.
This version is now licensed under a 2-clause BSD license, instead of
the Artistic license. I've reverted a lot of local modifications we made
to ee, because they have been integrated upstream as well.
Only local modifications include:
- $FreeBSD$ ID.
- Pathname to init.ee.
- catopen() call, to honor LC_MESSAGES instead of LANG.
To keep SVN happy, I'm putting an application/octet-stream mime type on
the KOI8 translations.
Zachary Loafman [Wed, 27 May 2009 16:36:54 +0000 (16:36 +0000)]
fail(9) support:
Add support for kernel fault injection using KFAIL_POINT_* macros and
fail_point_* infrastructure. Add example fail point in vfs_bio.c to
simulate VM buf pressure.
Andriy Gapon [Wed, 27 May 2009 15:23:12 +0000 (15:23 +0000)]
linux_ioctl_cdrom: reduce stack usage
... by moving two ~2KB structures from stack to heap allocation.
I experienced stack overflow in linux emulation on i386 (8K stack)
when LINUX_DVD_READ_STRUCT ioctl was performed on atapicam cd
device and there was an error that resulted in additional quite
heavy stack use in cam layer.
Rick Macklem [Wed, 27 May 2009 15:16:56 +0000 (15:16 +0000)]
Add a function to the experimental nfs subsystem that tests to see
if a local file system supports NFSv4 ACLs. This allows the
NFSHASNFS4ACL() macro to be correctly implemented. The NFSv4 ACL
support should now work when the server exports a ZFS volume.
Jamie Gritton [Wed, 27 May 2009 14:30:26 +0000 (14:30 +0000)]
Add support for the arbitrary named jail parameters used by jail_set(2)
and jail_get(2). Jail(8) can now create jails using a "name=value"
format instead of just specifying a limited set of fixed parameters; it
can also modify parameters of existing jails. Jls(8) can display all
parameters of jails, or a specified set of parameters. The available
parameters are gathered from the kernel, and not hard-coded into these
programs.
Small patches on killall(1) and jexec(8) to support jail names with
jail_get(2).
Jamie Gritton [Wed, 27 May 2009 14:11:23 +0000 (14:11 +0000)]
Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.
Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().
Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.
Don't discard packets with 'Destination Unreachable' at the beginning
of ip_forward(), if the IPSEC is compiled in. It is possible that there
is an SPD that this packets will go through, even if there is no matching
route. If not, ICMP will be sent anyway, after ip_output().
This is somewhat similar in purpose to r191621, except that one was
for the packets sent from the host, while this one is for packets
being forwarded by the host.
Reviewed by: bz@
Sponsored by: Wheel Sp. z o.o. (http://www.wheel.pl)
Adrian Chadd [Wed, 27 May 2009 02:49:08 +0000 (02:49 +0000)]
Ensure that there are enough TX mbuf ring slots available before beginning
to dequeue a packet.
The tx path was trying to ensure that enough Xenbus TX ring slots existed but
it didn't check to see whether the mbuf TX ring slots were also available.
They get freed in xn_txeof() which occurs after transmission, rather than earlier
on in the process. (The same happens under Linux too.)
Due to whatever reason (CPU use, scheduling, memory constraints, whatever) the
mbuf TX ring may not have enough slots free and would allocate slot 0. This is
used as the freelist head pointer to represent "free" mbuf TX ring slots; setting
this to an actual mbuf value rather than an id crashes the code.
This commit introduces some basic code to track the TX mbuf ring use and then
(hopefully!) ensures that enough slots are free in said TX mbuf ring before it
enters the actual work loop.
A few notes:
* Similar logic needs to be introduced to check there are enough actual slots
available in the xenbuf TX ring. There's some logic which is invoked earlier
but it doesn't hard-check against the number of available ring slots.
Its trivial to do; I'll do it in a subsequent commit.
* As I've now commented in the source, it is likely possible to deadlock the
driver under certain conditions where the rings aren't receiving any changes
(which I should enumerate) and thus Xen doesn't send any further software
interrupts. I need to make sure that the timer(s) are running right and
the queues are periodically kicked.
Adrian Chadd [Wed, 27 May 2009 01:45:23 +0000 (01:45 +0000)]
Add in some INVARIANT checks in the TX mbuf descriptor "freelist" management code.
Slot 0 must always remain "free" and be a pointer to the first free entry in the
mbuf descriptor list. It is thus an error to have code allocate or push slot 0
back into the list.
Jilles Tjoelker [Tue, 26 May 2009 22:33:10 +0000 (22:33 +0000)]
Fix various cases with 3 or 4 parameters in test(1) to be POSIX compliant.
More precisely, this gives precedence to an interpretation not using the
'(', ')', '-a' and '-o' in their special meaning, if possible. For example,
it is now safe to write [ "$a" = "$b" ] and assume it compares the two
strings.
The man page already says that test(1) works this way, so does not need to
be changed.
Interpretation of input with more parameters tries a bit harder to find a
valid parse in some cases.
Rick Macklem [Tue, 26 May 2009 22:21:53 +0000 (22:21 +0000)]
Fix the experimental nfs subsystem so that it builds with the
current NFSv4 ACLs, as defined in sys/acl.h. It still needs a
way to test a mount point for NFSv4 ACL support before it will
work. Until then, the NFSHASNFS4ACL() macro just always returns 0.
Stacey Son [Tue, 26 May 2009 21:39:09 +0000 (21:39 +0000)]
Add the ksyms(4) pseudo driver. The ksyms driver allows a process to
get a quick snapshot of the kernel's symbol table including the symbols
from any loaded modules (the symbols are all merged into one symbol
table). Unlike like other implementations, this ksyms driver maps
memory in the process memory space to store the snapshot at the time
/dev/ksyms is opened. It also checks to see if the process has already
a snapshot open and won't allow it to open /dev/ksyms it again until it
closes first. This prevents kernel and process memory from being
exhausted. Note that /dev/ksyms is used by the lockstat(1) command.
Ed Schouten [Tue, 26 May 2009 21:06:51 +0000 (21:06 +0000)]
Merge local changes to ee(1) into contrib space.
The source file, manual page and English translation are now directly
obtained from the contrib/ directory. This makes it a lot easier to
merge a newer version of ee(1) into the tree.