Sam Leffler [Sat, 30 May 2009 19:57:31 +0000 (19:57 +0000)]
o assert TDMA_MAXSLOTS is 2 so noone tries to blindly increase it
o add safety belt in vdetach for failed state block allocation
o fix dynamic change to tdma config; ERESTART may not result in
kicking the state machine so we need to explicitly mark the
beacon for update
Adrian Chadd [Sat, 30 May 2009 15:20:25 +0000 (15:20 +0000)]
Even though I'm not quite sure that the call_func stuff will work properly
in all the places/cases IPI messages will be generated, at least be consistent
with how the call_data pointer is assigned and cleared (ie, all done inside
the spinlock.
Ensure that its NULL before continuing, just to try and identify situations
where things are going horribly wrong.
Attilio Rao [Sat, 30 May 2009 15:14:44 +0000 (15:14 +0000)]
When user_frac in the polling subsystem is low it is going to busy the
CPU for too long period than necessary. Additively, interfaces are kept
polled (in the tick) even if no more packets are available.
In order to avoid such situations a new generic mechanism can be
implemented in proactive way, keeping track of the time spent on any
packet and fragmenting the time for any tick, stopping the processing
as soon as possible.
In order to implement such mechanism, the polling handler needs to
change, returning the number of packets processed.
While the intended logic is not part of this patch, the polling KPI is
broken by this commit, adding an int return value and the new flag
IFCAP_POLLING_NOCOUNT (which will signal that the return value is
meaningless for the installed handler and checking should be skipped).
Bump __FreeBSD_version in order to signal such situation.
Add VOP_ACCESSX, which can be used to query for newly added V*
permissions, such as VWRITE_ACL. For a filsystems that don't
implement it, there is a default implementation, which works
as a wrapper around VOP_ACCESS.
Randall Stewart [Sat, 30 May 2009 11:14:41 +0000 (11:14 +0000)]
Adds missing sysctl to manage the vtag_time_wait time. This will
even allow disabling time-wait all together if you set the value
to 0 (not advisable actually). The default remains the same
i.e. 60 seconds.
Adrian Chadd [Sat, 30 May 2009 08:53:13 +0000 (08:53 +0000)]
Make ipi_cpu() function as intended.
IPI's in Xen are implemented through hypervisor event channels.
The MP code creates a pair of IRQs for each base IPI per CPU
(one for IPI function dispatch calls, one for IPI bitmap dispatch calls.)
Using PCPU_GET() was returning the IRQ of the IPI handler for the
current CPU; thus calls to ipi_cpu() were sending itself a message.
Instead, looking up the IPI in the target CPU ipi-to-irq map is needed.
Note: This doesn't fix Xen SMP (far from it!) but it at least
sends IPI's to the right places. Next - sending IPIs..
Xin LI [Fri, 29 May 2009 22:19:45 +0000 (22:19 +0000)]
Code cleanup for nfs4 utilities:
- Mark internal routines as static;
- Eliminate unused parameters where possible, mark __unused for others;
- Remove unused variables;
- Use %jd for int64_t values in printf();
- Add appropriate %d for printf to match its parameter;
- Rename a variable to resolve conflict with revoke(2);
Reviewed by: rmacklem
Tested with: make universe (bugs are mine)
Jamie Gritton [Fri, 29 May 2009 21:27:12 +0000 (21:27 +0000)]
Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.
The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.
- Move from mount(2) to nmount(2). This should allow to convert MNT_SNAPSHOT
flag from a mount flag to FS-specific flag.
- Simplify usage. Instead of 'mksnap_ffs /mnt/foo /mnt/foo/snap' allow to
give only one argument: 'mksnap_ffs /mnt/foo/snap'. Old usage is also
accepted for now.
- Add an example of how to mount a snapshot.
Alan Cox [Fri, 29 May 2009 18:35:51 +0000 (18:35 +0000)]
Modify vm_hold_load_pages() to allocate pages using VM_ALLOC_NOOBJ rather
than using the kernel object. This allows the elimination of page queues
locking from vm_hold_free_pages().
Stanislav Sedov [Fri, 29 May 2009 16:24:23 +0000 (16:24 +0000)]
- Prevent buffer overflow in IPFilter's load_http function used to load
ipfilter tables via http by the user-level ippool utility. Previously
the 1024-byte buffer used to store a http request coudld easily overflow
if the length of the hostname part of the url passes exceeded 496 bytes. [1]
- Use snprintf to prevent possieble buffer overflows in future. [2]
- Do not try to close the descriptor twice on failure. [2]
There is only one spare MNT_ flag left, and I want to use it for NFSv4 ACLs.
Make room for additional filesystem flags now, to avoid breaking ABI later.
John Baldwin [Fri, 29 May 2009 14:03:34 +0000 (14:03 +0000)]
Remove extra cpu_spinwait() invocations. This should really only be used
in tight spin loops, not in these edge cases where we restart a much
larger loop only a few times.
Adrian Chadd [Fri, 29 May 2009 13:43:21 +0000 (13:43 +0000)]
Fix the Xen TOD update when the hypervisor wall clock is nudged.
The "wall clock" in the current code is actually the hypervisor start time.
The time of day is the "start time" plus the hypervisor "uptime".
Large enough bumps in the dom0 clock lead to a hypervisor "bump" which is
implemented as a bump in the start time, not the uptime. The clock.c routines
were reading in the hypervisor start time and then using this as the TOD.
This meant that any hypervisor time bump would cause the FreeBSD DomU to
set its TOD to the hypervisor start time, rather than the actual TOD.
This fix is a bit hacky and some reshuffling should be done later on
to clarify what is going on. I've left the wall clock code alone.
(The code which updates shadow_tv and shadow_tv_version.)
A new routine adds the uptime to the shadow_tv, which is then used to
update the TOD.
I've included some debugging so it is obvious when the clock is nudged.
Robert Watson [Fri, 29 May 2009 10:52:37 +0000 (10:52 +0000)]
Make the rmlock(9) interface a bit more like the rwlock(9) interface:
- Add rm_init_flags() and accept extended options only for that variation.
- Add a flags space specifically for rm_init_flags(), rather than borrowing
the lock_init() flag space.
- Define flag RM_RECURSE to use instead of LO_RECURSABLE.
- Define flag RM_NOWITNESS to allow an rmlock to be exempt from WITNESS
checking; this wasn't possible previously as rm_init() always passed
LO_WITNESS when initializing an rmlock's struct lock.
- Add RM_SYSINIT_FLAGS().
- Rename embedded mutex in rmlocks to make it more obvious what it is.
- Update consumers.
- Update man page.
Let vfs_lookup() return ENOTDIR if the path has a trailing slash and
the last component is a symlink to something that isn't a directory.
We introduce a new namei flag, TRAILINGSLASH, which is set by lookup()
if the last component is followed by a slash. The trailing slash is
then stripped, as before. If the final component is a symlink,
lookup() will return to namei(), which will expand the symlink and
call lookup() with the new path. When all symlinks have been
resolved, lookup() checks if the TRAILINGSLASH flag is set, and if it
is, and the vnode it ended up with is not a directory, it returns
ENOTDIR.
Ed Schouten [Fri, 29 May 2009 06:41:23 +0000 (06:41 +0000)]
Last minute TTY API change: remove mutex argument from tty_alloc().
I don't want people to override the mutex when allocating a TTY. It has
to be there, to keep drivers like syscons happy. So I'm creating a
tty_alloc_mutex() which can be used in those cases. tty_alloc_mutex()
should eventually be removed.
The advantage of this approach, is that we can just remove a function,
without breaking the regular API in the future.
Attilio Rao [Fri, 29 May 2009 01:49:27 +0000 (01:49 +0000)]
Reverse the logic for ADAPTIVE_SX option and enable it by default.
Introduce for this operation the reverse NO_ADAPTIVE_SX option.
The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed
and the new flag, offering the reversed logic, SX_NOADAPTIVE is added.
Additively implements adaptive spininning for sx held in shared mode.
The spinning limit can be handled through sysctls in order to be tuned
while the code doesn't reach the release, after which time they should
be dropped probabilly.
This change has made been necessary by recent benchmarks where it does
improve concurrency of workloads in presence of high contention
(ie. ZFS).
KPI breakage is documented by __FreeBSD_version bumping, manpage and
UPDATING updates.
Requested by: jeff, kmacy
Reviewed by: jeff
Tested by: pho
Rick Macklem [Thu, 28 May 2009 20:28:13 +0000 (20:28 +0000)]
Change the "-4" argument for nfsd and mountd to "-e" to avoid
confusion, since it does not refer to IPv4 nor NFSv4, but to
running the experimental server instead of the regular one.
Rick Macklem [Thu, 28 May 2009 19:45:11 +0000 (19:45 +0000)]
Add the kernel build glue for the experimental NFS subsystem that
includes support for NFSv4. The subsystem can optionally be linked
into the kernel using the two options:
NFSCL - the client
NFSD - the server
It is also built as three modules:
nfscl - the client
nfsd - the server
nfscommon - functions shared by the client and server
Doug Rabson [Thu, 28 May 2009 08:22:36 +0000 (08:22 +0000)]
Some of the boot loader code only works on a ufs file system, but it
uses the generic struct dirent, which happens to look identical to UFS's
struct direct. If BSD ever changes dirent then this will be a problem.
Submitted by: matthew dot fleming at isilon dot com
Kip Macy [Thu, 28 May 2009 08:18:12 +0000 (08:18 +0000)]
MFdevbranch 192944
- add FreeBSD implementation of xdrmem_control needed by zfs
- have zfs define xdr_ops using FreeBSD's definition
- remove solaris xdr files from zfs compile
Brian Somers [Thu, 28 May 2009 07:43:06 +0000 (07:43 +0000)]
Update this script so that it handles different ruleset failures
differently. The output now shows the ruleset and shortens to
slightly different text (using $daily_status_mail_rejects_shorten),
but it should be more descriptive.
PR: 35018
Inspired by: Mikhail Teterin - mi at aldan dot algebra dot com
MFC after: 3 weeks
Alan Cox [Thu, 28 May 2009 06:52:14 +0000 (06:52 +0000)]
Revise vm_pageout_scan()'s handling of partially dirty pages. Specifically,
rather than unconditionally making partially dirty pages fully dirty, only
make partially dirty pages fully dirty if the pmap says that the page has
been modified.
(This change is also a small optimization. It eliminate an unnecessary call
to pmap_is_modified() on pages that are mapped read only.)
Adrian Chadd [Thu, 28 May 2009 04:17:05 +0000 (04:17 +0000)]
Say hello to a very basic, read-only, Xen Hypervisor RTC.
The hypervisor doesn't provide a single "TOD" - it instead provides a
"start time" and a "running time". These are added together to form
the current TOD. The TOD is in UTC.
This RTC is only (initially) designed to be read at startup. There's
some further poking that needs to happen to pick up hypervisor time
changes (ie, by the Dom0 time being adjusted by something). This
time adjustment currently can cause "weird stuff" in the DomU clock;
I'll begin investigating and repairing that in subsequent commits.
Adrian Chadd [Thu, 28 May 2009 04:03:16 +0000 (04:03 +0000)]
Don't call the watch callback if its NULL.
I'm not sure what series of events is leading up to this watch event
being received with no callback info and it should be investigated.
I'm triggering it somehow by registering an RTC device (which will
show up in a subsequent commit.)
Rick Macklem [Wed, 27 May 2009 22:02:54 +0000 (22:02 +0000)]
Modify mountd to handle the experimental nfs server as well as the
regular one. It now takes a "-4" command line argument to force it
to use the experimental server. Otherwise it will use the regular
server unless the experimental server is the only one linked into
the kernel. A third kind of line has been added to /etc/exports,
which is specific to NFSv4 and defines where the NFSv4 tree root is
and can be used to limit access to NFSv4 state handling operations
that do not use any file handle.
Rick Macklem [Wed, 27 May 2009 19:56:51 +0000 (19:56 +0000)]
Add support for the experimental nfs client to mount_nfs. The
experimental client is used when the fstype is "newnfs" or the "nfsv4"
option is specified. It includes the addition of the option:
gssname - to specify a client side initiator host based principal name
which is specific to NFSv4.
It also includes a change to mount.c, so that it knows about
mount_newnfs, but not mount_nfs4.