MFC r231265:
Introduce the "ruleset=number" option for devfs(5) mounts.
Add support for updating the devfs mount (currently only changing the
ruleset number is supported).
Check mnt_optnew with vfs_filteropt(9).
This new option sets the specified ruleset number as the active ruleset
of the new devfs mount and applies all its rules at mount time. If the
specified ruleset doesn't exist, a new empty ruleset is created.
MFC r231267 [1]:
Add support for mounting devfs inside jails.
A new jail(8) option "devfs_ruleset" defines the ruleset enforcement for
mounting devfs inside jails. A value of -1 disables mounting devfs in
jails, a value of zero means no restrictions. Nested jails can only
have mounting devfs disabled or inherit parent's enforcement as jails are
not allowed to view or manipulate devfs(8) rules.
MFC r232059 [1]:
To improve control over the use of mount(8) inside a jail(8), introduce
a new jail parameter node with the following parameters:
allow.mount.devfs:
allow mounting the devfs filesystem inside a jail
allow.mount.nullfs:
allow mounting the nullfs filesystem inside a jail
Both parameters are disabled by default (equals the behavior before
devfs and nullfs in jails). Administrators have to explicitly allow
mounting devfs and nullfs for each jail. The value "-1" of the
devfs_ruleset parameter is removed in favor of the new allow setting.
MFC r232186:
Analogous to r232059, add a parameter for the ZFS file system:
allow.mount.zfs:
allow mounting the zfs filesystem inside a jail
This way the permssions for mounting all current VFCF_JAIL filesystems
inside a jail are controlled wia allow.mount.* jail parameters.
Update sysctl descriptions.
Update jail(8) and zfs(8) manpages.
MFC r232247:
mdoc(7) stype - start new sentences on new line
MFC r232278 [1]:
Add procfs to jail-mountable filesystems.
MFC r232291:
Bump .Dd to reflect latest update
MFC r232307:
Add "export" to devfs_opts[] and return EOPNOTSUPP if called with it.
Fixes mountd warnings.
MFC r232342 (jamie) [2]:
Handle the case where a boolean parameter is also a node.
mav [Fri, 9 Mar 2012 05:43:08 +0000 (05:43 +0000)]
MFC r232380:
Fix names of some Marvell SATA chips. It looks like chips with proprietary
interface supported by mvs(4) are 88SX, while AHCI-like chips are 88SE.
kib [Thu, 8 Mar 2012 12:54:26 +0000 (12:54 +0000)]
MFC r232048:
Allow the parent to gather the exit status of the children reparented
to the debugger. When reparenting for debugging, keep the child in
the new orphan list of old parent. When looping over the children in
kern_wait(), iterate over both children list and orphan list to search
for the process by pid.
MFC r232104:
Restore the return statement erronously removed in the r232048.
In order to keep stable/9 KBI, the p_dbg_child member of struct proc
was replaced with padding.
rmacklem [Thu, 8 Mar 2012 03:02:48 +0000 (03:02 +0000)]
MFC: r2323467
The name caching changes of r230394 exposed an intermittent bug
in the new NFS server for NFSv4, where it would report ENOENT
when the file actually existed on the server. This turned out
to be caused by not initializing ni_topdir before calling lookup()
and there was a rare case where the value on the stack location
assigned to ni_topdir happened to be a pointer to a ".." entry,
such that "dp == ndp->ni_topdir" succeeded in lookup().
This patch initializes ni_topdir to fix the problem.
kib [Wed, 7 Mar 2012 08:05:12 +0000 (08:05 +0000)]
MFC r232303:
In null_reclaim(), assert that reclaimed vnode is fully constructed,
instead of accepting half-constructed vnode. Previous code cannot decide
what to do with such vnode anyway, and although processing it for hash
removal, paniced later when getting rid of nullfs reference on lowervp.
While there, remove initializations from the declaration block.
kib [Wed, 7 Mar 2012 08:02:43 +0000 (08:02 +0000)]
MFC r232301:
Always request exclusive lock for the lower vnode in nullfs_vget().
The null_nodeget() requires exclusive lock on lowervp to be able to
insmntque() new vnode.
kib [Wed, 7 Mar 2012 07:59:30 +0000 (07:59 +0000)]
MFC r232299:
Move the code to destroy half-contructed nullfs vnode into helper
function null_destroy_proto() from null_insmntque_dtr(). Also
apply null_destroy_proto() in null_nodeget() when we raced and a vnode
is found in the hash, so the currently allocated protonode shall be
destroyed.
Lock the vnode interlock around reassigning the v_vnlock.
MFC r232383:
Do not expose unlocked unconstructed nullfs vnode on mount list.
Lock the native nullfs vnode lock before switching the locks.
Add regression tests scripts for multi-IP FIBs exercising the send,
receive and forward path tagging packets with both the ifconfig fib
option or using ipfw, running ICMP6, TCP/v6 and UDP/v6 tests and
testing both setfib(2) as well as the SO_SETFIB socket option.
At 16 FIBs a total of over 64k return codes/replies/stati are checked,
sometimes multiple times (in different ways, e.g. the reflected request
as well as ipfw counter values).
The scripts need two or three machines to run and are thus not added
to the tools/regression framework but only to tools/test.
MFC r232114:
Update scripts to work around two sh(1) bugs found in stable/8:
1) _x=$((_x + 1)) does not work while x=$((x + 1)) does.
2) Parameter Expansion, esp. "${x%%bar}" does not work if quoted.
Correct typos and improve some details forwarding.sh already
had in initiator, esp. related to ipfw accepting if the default
is deny.
Add an extra stat call to the "delay" function in addition to the
touch which together is still a lot faster than sleep 1 but seems
to help a lot more to mitigate the unrelated kernel race seen.
Add regression tests for the setsockopt(2) SO_SETFIB socket option.
Check that the expected domain(9) families all handle the socket option
correctly and do proper bounds checks. This would catch bugs as fixed
in (r230938,)r230981.
ken [Mon, 5 Mar 2012 18:54:28 +0000 (18:54 +0000)]
MFC 232411:
Fix a problem that was causing the mpt(4) driver to attach to MegaRAID
cards that should be handled by the mfi(4) driver.
The root of the problem is that the mpt(4) driver was masking off the
bottom bit of the PCI device ID when deciding which cards to attach to.
It appears that a number of the mpt(4) Fibre Channel cards had a LAN
variant whose PCI device ID was just one bit off from the FC card's device
ID. The FC cards were even and the LAN cards were odd.
The problem was that this pattern wasn't carried over on the SAS and
parallel SCSI mpt(4) cards. Luckily the SAS and parallel SCSI PCI device
IDs were either even numbers, or they would get masked to a supported
adjacent PCI device ID, and everything worked well.
Now LSI is using some of the odd-numbered PCI device IDs between the 3Gb
SAS device IDs for their new MegaRAID cards. This is causing the mpt(4)
driver to attach to the RAID cards instead of the mfi(4) driver.
The solution is to stop masking off the bottom bit of the device ID, and
explicitly list the PCI device IDs of all supported cards.
This change should be a no-op for mpt(4) hardware. The only intended
functional change is that for the 929X, the is_fc variable gets set. It
wasn't being set previously, but needs to be because the 929X is a Fibre
Channel card.
kib [Mon, 5 Mar 2012 11:45:19 +0000 (11:45 +0000)]
MFC r232239:
Fix a race in top non-interactive mode. Use plain sleep(3) call instead
of arming timer and then pausing. If SIGALRM is delivered before pause(3)
is entered, top hangs.
delphij [Mon, 5 Mar 2012 05:18:58 +0000 (05:18 +0000)]
MFC r232202:
Drop setuid status while doing file operations to prevent potential
information leak. This changeset is intended to be a minimal one
to make backports easier.
MFC r231754:
Add additional check to EBR probe and create methods:
don't try probe and create EBR scheme when parent partition type
is not "ebr". This fixes error messages about corrupted EBR for
some partitions where is actually another partition scheme.
NOTE: if you have EBR on the partition with different than "ebr"
(0x05) type, then you will lost access to partitions until it will be
changed.
MFC r231928:
Add alias for the partition type 0x0f. Now "ebr" name is used for both
types 0x05 and 0x0f, but 0x05 is preferred and used when partition is
created with "gpart add -t ebr ...".
This should keep EBR partitions accessible after r231754 for those,
who have EBR on the partition with type 0x0f.
raj [Sun, 4 Mar 2012 17:53:40 +0000 (17:53 +0000)]
MFC r230865:
Adjust mvs(4) to handle interrupt cause reg depending on the actual number of
channels available
- current code treats bits 4:7 in 'SATAHC interrupt mask' and 'SATAHC
interrupt cause' as flags for SATA channels 2 and 3
- for embedded SATA controllers (SoC) these bits have been marked as reserved
in datasheets so far, but for some new and upcoming chips they are used for
purposes other than SATA
raj [Sun, 4 Mar 2012 17:00:46 +0000 (17:00 +0000)]
MFC r228504, r228530.
r228504:
Make *intr{cnt,names} on ARM reside in data section, similar to other arches.
sintrnames and sintrcnt are initialized with non-zero values, which were
discarded by the .bss directive, so consumers like "vmstat -i" were not
getting correct data.
remko [Sun, 4 Mar 2012 10:37:26 +0000 (10:37 +0000)]
Add an ifconfig carp option that enables users to set
the state of the carp cluster.
This is a direct commit to stable/9 because -HEAD's
code is very different. I discussed this with Gleb
and the reason for this is that since we do not
touch the kernel itself and are not adding very
weird or confusing things, we can commit this to the
stable branch directly.
The options 'master' and 'backup' are now available,
which enables the administrator to force a node into
the backup or master state on the cluster. Ofcourse
preempt has to be disabled otherwise the master node
will become master again.
One can do that with:
sysctl net.inet.carp.preempt=0
After that one can schedule maintenance on the node
normally running as the master and such.
delphij [Sat, 3 Mar 2012 02:35:45 +0000 (02:35 +0000)]
MFC r231888:
Put the signal trap output to standard error instead of standard output.
Without this change, pressing ^T could result in rc.d script putting
junk strings like:
Script <filename> running
in configuration files when redirecting standard output to these files.
nwhitehorn [Sat, 3 Mar 2012 02:19:33 +0000 (02:19 +0000)]
MFC r230123,230139:
Rework SLB trap handling so that double-faults into an SLB trap handler are
possible, and double faults within an SLB trap handler are not. The result
is that it possible to take an SLB fault at any time, on any address, for
any reason, at any point in the kernel.
This lets us do two important things. First, it removes the (soft) 16 GB RAM
ceiling on PPC64 as well as any architectural limitations on KVA space.
Second, it lets the kernel tolerate poorly designed hypervisors that
have a tendency to fail to restore the SLB properly after a hypervisor
context switch.
Now that we can tolerate LPAR context switches on the PS3 hypervisor, going
to hypervisor-idle on both threads will not hang the kernel.
Set read buffer size to multiple of sizeof(struct futx).
If the utmpx database gets updated while an application is reading it,
there is a chance the reading application processes partially
overwritten entries. To solve this, make sure we always read a multiple
of sizeof(struct futx) at a time.
Code should just use the devtoname() function to obtain the name of a
character device. Also add const keywords to pieces of code that need it
to build properly.
kib [Fri, 2 Mar 2012 11:47:34 +0000 (11:47 +0000)]
MFC r231868:
Fetch the aux vector for the static libc, and use the entries to
initialize the cache of the system information as it was done for the
dynamic libc. This removes several sysctls from the static binary
startup.
Use the aux vector to fill the single struct dl_phdr_info describing
the static binary itself, to implement dl_iterate_phdr(3) for the
static binaries.
kib [Fri, 2 Mar 2012 11:32:47 +0000 (11:32 +0000)]
MFC r231885:
Fix misuse of the kernel map in miscellaneous image activators.
Vnode-backed mappings cannot be put into the kernel map, since it is a
system map.
emaste [Fri, 2 Mar 2012 00:21:07 +0000 (00:21 +0000)]
MFC r232267:
Workaround for PCIe 4GB boundary issue
Enforce a boundary of no more than 4GB - transfers crossing a 4GB
boundary can lead to data corruption due to PCIe limitations. This
change is a less-intrusive workaround that can be quickly merged back
to older branches; a cleaner implementation will arrive in HEAD later
but may require KPI changes.
dim [Thu, 1 Mar 2012 17:51:15 +0000 (17:51 +0000)]
MFC r231982:
When building with clang, disable -Wformat-security for
sys/dev/hpt27xx/osm_bsd.c, since it gets the following warnings:
sys/dev/hpt27xx/osm_bsd.c:1180:25: error: format string is not a string literal (potentially insecure) [-Werror,-Wformat-security]
S_IRUSR | S_IWUSR, driver_name);
^~~~~~~~~~~
@/dev/hpt27xx/hpt27xx_config.h:46:21: note: expanded from:
#define driver_name hpt27xx_driver_name
^~~~~~~~~~~~~~~~~~~
Since 'hpt27xx_driver_name' is a constant string symbol (coming from the
proprietary hpt27xx_lib.o file), there is no security problem.
Because this driver is provided by the vendor, and applying changes
requires re-certification and other bureaucratic exercises, just disable
the warning for now.
gibbs [Wed, 29 Feb 2012 18:41:59 +0000 (18:41 +0000)]
MFC r231883
===========
Fix regression in the handling of blkback close events for
devices that are unplugged via QEMU.
sys/dev/xen/blkback/blkback.c:
Toolstack initiated closures change the frontend's state
to Closing. The backend must change to Closing as well,
even if we can't actually close yet, in order for the
frontend to notice and start the closing process.
davidxu [Wed, 29 Feb 2012 06:19:00 +0000 (06:19 +0000)]
MFC 230857:
If multiple threads call kevent() to get AIO events on same kqueue fd,
it is possible that a single AIO event will be reported to multiple
threads, it is not threading friendly, and the existing API can not
control this behavior.
Allocate a kevent flags field sigev_notify_kevent_flags for AIO event
notification in sigevent, and allow user to pass EV_CLEAR, EV_DISPATCH
or EV_ONESHOT to AIO kernel code, user can control whether the event
should be cleared once it is retrieved by a thread. This change should
be comptaible with existing application, because the field should have
already been zero-filled, and no additional action will be taken by
kernel.
PR: kern/156567
MFC 231006:
Add 32-bit compat code for AIO kevent flags introduced in revision 230857.
MFC 231724:
Add notes about sigev_notify_kevent_flags introduced in revision 230857
which enables thread-friendly polling on same fd for AIO events.
thompsa [Wed, 29 Feb 2012 00:52:56 +0000 (00:52 +0000)]
MFC r232008,232010,232080,232089
Using the flowid in the mbuf assumes the network card is giving a good hash for
the traffic flow, this may not be the case giving poor traffic distribution.
Add a sysctl which allows us to fall back to our own flow hash code.
delphij [Tue, 28 Feb 2012 23:30:19 +0000 (23:30 +0000)]
MFC r228924:
In POSIX.1-2008:
P_tmpdir [OB XSI] Default directory prefix for tempnam().
This macro is used in a lot of places in legacy applications,
and is why we see a lot of programs written for e.g. Linux
store volatile temporary files in /var/tmp and not /tmp.
rmacklem [Tue, 28 Feb 2012 15:52:01 +0000 (15:52 +0000)]
MFC: r232050
hrs@ reported a panic to freebsd-stable@ under the subject line
"panic in 8.3-PRERELEASE" on Feb. 22, 2012. This panic was caused
by use of a mix of tsleep() and msleep() calls on the same event
in the new NFS server DRC code. It did "mtx_unlock(); tsleep();"
in two places, which kib@ noted introduced a slight risk that the
wakeup() would occur before the tsleep(), resulting in a 10sec
delay before waking up. This patch fixes the problem by replacing
"mtx_unlock(); tsleep();" with mtx_sleep(..PDROP..). It also
changes a nfsmsleep() call to mtx_sleep() so that the code uses
mtx_sleep() consistently within the file.
jkim [Mon, 27 Feb 2012 18:28:18 +0000 (18:28 +0000)]
MFC: r231843, r232061, r232063, r232065, r232069
- Set the initial mode for the adapter after executing VESA BIOS POST.
- Probe supported states for save/restore function.
- Defer to VGA methods if no state is supported.
sbruno [Mon, 27 Feb 2012 17:29:42 +0000 (17:29 +0000)]
MFC r231860
During work to port isci(4) to stable/7 I noted that the maxio portion of
struct ccb_pathinq from sys/cam/cam_ccb.h wasn't added to stable/7 at all
and didn't appear in stable/8 until svn R195534. Since __FreeBSD_version
did not get bumped until svn R195634, assume that maxio is valid at 800102
or higher.
brueffer [Sat, 25 Feb 2012 10:10:43 +0000 (10:10 +0000)]
MFC: r231871
Switch the license boilerplates to our standard one.
Advantages:
- Reduces the number of different license versions in the tree
- Eliminates a typo
- Removes some incorrect author attributions due to c/p
- Removes c/p error potential for future pmc manpages
marius [Sat, 25 Feb 2012 00:35:19 +0000 (00:35 +0000)]
MFC: r231913
- Probe BCM57780.
- In case the parent is bge(4), don't set the Jumbo frame settings unless
the MAC actually is Jumbo capable as otherwise the PHY might not have the
corresponding registers implemented. This is also in line with what the
Linux tg3 driver does.
PR: 165032
Submitted by: Alexander Milanov
Obtained from: OpenBSD
marius [Fri, 24 Feb 2012 00:47:14 +0000 (00:47 +0000)]
MFC: r231621
- As it turns out, MSI-X is broken for at least LSI SAS1068E when passed
through by VMware so blacklist their PCI-PCI bridge for MSI/MSI-X here.
Note that besides currently there not being a quirk type that disables
MSI-X only and there's no evidence that MSI doesn't work with the VMware
pass-through, it's really questionable whether MSI generally works in
that setup as VMware only mention three know working devices [1, p. 4].
Also not that this quirk entry currently doesn't affect the devices
emulated by VMware in any way as these don't claim support MSI/MSI-X to
begin with. [2]
While at it, make the PCI quirk table const and static.
- Remove some duplicated empty lines.
- Use DEVMETHOD_END.
jkim [Thu, 23 Feb 2012 22:34:44 +0000 (22:34 +0000)]
MFC: r231781
Some BIOSes are known for corrupting low 64KB between suspend and resume.
Mask off the first 16 pages unless we appear to be running in a VM. This
address may be overridden by 'hw.physmem.start' tunable from loader.
jkim [Thu, 23 Feb 2012 22:26:14 +0000 (22:26 +0000)]
MFC: r231161
- Give all clocks and timers on acpi0 the equal probing order.
- Increase probing order for ECDT table to match HID-based probing.
- Decrease probing order for HPET table to match HID-based probing.
- Decrease probing order for CPUs and system resources.
- Fix ACPI_DEV_BASE_ORDER to reflect the reality.
jkim [Thu, 23 Feb 2012 22:20:52 +0000 (22:20 +0000)]
MFC: r231797
Clean up RFLAG and CR3 register handling and nearby comments. For BSP, use
spinlock_enter()/spinlock_exit() to save/restore RFLAGS. We know interrupt
is disabled when returning from S3. For AP, we do not have to save/restore
it because IRET will do it for us any way. Do not save CR3 locally because
savectx() does it and BSP does not have to switch to kernel map for amd64.
Change contigmalloc(9) flag while I am in the neighborhood.
jkim [Thu, 23 Feb 2012 22:03:20 +0000 (22:03 +0000)]
MFC: r231791, r231840
Set up an event handler to turn off speaker if user requested it. Speaker
will stop beeping after all device drivers are resumed. Use proper API to
"acquire" and "release" PIC timer2 for consistency and correctness.