Add CPU percentage limit enforcement to RCTL. The resouce name is "pcpu".
It was implemented by Rudolf Tomori during Google Summer of Code 2012.
MFC r242957:
Don't divide by zero.
MFC r243070:
Fix kassert that's not really valid for %CPU accounting. The problem
here is race between decaying the resource usage in containers, and updating
per-process usage; basically, the former may cause per-container usage
to get smaller than per-process usage.
MFC r243088:
Improve KASSERT messages in racct, to make it clear which resource
caused the problem.
MFC r248298:
Accessing td_state requires thread lock to be held.
MFC r248300:
When throttling a process to enforce RACCT limits, do not use neither
PBDRY (which simply doesn't make any sense) nor PCATCH (which could
be used by a malicious process to work around the PCPU limit).
MFC r249163:
If filter of the interrupt event is not null, print it, in addition to
the handler address. Add a mark to distinguish between filter and
handler.
MFC r232385 by ru: Remove 3 syscalls from opendir().
Finally removed the stat() and fstat() calls from the opendir() code.
They were made excessive in r205424 by opening with O_DIRECTORY.
Also eliminated the fcntl() call used to set FD_CLOEXEC by opening
with O_CLOEXEC.
(fdopendir() still checks that the passed descriptor is a directory,
and sets FD_CLOEXEC on it.)
The necessary kernel support for O_DIRECTORY and O_CLOEXEC was already in
9.0-RELEASE.
Add a conditional sleep 1 in case we add any IPv6 addresses to interfaces.
Do this per jail started, not per address. This will allow DAD to complete
and services to properly start. Before we have seen problems with services
trying to start before the IPv6 address was available to use and thus
erroring and failing to start.
MFC r249062:
Since ATA_CAM mode has no implemented support for serializing access to the
different ATA channels, required for acard and pc98 ATA controllers, block
access to second channels of both, hoping that one working channel is better
then none. I have an idea how that support could be implemented, but I have
no hardware to work on that.
MFC r248800:
On SIM destruction free associated CCBs, preallocated inside xpt_get_ccb().
Before this change they were just leaked. Fortunately USB sticks now use
only one CCB, and so leak was only 2KB per detach, while other bigger SIMs
with much more allocated CCBs are rarely detached.
Update the manual page to reflect reality. With r138509 and r152355,
"nostrictjoliet" option for mount_cd9660(8) was completely replaced with
"brokenjoliet" somehow.
hrStorageSize and hrStorageUsed are 32 bit integers, reporting a fs
size and usage in hrStorageAllocationUnits. If the file system has
more than 2^31 allocations it can not be shown correctly and the
meters are useless.
In such cases follow net-snmp behaviour and increase
hrStorageAllocationUnits so the values fit under INT_MAX.
dim [Mon, 8 Apr 2013 07:08:29 +0000 (07:08 +0000)]
MFC r248991:
Follow up to r247960 and rr247960 by also amending ctfmerge. For the
only other case where STT_FILE symbols are used, in symit_next() in
cddl/contrib/opensolaris/tools/ctf/cvt/input.c, save the basename of the
symbol, instead of the full pathname.
MFC r230998,r233792: sh: Use vfork in a few common cases.
This uses vfork() for simple commands and command substitutions containing a
single simple command, invoking an external program under certain conditions
(no redirections or variable assignments, non-interactive shell, no job
control). These restrictions limit the amount of code executed in a vforked
child.
Various incarnations of this patch have been shown to bring performance
improvements:
http://lists.freebsd.org/pipermail/freebsd-hackers/2012-January/037581.html
The use of vfork() can be disabled by setting a variable named
SH_DISABLE_VFORK.
- Add support for 'memsync' mode. This is the fastest replication mode that's
why it will now be the default.
- Bump protocol version to 2 and add backward compatibility for version 1.
- Allow to specify hosts by kern.hostid as well (in addition to hostname and
kern.hostuuid) in configuration file.
------------------------------------------------------------------------
r245228 | ken | 2013-01-09 10:02:08 -0700 (Wed, 09 Jan 2013) | 43 lines
Make CTL work a little better with loading and unloading drivers.
Previously CTL would leave individual LUNs enabled in the target
driver, whether or not the port as a whole was enabled. It would
also leave the wildcard LUN enabled indefinitely.
This change means that CTL will enable and disable any active LUNs,
as well as the wildcard LUN, when enabling and disabling a port.
Also, fix a bug that could crop up due to an uninitialized CCB
type.
ctl.c: Before calling ctl_frontend_online(), run through
the LUN list and enable all active LUNs.
After calling ctl_frontend_offline(), run through
the LUN list and disble all active LUNs.
scsi_ctl.c: Before bringing a port online, allocate the
wildcard peripheral for that bus. And after taking
a port offline, invalidate the wildcard peripheral
for that bus.
Make sure that we hold the SIM lock around all
calls to xpt_action() and other transport layer
interfaces that require it.
Use CAM_SIM_{LOCK|UNLOCK} consistently to acquire
and release the SIM lock.
Update a number of outdated comments. Some of
these should have been fixed long ago.
Actually do LUN disbables now. The newer drivers
in the tree work correctly for this as far as I
know.
Initialize the CCB type to CTLFE_CCB_DEFAULT to
avoid a panic due to uninitialized memory.
MFC r247161:
Hide SEMB port of the SiI3826 Port Multiplier by default to avoid extra
errors while it tries to talk via I2C to usually missing external SEP.
There is tunable to enable it back when needed.
MFC r247154:
Add DA_Q_NO_PREVENT quirk for Kingston DataTraveler G3 1.00 USB flash.
PREVENT ALLOW MEDIUM REMOVAL commands return errors on these devices
without returning sense data. In some cases unrelated following commands
start to return errors too, that makes device to be dropped by CAM.
MFC r245306:
Do not schedule periph for payload/TUR requests if reprobe is in progress
to avoid sending extra READ CAPACITY requests by dastart(). Schedule periph
again on reprobe completion, or otherwise it may stuck indefinitely long.
This should fix USB explore thread hanging on device unplug, waiting for
periph destruction.
MFC r245253 (by smh):
Changed scsi_da device requests to use the sysctl tunable value for retry_count
and da_default_timeout where their current hardcoded values matched the current
default value for said tunables.
MFC r245252 (by smh):
Updates delete_method sysctl changes to always maintain disk d_flags
DISKFLAG_CANDELETE. While this change makes this layer consistent
other layers such as UFS and ZFS BIO_DELETE support may not notice
any change made manually via these device sysctls until the device
is reopened via a mount.
MFC r238886, r238892:
Implement media change notification for DA and CD removable media devices.
It includes three parts:
1) Modifications to CAM to detect media media changes and report them to
disk(9) layer. For modern SATA (and potentially UAS) devices it utilizes
Asynchronous Notification mechanism to receive events from hardware.
Active polling with TEST UNIT READY commands with 3 seconds period is used
for incapable hardware. After that both CD and DA drivers work the same way,
detecting two conditions: "NOT READY: Medium not present" after medium was
detected previously, and "UNIT ATTENTION: Not ready to ready change, medium
may have changed". First one reported to disk(9) as media removal, second
as media insert/change. To reliably receive second event new
AC_UNIT_ATTENTION async added to make UAs broadcasted to all periphs by
generic error handling code in cam_periph_error().
2) Modifications to GEOM core to handle media remove and change events.
Media removal handled by spoiling all consumers attached to the provider.
Media change event also schedules provider retaste after spoiling to probe
new media. New flag G_CF_ORPHAN was added to consumers to reflect that
consumer is in process of destruction. It allows retaste to create new
geom instance of the same class, while previous one is still dying.
3) Modifications to some GEOM classes: DEV -- to report media change
events to devd; PART class already handles spoiling alike to orphan.
MFC r244716 (by pjd):
Reset provider-specific fields when resending I/O request in low memory
conditions. This fixes assertion which checks those fields when kernel is
compiled with DIAGNOSTIC.
MFC r240822, r241022 (by pjd):
Use the topology lock to protect list of providers while withering them.
It is possible that provider is destroyed while we are iterating over the
list.
Remove the topology lock from disk_gone(), it might be called with regular
mutexes held and the topology lock is an sx lock.
The topology lock was there to protect traversing through the list of providers
of disk's geom, but it seems that disk's geom has always exactly one provider.
Change the code to call g_wither_provider() for this one provider, which is
safe to do without holding the topology lock and assert that there is indeed
only one provider.
MFC r238198 (by trasz):
Fix orphan() methods of several GEOM classes to not assume that there
is an error set on the provider. With GEOM resizing, class can become
orphaned when it doesn't implement resize() method and the provider size
decreases.
MFC r228204:
Close race between geom destruction on g_vfs_close() when softc destroyed
and g_vfs_orphan() call that tries to access softc, intruced at r227015.
MFC r227015:
Add mutex and two flags to make orphan() call properly asynchronous:
- delay consumer closing and detaching on orphan() until all I/Os complete;
- prevent new I/Os submission after orphan() called.
Previous implementation could destroy consumers still having active
requests and worked only because of global workaround made on GEOM level.
MFC r226998, r227004:
Refactor disk disconnection and geom destruction handling sequences.
Do not close/destroy opened consumer directly in case of disconnect. Instead
keep it existing until it will be closed in regular way in response to
upstream provider destruction. Delay geom destruction in the same way.
Previous implementation could destroy consumers still having active
requests and worked only because of global workaround made on GEOM level.
MFC r237689 (by imp):
Add a sysctl to set the cdrom timeout. Data recovery operations from
a CD or DVD drive with a damaged disc often benefit from a shorter
timeout. Also, when retries are set to 0, an application is expecting
errors and recovering them so do not print the error into the log.
The number of expected errors can literally be in the hundreds of
thousands which significantly slows data recovery.
MFC r237478:
Add scsi_extract_sense_ccb() -- wrapper around scsi_extract_sense_len().
It allows to remove number of duplicate checks from several places.
MFC r248567:
Do not call vnode_pager_setsize() while a NFS node mutex is
locked. vnode_pager_setsize() might sleep waiting for the page after
EOF be unbusied.
Call vnode_pager_setsize() both for the regular and directory vnodes.
MFC r248581:
Initialize the variable to avoid (false) compiler warning about
use of an uninitialized local.
dim [Wed, 3 Apr 2013 06:48:47 +0000 (06:48 +0000)]
MFC r248802:
Similar to r239870 and r239872, teach the other binutils tools about the
DW_FORM_flag_present dwarf attribute, so they do not print errors or
warnings on files that contain it. (This attribute can be emitted by
newer versions of clang and gcc.)
melifaro [Fri, 29 Mar 2013 16:24:20 +0000 (16:24 +0000)]
Merge 248070.
Fix long-standing issue with interface routes being unprotected:
Use RTM_PINNED flag to mark route as immutable.
Forbid deleting immutable routes without special rtrequest1_fib() flag.
Adding interface address with prefix already in route table is handled
by atomically deleting old prefix and adding interface one.
yongari [Fri, 29 Mar 2013 00:21:36 +0000 (00:21 +0000)]
MFC r248226:
r241438 broke IPMI access on Sun Fire X2200 M2(BCM5715).
Fix the IPMI regression by sending BGE_FW_DRV_STATE_UNLOAD to
ASF/IPMI firmware in driver attach phase. Sending heartheat to
ASF/IPMI is enabled only after upping interface so
setting driver state to BGE_FW_DRV_STATE_START in attach phase
broke IPMI access.
While I'm here, add NVRAM arbitration lock before performing
controller reset. ASF/IPMI firmware may be able to access the NVRAM
while controller reset is in progress. Without the arbitration
lock before resetting the controller, ASF/IPMI may not initialize
properly.
emaste [Thu, 28 Mar 2013 20:48:40 +0000 (20:48 +0000)]
MFC r244183 by glebius:
Fix problem in r238990 (MFC'd in r240313). The LLE_LINKED flag should be
tested prior to entering llentry_free(), and in case if we lose the race,
we should simply perform LLE_FREE_LOCKED(). Otherwise, if the race is
lost by the thread performing arptimer(), it will remove two references
from the lle instead of one.
kib [Thu, 28 Mar 2013 06:20:13 +0000 (06:20 +0000)]
MFC r248276:
Rewrite the vfs_bio_clrbuf(9) to not access the b_data for B_VMIO
buffers directly, use pmap_zero_page_area(9) for each zeroing page
region instead.
delphij [Thu, 28 Mar 2013 05:39:45 +0000 (05:39 +0000)]
MFC r248788 (erwin):
Update BIND to 9.8.4-P2
Removed the check for regex.h in configure in order
to disable regex syntax checking, as it exposes
BIND to a critical flaw in libregex on some
platforms. [RT #32688]
mav [Wed, 27 Mar 2013 14:23:50 +0000 (14:23 +0000)]
MFC r245266:
Remove not very useful printf, that can be too chatty.
ASUS P8Z77-V board reports _AC2, _AC3 and _AC4 setpoints as 0C. With active
cooling already automatically set to _AC2, that still caused driver to print
two useless lines about temperature above _AC3 and _AC4 every ten seconds.
Three setponts of 0C is probably a board bug, but the same spam could happen
also in correct case if system is runnign not with the lowest cooling level.
dim [Wed, 27 Mar 2013 07:58:29 +0000 (07:58 +0000)]
MFC r248672:
Compile contrib/tzcode/stdtime/localtime.c with -fwrapv, since it relies
on signed integer overflow wrapping. Otherwise mktime(3) and timegm(3)
can hang, in case the timestamp passed in struct tm is not representable
in a time_t. Specifically, any timestamp after 2038-01-19 03:14:07, in
combination with a 32-bit time_t.
Note that it would be better to change the code to not rely on undefined
behaviour, but it is contributed code, and it is not entirely trivial to
fix the issue properly.
jilles [Sun, 24 Mar 2013 12:35:12 +0000 (12:35 +0000)]
MFC r248446: find: Include nanoseconds when comparing timestamps of files.
When comparing to the timestamp of a given file using -newer, -Xnewer and
-newerXY (where X and Y are one of m, c, a, B), include nanoseconds in the
comparison.
The primaries that compare a timestamp of a file to a given value (-Xmin,
-Xtime, -newerXt) continue to compare times in whole seconds.
Note that the default value 0 of vfs.timestamp_precision almost always
causes the nanoseconds part to be 0. However, touch -d can set a timestamp
to the microsecond regardless of that sysctl.
mckusick [Sat, 23 Mar 2013 21:56:19 +0000 (21:56 +0000)]
MFC of 246876 and 246877
MFC: 246876:
Add barrier write capability to the VFS buffer interface. A barrier
write is a disk write request that tells the disk that the buffer
being written must be committed to the media along with any writes
that preceeded it before any future blocks may be written to the drive.
Barrier writes are provided by adding the functions bbarrierwrite
(bwrite with barrier) and babarrierwrite (bawrite with barrier).
Following a bbarrierwrite the client knows that the requested buffer
is on the media. It does not ensure that buffers written before that
buffer are on the media. It only ensure that buffers written before
that buffer will get to the media before any buffers written after
that buffer. A flush command must be sent to the disk to ensure that
all earlier written buffers are on the media.
Reviewed by: kib
Tested by: Peter Holm
MFC 246877:
The UFS2 filesystem allocates new blocks of inodes as they are needed.
When a cylinder group runs short of inodes, a new block for inodes is
allocated, zero'ed, and written to the disk. The zero'ed inodes must
be on the disk before the cylinder group can be updated to claim them.
If the cylinder group claiming the new inodes were written before the
zero'ed block of inodes, the system could crash with the filesystem in
an unrecoverable state.
Rather than adding a soft updates dependency to ensure that the new
inode block is written before it is claimed by the cylinder group
map, we just do a barrier write of the zero'ed inode block to ensure
that it will get written before the updated cylinder group map can
be written. This change should only slow down bulk loading of newly
created filesystems since that is the primary time that new inode
blocks need to be created.
Reported by: Robert Watson
Reviewed by: kib
Tested by: Peter Holm
jilles [Sat, 23 Mar 2013 15:50:34 +0000 (15:50 +0000)]
MFC r246641: fts: Use O_DIRECTORY when opening name that might be changed by
attacker.
There are uncommon cases where fts_safe_changedir() may be called with a
non-NULL name that is not "..". Do not block or worse if an attacker put (a
symlink to) a fifo or device where a directory used to be.
mckusick [Fri, 22 Mar 2013 22:40:16 +0000 (22:40 +0000)]
MFS of 246289:
For UFS2 i_blocks is unsigned. The current "sanity" check that it
has gone below zero after the blocks in its inode are freed is a
no-op which the compiler fails to warn about because of the use of
the DIP macro. Change the sanity check to compare the number of
blocks being freed against the value i_blocks. If the number of
blocks being freed exceeds i_blocks, just set i_blocks to zero.
mm [Fri, 22 Mar 2013 07:57:28 +0000 (07:57 +0000)]
MFC r240870 (pjd):
It is possible to recursively destroy snapshots even if the snapshot
doesn't exist on a dataset we are starting from. For example if we
have the following configuration:
tank
tank/foo
tank/foo@snap
tank/bar
tank/bar@snap
We can execute:
# zfs destroy -t tank@snap
eventhough tank@snap doesn't exit.
Unfortunately it is not possible to do the same with recursive rename:
# zfs rename -r tank@snap tank@pans
cannot open 'tank@snap': dataset does not exist
...until now. This change allows to recursively rename snapshots even if
snapshot doesn't exist on the starting dataset.
tijl [Thu, 21 Mar 2013 16:15:34 +0000 (16:15 +0000)]
- Fix two possible overflows when testing if ELF program headers are on
the first page:
1. Cast uint16_t operands in a multiplication to unsigned int because
otherwise the implicit promotion to int results in a signed
multiplication that can overflow and the behaviour on integer
overflow is undefined.
2. Replace (offset + size > PAGE_SIZE) with (size > PAGE_SIZE - offset)
because the sum may overflow.
- Use the same tests to see if the path to the interpreter is on the first
page. There's no overflow here because size is already limited by
MAXPATHLEN, but the compiler optimises the new tests better. Also fix an
off-by-one error.
- Simplify tests to see if an ELF note program header is on the first page.
This also fixes an off-by-one error.