kib [Tue, 7 May 2013 09:48:42 +0000 (09:48 +0000)]
MFC r249811:
Literally follow POSIX:
If the bs= expr operand is specified and no conversions other than sync,
noerror, or notrunc are requested, the data returned from each input
block shall be written as a separate output block.
dim [Mon, 6 May 2013 19:59:13 +0000 (19:59 +0000)]
MFC r215137:
Revert r103230, which depended on ld preserving the __start_xxx and
__stop_xxx symbols for custom sections, even when these were not
referenced (at link time). This behaviour was changed again in binutils
commit 0b8ed435c3fe8bd09a08c23920e65bfb03251221.
This time, put the __GLOBL macro definition in cdefs.h, so it can be
reused in a few other places where it will be needed.
Reviewed by: kib
MFC r215138:
Use the same treatment as in linker_set.h for the __start and __stop
symbols of the set_vnet and set_pcpu sections, so those symbols will
always be emitted in kernel modules, if they use vnet.h or pcpu.h.
Also, for pcpu.h, make the __(start|stop)_set_pcpu declarations, and
associated macros invisible to userland, to prevent it picking up these
symbols.
dim [Wed, 1 May 2013 18:06:53 +0000 (18:06 +0000)]
MFC r249846:
When rebooting (exiting) from the BTX loader, make sure to restore the
GDT from the correct segment, otherwise a triple fault would be caused.
In some virtual environments (VMware, VirtualBox, etc) this could lead
to a unhandled error or hang in the guest emulation software.
Thanks to avg and jhb for a few hints in the right direction.
Noticed by: Jeremy Chadwick <jdc@koitsu.org> (and many others)
lstewart [Wed, 1 May 2013 08:57:45 +0000 (08:57 +0000)]
MFC r245783:
Simplify and fix a bug in cc_ack_received()'s "are we congestion window limited"
logic (refer to [1] for associated discussion). snd_cwnd and snd_wnd are
unsigned long and on 64 bit hosts, min() will truncate them to 32 bits and could
therefore potentially corrupt the result (although under normal operation,
neither variable should legitmately exceed 32 bits).
- antarctica: AusAQ and ATAQ have been removed.
- Antarctica/Macquarie has been moved to australasia file and AU.
- Asia/Hebron, Palestine updated for 2013.
- Paraguay stays with DST for the whole year.
Notify CAM on state* change to a logical volume not status. This resolves
the issues reported regarding camcontrol devlist not showing the rebuild
states of volumes unless an explicit camcontrol rescan was executed.
mm [Fri, 19 Apr 2013 10:35:45 +0000 (10:35 +0000)]
MFC r240870 (pjd):
It is possible to recursively destroy snapshots even if the snapshot
doesn't exist on a dataset we are starting from. For example if we
have the following configuration:
tank
tank/foo
tank/foo@snap
tank/bar
tank/bar@snap
We can execute:
# zfs destroy -t tank@snap
eventhough tank@snap doesn't exit.
Unfortunately it is not possible to do the same with recursive rename:
# zfs rename -r tank@snap tank@pans
cannot open 'tank@snap': dataset does not exist
...until now. This change allows to recursively rename snapshots even if
snapshot doesn't exist on the starting dataset.
The code in clear_remove() and clear_inodedeps() skips one entry
in the pagedep and inodedep hash tables. An entry in the table is
skipped because 'pagedep_hash' and 'inodedep_hash' hold the size
of the hash tables - 1.
The chance that this would have any operational failure is extremely
unlikely. These funtions only need to find a single entry and are
only called when there are too many entries. The chance that they
would fail because all the entries are on the single skipped hash
chain are remote.
dim [Tue, 16 Apr 2013 06:52:32 +0000 (06:52 +0000)]
MFC r249316:
Ensure make -j N universe works correctly, by checking for an up-to-date
make before starting the universe targets themselves. Otherwise, all of
the targets would attempt to build make simultaneously, overwriting each
other's copies of the make object files and executable. This could lead
to strange errors, for example when partially-written make executables
are invoked.
Also amend r216620, to make the rest of universe wait properly until the
upgrade_checks target is finished, by adding universe_${target}_prologue
to the .ORDER target. Otherwise, make will be too smart for its own
good, and start building the universe targets simultaneously with the
prologues anyway.
FreeBSD 8.0 introduced inpcb reference counting, and FreeBSD 8.1 began using
that reference count to protect inpcb stability in udp_pcblist() and other
monitoring functions, preventing the inpcb from being garbage collected
across potentially sleeping copyout() operations despite the inpcb zone
becoming shrinkable.
However, this introduced a race condition in which inp->inp_socket() might
become NULL as a result of the socket being freed, but before the inpcb we
removed from the global list of connections, allowing it to be exposed to a
third thread invoking udp_input() or udp6_input() which would try to
indirect through inp_socket without testing it for NULL. This might occur
with particular regularity on systems that frequently run netstat, or which
use SNMP for connection monitoring.
Later FreeBSD releases use a different reference/destruction model, but
stable/8 remained affected in FreeBSD 8.2 and 8.3; the problem could be
spotted on very high-load UDP services, such as top-level name servers.
An Errata Note for 8.x branches under continuing support might be
appropriate. Regardless, this fix should be merged to releng/8.4 prior to
8.4-RELEASE.
PR: 172963
Submitted by: Vincent Miller <vmiller@verisign.com>
Submitted by: Julien Charbon <jcharbon@verisign.com>
Submitted by: Marc De La Gueronniere <mdelagueronniere@verisign.com>
Add a conditional sleep 1 in case we add any IPv6 addresses to interfaces.
Do this per jail started, not per address. This will allow DAD to complete
and services to properly start. Before we have seen problems with services
trying to start before the IPv6 address was available to use and thus
erroring and failing to start.
MFC r248800:
On SIM destruction free associated CCBs, preallocated inside xpt_get_ccb().
Before this change they were just leaked. Fortunately USB sticks now use
only one CCB, and so leak was only 2KB per detach, while other bigger SIMs
with much more allocated CCBs are rarely detached.
Update the manual page to reflect reality. With r138509 and r152355,
"nostrictjoliet" option for mount_cd9660(8) was completely replaced with
"brokenjoliet" somehow.
hrStorageSize and hrStorageUsed are 32 bit integers, reporting a fs
size and usage in hrStorageAllocationUnits. If the file system has
more than 2^31 allocations it can not be shown correctly and the
meters are useless.
In such cases follow net-snmp behaviour and increase
hrStorageAllocationUnits so the values fit under INT_MAX.
- Add support for 'memsync' mode. This is the fastest replication mode that's
why it will now be the default.
- Bump protocol version to 2 and add backward compatibility for version 1.
- Allow to specify hosts by kern.hostid as well (in addition to hostname and
kern.hostuuid) in configuration file.
- enable_all_pool_feat will be unset if version is specified. Use it as
a flag instead of testing the props nvlist;
- Allow user to use -d -o feature@...=enable to create a v5000 pool when
desired. Without this, the implicit -o version=28 would make the utility
to complain about feature@ and version being conflict, which is confusing.
Per Matthew Ahrens, version 5000 should not be exposed to user and there is
a problem with my first revision, namely, specifying -d -o feature@...=enable
will still bail out with:
'feature@' and 'version' properties cannot be specified together.
Because zpool create -o version=5000 will not likely be supported by other
ZFS implementations (including ours on -CURRENT and 9-STABLE), remove the
hack that make that work. Users who want feature flags support can still
do an explicit 'zpool upgrade' after creating a pool.
The current ZFS version in 8-STABLE supports feature flags, which
enables many new features but makes it impossible to import pools
created on earlier released FreeBSD 9.x releases, including 9.0
and 9.1-RELEASE, where the feature flags are not yet supported
because they predates the merge (r243674), and 9.2-RELEASE will
not be released before 8.4-RELEASE.
To avoid surprises when users "upgrade" to 9.1-RELEASE, limit the
creation version to 28 by default on stable/8. The user will still
be able to upgrade the pool by using "zpool upgrade" or at create
time by explicitly specifying "zpool create -o version=5000".
This is a direct commit to stable/8 because it's not applicable to
-HEAD, and can be reverted once 9.2-RELEASE is released.
dim [Wed, 3 Apr 2013 16:26:58 +0000 (16:26 +0000)]
MFC r248802:
Similar to r239870 and r239872, teach the other binutils tools about the
DW_FORM_flag_present dwarf attribute, so they do not print errors or
warnings on files that contain it. (This attribute can be emitted by
newer versions of clang and gcc.)
Oops, r240972 (Add DEBUG kernel distribution) forgot to make said distribution
optional (such as the long-standing "local" distribution; also optional). This
fixes a regression in the install process when the user selects "All" as the
distribution-set.
melifaro [Sun, 31 Mar 2013 10:17:39 +0000 (10:17 +0000)]
Merge r248070.
Fix long-standing issue with interface routes being unprotected:
Use RTM_PINNED flag to mark route as immutable.
Forbid deleting immutable routes without special rtrequest1_fib() flag.
Adding interface address with prefix already in route table is handled
by atomically deleting old prefix and adding interface one.
mckusick [Sat, 30 Mar 2013 20:57:35 +0000 (20:57 +0000)]
MFC of 246876, 246877, and 247387:
MFC reviewed by: kib
MFC 246876:
Add barrier write capability to the VFS buffer interface. A barrier
write is a disk write request that tells the disk that the buffer
being written must be committed to the media along with any writes
that preceeded it before any future blocks may be written to the drive.
Barrier writes are provided by adding the functions bbarrierwrite
(bwrite with barrier) and babarrierwrite (bawrite with barrier).
Following a bbarrierwrite the client knows that the requested buffer
is on the media. It does not ensure that buffers written before that
buffer are on the media. It only ensure that buffers written before
that buffer will get to the media before any buffers written after
that buffer. A flush command must be sent to the disk to ensure that
all earlier written buffers are on the media.
Reviewed by: kib
Tested by: Peter Holm
MFC 246877:
The UFS2 filesystem allocates new blocks of inodes as they are needed.
When a cylinder group runs short of inodes, a new block for inodes is
allocated, zero'ed, and written to the disk. The zero'ed inodes must
be on the disk before the cylinder group can be updated to claim them.
If the cylinder group claiming the new inodes were written before the
zero'ed block of inodes, the system could crash with the filesystem in
an unrecoverable state.
Rather than adding a soft updates dependency to ensure that the new
inode block is written before it is claimed by the cylinder group
map, we just do a barrier write of the zero'ed inode block to ensure
that it will get written before the updated cylinder group map can
be written. This change should only slow down bulk loading of newly
created filesystems since that is the primary time that new inode
blocks need to be created.
Reported by: Robert Watson
Reviewed by: kib
Tested by: Peter Holm
MFC 247387:
An inode block must not be blockingly read while cg block is owned.
The order is inode buffer lock -> snaplk -> cg buffer lock, reversing
the order causes deadlocks.
Inode block must not be written while cg block buffer is owned. The
FFS copy on write needs to allocate a block to copy the content of the
inode block, and the cylinder group selected for the allocation might
be the same as the owned cg block. The reserved block detection code
in the ffs_copyonwrite() and ffs_bp_snapblk() is unable to detect the
situation, because the locked cg buffer is not exposed to it.
In order to maintain the dependency between initialized inode block
and the cg_initediblk pointer, look up the inode buffer in
non-blocking mode. If succeeded, brelse cg block, initialize the inode
block and write it. After the write is finished, reread cg block and
update the cg_initediblk.
If inode block is already locked by another thread, let the another
thread initialize it. If another thread raced with us after we
started writing inode block, the situation is detected by an update of
cg_initediblk. Note that double-initialization of the inode block is
harmless, the block cannot be used until cg_initediblk is incremented.
Sponsored by: The FreeBSD Foundation
In collaboration with: pho
Reviewed by: mckusick
X-MFC-note: after r246877
mckusick [Sat, 30 Mar 2013 00:22:26 +0000 (00:22 +0000)]
MFC of 246289:
For UFS2 i_blocks is unsigned. The current "sanity" check that it
has gone below zero after the blocks in its inode are freed is a
no-op which the compiler fails to warn about because of the use of
the DIP macro. Change the sanity check to compare the number of
blocks being freed against the value i_blocks. If the number of
blocks being freed exceeds i_blocks, just set i_blocks to zero.
tijl [Fri, 29 Mar 2013 13:23:43 +0000 (13:23 +0000)]
MFC r248256:
- Fix two possible overflows when testing if ELF program headers are on
the first page:
1. Cast uint16_t operands in a multiplication to unsigned int because
otherwise the implicit promotion to int results in a signed
multiplication that can overflow and the behaviour on integer
overflow is undefined.
2. Replace (offset + size > PAGE_SIZE) with (size > PAGE_SIZE - offset)
because the sum may overflow.
- Use the same tests to see if the path to the interpreter is on the first
page. There's no overflow here because size is already limited by
MAXPATHLEN, but the compiler optimises the new tests better. Also fix an
off-by-one error.
- Simplify tests to see if an ELF note program header is on the first page.
This also fixes an off-by-one error.
bryanv [Fri, 29 Mar 2013 02:09:46 +0000 (02:09 +0000)]
MFC 247870:
Remove the virtio dependency entry for the VirtIO device drivers. This
will prevent the kernel from linking if the device driver are included
without the virtio module. Remove pci and scbus for the same reason.
Also explain the relationship and necessity of the virtio and virtio_pci
modules. Currently in FreeBSD, we only support VirtIO PCI, but it could
be replaced with a different interface (like MMIO) and the device
(network, block, etc) will still function.
yongari [Fri, 29 Mar 2013 00:22:43 +0000 (00:22 +0000)]
MFC r248226:
r241438 broke IPMI access on Sun Fire X2200 M2(BCM5715).
Fix the IPMI regression by sending BGE_FW_DRV_STATE_UNLOAD to
ASF/IPMI firmware in driver attach phase. Sending heartheat to
ASF/IPMI is enabled only after upping interface so
setting driver state to BGE_FW_DRV_STATE_START in attach phase
broke IPMI access.
While I'm here, add NVRAM arbitration lock before performing
controller reset. ASF/IPMI firmware may be able to access the NVRAM
while controller reset is in progress. Without the arbitration
lock before resetting the controller, ASF/IPMI may not initialize
properly.
sbruno [Thu, 28 Mar 2013 17:27:46 +0000 (17:27 +0000)]
MFC r247279
The 5300 series ciss(4) board does not work in performant mode with our
currnet initialization sequence. Set it to simple mode only so that
systems can be updated from stable/7 to newer installations.
At some point, we should figure out why we cannot initialize performant
mode on this board.
gavin [Thu, 28 Mar 2013 09:03:15 +0000 (09:03 +0000)]
When r241373 was merged, one file appears to have been missed from the
commit. Merge it:
Remove undefined behavior from sranddev() and
srandomdev(). This doesn't actually work
with any modern C compiler:
In particular, both clang and modern gcc
verisons silently elide any xor operation
with 'junk'.
No mergeinfo changes with this commit as r241475 already updated the
mergeinfo.
delphij [Thu, 28 Mar 2013 05:35:46 +0000 (05:35 +0000)]
MFC r248788 (erwin):
Update BIND to 9.8.4-P2
Removed the check for regex.h in configure in order
to disable regex syntax checking, as it exposes
BIND to a critical flaw in libregex on some
platforms. [RT #32688]
gjb [Tue, 19 Mar 2013 19:49:06 +0000 (19:49 +0000)]
- Revert r248483, keeping devel/subversion as a port dependency
for the doc/ build.
- Remove logic in release/Makefile that modifies local src/ tree.[1]
Submitted by: delphij [1]
Approved by: re (jpaetzel)