marius [Sat, 28 Jan 2012 23:45:31 +0000 (23:45 +0000)]
MFC: r206451, r206453
Add sbbc(4), a driver for the BootBus controller found in Serengeti and
StarCat systems which provides time-of-day services for both as well as
console service for Serengeti, i.e. Sun Fire V1280. While the latter is
described with a device type of serial in the OFW device tree, it isn't
actually an UART. Nevertheless the console service is handled by uart(4)
as this allowed to re-use quite a bit of MD and MI code. Actually, this
idea is stolen from Linux which interfaces the sun4v hypervisor console
with the Linux counterpart of uart(4).
marius [Sat, 28 Jan 2012 23:26:55 +0000 (23:26 +0000)]
MFC: r225891
Re-reading the Schizo errata suggests that it's actually tolerable to
also use the streaming buffer of pre version 5/revision 2.3 hardware as
long as we stay away from context flushes (which iommu(4) so far doesn't
take advantage of). OpenSolaris does the same.
marius [Sat, 28 Jan 2012 23:25:31 +0000 (23:25 +0000)]
MFC: r225890
- Add protective parentheses to macros as far as possible.
- Move {r,w,}mb() to the top of this file where they live on most of the
other architectures.
marius [Sat, 28 Jan 2012 23:24:08 +0000 (23:24 +0000)]
MFC: r225889, r228222
In total store which we use for running the kernel and all of the userland
atomic operations behave as if they were followed by a CPU memory barrier
so there's no need to include ones in the acquire variants of atomic(9) and
it's sufficient to just use include compiler memory barriers to satisfy
the requirements of atomic(9). Removing the CPU memory barriers results in
a small performance improvement, specifically this is sufficient to
compensate the performance loss seen in the worldstone benchmark seen when
using SCHED_ULE instead of SCHED_4BSD.
This change is inspired by Linux even more radically doing the equivalent
thing some time ago.
Thanks go to Peter Jeremy for additional testing.
marius [Sat, 28 Jan 2012 23:15:04 +0000 (23:15 +0000)]
MFC: r225886
- Right-justify backslashes as suggested by style(9).
- Rename ATOMIC_INC_ULONG to ATOMIC_INC_LONG in order to be consistent with
the names of the other macros in this file an adjust accordingly.
marius [Sat, 28 Jan 2012 23:13:00 +0000 (23:13 +0000)]
MFC: r228022, r228026
For sparc64 also adjust the geometry of da(4) driven disks to not overflow
the 16-bit cylinders field of the VTOC8 disk label (at around 502GB). The
geometry chosen for disks above that limit allows to use disks up to 2TB,
which is the limit of the extended VTOC8 format. The geometry used for
disks smaller than the 16-bit cylinders limit stays the same as used by
cam_calc_geometry(9) for extended translation.
Thanks to Hans-Joerg Sirtl for providing hardware for testing this change.
rmacklem [Sat, 28 Jan 2012 02:18:50 +0000 (02:18 +0000)]
MFC: r230100
Tai Horgan reported via email that there were two places in
the new NFSv4 server where the code follows the wrong list.
Fortunately, for these fairly rare cases, the lc_stateid[]
lists are normally empty. This patch fixes the code to
follow the correct list.
dumbbell [Thu, 26 Jan 2012 22:01:05 +0000 (22:01 +0000)]
MFC r228259:
Support domain-search in dhclient(8)
The "domain-search" option (option 119) allows a DHCP server to publish
a list of implicit domain suffixes used during name lookup. This option
is described in RFC 3397.
For instance, if the domain-search option says:
".example.org .example.com"
and one wants to resolve "foobar", the resolver will try:
1. "foobar.example.org"
2. "foobar.example.com"
The file /etc/resolv.conf is updated with a "search" directive if the
DHCP server provides "domain-search".
A regression test suite is included in this patch under
tools/regression/sbin/dhclient.
MFC r229000:
Invalid Domain Search option isn't considered as a fatal error
In the original Domain Search option patch, an invalid option value
would cause the whole lease to be rejected. However, DHCP servers who
emit such an invalid value are more common than I thought. With this new
patch, just the option is rejected, not the entire lease.
ae [Thu, 26 Jan 2012 10:33:19 +0000 (10:33 +0000)]
MFC r223666:
Add new rule actions "call" and "return" to ipfw. They make
possible to organize subroutines with rules.
The "call" action saves the current rule number in the internal
stack and rules processing continues from the first rule with
specified number (similar to skipto action). If later a rule with
"return" action is encountered, the processing returns to the first
rule with number of "call" rule saved in the stack plus one or higher.
ae [Thu, 26 Jan 2012 09:28:09 +0000 (09:28 +0000)]
MFC r222279:
Do not truncate available disk space to the closest track boundary.
MFC r222341:
Some partitioning tools may have a different opinion about disk
geometry and partitions may start from withing the first track.
If we found such partitions, then do not reserve space of the
first track, only first sector.
ae [Thu, 26 Jan 2012 09:14:51 +0000 (09:14 +0000)]
MFC r216132 (by ivoras):
Add a note about the magic number 20. Actually, 22.75 entries fit in
a 512 byte sector but when choosing magic numbers, 20 looks nicer.
MFC r223332:
Change the way how we update bootcode for BSD scheme.
Since the only parameter that we check is size of bootcode, then
allow only two sizes: size of boot1 and size of /boot/boot.
This partially protects users from losing ability to boot if incorrect
bootcode is specified.
ae [Thu, 26 Jan 2012 08:47:29 +0000 (08:47 +0000)]
MFC r221788:
Add basic metadata integrity check. In case when partition table was
probed and read successfull, but it contains invalid values (e.g.
overlapped partitions, offset or size is out of bounds), then table
will be rejected.
MFC r221972:
Add a sysctl kern.geom.part.check_integrity for those who has corrupt
partition tables and lost an ability to boot after r221788.
Also unhide an error message from bootverbose, this would help to
easier determine the problem.
MFC r221984:
Add diagnostic messages for integrity checks.
MFC r221992:
Make diagnostic messages more specific. With bootverbose print out
all inconsistencies of integrity in the partition table, not first
found only.
MFC r222642:
Add diagnostic message about not aligned partitions.
ae [Thu, 26 Jan 2012 07:51:51 +0000 (07:51 +0000)]
MFC r226880 (modified version):
Our geom withering function could take some time before geom with its
providers and consumers will be destroyed. Before take some actions
with a geom, check that it is not destroyed at the moment.
ae [Thu, 26 Jan 2012 07:42:54 +0000 (07:42 +0000)]
MFC r215118:
Move code for search of existing geom into g_part_find_geom
function and use this function instead of g_part_parm_geom
in g_part_ctl_create.
rmacklem [Wed, 25 Jan 2012 02:22:16 +0000 (02:22 +0000)]
MFC: r229956
jwd@ reported via email that the "CacheSize" field reported by "nfsstat -e -s"
would go negative after using the "-z" option to zero out the stats.
This patch fixes that by not zeroing out the srvcache_size field
for "-z", since it is the size of the cache and not a counter
pluknet [Tue, 24 Jan 2012 10:32:02 +0000 (10:32 +0000)]
MFC r230256:
Fix the "lock &zrl->zr_mtx already initialized" assertion by initializing
the allocated memory before calling mtx_init(9) on mtx pointing to it.
Otherwize, random contents of uninitialized memory might occasionally
trigger the assertion.
Reported by: Pavel Polyakov <bsd kobyla org>
Reviewed by: pjd
gavin [Sun, 22 Jan 2012 21:25:47 +0000 (21:25 +0000)]
Merge r229085 from head:
Default to not performing the early-boot memory tests when we detect we
are booting inside a VM. There are three reasons to disable this:
o It causes the VM host to believe that all the tested pages or RAM are
in use. This in turn may force the host to page out pages of RAM
belonging to other VMs, or otherwise cause problems with fair resource
sharing on the VM cluster.
o It adds significant time to the boot process (around 1 second/Gig in
testing)
o It is unnecessary - the host should have already verified that the
memory is functional etc.
Note that this simply changes the default when in a VM - it can still be
overridden using the hw.memtest.tests tunable.
rmacklem [Sun, 22 Jan 2012 06:00:50 +0000 (06:00 +0000)]
MFC: r229802
opt_inet6.h was missing from some files in the new NFS subsystem.
The effect of this was, for clients mounted via inet6 addresses,
that the DRC cache would never have a hit in the server. It also
broke NFSv4 callbacks when an inet6 address was the only one available
in the client. This patch fixes the above, plus deletes opt_inet6.h
from a couple of files it is not needed for.
alc [Sat, 21 Jan 2012 18:38:57 +0000 (18:38 +0000)]
MFC r228746
The Xen pmap doesn't support superpages. So, there is no point in it
initializing structures, like the pv table, that are only used to
implement superpages. In fact, some of the unnecessary code in
pmap_init() was actually doing harm. It was preventing the kernel from
booting on virtual machines with more than 768 MB of memory.
Note: The change to pmap_page_is_mapped() differs slightly from r228746
because of differences in how the page queues lock is used in
FreeBSD 8.x.
rmh [Sat, 21 Jan 2012 18:21:44 +0000 (18:21 +0000)]
MFC r227827
Define __FreeBSD_kernel__ macro in sys/param.h.
__FreeBSD_kernel__ indicates that this system uses the kernel of FreeBSD,
which by definition is always true on FreeBSD. This macro is also defined
on other systems that use the kernel of FreeBSD, such as GNU/kFreeBSD.
It is tempting to use this macro in userland code when we want to enable
kernel-specific routines, and in fact it's fine to do this in code that
is part of FreeBSD itself. However, be aware that as presence of this
macro is still not widespread (e.g. older FreeBSD versions, 3rd party
compilers, etc), it is STRONGLY DISCOURAGED to check for this macro in
external applications without also checking for __FreeBSD__ as an
alternative.
alc [Sat, 21 Jan 2012 07:21:44 +0000 (07:21 +0000)]
MFC r226163, r228317, and r228324
Fix the handling of an empty kmem map by sysctl_kmem_map_free().
Eliminate the possibility of 32-bit arithmetic overflow in the
calculation of vm_kmem_size that may occur if the system
administrator has specified a vm.vm_kmem_size tunable value that
exceeds the hard cap.
lstewart [Sat, 21 Jan 2012 04:22:19 +0000 (04:22 +0000)]
MFC r229898:
Consumers of bpfdetach() expect it to remove all bpf_if structs from the
bpf_iflist list which reference the specified ifnet. The existing implementation
only removes the first matching bpf_if found in the list, effectively leaking
list entries if an ifnet has been bpfattach()ed multiple times with different
DLTs.
Fix the leak by performing the detach logic in a loop, stopping when all bpf_if
structs referencing the specified ifnet have been detached and removed from the
bpf_iflist list.
Whilst here, also:
- Remove the unnecessary "bp->bif_ifp == NULL" check, as a bpf_if should never
exist in the list with a NULL ifnet pointer.
- Except when INVARIANTS is in the kernel config, silently ignore the case where
no bpf_if referencing the specified ifnet is found, as it is harmless and does
not require user attention.
emaste [Fri, 20 Jan 2012 00:20:00 +0000 (00:20 +0000)]
MFC r216269:
Don't warn if a partition appears not to be aligned on a track boundary.
Modern disks use LBA and create a fake CHS geometry that doesn't have any
relation to the on-disk layout of data.
gnn [Thu, 19 Jan 2012 19:39:41 +0000 (19:39 +0000)]
MFC: 229965
Fix for PR 138526.
Add the ability for /dev/null and /dev/zero to accept
being set into non blocking mode via fcntl(). This
brings the code into compliance with IEEE Std 1003.1-2001
as referenced in another PR, 94729.
truckman [Wed, 18 Jan 2012 21:50:59 +0000 (21:50 +0000)]
MFC: r229984
Pass the arguments to mtx_init() in the correct order. There should be
no change to the binary because the value of MTX_DEF is zero and there
is a visible function prototype.
bz [Tue, 17 Jan 2012 22:08:58 +0000 (22:08 +0000)]
MFC r225048:
In this branch when doing no further checkes there is no reason use
the temporary variable and check with if as TUNABLE_*_FETCH do not
alter values unless successfully found the tunable.
bz [Tue, 17 Jan 2012 22:02:11 +0000 (22:02 +0000)]
MFC r224516:
Introduce a tunable to disable the time consuming parts of bootup
memtesting, which can easily save seconds to minutes of boot time.
The tunable name is kept general to allow reusing the code in
alternate frameworks.
glebius [Fri, 13 Jan 2012 23:25:58 +0000 (23:25 +0000)]
Merge r228463, that explicily uses 255.0.0.0 mask for the temporary prefix.
This change isn't actually needed in the stable/8, but let it be here, in
case if anyone tries to run stable/8 world on a head/ kernel.
jhb [Fri, 13 Jan 2012 20:35:43 +0000 (20:35 +0000)]
MFC 221891,229665,229672,229700:
Remove the assertion from tcp_input() that rcv_nxt is always greater
than or equal to rcv_adv and fix tcp_twstart() to handle this case by
assuming the last window was zero rather than a negative value.
The code in tcp_input() already safely handled this case. It can happen
due to delayed ACKs along with a remote sender that sends data beyond
the window we previously advertised. If we have room in our socket buffer
for the extra data beyond the advertised window, we will accept it.
However, if the ACK for that segment is delayed, then we will not
effectively fixup rcv_adv to account for that extra data until the
next segment arrives and forces out an ACK. When that next segment
arrives, rcv_nxt will be beyond rcv_adv.
jhb [Fri, 13 Jan 2012 20:25:56 +0000 (20:25 +0000)]
MFC 228960:
Cap the priority calculated from the current thread's running tick count
at SCHED_PRI_RANGE to prevent overflows in the priority value. This can
happen due to irregularities with clock interrupts under certain
virtualization environments.
jhb [Fri, 13 Jan 2012 20:15:49 +0000 (20:15 +0000)]
MFC 229429:
Some small fixes to CPU accounting for threads:
- Only initialize the per-cpu switchticks and switchtime in sched_throw()
for the very first context switch on APs during boot. This avoids a
small gap between the middle of thread_exit() and sched_throw() where
time is not accounted to any thread.
- In thread_exit(), update the timestamp bookkeeping to track the changes
to mi_switch() introduced by td_rux so that the code once again matches
the comment claiming it is mimicing mi_switch(). Specifically, only
update the per-thread stats directly and depend on ruxagg() to update
p_rux rather than adjusting p_rux directly. While here, move the
timestamp bookkeeping as late in the function as possible.
jhb [Fri, 13 Jan 2012 19:51:15 +0000 (19:51 +0000)]
MFC 229390,229420,229479:
Fix some races in the multicast code by removing places where we would
drop the IF_ADDR_LOCK while walking an interface's multicast address list:
- Use TAILQ_FOREACH() instead of TAILQ_FOREACH_SAFE() for some loops that
do not modify the queues they iterate over.
- When cancelling multicast timers on an interface, don't release the
reference on a group in the leaving state while iterating over the loop.
Instead, use the same approach used in igmp_ifdetach() and mld_ifdetach()
of placing the groups to free on a pending release list and then releasing
the references after dropping the IF_ADDR_LOCK.
- Use the mli_relinmhead list normally used to defer calls to
in6m_release_locked() to defer calls to mld_v1_transmit_report() until
after the IF_ADDR_LOCK is dropped.
jhb [Fri, 13 Jan 2012 19:20:33 +0000 (19:20 +0000)]
MFC 229414,229476,229477:
Various fixes to the SIOC[DG]LIFADDR ioctl handlers:
- Grab a reference on any matching interface address (ifa) before dropping
the IF_ADDR_LOCK() and release the reference after using it to prevent a
potential use-after-free.
- Fix the IPv4 ioctl handlers in in_lifaddr_ioctl() to work with IPv4
interface addresses rather than IPv6.
- Add missing interface address list locking in the IPv4 handlers.
jhb [Fri, 13 Jan 2012 19:13:43 +0000 (19:13 +0000)]
MFC 215605,215606,222952,229400:
Various improvements to the 'cscope' target:
- Add x86 to ALL_ARCH.
- Add lex and yacc sources to things cscope'd.
- Include sys/xen in cscope tag file generation.
- Improve the cscope target's handling of MD directories. Automatically
include the MACHINE_ARCH directory if it differs from MACHINE when
building an index for a single machine. Also, include the 'x86'
directory when building an index for i386, pc98, or amd64.
jhb [Fri, 13 Jan 2012 18:54:10 +0000 (18:54 +0000)]
MFC 228849, 229727:
Add post-VOP hooks for VOP_DELETEEXTATTR() and VOP_SETEXTATTR() and use
these to trigger a NOTE_ATTRIB EVFILT_VNODE kevent when the extended
attributes of a vnode are changed.
jhb [Fri, 13 Jan 2012 18:49:28 +0000 (18:49 +0000)]
MFC 228738:
Allow boot0cfg to force a PXE boot via boot0 on the next boot.
- Fix boot0 to check for PXE when using the pre-set setting for the
preferred slice.
- Update boot0cfg to use slice 6 to select PXE. Accept a 'pxe' argument
instead of a number for the 's' option as a way to select PXE as well.
mckusick [Fri, 13 Jan 2012 07:10:52 +0000 (07:10 +0000)]
MFC: 226520
The current /etc/dumpdates file restricts device names to 32 characters.
With the addition of various GEOM layers some device names now exceed
this length, for example /dev/mirror/encrypted.elig.journal. This
change expands the field to 53 bytes which brings the /etc/dumpdates
lines to 80 characters. Exceeding 80 characters makes the /etc/dumpdates
file much less human readable. A test is added to dump so that it
verifies that the device name will fit in the 53 character field
failing the dump if it is too long.
This change has been checked to verify that its /etc/dumpdates file
is compatible with older versions of dump.
Reported by: Martin Sugioarto <martin@sugioarto.com>
PR: kern/160678
mav [Thu, 12 Jan 2012 15:57:03 +0000 (15:57 +0000)]
MFC r228461:
Fix few bugs in isp(4) target mode support:
- in destroy_lun_state() assert hold == 1 instead of 0, as it should
receive hold taken by the create_lun_state() or get_lun_statep() before;
- fix hold count leak inside rls_lun_statep() that also fired above assert;
- in destroy_lun_state() use SIM bus number instead of SIM path id for
ISP_GET_PC_ADDR(), as it was before r196008;
- make isp_disable_lun() to set status in CCB;
- make isp_target_mark_aborted() set status into the proper CCB.
mav [Thu, 12 Jan 2012 15:02:51 +0000 (15:02 +0000)]
MFC r228808, r228847, 229395:
r228808, r228847:
Make cd driver to handle Audio CDs, reporting their 2352 bytes sectors to
GEOM and using READ CD command for reading data, same as acd driver does.
Audio CDs identified by checking respective bit of the control field of
the first track in TOC.
229395:
Add support for CDRIOCGETBLOCKSIZE and CDRIOCSETBLOCKSIZE IOCTLs to control
sector size same as acd driver does. Together with r228808 and r228847 this
allows existing multimedia/vlc to play Audio CDs via CAM cd driver.
mckusick [Wed, 11 Jan 2012 19:12:29 +0000 (19:12 +0000)]
MFC: 226265
When unmounting a filesystem always wait for the vfs_busy lock to clear
so that if no vnodes in the filesystem are actively in use the unmount
will succeed rather than failing with EBUSY.
Reported by: Garrett Cooper
Reviewed by: Attilio Rao and Kostik Belousov
Tested by: Garrett Cooper
PR: kern/161016
mav [Wed, 11 Jan 2012 18:14:22 +0000 (18:14 +0000)]
MFC r228726, r228727:
Cast some vendor-specific spell on VIA VT1708S codecs to:
- make analog input loopback work;
- get access to the mics boost controls.
rmacklem [Wed, 11 Jan 2012 01:58:49 +0000 (01:58 +0000)]
MFC: r228827
During investigation of an NFSv4 client crash reported by glebius@,
jhb@ spotted that nfscl_getstateid() might modify credentials when
called from nfsrpc_read() for the case where p != NULL, whereas
nfsrpc_read() only did a crdup() to get new credentials for p == NULL.
This bug was introduced by r195510, since pre-r195510 nfscl_getstateid()
only modified credentials for the p == NULL case. This patch modifies
nfsrpc_read()/nfsrpc_write() so that they do crdup() for the p != NULL case.
It is conceivable that this bug caused the crash reported by glebius@, but
that will not be determined for some time, since the crash occurred after
about 1month of operation.
rmacklem [Tue, 10 Jan 2012 02:55:43 +0000 (02:55 +0000)]
MFC: r228757
jwd@ reported a problem via email where the old NFS client would
get a reply of EEXIST from an NFS server when a Mkdir RPC was retried,
for an NFS over UDP mount.
Upon investigation, it was found that the client was retransmitting
the Mkdir RPC request over UDP, but with a different xid. As such,
the retransmitted message would miss the Duplicate Request Cache
in the server, causing it to reply EEXIST. The kernel client side
UDP rpc code has two timers. The first one causes a retransmit using
the same xid and socket and was set to a fixed value of 3seconds.
(The default can be overridden via CLSET_RETRY_TIMEOUT.)
The second one creates a new socket and xid and should be larger
than the first. However, both NFS clients were setting the second
timer to nm_timeo ("timeout=<value>" mount argument), which defaulted to
1second, so the first timer would never time out.
This patch fixes both NFS clients so that they set the first timer
using nm_timeo and makes the second timer larger than the first one.