Interface addresses are refcounted as packets move through the stack,
and there's garbage collection tied to it so that address changes can
safely propagate while traffic is flowing. In our setup, we weren't
changing or deleting any addresses, but the refcounting logic in
ip6_input() was wrong and caused a reference leak on every inbound
V6 packet. This eventually caused a 32bit overflow, and the resulting
0 value caused the garbage collection to run on the active address.
That then snowballed into the panic.
dim [Tue, 5 Jun 2012 06:43:45 +0000 (06:43 +0000)]
MFC r236444:
Install libcxxrt's C++ ABI and unwind headers. This is done in libc++'s
Makefile, so these headers go into the same destination directory as
libc++'s own headers, currently /usr/include/c++/v1.
dougb [Mon, 4 Jun 2012 22:14:33 +0000 (22:14 +0000)]
Upgrade to 9.8.3-P1, the latest from ISC. This version contains
a critical bugfix:
Processing of DNS resource records where the rdata field is zero length
may cause various issues for the servers handling them.
Processing of these records may lead to unexpected outcomes. Recursive
servers may crash or disclose some portion of memory to the client.
Secondary servers may crash on restart after transferring a zone
containing these records. Master servers may corrupt zone data if the
zone option "auto-dnssec" is set to "maintain". Other unexpected
problems that are not listed here may also be encountered.
All BIND users are strongly encouraged to upgrade.
mav [Mon, 4 Jun 2012 07:12:36 +0000 (07:12 +0000)]
MFC r232740:
Make kern.sched.idlespinthresh default value adaptive depending of HZ.
Otherwise with HZ above 8000 CPU may never skip timer ticks on idle.
mav [Mon, 4 Jun 2012 07:03:56 +0000 (07:03 +0000)]
MFC r235333:
Add two functions xpt_batch_start() and xpt_batch_done() to the CAM SIM KPI
to allow drivers to handle request completion directly without passing
them to the CAM SWI thread removing extra context switch.
Modify all ATA/SATA drivers to use them.
bapt [Sun, 3 Jun 2012 18:16:17 +0000 (18:16 +0000)]
MFC: 234851, 235059
Add two special directives to libmap.conf:
include <file>:
Parse the contents of file before continuing with the current file.
includedir <dir>:
Parse the contents of every file in dir that ends in .conf before continuing
with the current file.
Any file or directory encountered while processing include or includedir
directives will be parsed exactly once, even if it is encountered multiple
times.
rea [Sun, 3 Jun 2012 18:00:38 +0000 (18:00 +0000)]
OpenSSH: allow VersionAddendum to be used again
Prior to this, setting VersionAddendum will be a no-op: one will
always have BASE_VERSION + " " + VERSION_HPN for VersionAddendum
set in the config and a bare BASE_VERSION + VERSION_HPN when there
is no VersionAddendum is set.
HPN patch requires both parties to have the "hpn" inside their
advertized versions, so we add VERSION_HPN to the VERSION_BASE
if HPN is enabled and omitting it if HPN is disabled.
VersionAddendum now uses the following logics:
* unset (default value): append " " and VERSION_ADDENDUM;
* VersionAddendum is set and isn't empty: append " "
and VersionAddendum;
* VersionAddendum is set and empty: don't append anything.
jilles [Sat, 2 Jun 2012 20:18:34 +0000 (20:18 +0000)]
MFC r236193: libfetch: Avoid SIGPIPE on network connections.
To avoid unexpected process termination from SIGPIPE when writing to a
closed network connection, enable SO_NOSIGPIPE on all network connections.
The POSIX standard MSG_NOSIGNAL is not used since it requires modifying all
send calls to add this flag. This is particularly nasty for SSL connections.
marius [Sat, 2 Jun 2012 19:16:09 +0000 (19:16 +0000)]
MFC: r236328
Try to finally get the point in time at which bge_add_sysctls() is called
right; it needs to be called before bge_can_use_msi() but in turn requires
bge_flags to be properly set.
marius [Sat, 2 Jun 2012 19:11:44 +0000 (19:11 +0000)]
MFC: r236156
- Fix some typos in mmc_acquire_bus() and mmc_send_csd().
- Fix some math errors in mmc_decode_csd_sd().
- Fix incorrect arguments to mmc_send_app_op_cond() in mmc_go_discovery().
- Add reporting of CSD for debug purposes.
- Add detection (and skipping) of password-locked cards.
- Add setting of block length on card if necessary.
marius [Sat, 2 Jun 2012 18:57:13 +0000 (18:57 +0000)]
MFC: r236061
- When creating the DMA tag for user data, don't ask for more segments
than required for handling MAXPHYS and report the resulting maximum
I/O size to CAM instead of implicitly limiting it to DFLTPHYS.
- Move the variables of sym_action2() out of nested scope as required
by style(9) and remove extraneous curly braces.
- Replace a magic value for PCIR_COMMAND with the appropriate macro.
- Use DEVMETHOD_END.
- Use NULL instead of 0 for pointers.
bapt [Sat, 2 Jun 2012 15:13:28 +0000 (15:13 +0000)]
MFC: 228545,229572
Modify pw_copy:
- if pw is NULL and oldpw is not NULL then the oldpw is deleted
- if pw->pw_name != oldpw->pw_name but pw->pw_uid == oldpw->pw_uid
then it renames the user
add new gr_* functions so now gr_util API is similar to pw_util API,
this allow to manipulate groups in a safe way.
Add new pw_make_v7 to make a passwd line (in v7 format) out of a struct passwd
kib [Fri, 1 Jun 2012 14:40:16 +0000 (14:40 +0000)]
MFC r235266:
According to SUSv4, realpath(3) must fail if
[ENOENT] A component of file_name does not name an existing file or
file_name points to an empty string.
[ENOTDIR] A component of the path prefix is not a directory, or the
file_name argument contains at least one non- <slash> character
and ends with one or more trailing <slash> characters and the last
pathname component names an existing file that is neither a
directory nor a symbolic link to a directory.
Add checks for the listed conditions, and set errno accordingly.
Update the realpath(3) manpage to mention SUS behaviour. Remove the
requirement to include sys/param.h before stdlib.h.
rstone [Wed, 30 May 2012 23:22:52 +0000 (23:22 +0000)]
MFC r235459 and r235471
r235459:
Implement the DTrace sched provider. This implementation aims to be
compatible with the sched provider implemented by Solaris and its open-
source derivatives. Full documentation of the sched provider can be found
on Oracle's DTrace wiki pages.
Note that for compatibility with scripts originally written for Solaris,
serveral probes are defined that will never fire. These probes are defined
to fire when Solaris-specific features perform certain actions. As these
features are not present in FreeBSD, the probes can never fire.
Also, I have added a two probes that are not defined in Solaris, lend-pri
and load-change. These probes have been added to make it possible to
collect schedgraph data with DTrace.
Finally, a few probes are defined in Solaris to take a cpuinfo_t *
argument. As it was not immediately clear to me how to translate that to
FreeBSD, currently those probes are passed NULL in place of a cpuinfo_t *.
Sponsored by: Sandvine Incorporated
r235471:
Fix typo in function name SDT_PROBE4 and unbreak 4BSD UP.
This brings stdatomic.h into -STABLE and includes improvements to tgmath.h that
significantly decrease compile times on a C11 compiler (such as the clang in
the base system).
jhb [Wed, 30 May 2012 16:30:51 +0000 (16:30 +0000)]
MFC 235833:
Only check to see if a memory resource is a PCI ROM BAR when activating
and deactivating PCI resources. Previously, if a device had more than
48 MSI interrupts, then activating message 48 (which has a rid == PCIR_BIOS)
would incorrectly try to enable the PCI ROM BAR.
jhb [Wed, 30 May 2012 16:13:10 +0000 (16:13 +0000)]
MFC 234501:
The amr(4) firmware contains a rather dubious "feature" where it
assumes for small buffers (< 64k) that the OS driver is actually using
a buffer rounded up to the next power of 2. It also assumes that the
buffer is at least 4k in size. Furthermore, there is at least one
known instance of megarc sending a request with a 12k buffer where the
firmware writes out a 24k-ish reply.
To workaround the data corruption triggered by this "feature", ensure
that buffers for user commands use a minimum size of 32k, and that
buffers between 32k and 64k use a 64k buffer.
emax [Wed, 30 May 2012 00:38:24 +0000 (00:38 +0000)]
MFC r235854
Tweak condition for disabling allocation from per-CPU buckets in
low memory situation. I've observed a situation where per-CPU
allocations were disabled while there were enough free cached pages.
Basically, cnt.v_free_count was sitting stable at a value lower
than cnt.v_free_min and that caused massive performance drop.
jimharris [Tue, 29 May 2012 23:10:10 +0000 (23:10 +0000)]
MFC r234106, r235751:
Queue CCBs internally instead of using CAM_REQUEUE_REQ status. This fixes
problem where userspace apps such as smartctl fail due to CAM_REQUEUE_REQ
status getting returned when tagged commands are outstanding when smartctl
sends its I/O using the pass(4) interface.
trasz [Tue, 29 May 2012 15:08:35 +0000 (15:08 +0000)]
MFC r235787:
Fix panic with RACCT that could occur in low memory (or out of swap)
situations, due to fork1() calling racct_proc_exit() without calling
racct_proc_fork() first.
fabient [Tue, 29 May 2012 14:50:21 +0000 (14:50 +0000)]
MFC r233628, r234598, r235229, r235831, r226986.
Add software PMC support.
New kernel events can be added at various location for sampling or counting.
This will for example allow easy system profiling whatever the processor is
with known tools like pmcstat(8).
Simultaneous usage of software PMC and hardware PMC is possible, for example
looking at the lock acquire failure, page fault while sampling on
instructions.
yongari [Tue, 29 May 2012 04:44:51 +0000 (04:44 +0000)]
MFC r235821:
Don't force max payload size to 128. Root complex and Endpoint will
negotiate with each other on the TLP payload size so blindly
forcing the size to 128 can cause a completion error which in turn
will stop device.
yongari [Tue, 29 May 2012 04:30:02 +0000 (04:30 +0000)]
MFC r235816:
Make IPMI work in the bce driver even when the interface is
configured down. Formerly, IPMI communication was lost whenever the
interface was not up. The reason was that the BCE_EMAC_MODE
register was not configured with the correct media settings. There
are two parts to the fix.
First, resetting the chip in bce_reset() causes the BCE_EMAC_MODE
register to be initialized to a default value that does not
necessarily correspond to the actual media settings. The fix
implemented here is a bit of a hack. Ideally, at the end of
bce_reset() we would poll the PHY to determine the negotiated media,
and then we would set the BCE_EMAC_MODE register accordingly. That
is difficult, since the PHY is abstracted behind the MII layer and is
not supposed to be queried directly from the MAC driver. Instead,
we read the BCE_EMAC_MODE register at the beginning of bce_reset()
and then restore its media bits to their original values before
returning. If IPMI is up and running, then the link is already
established and the BCE_EMAC_MODE register is already set appropriately
when bce_reset() is called. If IPMI is not running, no harm is
done by preserving the BCE_EMAC_MODE settings. The driver will set
the register properly once the interface is configured up and link
is established.
Second, bce_miibus_statchg() is sometimes called when the link is
down. In that case, the reported media settings are invalid.
Formerly, the driver used them anyway to setup the BCE_EMAC_MODE
register. We now avoid changing any MAC registers unless link is
active and the reported media settings are valid.
alc [Mon, 28 May 2012 23:31:48 +0000 (23:31 +0000)]
MFC r230120
Neither tmpfs_nocacheread() nor tmpfs_mappedwrite() needs to call
vm_object_pip_{add,subtract}() on the swap object because the swap
object can't be destroyed while the vnode is exclusively locked.
Moreover, even if the swap object could have been destroyed during
tmpfs_nocacheread() and tmpfs_mappedwrite() this code is broken
because vm_object_pip_subtract() does not wake up the sleeping thread
that is trying to destroy the swap object.
Free invalid pages after an I/O error. There is no virtue in keeping
them around in the swap object creating more work for the page daemon.
(I believe that any non-busy page in the swap object will now always
be valid.)
vm_pager_get_pages() does not return a standard errno, so its return
value should not be returned by tmpfs without translation to an errno
value.
There is no reason for the wakeup on vpg in tmpfs_mappedwrite() to
occur with the swap object locked.
Eliminate printf()s from tmpfs_nocacheread() and tmpfs_mappedwrite().
(The swap pager already spams your console if data corruption is
imminent.)
alc [Mon, 28 May 2012 22:58:13 +0000 (22:58 +0000)]
MFC r229821
Correct an error of omission in the implementation of the truncation
operation on POSIX shared memory objects and tmpfs. Previously, neither of
these modules correctly handled the case in which the new size of the object
or file was not a multiple of the page size. Specifically, they did not
handle partial page truncation of data stored on swap. As a result, stale
data might later be returned to an application.
Interestingly, a data inconsistency was less likely to occur under tmpfs
than POSIX shared memory objects. The reason being that a different mistake
by the tmpfs truncation operation helped avoid a data inconsistency. If the
data was still resident in memory in a PG_CACHED page, then the tmpfs
truncation operation would reactivate that page, zero the truncated portion,
and leave the page pinned in memory. More precisely, the benevolent error
was that the truncation operation didn't add the reactivated page to any of
the paging queues, effectively pinning the page. This page would remain
pinned until the file was destroyed or the page was read or written. With
this change, the page is now added to the inactive queue.
MFC r230180
When tmpfs_write() resets an extended file to its original size after an
error, we want tmpfs_reg_resize() to ignore I/O errors and unconditionally
update the file's size.
rstone [Sun, 27 May 2012 18:55:23 +0000 (18:55 +0000)]
MFC r234691
Implement the D "cpu" variable, which returns curcpu. I have chosen not
to follow the example of OpenSolaris and its descendants, which implemented
cpu as an inline that took a value out of curthread. At certain points in
the FreeBSD scheduler curthread->td_oncpu will no longer be valid (in
particular, just before the thread gets descheduled) so instead I have
implemented this as its own built-in variable.
rstone [Sun, 27 May 2012 14:48:14 +0000 (14:48 +0000)]
MFC r233552
Instead of only iterating over the set of known SDT probes when sdt.ko is
loaded and unloaded, also have sdt.ko register callbacks with kern_sdt.c
that will be called when a newly loaded KLD module adds more probes or
a module with probes is unloaded.
This fixes two issues: first, if a module with SDT probes was loaded after
sdt.ko was loaded, those new probes would not be available in DTrace.
Second, if a module with SDT probes was unloaded while sdt.ko was loaded,
the kernel would panic the next time DTrace had cause to try and do
anything with the no-longer-existent probes.
This makes it possible to create SDT probes in KLD modules, although there
are still two caveats: first, any SDT probes in a KLD module must be part
of a DTrace provider that is defined in that module. At present DTrace
only destroys probes when the provider is destroyed, so you can still
panic the system if a KLD module creates new probes in a provider from a
different module(including the kernel) and then unload the the first module.
Second, the system will panic if you unload a module containing SDT probes
while there is an active D script that has enabled those probes.
rmacklem [Sun, 27 May 2012 01:24:08 +0000 (01:24 +0000)]
MFC: r234740
Fix a leak of namei lookup path buffers that occurs when a
ZFS volume is exported via the new NFS server. The leak occurred
because the new NFS server code didn't handle the case where
a file system sets the SAVENAME flag in its VOP_LOOKUP() and
ZFS does this for the DELETE case.
des [Sat, 26 May 2012 23:42:52 +0000 (23:42 +0000)]
Make yacc a bootstrap tool if building on head, since the new yacc is
not 100% backward compatible. Also, build yacc before lex, because lex
requires yacc but not vice versa.
mdf [Sat, 26 May 2012 20:13:24 +0000 (20:13 +0000)]
MFC r235297, r235316:
Add a -v and -N option to kenv(1), so it can be more easily used in
scripts the way sysctl(8) is. The -N option, like in sysctl(8),
displays only the kenv names, not their values. The -v option prints an
individual kenv variable name with its value as name="value". This is
the inverse of sysctl(8)'s -n flag, since the default behaviour of
kenv(1) is already like sysctl(8) -n.