Justin T. Gibbs [Thu, 16 Feb 2012 21:58:47 +0000 (21:58 +0000)]
Fix a bug in the calculation of the maximum I/O request size.
The previous code did not limit the I/O request size based on
the maximum number of segments supported by the back-end. In
current practice, since the only back-end supporting chained
requests is the FreeBSD implementation, this limit was never
exceeded.
sys/dev/xen/blkfront/block.h:
Add two macros, XBF_SEGS_TO_SIZE() and XBF_SIZE_TO_SEGS(),
to centralize the logic of reserving a segment to deal with
non-page-aligned I/Os.
sys/dev/xen/blkfront/blkfront.c:
o When negotiating transfer parameters, limit the
max_request_size we use and publish, if it is greater
than the maximum, unaligned, I/O we can support with
the number of segments advertised by the backend.
o Don't unilaterally reduce the I/O size published to
the disk layer by a single page. max_request_size
is already properly limited in the transfer parameter
negotiation code.
o Fix typos in printf strings:
"max_requests_segments" -> "max_request_segments"
"specificed" -> "specified"
Gleb Smirnoff [Thu, 16 Feb 2012 19:10:01 +0000 (19:10 +0000)]
Refactor the name hash and the ID hash, that are used to address nodes:
- Make hash sizes growable, to satisfy users running large mpd
installations, having thousands of nodes.
- NG_NAMEHASH() proved to give a very bad distribution in real life
name sets, while generic hash32_str(name, HASHINIT) proved to give
an even one, so you the latter for name hash.
- Do not store unnamed nodes in slot 0 of name hash, no reason for that.
- Use the ID hash in cases when we need to run through all nodes: the
NGM_LISTNODES command and in the vnet_netgraph_uninit().
- Implement NGM_LISTNODES and NGM_LISTNAMES as separate code, the former
iterates through the ID hash, and the latter through the name hash.
- Keep count of all nodes and of named nodes, so that we don't need
to count nodes in NGM_LISTNODES and NGM_LISTNAMES. The counters are
also used to estimate whether we need to grow hashes.
- Close a race between two threads running ng_name_node() assigning same
name to different nodes.
Alan Cox [Thu, 16 Feb 2012 06:45:51 +0000 (06:45 +0000)]
When vm_mmap() is used to map a vm object into a kernel vm_map, it
makes no sense to check the size of the kernel vm_map against the
user-level resource limits for the calling process.
Jung-uk Kim [Wed, 15 Feb 2012 23:33:22 +0000 (23:33 +0000)]
Clean up RFLAG and CR3 register handling and nearby comments. For BSP, use
spinlock_enter()/spinlock_exit() to save/restore RFLAGS. We know interrupt
is disabled when returning from S3. For AP, we do not have to save/restore
it because IRET will do it for us any way. Do not save CR3 locally because
savectx() does it and BSP does not have to switch to kernel map for amd64.
Change contigmalloc(9) flag while I am in the neighborhood.
Luigi Rizzo [Wed, 15 Feb 2012 23:13:29 +0000 (23:13 +0000)]
(This commit only touches code within the DEV_NETMAP blocks)
Introduce some functions to map NIC ring indexes into netmap ring
indexes and vice versa. This way we can implement the bound
checks only in one place (and hopefully in a correct way).
On passing, make the code and comments more uniform across the
various drivers.
Jung-uk Kim [Wed, 15 Feb 2012 22:49:25 +0000 (22:49 +0000)]
Set up an event handler to turn off speaker if user requested it. Speaker
will stop beeping after all device drivers are resumed. Use proper API to
"acquire" and "release" PIC timer2 for consistency and correctness.
Jung-uk Kim [Wed, 15 Feb 2012 21:32:05 +0000 (21:32 +0000)]
Some BIOSes are known for corrupting low 64KB between suspend and resume.
Mask off the first 16 pages unless we appear to be running in a VM. This
address may be overridden by 'hw.physmem.start' tunable from loader.
Note Linux used to have a BIOS quirk table for this issue but it seems they
made it default recently.
Luigi Rizzo [Wed, 15 Feb 2012 18:59:26 +0000 (18:59 +0000)]
reduce the differences between these three files.
The three drivers (em, lem and igb) are extremely similar, too bad
that the structures use different names and we cannot share the code.
Bjoern A. Zeeb [Wed, 15 Feb 2012 16:09:56 +0000 (16:09 +0000)]
Fix PAWS (Protect Against Wrapped Sequence numbers) in cases when
hz >> 1000 and thus getting outside the timestamp clock frequenceny of
1ms < x < 1s per tick as mandated by RFC1323, leading to connection
resets on idle connections.
Always use a granularity of 1ms using getmicrouptime() making all but
relevant callouts independent of hz.
Use getmicrouptime(), not getmicrotime() as the latter may make a jump
possibly breaking TCP nfsroot mounts having our timestamps move forward
for more than 24.8 days in a second without having been idle for that
long.
Add additional check to EBR probe and create methods:
don't try probe and create EBR scheme when parent partition type
is not "ebr". This fixes error messages about corrupted EBR for
some partitions where is actually another partition scheme.
NOTE: if you have EBR on the partition with different than "ebr"
(0x05) type, then you will lost access to partitions until it will be
changed.
Justin T. Gibbs [Wed, 15 Feb 2012 06:45:49 +0000 (06:45 +0000)]
Enhance documentation, improve interoperability, and fix defects in
FreeBSD's front and back Xen blkif interface drivers.
sys/dev/xen/blkfront/block.h:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkback/blkback.c:
Replace FreeBSD specific multi-page ring impelementation with
support for both the Citrix and Amazon/RedHat versions of this
extension.
sys/dev/xen/blkfront/blkfront.c:
o Add a per-instance sysctl tree that exposes all negotiated
transport parameters (ring pages, max number of requests,
max request size, max number of segments).
o In blkfront_vdevice_to_unit() add a missing return statement
so that we properly identify the unit number for high numbered
xvd devices.
sys/dev/xen/blkback/blkback.c:
o Add static dtrace probes for several events in this driver.
o Defer connection shutdown processing until the front-end
enters the closed state. This avoids prematurely tearing
down the connection when buggy front-ends transition to the
closing state, even though the device is open and they
veto the close request from the tool stack.
o Add nodes for maximum request size and the number of active
ring pages to the exising, per-instance, sysctl tree.
o Miscelaneous style cleanup.
sys/xen/interface/io/blkif.h:
o Add extensive documentation of the XenStore nodes used to
implement the blkif interface.
o Document the startup sequence between a front and back driver.
o Add structures and documenatation for the "discard" feature
(AKA Trim).
o Cleanup some definitions related to FreeBSD's request
number/size/segment-limit extension.
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkback/blkback.c:
sys/xen/xenbus/xenbusvar.h:
Add the convenience function xenbus_get_otherend_state() and
use it to simplify some logic in both block-front and block-back.
Robert Watson [Tue, 14 Feb 2012 20:34:25 +0000 (20:34 +0000)]
When initialising the CP0 status register during boot on 64-bit MIPS,
set all three of the kernel, supervisor, and user-mode 64-bit mode
flags. While FreeBSD does not currently use the supervisor ring (and
hence this is effectively a NOP on most systems), doing this avoids
triggering an exception on 64-bit MIPS CPUs that don't support 32-bit
compatibility mode, and therefore don't allow clearing the SX bit.
Reviewed by: gonzo
MFC after: 3 days
Sponsored by: DARPA, SRI International
Add options for program (-p) and to turn off waiting (-w) which is now
on by default.
The default is to wait after each counter is tested. Since the prompt
would go to stdout you won't see it if you're redirecting the output
of the executed sub-program to /dev/null, so just press return to
continue or Ctrl-D to stop.
Tijl Coosemans [Tue, 14 Feb 2012 12:50:20 +0000 (12:50 +0000)]
Change some headers such that lang/gcc* ports no longer patch them.
The lang/gcc* ports patch headers where they think something is
non-standard. These patched headers override the system headers which means
you have to rebuild these ports whenever you do installworld to make sure
they contain the latest changes.
David Chisnall [Tue, 14 Feb 2012 12:03:23 +0000 (12:03 +0000)]
Cleanup of xlocale:
- Address performance regressions encountered by das@ by caching per-thread
data in TLS where available.
- Add a __NO_TLS flag to cdefs.h to indicate where not available.
- Reorganise the xlocale.h definitions into xlocale/*.h so that they can be
included from multiple places.
- Export the POSIX2008 subset of xlocale when POSIX2008 says it should be
exported, independently of whether xlocale.h is included.
- Fix the bug where programs using ctype functions always assumed ASCII unless
recompiled.
- Fix some style(9) violations.
Reviewed by: brooks (mentor)
Approved by: dim (mentor)
Michael Tuexen [Tue, 14 Feb 2012 12:00:34 +0000 (12:00 +0000)]
Fix a bug where the wrong protocol overhead was used. This can lead
to a deadlock of an association when an IPv6 socket was used to
communcate with IPv4 and an ICMPv4 fragmentation needed message
was received.
While there, simplify the code a bit.
Doug Barton [Tue, 14 Feb 2012 10:51:24 +0000 (10:51 +0000)]
Fix various issues with the NFS and RPC related scripts:
1. Add new functionality to the force_depend method to incorporate the
tests for whether the service is enabled and/or already running.
2. Add a new option to bypass checking only that the service is enabled
at boot time, and always check if it is running.
3. Use this new functionality to greatly simplify the rc.d scripts that
use force_depend.
4. Add a force_depend for statd in lockd
5. Remove the check that either nfs_server or nfs_client is _enable'd
from statd and lockd. This was always overkill, and prevented using
the {one|force}start options, as well as stop'ing on the command line.
6. The yp* scripts had some of their arguments in various weird orders.
Bring them into line with the model.
7. If mountd fails to create /var/db/mountdtab, err out.
Ideas, suggestions, and/or review from delphij and jilles.
Pointy hats are completely my responsibility however.
Alexander Motin [Tue, 14 Feb 2012 09:19:30 +0000 (09:19 +0000)]
Do not handle MOD_SHUTDOWN equally to MOD_UNLOAD in sound kernel module.
MOD_SHUTDOWN is not an end of existence, and there is a life after it.
In particular, code previously called on MOD_SHUTDOWN grabbed lock and
deallocated unit numbering. That caused infinite wait loop if snd_uaudio
tried to destroy its PCM device after that point.
Add a rudimentary test to run through all the available counters on a
system and then execute a program with pmcstat in counting mode.
The program will verify that all counters fire and that the code neither
panics the system nor locks it up. This should be considered a first pass
conformance test for new sets of counters being added to hwpmc(4).
Marius Strobl [Tue, 14 Feb 2012 00:18:35 +0000 (00:18 +0000)]
- As it turns out, MSI-X is broken for at least LSI SAS1068E when passed
through by VMware so blacklist their PCI-PCI bridge for MSI/MSI-X here.
Note that besides currently there not being a quirk type that disables
MSI-X only and there's no evidence that MSI doesn't work with the VMware
pass-through, it's really questionable whether MSI generally works in
that setup as VMware only mention three know working devices [1, p. 4].
Also not that this quirk entry currently doesn't affect the devices
emulated by VMware in any way as these don't claim support MSI/MSI-X to
begin with. [2]
While at it, make the PCI quirk table const and static.
- Remove some duplicated empty lines.
- Use DEVMETHOD_END.
Ed Maste [Mon, 13 Feb 2012 16:48:49 +0000 (16:48 +0000)]
Add a sysctl to report the firmware build number.
Some older firmware versions have issues that can be worked around by
avoiding certain operations. Add a sysctl dev.aac.#.firmware_build to
make it easy for scripts or userland tools to detect the firmware
version.
Ed Schouten [Mon, 13 Feb 2012 11:59:59 +0000 (11:59 +0000)]
Polish diff against upstream.
- Revert unneeded whitespace changes.
- Revert modifications to loginrec.c, as the upstream version already
does the right thing.
- Fix indentation and whitespace of local changes.
Adrian Chadd [Mon, 13 Feb 2012 07:47:36 +0000 (07:47 +0000)]
Correct the 802.11s mesh configuration structure and related tidbits.
* Change the mesh IE size to be IEEE80211_MESH_CONF_SZ rather than the
size of the structure;
* conf_cap is now a uint8_t rather than a uint16_t (uint16_t in D3.0,
uint8_t in the amendment spec);
* Update mesh config capability bits - earlier bits were from draft X,
current is amendment spec;
* Update the following to be an enum rather than #define and added
a VENDOR entry too from the amendment spec;
IEEE80211_MESHCONF_PATH_*
IEEE80211_MESHCONF_METRIC_*
IEEE80211_MESHCONF_CC_*
IEEE80211_MESHCONF_SYNC_*
IEEE80211_MESHCONF_AUTH_*
* Kept IEEE80211_MESHCONF_FORM_* and IEEE80211_MESHCONF_CAP_* as
defines because they are defined in a way that we need to mask in/out
information;
* In IEEE80211_MESHCONF_CAP_* IEEE80211_MESHCONF_CAP_TBTTA is removed
and 0x80 is made reserved as defined in the amendment spec.
Ed Maste [Mon, 13 Feb 2012 01:44:12 +0000 (01:44 +0000)]
Fix panic after "WARNING - ATA_IDENTIFY taskqueue timeout"
When performing a firmware upgrade via atacontrol[1] the subsequent
command may time out producing the error message above. When this
happens the callout could still be active, and the system would then
panic due to a destroyed semaphore.
Instead, ensure that the callout is done first, via callout_drain.
Note that this fix applies to the "old" ata(4) and so isn't applicable
to the default configuration in HEAD. It is still applicable to
stable/8.
Adrian Chadd [Mon, 13 Feb 2012 00:28:41 +0000 (00:28 +0000)]
Attempt to address some potential vap->iv_bss race conditions.
There are unfortunately a number of situations where vap->iv_bss is changed
or freed by some code in net80211. Because multiple threads can concurrently
be doing work (and the vap->iv_bss access isn't at all done behind any kind
of lock), it's quite possible that:
* a change will occur in one thread - eg, by a call through
ieee80211_sta_join1();
* a state change occurs in another thread - eg an RX is scheduled
in the ath tasklet and it calls ieee80211_input_mimo_all(), which
does dereference vap->iv_bss;
* these two executing concurrently, causing things to explode.
Another instance is ath_beacon_alloc() which takes an ieee80211_node *.
It's called with the vap->iv_bss node from ath_newstate(). If the node has
changed in the meantime (say it's been freed elsewhere) the reference
that it grabbed _before_ refcounting it may be stale.
I would _prefer_ that these sorts of things were serialised somewhere but
that may be a bit much to ask. Instead, the best we can (currently) hope
is that the underlying bss node is still (somewhat) valid.
There is a related PR (kern/164382) described by the first case above.
That should be fixed by properly serialising the RX path and reset path
so an RX can't occur at the same time as the vap free/shutdown path.
This is inspired by some related fixes in r212127.
Brooks Davis [Sun, 12 Feb 2012 23:18:05 +0000 (23:18 +0000)]
Prevent periodic scripts that run longer than the expected period from
starting up before the previous script finishes. This prevents an
infinite number of them from piling up and slowing a system down.
Since all the refactoring to make this happen required churning the
indenting of most of this file, make the indentation more consistent.
Ed Schouten [Sun, 12 Feb 2012 18:29:56 +0000 (18:29 +0000)]
Globally replace u_int*_t from (non-contributed) man pages.
The reasoning behind this, is that if we are consistent in our
documentation about the uint*_t stuff, people will be less tempted to
write new code that uses the non-standard types.
I am not going to bump the man page dates, as these changes can be
considered style nits. The meaning of the man pages is unaffected.
Andriy Gapon [Sun, 12 Feb 2012 14:58:50 +0000 (14:58 +0000)]
start watchdogd before most of other daemons/servers
The main benefit is that watchdogd would shutdown after most of other
daemons/servers and thus, for example, would remedy a system hang caused
by unlucky X server shutdown.