Jung-uk Kim [Tue, 5 Jul 2011 18:42:10 +0000 (18:42 +0000)]
Correct cpu_monitor() and cpu_mwait() for amd64. These instructions take
%rcx as "extensions" in long mode. If any unused bit is set in %rcx, these
instructions cause general protection fault. Fix style nits and synchronize
i386 with amd64.
Marius Strobl [Tue, 5 Jul 2011 18:40:37 +0000 (18:40 +0000)]
Call pmap_qremove() before freeing or unwiring the pages, otherwise
there's a window during which a page can be re-used before its previous
mapping is removed.
Follow Linux by unconditionally stripping the RX vlan tag from incoming
packets. It turns out that all firmware versions insert it, whether or not
they support VLAN tagging.
Submitted by: glevand <geoffrey.levand at mail dot ru>
o Eliminate flow6_hash_entry in favor of flow_hash_entry. We don't need
a separate struct to start a slist of semi-opaque structs. This
makes some code more compact.
o Rewrite ng_netflow_flow_show() and its API/ABI:
- Support for IPv6 is added.
- Request and response now use same struct. Structure specifies
version (6 or 4), index of last retrieved hash, and also index
of last retrieved entry in the hash entry.
Ed Schouten [Tue, 5 Jul 2011 14:12:48 +0000 (14:12 +0000)]
Only print entries for which ut_host points to a character device.
Now that we use utmpx, we more often have entries for which the ut_line
is left blank. To prevent us from returning struct stat for "/dev/",
check that the resulting stat structure belongs to a character device.
This new version of _fget() requires new parameters:
- cap_rights_t needrights
the rights that we expect the capability's rights mask to include
(e.g. CAP_READ if we are going to read from the file)
- cap_rights_t *haverights
used to return the capability's rights mask (ignored if NULL)
- u_char *maxprotp
the maximum mmap() rights (e.g. VM_PROT_READ) that can be permitted
(only used if we are going to mmap the file; ignored if NULL)
- int fget_flags
FGET_GETCAP if we want to return the capability itself, rather than
the underlying object which it wraps
Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc
Rick Macklem [Mon, 4 Jul 2011 23:32:09 +0000 (23:32 +0000)]
The algorithm used by nfscl_getopen() could have resulted in
multiple instances of the same lock_owner when a process both
inherited an open file descriptor plus opened the same file itself.
Since some NFSv4 servers cannot handle multiple instances of
the same lock_owner string, this patch changes the algorithm
used by nfscl_getopen() in the new NFSv4 client to keep that
from happening. The new algorithm is simpler, since there is
no longer any need to ascend the process's parentage tree because
all NFSv4 Closes for a file are done at VOP_INACTIVE()/VOP_RECLAIM(),
making the Opens indistinct w.r.t. use with Lock Ops.
This problem was discovered at the recent NFSv4 interoperability
Bakeathon.
Jeff Roberson [Mon, 4 Jul 2011 22:08:04 +0000 (22:08 +0000)]
- Speed up pendingblock processing again. Having too much delay between
ffs_blkfree() and the pending adjustment causes all kinds of
space related problems.
Jeff Roberson [Mon, 4 Jul 2011 20:53:55 +0000 (20:53 +0000)]
- It is impossible to run request_cleanup() while doing a copyonwrite.
This will most likely cause new block allocations which can recurse
into request cleanup.
- While here optimize the ufs locking slightly. We need only acquire and
drop once.
- process_removes() and process_truncates() also is only needed once.
- Attempt to flush each item on the worklist once but do not loop forever
if some can not be completed.
pf(4) tags now store the state key but tcp_respond tries to reuse a mbuf as an optimization.
This makes pf find the wrong state and cause errors reported with state mismatches.
Clear the cached state link on the pf(4) tag to avoid the state mismatches.
cap_funwrap() and cap_funwrap_mmap() unwrap capabilities, exposing the
underlying object. Attempting to unwrap a capability with an inadequate
rights mask (e.g. calling cap_funwrap(fp, CAP_WRITE | CAP_MMAP, &result)
on a capability whose rights mask is CAP_READ | CAP_MMAP) will result in
ENOTCAPABLE.
Unwrapping a non-capability is effectively a no-op.
These functions will be used by Capsicum-aware versions of _fget(), etc.
Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc
- Remove the now unused CPU_NAND_ATOMIC()
- Add a comment explaining that CPU_OR_ATOMIC() and
CPU_COPY_STORE_REL() are special wrappers used to cater particular
cases.
With retirement of cpumask_t and usage of cpuset_t for representing a
mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.
Remove them and replace their usage with custom pc_cpuid magic (as,
atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and
pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).
This change is not targeted for MFC because of struct pcpu members
removal and dependency by cpumask_t retirement.
- Use refcount(9) API to manage node and hook refcounting.
- Make ng_unref_node() void, since caller shouldn't be
interested in whether node is valid after call or not,
since it can't be guaranteed to be valid. [1]
ARP code reuses mbuf from ARP request to make a reply, but it does not
reset rcvif to NULL. Since rcvif is not NULL, ipfw(4) supposes that ARP
replies were received on specified interface.
Reset rcvif to NULL for ARP replies to fix this issue.
Rick Macklem [Sun, 3 Jul 2011 21:44:26 +0000 (21:44 +0000)]
Modify the new NFSv4 client so that it appends a file handle
to the lock_owner4 string that goes on the wire. Also, add
code to do a ReleaseLockOwner Op on the lock_owner4 string
before a Close. Apparently not all NFSv4 servers handle multiple
instances of the same lock_owner4 string, at least not in a
compatible way. This patch avoids having multiple instances,
except for one unusual case, which will be fixed by a future commit.
Found at the recent NFSv4 interoperability Bakeathon.
Ed Schouten [Sun, 3 Jul 2011 20:59:57 +0000 (20:59 +0000)]
Improve portability of config(8).
- Use strlen(dp->d_name) instead of the unportable dp->d_namlen. Rename
i to len to make it slightly more descriptive and prevent negative
indexing of the array.
- Replace index() by strchr().
Tag mbufs of all incoming frames or packets with the interface's FIB
setting (either default or if supported as set by SIOCSIFFIB, e.g.
from ifconfig).
Submitted by: Alexander V. Chernikov (melifaro ipfw.ru)
Reviewed by: julian
MFC after: 2 weeks
Alan Cox [Sat, 2 Jul 2011 23:34:47 +0000 (23:34 +0000)]
Initialize marker pages as held rather than fictitious/wired. Marking the
page as held is more useful as a safety precaution in case someone forgets
to check for PG_MARKER.
Fix problem about USB MIDI TX data format, that some devices only accept
a maximum of 4 bytes (one command) per short terminated USB transfer.
Optimise the TX case by sending multiple USB frames.
Ed Schouten [Sat, 2 Jul 2011 13:54:20 +0000 (13:54 +0000)]
Reintroduce the cioctl() hook in the TTY layer for digi(4).
The cioctl() hook can be used by drivers to add ioctls to the *.init and
*.lock devices. This commit breaks the ttydevsw ABI, since this
structure didn't provide any padding. To prevent ABI breakage in the
future, add a tsw_spare.
Submitted by: Peter Jeremy <peter jeremy alcatel lucent com>
Obtained from: kern/152254 (slightly modified)
Marius Strobl [Sat, 2 Jul 2011 12:56:03 +0000 (12:56 +0000)]
UltraSPARC-IV CPUs seem to be affected by a not publicly documented
erratum causing them to trigger stray vector interrupts accompanied by a
state in which they even fault on locked TLB entries. Just retrying the
instruction in that case gets the CPU back on track though. OpenSolaris
also just ignores a certain number of stray vector interrupts.
While at it, implement the stray vector interrupt handling for SPARC64-VI
which use these for indicating uncorrectable errors in interrupt packets.
Marius Strobl [Sat, 2 Jul 2011 11:14:54 +0000 (11:14 +0000)]
- For Cheetah- and Zeus-class CPUs don't flush all unlocked entries from
the TLBs in order to get rid of the user mappings but instead traverse
them an flush only the latter like we also do for the Spitfire-class.
Also flushing the unlocked kernel entries can cause instant faults which
when called from within cpu_switch() are handled with the scheduler lock
held which in turn can cause timeouts on the acquisition of the lock by
other CPUs. This was easily seen with a 16-core V890 but occasionally
also happened with 2-way machines.
While at it, move the SPARC64-V support code entirely to zeus.c. This
causes a little bit of duplication but is less confusing than partially
using Cheetah-class bits for these.
- For SPARC64-V ensure that 4-Mbyte page entries are stored in the 1024-
entry, 2-way set associative TLB.
- In {d,i}tlb_get_data_sun4u() turn off the interrupts in order to ensure
that ASI_{D,I}TLB_DATA_ACCESS_REG actually are read twice back-to-back.
Tested by: Peter Jeremy (16-core US-IV), Michael Moll (2-way SPARC64-V)
- Fix typo in check_for_nested_with_variably_modified present
- Implement -Wvariable-decl.
- Port -Wtrampolines support from gcc3.
(all three also via OpenBSD)
Marius Strobl [Fri, 1 Jul 2011 18:31:59 +0000 (18:31 +0000)]
Fix r223695 to compile on architectures which don't use the MBR scheme; wrap
the MBR support in the common part of the loader in #ifdef's and enable it
only for userboot for now.
Marcel Moolenaar [Thu, 30 Jun 2011 20:34:55 +0000 (20:34 +0000)]
Change the management of nested faults by switching to physical
addressing while reading or writing the trap frame. It's not
possible to guarantee that the one translation cache entry that
we depend on is not going to get purged by the CPU. We already
know that global shootdowns (ptc.g and/or ptc.ga) can (and will)
cause multiple TC entries to get purged and we initialize tried
to handle that by serializing kernel entry with these operations.
However, we need to serialize kernel exit as well.
But even if we can serialize, it appears that CPU threads within
a core can affect each other's TC entries beyond the global
shootdown. This would mean serializing any and all translatation
cache updates with the threads in a core with the kernel entry
and exit of any thread in that core. This is just too painful
and complicated.
Since we already properly coded for the 2 nested faults that we
can get, all we need to do is use those to obtain the physical
address of the trap frame, switch to physical mode and in that
way eliminate any further faults. The trap frame is already
aligned to 1KB boundaries to make sure we don't cross the page
boundary, this is safe to do.
We still need to serialize ptc.g or ptc.ga across CPUs because
the platform can only have 1 such operation outstanding at the
same time. We can now use a regular (spin) lock for this.
Also, it has been observed that we can get a nested TLB faults
for region 7 virtual addresses. This was unexpected. For now,
we enhance the nested TLB fault handler to deal with those as
well, but it needs to be understood.
Doug Rabson [Thu, 30 Jun 2011 16:08:56 +0000 (16:08 +0000)]
Add a version of the FreeBSD bootloader which can run in userland, packaged
as a shared library. This is intended to be used by BHyVe to load FreeBSD
kernels into new virtual machines.
When Capsicum starts creating capabilities to wrap existing file
descriptors, we will want to allocate a new descriptor without installing
it in the FD array.
Split falloc() into falloc_noinstall() and finstall(), and rewrite
falloc() to call them with appropriate atomicity.
Add some checks to ensure that Capsicum is behaving correctly, and add some
more explicit comments about what's going on and what future maintainers
need to do when e.g. adding a new operation to a sys_machdep.c.
Sergey Kandaurov [Thu, 30 Jun 2011 09:20:26 +0000 (09:20 +0000)]
Fix quota(1) output.
- Fix calculation of 1024-byte sized blocks from disk blocks shown when -h
option isn't specified. It was broken with quota64 integration.
- In prthumanval(): limit the size of a buffer passed to humanize_number()
to a width of 5 bytes but allow a shorter length if requested. That's what
users expect.
Alan Cox [Wed, 29 Jun 2011 16:40:41 +0000 (16:40 +0000)]
Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.
This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write(). It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.
Update all of the existing assertions on pmap_remove_all() to reflect this
change.
John Baldwin [Wed, 29 Jun 2011 16:20:52 +0000 (16:20 +0000)]
- Add read-only sysctls for all of the tunables supported by the igb and
em drivers.
- Make the per-instance 'enable_aim' sysctl truly per-instance by having it
change a per-instance variable (which is used to control AIM) rather
than having all of the per-instance sysctls operate on a single global
variable.
Adrian Chadd [Wed, 29 Jun 2011 13:21:52 +0000 (13:21 +0000)]
Fix a corner case in STA beacon processing when a CSA is received but
the AP doesn't transmit beacons.
If the AP requests a CSA (ie, a channel switch) and then enters CAC
(channel availability check) for 60 seconds, it doesn't send beacons
and it just listens for radar events (and other things which we don't
do yet.)
Now, ath_newstate() was not resetting the beacon timer config on
a transition to the RUN state when in STA mode - it was setting
sc_syncbeacon, which simply updates the beacon config from the
contents of the next received beacon.
This means the STA never generates beacon miss events.
If the AP goes into CAC for 60 seconds and recovers, the STA will
happily receive the first beacon and reconfigure timers.
But if it gets a radar event after that, it'll change channel
again, not notify the station that it's changed channel..
and since the station is happily waiting for the first beacon
to configure the beacon timer details from, it won't ever
generate a beacon miss interrupt and it'll sit there forever
(or until the AP appears on that channel once again.)
This change forces the last known beacon timer config to be
written to hardware on a transition from CSA->RUN in STA mode.
This forces bmiss events to occur and the STA will eventually
(after a handful of beacon miss events) begin scanning for
another access point.
We may split today's CAPABILITIES into CAPABILITY_MODE (which has
to do with global namespaces) and CAPABILITIES (which has to do with
constraining file descriptors). Just in case, and because it's a better
name anyway, let's move CAPABILITIES out of the way.
Also, change opt_capabilities.h to opt_capsicum.h; for now, this will
only hold CAPABILITY_MODE, but it will probably also hold the new
CAPABILITIES (implying constrained file descriptors) in the future.
Bjoern A. Zeeb [Wed, 29 Jun 2011 13:01:10 +0000 (13:01 +0000)]
In case ntp cannot resolve a hostname on startup it will queue the entry
for resolving by a child process that, upon success, will add the entry
to the config of the running running parent process.
Unfortunately there are a couple of bugs with this, fixed in various
later versions of upstream in potentially different ways due to other
code changes:
1) Upon server [-46] <FQDN> the [-46] are used as FQDN for later resolving
which does not work. Make sure we always pass the name (or IP there).
2) The intermediate file to carry the information to the child process
does not know about -4/-6 restrictions, so that a dual-stacked host
could resolve to an IPv6 address but that might be unreachable (see
r223626) leading to no working synchronization ignoring a IPv4 record.
Thus alter the intermediate format to also pass the address family
(AF_UNSPEC (default), AF_INET or AF_INET6) to the child process
depending on -4 or -6.
3) Make the child process to parse the new intermediate file format and
save the address family for getaddrinfo() hints flags.
4) Change child to always reload resolv.conf calling res_init() before
trying to resolve names. This will pick up resolv.conf changes or
new resolv.confs should they have not existed or been empty or
unusable on ntp startup. This fix is more conditional in upstream
versions but given FreeBSD has res_init there is no need for the
configure logic as well.
Add new rule actions "call" and "return" to ipfw. They make
possible to organize subroutines with rules.
The "call" action saves the current rule number in the internal
stack and rules processing continues from the first rule with
specified number (similar to skipto action). If later a rule with
"return" action is encountered, the processing returns to the first
rule with number of "call" rule saved in the stack plus one or higher.
Improve error reporting. Use corresponding error message when file to be
preprocessed is missing. Also suggest to use absolute pathname if -p option
is specified.
Initialize elements of state array when creating the GPT table.
This fixes the problem, when the secondary GPT header is not erased when
partition table destroyed. Move equal operations from g_part_gpt_create
and g_part_gpt_recover to the separate function g_gpt_set_defaults.
Rick Macklem [Tue, 28 Jun 2011 22:52:38 +0000 (22:52 +0000)]
Fix the new NFSv4 client so that it doesn't fill the cached
mode attribute in as 0 when doing writes. The change adds
the Mode attribute plus the others except Owner and Owner_group
to the list requested by the NFSv4 Write Operation. This fixed
a problem where an executable file built by "cc" would get mode
0111 instead of 0755 for some NFSv4 servers.
Found at the recent NFSv4 interoperability Bakeathon.
Mikolaj Golub [Tue, 28 Jun 2011 21:01:32 +0000 (21:01 +0000)]
Check the returned value of activemap_write_complete() and update matadata on
disk if needed. This should fix a potential case when extents are cleared in
activemap but metadata is not updated on disk.
Mikolaj Golub [Tue, 28 Jun 2011 20:57:54 +0000 (20:57 +0000)]
Make activemap_write_start/complete check the keepdirty list, when
stating if we need to update activemap on disk. This makes keepdirty
serve its purpose -- to reduce number of metadata updates.
Marius Strobl [Tue, 28 Jun 2011 16:16:43 +0000 (16:16 +0000)]
- In gem_reset_rx() also reset the RX MAC which is necessary in order to
get it out of a stuck condition that can be caused by GEM_MAC_RX_OVERFLOW.
- In gem_reset_rxdma() call gem_setladrf() in order to reprogram the RX
filter and restore the previous content of GEM_MAC_RX_CONFIG. While at it
consistently use the newly introduced sc_mac_rxcfg throughout the driver
instead of reading the its old content.
- Increment if_iqdrops instead of if_ierrors in case of RX buffer allocation
failure.
- According to the GEM datasheet the RX MAC should also be disabled in
gem_setladrf() before changing its configuration.
- Add error messages to gem_disable_{r,t}x() and take advantage of these
throughout the driver instead of duplicating their functionality all over
the place.
Bjoern A. Zeeb [Tue, 28 Jun 2011 09:46:25 +0000 (09:46 +0000)]
Compare port numbers correctly. They are stored by SRCPORT()
in host byte order, so we need to compare them as such.
Properly compare IPv6 addresses as well.
This allows the, by default, 8 badaddrs slots per address
family to work correctly and only print sendto() errors once.
The change is no longer applicable to any latest upstream versions.