jonathan [Mon, 4 Jul 2011 14:40:32 +0000 (14:40 +0000)]
Add kernel functions to unwrap capabilities.
cap_funwrap() and cap_funwrap_mmap() unwrap capabilities, exposing the
underlying object. Attempting to unwrap a capability with an inadequate
rights mask (e.g. calling cap_funwrap(fp, CAP_WRITE | CAP_MMAP, &result)
on a capability whose rights mask is CAP_READ | CAP_MMAP) will result in
ENOTCAPABLE.
Unwrapping a non-capability is effectively a no-op.
These functions will be used by Capsicum-aware versions of _fget(), etc.
Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc
- Remove the now unused CPU_NAND_ATOMIC()
- Add a comment explaining that CPU_OR_ATOMIC() and
CPU_COPY_STORE_REL() are special wrappers used to cater particular
cases.
With retirement of cpumask_t and usage of cpuset_t for representing a
mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.
Remove them and replace their usage with custom pc_cpuid magic (as,
atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and
pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).
This change is not targeted for MFC because of struct pcpu members
removal and dependency by cpumask_t retirement.
- Use refcount(9) API to manage node and hook refcounting.
- Make ng_unref_node() void, since caller shouldn't be
interested in whether node is valid after call or not,
since it can't be guaranteed to be valid. [1]
ARP code reuses mbuf from ARP request to make a reply, but it does not
reset rcvif to NULL. Since rcvif is not NULL, ipfw(4) supposes that ARP
replies were received on specified interface.
Reset rcvif to NULL for ARP replies to fix this issue.
Modify the new NFSv4 client so that it appends a file handle
to the lock_owner4 string that goes on the wire. Also, add
code to do a ReleaseLockOwner Op on the lock_owner4 string
before a Close. Apparently not all NFSv4 servers handle multiple
instances of the same lock_owner4 string, at least not in a
compatible way. This patch avoids having multiple instances,
except for one unusual case, which will be fixed by a future commit.
Found at the recent NFSv4 interoperability Bakeathon.
- Use strlen(dp->d_name) instead of the unportable dp->d_namlen. Rename
i to len to make it slightly more descriptive and prevent negative
indexing of the array.
- Replace index() by strchr().
Tag mbufs of all incoming frames or packets with the interface's FIB
setting (either default or if supported as set by SIOCSIFFIB, e.g.
from ifconfig).
Submitted by: Alexander V. Chernikov (melifaro ipfw.ru)
Reviewed by: julian
MFC after: 2 weeks
Initialize marker pages as held rather than fictitious/wired. Marking the
page as held is more useful as a safety precaution in case someone forgets
to check for PG_MARKER.
Fix problem about USB MIDI TX data format, that some devices only accept
a maximum of 4 bytes (one command) per short terminated USB transfer.
Optimise the TX case by sending multiple USB frames.
Reintroduce the cioctl() hook in the TTY layer for digi(4).
The cioctl() hook can be used by drivers to add ioctls to the *.init and
*.lock devices. This commit breaks the ttydevsw ABI, since this
structure didn't provide any padding. To prevent ABI breakage in the
future, add a tsw_spare.
Submitted by: Peter Jeremy <peter jeremy alcatel lucent com>
Obtained from: kern/152254 (slightly modified)
marius [Sat, 2 Jul 2011 12:56:03 +0000 (12:56 +0000)]
UltraSPARC-IV CPUs seem to be affected by a not publicly documented
erratum causing them to trigger stray vector interrupts accompanied by a
state in which they even fault on locked TLB entries. Just retrying the
instruction in that case gets the CPU back on track though. OpenSolaris
also just ignores a certain number of stray vector interrupts.
While at it, implement the stray vector interrupt handling for SPARC64-VI
which use these for indicating uncorrectable errors in interrupt packets.
marius [Sat, 2 Jul 2011 11:14:54 +0000 (11:14 +0000)]
- For Cheetah- and Zeus-class CPUs don't flush all unlocked entries from
the TLBs in order to get rid of the user mappings but instead traverse
them an flush only the latter like we also do for the Spitfire-class.
Also flushing the unlocked kernel entries can cause instant faults which
when called from within cpu_switch() are handled with the scheduler lock
held which in turn can cause timeouts on the acquisition of the lock by
other CPUs. This was easily seen with a 16-core V890 but occasionally
also happened with 2-way machines.
While at it, move the SPARC64-V support code entirely to zeus.c. This
causes a little bit of duplication but is less confusing than partially
using Cheetah-class bits for these.
- For SPARC64-V ensure that 4-Mbyte page entries are stored in the 1024-
entry, 2-way set associative TLB.
- In {d,i}tlb_get_data_sun4u() turn off the interrupts in order to ensure
that ASI_{D,I}TLB_DATA_ACCESS_REG actually are read twice back-to-back.
Tested by: Peter Jeremy (16-core US-IV), Michael Moll (2-way SPARC64-V)
- Fix typo in check_for_nested_with_variably_modified present
- Implement -Wvariable-decl.
- Port -Wtrampolines support from gcc3.
(all three also via OpenBSD)
marius [Fri, 1 Jul 2011 18:31:59 +0000 (18:31 +0000)]
Fix r223695 to compile on architectures which don't use the MBR scheme; wrap
the MBR support in the common part of the loader in #ifdef's and enable it
only for userboot for now.
marcel [Thu, 30 Jun 2011 20:34:55 +0000 (20:34 +0000)]
Change the management of nested faults by switching to physical
addressing while reading or writing the trap frame. It's not
possible to guarantee that the one translation cache entry that
we depend on is not going to get purged by the CPU. We already
know that global shootdowns (ptc.g and/or ptc.ga) can (and will)
cause multiple TC entries to get purged and we initialize tried
to handle that by serializing kernel entry with these operations.
However, we need to serialize kernel exit as well.
But even if we can serialize, it appears that CPU threads within
a core can affect each other's TC entries beyond the global
shootdown. This would mean serializing any and all translatation
cache updates with the threads in a core with the kernel entry
and exit of any thread in that core. This is just too painful
and complicated.
Since we already properly coded for the 2 nested faults that we
can get, all we need to do is use those to obtain the physical
address of the trap frame, switch to physical mode and in that
way eliminate any further faults. The trap frame is already
aligned to 1KB boundaries to make sure we don't cross the page
boundary, this is safe to do.
We still need to serialize ptc.g or ptc.ga across CPUs because
the platform can only have 1 such operation outstanding at the
same time. We can now use a regular (spin) lock for this.
Also, it has been observed that we can get a nested TLB faults
for region 7 virtual addresses. This was unexpected. For now,
we enhance the nested TLB fault handler to deal with those as
well, but it needs to be understood.
dfr [Thu, 30 Jun 2011 16:08:56 +0000 (16:08 +0000)]
Add a version of the FreeBSD bootloader which can run in userland, packaged
as a shared library. This is intended to be used by BHyVe to load FreeBSD
kernels into new virtual machines.
jonathan [Thu, 30 Jun 2011 15:22:49 +0000 (15:22 +0000)]
When Capsicum starts creating capabilities to wrap existing file
descriptors, we will want to allocate a new descriptor without installing
it in the FD array.
Split falloc() into falloc_noinstall() and finstall(), and rewrite
falloc() to call them with appropriate atomicity.
jonathan [Thu, 30 Jun 2011 10:56:02 +0000 (10:56 +0000)]
Add some checks to ensure that Capsicum is behaving correctly, and add some
more explicit comments about what's going on and what future maintainers
need to do when e.g. adding a new operation to a sys_machdep.c.
pluknet [Thu, 30 Jun 2011 09:20:26 +0000 (09:20 +0000)]
Fix quota(1) output.
- Fix calculation of 1024-byte sized blocks from disk blocks shown when -h
option isn't specified. It was broken with quota64 integration.
- In prthumanval(): limit the size of a buffer passed to humanize_number()
to a width of 5 bytes but allow a shorter length if requested. That's what
users expect.
alc [Wed, 29 Jun 2011 16:40:41 +0000 (16:40 +0000)]
Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.
This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write(). It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.
Update all of the existing assertions on pmap_remove_all() to reflect this
change.
jhb [Wed, 29 Jun 2011 16:20:52 +0000 (16:20 +0000)]
- Add read-only sysctls for all of the tunables supported by the igb and
em drivers.
- Make the per-instance 'enable_aim' sysctl truly per-instance by having it
change a per-instance variable (which is used to control AIM) rather
than having all of the per-instance sysctls operate on a single global
variable.
adrian [Wed, 29 Jun 2011 13:21:52 +0000 (13:21 +0000)]
Fix a corner case in STA beacon processing when a CSA is received but
the AP doesn't transmit beacons.
If the AP requests a CSA (ie, a channel switch) and then enters CAC
(channel availability check) for 60 seconds, it doesn't send beacons
and it just listens for radar events (and other things which we don't
do yet.)
Now, ath_newstate() was not resetting the beacon timer config on
a transition to the RUN state when in STA mode - it was setting
sc_syncbeacon, which simply updates the beacon config from the
contents of the next received beacon.
This means the STA never generates beacon miss events.
If the AP goes into CAC for 60 seconds and recovers, the STA will
happily receive the first beacon and reconfigure timers.
But if it gets a radar event after that, it'll change channel
again, not notify the station that it's changed channel..
and since the station is happily waiting for the first beacon
to configure the beacon timer details from, it won't ever
generate a beacon miss interrupt and it'll sit there forever
(or until the AP appears on that channel once again.)
This change forces the last known beacon timer config to be
written to hardware on a transition from CSA->RUN in STA mode.
This forces bmiss events to occur and the STA will eventually
(after a handful of beacon miss events) begin scanning for
another access point.
jonathan [Wed, 29 Jun 2011 13:03:05 +0000 (13:03 +0000)]
We may split today's CAPABILITIES into CAPABILITY_MODE (which has
to do with global namespaces) and CAPABILITIES (which has to do with
constraining file descriptors). Just in case, and because it's a better
name anyway, let's move CAPABILITIES out of the way.
Also, change opt_capabilities.h to opt_capsicum.h; for now, this will
only hold CAPABILITY_MODE, but it will probably also hold the new
CAPABILITIES (implying constrained file descriptors) in the future.
bz [Wed, 29 Jun 2011 13:01:10 +0000 (13:01 +0000)]
In case ntp cannot resolve a hostname on startup it will queue the entry
for resolving by a child process that, upon success, will add the entry
to the config of the running running parent process.
Unfortunately there are a couple of bugs with this, fixed in various
later versions of upstream in potentially different ways due to other
code changes:
1) Upon server [-46] <FQDN> the [-46] are used as FQDN for later resolving
which does not work. Make sure we always pass the name (or IP there).
2) The intermediate file to carry the information to the child process
does not know about -4/-6 restrictions, so that a dual-stacked host
could resolve to an IPv6 address but that might be unreachable (see
r223626) leading to no working synchronization ignoring a IPv4 record.
Thus alter the intermediate format to also pass the address family
(AF_UNSPEC (default), AF_INET or AF_INET6) to the child process
depending on -4 or -6.
3) Make the child process to parse the new intermediate file format and
save the address family for getaddrinfo() hints flags.
4) Change child to always reload resolv.conf calling res_init() before
trying to resolve names. This will pick up resolv.conf changes or
new resolv.confs should they have not existed or been empty or
unusable on ntp startup. This fix is more conditional in upstream
versions but given FreeBSD has res_init there is no need for the
configure logic as well.
ae [Wed, 29 Jun 2011 10:06:58 +0000 (10:06 +0000)]
Add new rule actions "call" and "return" to ipfw. They make
possible to organize subroutines with rules.
The "call" action saves the current rule number in the internal
stack and rules processing continues from the first rule with
specified number (similar to skipto action). If later a rule with
"return" action is encountered, the processing returns to the first
rule with number of "call" rule saved in the stack plus one or higher.
ae [Wed, 29 Jun 2011 06:45:44 +0000 (06:45 +0000)]
Improve error reporting. Use corresponding error message when file to be
preprocessed is missing. Also suggest to use absolute pathname if -p option
is specified.
ae [Wed, 29 Jun 2011 05:41:14 +0000 (05:41 +0000)]
Initialize elements of state array when creating the GPT table.
This fixes the problem, when the secondary GPT header is not erased when
partition table destroyed. Move equal operations from g_part_gpt_create
and g_part_gpt_recover to the separate function g_gpt_set_defaults.
rmacklem [Tue, 28 Jun 2011 22:52:38 +0000 (22:52 +0000)]
Fix the new NFSv4 client so that it doesn't fill the cached
mode attribute in as 0 when doing writes. The change adds
the Mode attribute plus the others except Owner and Owner_group
to the list requested by the NFSv4 Write Operation. This fixed
a problem where an executable file built by "cc" would get mode
0111 instead of 0755 for some NFSv4 servers.
Found at the recent NFSv4 interoperability Bakeathon.
trociny [Tue, 28 Jun 2011 21:01:32 +0000 (21:01 +0000)]
Check the returned value of activemap_write_complete() and update matadata on
disk if needed. This should fix a potential case when extents are cleared in
activemap but metadata is not updated on disk.
trociny [Tue, 28 Jun 2011 20:57:54 +0000 (20:57 +0000)]
Make activemap_write_start/complete check the keepdirty list, when
stating if we need to update activemap on disk. This makes keepdirty
serve its purpose -- to reduce number of metadata updates.
marius [Tue, 28 Jun 2011 16:16:43 +0000 (16:16 +0000)]
- In gem_reset_rx() also reset the RX MAC which is necessary in order to
get it out of a stuck condition that can be caused by GEM_MAC_RX_OVERFLOW.
- In gem_reset_rxdma() call gem_setladrf() in order to reprogram the RX
filter and restore the previous content of GEM_MAC_RX_CONFIG. While at it
consistently use the newly introduced sc_mac_rxcfg throughout the driver
instead of reading the its old content.
- Increment if_iqdrops instead of if_ierrors in case of RX buffer allocation
failure.
- According to the GEM datasheet the RX MAC should also be disabled in
gem_setladrf() before changing its configuration.
- Add error messages to gem_disable_{r,t}x() and take advantage of these
throughout the driver instead of duplicating their functionality all over
the place.
osa [Tue, 28 Jun 2011 12:32:24 +0000 (12:32 +0000)]
Remove needless file due to Russia scraps DST in 2011.
> Description of fields to fill in above: 76 columns --|
> PR: If a GNATS PR is affected by the change.
> Submitted by: If someone else sent in the change.
> Reviewed by: If someone else reviewed your modification.
> Approved by: If you needed approval for this commit.
> Obtained from: If the change is from a third party.
> MFC after: N [day[s]|week[s]|month[s]]. Request a reminder email.
> Security: Vulnerability reference (one per line) or description.
> Empty fields above will be automatically removed.
M calendars/ru_RU.KOI8-R/calendar.all
D calendars/ru_RU.KOI8-R/calendar.msk
bz [Tue, 28 Jun 2011 09:46:25 +0000 (09:46 +0000)]
Compare port numbers correctly. They are stored by SRCPORT()
in host byte order, so we need to compare them as such.
Properly compare IPv6 addresses as well.
This allows the, by default, 8 badaddrs slots per address
family to work correctly and only print sendto() errors once.
The change is no longer applicable to any latest upstream versions.
pluknet [Tue, 28 Jun 2011 08:41:44 +0000 (08:41 +0000)]
Update ifc_len field of struct ifconf passed for the ioctl SIOCGIFCONF32
(i.e. under COMPAT_FREEBSD32) in case ifconf() returned success to match
the native SIOCGIFCONF behavior.
PR: kern/158369
Reported by: Paul Procacci <pprocacci att gmail com>
MFC after: 1 week
mm [Tue, 28 Jun 2011 07:52:01 +0000 (07:52 +0000)]
Add a new "REFCOMPRESSRATIO" property.
For snapshots, this is the same as COMPRESSRATIO, but for
filesystems/volumes, the COMPRESSRATIO is based on the data "USED" (ie,
includes blocks in children, but not blocks shared with the origin).
This is needed to figure out how much space a filesystem would use if it
were not compressed (ignoring snapshots).
In userland, sign extend the offset for JA instructions.
We currently use that to implement "ip6 protochain", and "pc" might be
wider than "pc->k", in which case we need to arrange that "pc->k" be
sign-extended, by casting it to bpf_int32.
yongari [Mon, 27 Jun 2011 21:27:12 +0000 (21:27 +0000)]
Disable microcode loading for 82550 and 82550C controllers. Loading
the microcode caused SCB timeouts. Linux driver does not allow
microcode loading for these controllers and jfv also confirmed that
there is no need to do and it shouldn't.
jhb [Mon, 27 Jun 2011 13:58:24 +0000 (13:58 +0000)]
- Remove the fake BPB from zfsldr. zfsldr doesn't support booting from
floppies, so it will not be used as the start of an emulated floppy
image on a bootable CD which is what the fake BPB was used for.
- Only check that EDD packet mode is available once at the start of
zfsldr rather than for each disk sector now that we read data in one
sector at a time. As a result, collapse the remaining bits of read
up into nread and rename nread to read.
- Restore a return at the end of putstr that I removed in the previous
revision.
Tested by: Henri Hennebert (earlier version)
MFC after: 1 week
ae [Mon, 27 Jun 2011 12:42:48 +0000 (12:42 +0000)]
EBR could contain an early stage of boot code. But we do not support it.
Remove message about non empty bootcode, we can not break something
while GEOM_PART_EBR_COMPAT is defined.
But without GEOM_PART_EBR_COMPAT any changes in EBR are allowed and we
can accidentally wipe the boot code. To do not break anything save
the first EBR chunk and keep it untouched each time when we are
changing EBR. Note that we are still not support boot code for EBR.
ae [Mon, 27 Jun 2011 10:42:06 +0000 (10:42 +0000)]
MS Windows NT+ uses 4 bytes at offset 0x1b8 in the MBR to identify
disk drive. The boot0cfg(8) utility preserves these 4 bytes when is
writing bootcode to keep a multiboot ability.
Change gpart's bootcode method to keep DSN if it is not zero. Also
do not allow writing bootcode with size not equal to MBRSIZE.
pjd [Mon, 27 Jun 2011 09:10:48 +0000 (09:10 +0000)]
Log a warning if we cannot sandbox using capsicum, but only under debug level 1.
It would be too noisy to log it as a proper warning as CAPABILITIES are not
compiled into GENERIC by default.
dim [Sun, 26 Jun 2011 19:03:33 +0000 (19:03 +0000)]
For some reason, contrib/traceroute/traceroute.c ensures MAXHOSTNAMELEN
is defined, but then proceeds to use a hardcoded maximum hostname length
of 64 anyway. Fix this by checking against MAXHOSTNAMELEN instead.