Commit various fixes for the SolarFlare drivers, in particular
this set of patches fixes support for systems with > 32 cores.
Details include
sfxge: RXQ index (not label) comes from FW in flush done/failed events
Change the second argument name of the efx_rxq_flush_done_ev_t and
efx_rxq_flush_failed_ev_t prototypes to highlight that RXQ index (not label)
comes from FW in flush done and failed events.
sfxge: TXQ index (not label) comes from FW in flush done event
Change the second argument name of the efx_txq_flush_done_ev_t prototype to
highlight that TXQ index (not label) comes from FW in flush done event.
sfxge: use TXQ type as label to support more than 32 TXQs
There are 3 TXQs in event queue 0 and 1 TXQ (with TCP/UDP checksum offload)
in all other event queues.
Submitted by: Andrew Rybchenko <Andrew.Rybchenko at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
Mark Johnston [Mon, 14 Apr 2014 00:24:04 +0000 (00:24 +0000)]
Fix some off-by-one errors. The kve_end and rdl_eaddr fields contain the
first address after the end of the map entry and should therefore be
excluded.
Mark Johnston [Mon, 14 Apr 2014 00:23:18 +0000 (00:23 +0000)]
Ensure that all eight syscall arguments are available to dtrace_probe(),
rather than just the first five. This is done by calling dtrace_probe()
through a function pointer, as in illumos.
Mark Johnston [Mon, 14 Apr 2014 00:22:42 +0000 (00:22 +0000)]
DTrace's pid provider works by inserting breakpoint instructions at probe
sites and installing a hook at the kernel's trap handler. The fasttrap code
will emulate the overwritten instruction in some common cases, but otherwise
copies it out into some scratch space in the traced process' address space
and ensures that it's executed after returning from the trap.
In Solaris and illumos, this (per-thread) scratch space comes from some
reserved space in TLS, accessible via the fs segment register. This
approach is somewhat unappealing on FreeBSD since it would require some
modifications to rtld and jemalloc (for static TLS) to ensure that TLS is
executable, and would thus introduce dependencies on their implementation
details. I think it would also be impossible to safely trace static binaries
compiled without these modifications.
This change implements the functionality in a different way, by having
fasttrap map pages into the target process' address space on demand. Each
page is divided into 64-byte chunks for use by individual threads, and
fasttrap's process descriptor struct has been extended to keep track of
any scratch space allocated for the corresponding process.
With this change it's possible to trace all libc functions in a program,
e.g. with
Always install pkg.conf. Don't depend on MK_PKGBOOTSTRAP.
This file is used by pkg(8) from ports as well. Someone may
choose to not install pkg(7) but still want to consume
official packages by building or installing pkg(8) manually.
Discussed with: bapt
MFC after: 1 day (Working on EN)
realpath(): Properly fail "." or ".." components after non-directories.
If realpath() is called on pathnames like "/dev/null/." or "/dev/null/..",
it should fail with [ENOTDIR]. Pathnames like "/dev/null/" already failed as
they should.
Also, put the check for non-directories after lstatting the previous
component instead of when the empty component (consecutive or trailing
slashes) is detected, saving an lstat() call and some lines of code.
Apparently some of the i386 boot blocks are so close to full that adding
single lines to ufsread.c spills them over. Duplicate a whole bunch of
code to get file sizes into boot1.efi/boot1.c rather than modifying
ufsread.c.
Julio Merino [Sun, 13 Apr 2014 11:35:42 +0000 (11:35 +0000)]
Document how to install the test suite.
As part of this, install the tests(7) manual page unconditionally (not only
when WITH_TESTS=yes) so that users that have not yet enabled the build of
the test suite can read details on how to do so.
Add my copyright here. Most of this is unmodified from the original sparc64
version, but at least some indication of changes that postdate the actual
invention of EFI is probably a good idea.
Warner Losh [Sun, 13 Apr 2014 05:21:56 +0000 (05:21 +0000)]
NO_MAN= has been deprecated in favor of MAN= for some time, go ahead
and finish the job. ncurses is now the only Makefile in the tree that
uses it since it wasn't a simple mechanical change, and will be
addressed in a future commit.
Warner Losh [Sun, 13 Apr 2014 05:21:48 +0000 (05:21 +0000)]
Don't apply ctf conversions in POSIX mode. These can't happen there
because they pollute the POSIX environment, which doens't allow
for these extentions. ctf conversions are really only relevant when
used in coordination with the rest of the bsd*.mk system anyway.
Leave them in place for the normal, non-posix enviornment since
they are quite useful there.
Warner Losh [Sun, 13 Apr 2014 05:21:35 +0000 (05:21 +0000)]
We no longer support upgrading from FreeBSD 4, so we don't need the
NOMAN and NOSHARED defines here. They have been obsolete for almost a
decade anyway.
Warner Losh [Sun, 13 Apr 2014 05:21:30 +0000 (05:21 +0000)]
Up the minimum system to build FreeBSD current to 8.0-RELEASE. The
issues with vendors that needed 7.x support have been resolved. Many
vendors are still using 8.x build platforms, however, so bumping this
up to 9.0 will have to wait until that is resolved. Actual support for
building from 8.x still relies on those vendors fixing bugs that are
present as most developers have moved onto 9.x or newer platforms.
Warner Losh [Sun, 13 Apr 2014 05:21:22 +0000 (05:21 +0000)]
Determine whether to build clang and its bootstrap tools the same
way. This allows a clang bootstrap to happen, even when WITHOUT_CLANG
is defined. This is a minimal version of a more extensive change which
can be MFC'd more easily. However, we have to also test to see if
we're building clang as not cc, since the bootstrap for that needs
these cross tools and it is easier to build them in just one place.
Davide Italiano [Sun, 13 Apr 2014 01:15:37 +0000 (01:15 +0000)]
Fix a panic in zfs_rename().
this is due to a wrong dereference of a vnode when it's not locked and
can be (potentially) recycled. 'sdvp' cannot be locked on zfs_rename()
entry point because the VFS can't be sure that this scenario is
LOR-free (it might violate the parent->child lock acquisition rule).
Dereference 'tdvp' instead, which is already locked on entry, and access
'sdvp' fields only when it's safe, i.e. under ZFS_ENTER scope.
While at it, remove the usage of VOP_REALVP, as long as this is a NOP
on FreeBSD.
Add a simple EFI stub loader. This is a quick and dirty of boot1.chrp from
the PowerPC port with all the Open Firmware bits removed and replaced by
their EFI counterparts. On the whole, I think I prefer Open Firmware.
This code is supposed to be an immutable shim that sits on the EFI system
partition, loads /boot/loader.efi from UFS and tells the real loader what
disk/partition to look at. It finds the UFS root partition by the somewhat
braindead approach of picking the first UFS partition it can find. Better
approaches are called for, but this works for now. This shim loader will
also be useful for secure boot in the future, which will require some
rearchitecture.
Davide Italiano [Sat, 12 Apr 2014 23:29:29 +0000 (23:29 +0000)]
Hide internal details of sbintime_t implementation wrapping INT64_MAX into
SBT_MAX, to make it more robust in case internal type representation will
change in the future. All the consumers were migrated to SBT_MAX and
every new consumer (if any) should from now use this interface.
As per POSIX, the -exec ... {} + primary always returns true, but a non-zero
exit status causes find to return a non-zero exit status itself. GNU does
the same, and also for -execdir ... {} +.
It does not make much sense to return false from the primary only when the
child process happens to be run.
The behaviour for -exec/-execdir ... ; remains unchanged: the primary
returns true or false depending on the exit status, and find's exit status
is unaffected.
Align and round the partitionable disk space to 4K by default.
Since this would also apply when recovering, make sure not to
align or round when that would have a partition fall outside
the partitionable area.
Introduce RANLIBFLAGS to mirror ARFLAGS and add -D to both. This sets
all timestamps in static libraries to 0 so that consecutive builds
from the same source, even on different machines, produce identical
libraries.
Sean Bruno [Fri, 11 Apr 2014 20:19:01 +0000 (20:19 +0000)]
Fix insta-panic on assert of unlocked periph mtx in ciss(4) when
logical volume state changes.
Currently, I view this as a critical fix for users and will MFC this rapidly as
my testing has shown data loss when the disk is failed by removing it when
under some amount of write activity and this code panics the box.
Reviewed by: mav@ scottl@
MFC after: 3 days
Sponsored by: Yahoo! Inc.
Neel Natu [Fri, 11 Apr 2014 20:15:53 +0000 (20:15 +0000)]
There is no need to save and restore the host's return address in the
'struct vmxctx'. It is preserved on the host stack across a guest entry
and exit and just restoring the host's '%rsp' is sufficient.
Alan Cox [Fri, 11 Apr 2014 16:55:25 +0000 (16:55 +0000)]
Before calling mmap() on a shared library's text and data sections, rtld
first calls mmap() with the arguments PROT_NONE and MAP_ANON to reserve a
single, contiguous range of virtual addresses for the entire shared library.
Later, rtld calls mmap() with the the shared library's file descriptor
and the argument MAP_FIXED to place the text and data sections within the
reserved range. The rationale for mapping shared libraries in this way is
explained in the commit message for Revision 190885. However, this approach
does have an unintended, negative consequence. Since the first call to
mmap() specifies MAP_ANON and not the shared library's file descriptor, the
kernel has no idea what alignment the vm object backing the file prefers.
As a result, the reserved range's alignment is unlikely to be the same as
the vm object's, and so mapping with superpages becomes impossible. To
address this problem, this revision adds the argument MAP_ALIGNED_SUPER to
the first call to mmap() if the text section is larger than the smallest
superpage size.
To determine if the text section is larger than the smallest superpage
size, rtld must always fetch the page size information. As a result, the
private code for fetching the base page size in rtld's builtin malloc is
redundant. Eliminate it. Requested by: kib
Amend r263891, by making clang default to DWARF2 debug info format for
all FreeBSD versions, not just 10.x and earlier. Apparently too many
people seem to have trouble with post-1993 formats.
Also remove the related notes about messing with kernel configuration
files from UPDATING, which are now superfluous.
Add SRC_UPDATE_SKIP, DOC_UPDATE_SKIP, and PORTS_UPDATE_SKIP
variables. These are intended to allow bypassing the
'svn co /usr/{src,doc,ports}' step in the chroot when the
tree exists from external means.
The use case here is that /usr/src, /usr/doc, and /usr/ports
in the chroot exist as result of zfs dataset clones, so it
is possible (and happens quite often) that the included
distributions may not be consistent. (This is not the case
for -RELEASE builds, but does happen for snapshot builds.)
Tested on: stable/9@r264319
MFC After: 3 days
Sponsored by: The FreeBSD Foundation
John Baldwin [Fri, 11 Apr 2014 13:11:43 +0000 (13:11 +0000)]
Don't leak the TCP pcbinfo lock if a time wait connection is closed
in between grabbing a reference on the connection structure and obtaining
the pcbinfo lock.
Marius Strobl [Thu, 10 Apr 2014 21:03:46 +0000 (21:03 +0000)]
Refine r264257; given that I later on decided to nuke the wildcard for
the Sunix 0x1999 line of chips there actually is no need to explicitly
keep puc(4) from attaching to the single port version anymore.
Peter Grehan [Thu, 10 Apr 2014 19:15:58 +0000 (19:15 +0000)]
Rework r264179.
- remove redundant code
- remove erroneous setting of the error return
in vmmdev_ioctl()
- use style(9) initialization
- in vmx_inject_pir(), document the race condition
that the final conditional statement was detecting,
Bruce M Simpson [Thu, 10 Apr 2014 18:43:02 +0000 (18:43 +0000)]
In if_freemulti(), relax the paranoid KASSERT() on ifma->ifma_protospec.
This KASSERT() existed as a sanity check that upper layers in the network
stack (e.g. inet, inet6) had released their reference to the underlying
driver's multicast memberships (ifmultiaddr{}). However it assumes the
lifecycle of the driver membership corresponds to the lifecycle of the
network layer membership.
In the submitter's case, ieee80211_ioctl_updatemulti() attempts to
reprogram the (parent, physical) ifnet{} memberships in response
to a change in membership on the (child, virtual) VAP ifnet, using
a batched update mechanism. These updates happen independently from
the network layer, causing a "false negative" assertion failure.
There are possibly other use cases where this KASSERT() may be triggered
by other networking stack activity (e.g. where a nesting relationship
exists between multiple ifnet{} instances). This suggests that further
review of FreeBSD's approach to nested ifnet relationships is needed.
John Baldwin [Thu, 10 Apr 2014 18:15:35 +0000 (18:15 +0000)]
Currently, the TCP slow timer can starve TCP input processing while it
walks the list of connections in TIME_WAIT closing expired connections
due to contention on the global TCP pcbinfo lock.
To remediate, introduce a new global lock to protect the list of
connections in TIME_WAIT. Only acquire the TCP pcbinfo lock when
closing an expired connection. This limits the window of time when
TCP input processing is stopped to the amount of time needed to close
a single connection.
Ed Maste [Thu, 10 Apr 2014 16:53:21 +0000 (16:53 +0000)]
Fix EFI loader object tree creation on 9.x build hosts
Previously ${COMPILER_TYPE} was checked in sys/boot/amd64, and the efi
subdirectory was skipped altogether for gcc (since GCC does not support
a required attribute). However, during the early buildworld stages
${COMPILER_TYPE} is the existing system compiler (i.e., gcc on 9.x build
hosts), not the compiler that will eventually be used. This caused
"make obj" to skip the efi subdirectory. In later build stages
${COMPILER_TYPE} is "clang", and then the efi loader would attempt to
build in the source directory.
Alexander Motin [Thu, 10 Apr 2014 16:00:33 +0000 (16:00 +0000)]
Fix wrong sizes used to access PD_Type and PD_State DDF metadata fields.
This caused incorrect behavior of arrays with big-endian DDF metadata.
Little-endian (like used by Adaptec controllers) should not be harmed.
Add workaround should be enough to manage compatibility.
Alexander Motin [Wed, 9 Apr 2014 10:58:52 +0000 (10:58 +0000)]
Introduce new serialization type CTL_SERIDX_UNMAP.
Unfortunately we can't check range collisions for UNMAP commands alike
to writes, because they include multiple ranges, which are also passed
in data block, not in CDB. As result, UNMAP commands have to be treated
as colliding with any other command accessing the media.
From the other side all UNMAPs are equal (we don't support ANCHOR flag),
so we can execute several UNMAPs same time.
Alexander Motin [Wed, 9 Apr 2014 08:57:57 +0000 (08:57 +0000)]
Remove support of LUN-based CD changers from cd(4) driver.
This code was heavily broken few months ago during CAM locking changes.
Fixing it would require almost complete rewrite. Since there are no
known devices on market using this interface younger then ~15 years, and
they are CD, not even DVD, I don't see much reason to rewrite it.
This change does not mean those devices won't work. They will just work
slower due to inefficient disks load/unload schedule if several LUNs
accessed same time.