smh [Fri, 1 Aug 2014 23:16:48 +0000 (23:16 +0000)]
Don't return ZIO_PIPELINE_CONTINUE from vdev_op_io_start methods
This prevents recursion of vdev_queue_io_done as per r265321 but
using a different method as recommended on the openzfs list.
We now use zio_interrupt(zio) and return ZIO_PIPELINE_STOP instead
of returning ZIO_PIPELINE_CONTINUE from vdev_*_io_start methods.
zio_vdev_io_start now ASSERTS the that vdev_op_io_start returns
ZIO_PIPELINE_STOP to ensure future changes don't reintroduce
ZIO_PIPELINE_CONTINUE returns.
Cleanup flow in vdev_geom_io_start while I'm here.
ian [Fri, 1 Aug 2014 22:56:41 +0000 (22:56 +0000)]
Add 64-bit atomic ops for armv4, only for kernel code, mostly so that we
don't need any #ifdef stuff to use atomic_load/store_64() elsewhere in
the kernel. For armv4 the atomics are trivial to implement for kernel
code (just disable interrupts), less so for user mode, so this only has
the kernel mode implementations for now.
delphij [Fri, 1 Aug 2014 22:33:23 +0000 (22:33 +0000)]
Split gethrtime() and gethrtime_waitfree() and make the former use
nanouptime() instead of getnanouptime(). nanouptime(9) provides more
precise result at expense of being slower.
In r269223, gethrtime() is used as creation time of dbuf, which in turn
acts as portion of lookup key to maintain AVL invariant where there can
not be duplicate items. Before this change, gethrtime() have preferred
better execution time by sacrificing precision, which may lead to panic
on busy systems with:
ian [Fri, 1 Aug 2014 22:28:36 +0000 (22:28 +0000)]
Add 64-bit atomic ops for armv6. The only safe way to access a 64-bit
value shared across multiple cores is with atomic_load_64() and
atomic_store_64(), because the normal 64-bit load/store instructions
are not atomic on 32-bit arm. Luckily the ldrexd/strexd instructions
that are atomic are fairly cheap on armv6. Because it's fairly simple
to do, this implements all the ops for 64-bit, not just load/store.
ian [Fri, 1 Aug 2014 20:30:24 +0000 (20:30 +0000)]
Teach as(1) to handle the arm .arch_extension pseudo-op, which accepts
the same values as the -march= command line option. Add support for the
"sec" extension (security extensions).
We've been getting away without support for the sec extension because
it's bogusly enabled even on arches where its presence is optional. This
support for .arch_extension is being added mainly so that we can use the
right directives in our source code, and that helps folks using external
toolchains (and will help us when we finally update our toolchain).
peter [Fri, 1 Aug 2014 19:32:20 +0000 (19:32 +0000)]
Like with /usr/lib + /usr/lib/compat, add the optional /usr/lib32/compat
to the ldconfig32 default path. /usr/lib32 is the 32 bit versions of
*current* libraries, while old versions should be able to be in
/usr/lib32/compat, like with /usr/lib/compat. The separation is meant to
keep the compile time default search paths cleaner.
grehan [Fri, 1 Aug 2014 18:36:40 +0000 (18:36 +0000)]
Fix byte ordering in default RSS key.
The rss_key[] array in netinet/in_rss.c has the bytes in incorrect
order. This results in the RSS test vectors in the Microsft RSS spec
and Intel NIC specs giving incorrect results, and making it difficult
to verify correct hash operation when RSS functionality is added to
new NICs.
CR: https://phabric.freebsd.org/D516
Reviewed by: adrian
ian [Fri, 1 Aug 2014 18:24:44 +0000 (18:24 +0000)]
Fix unwind-info errors in our hand-written arm assembler code.
We have functions nested within functions, and places where we start a
function then never end it, we just jump to the middle of something else.
We tried to express this with nested ENTRY()/END() macros (which result
in .fnstart and .fnend directives), but it turns out there's no way to
express that nesting in ARM EHABI unwind info, and newer tools treat
multiple .fnstart directives without an intervening .fnend as an error.
These changes introduce two new macros, EENTRY() and EEND(). EENTRY()
creates a global label you can call/jump to just like ENTRY(), but it
doesn't emit a .fnstart. EEND() is a no-op that just documents the
conceptual endpoint that matches up with the same-named EENTRY().
This is based on patches submitted by Stepan Dyatkovskiy, but I made some
changes and added the EEND() stuff, so blame any problems on me.
andrew [Fri, 1 Aug 2014 16:53:04 +0000 (16:53 +0000)]
Update the ARMv6 core clang targets to be an arm1176jzf-s. This brings us
in line with gcc in base as this makes llvm generate code for the armv6k
variant of the instruction set.
Add support for Chromebook2 -- next-generation 8-core
(4 in operation), 4GB ram (3.5 usable) ARM machine.
Support covers device drivers for:
- Serial Peripheral Interface (SPI)
- Chrome Embedded Controller (EC) - SPI-based version
- XHCI and USB 3.0 dual-role device PHY
Also:
- Add support for Exynos5420 in Pad module
- Move power-related functions to separate driver --
Power Management Unit (PMU)
- Enable XHCI for Chromebook1
Special thanks to grehan@ for hardware, and to
hselasky@ for r269139.
alc [Fri, 1 Aug 2014 01:48:41 +0000 (01:48 +0000)]
Correct a long-standing problem in moea{,64}_pvo_enter() that was revealed
by the combination of r268591 and r269134: When we attempt to add the
wired attribute to an existing mapping, moea{,64}_pvo_enter() do nothing.
(They only set the wired attribute on newly created mappings.)
imp [Fri, 1 Aug 2014 00:00:54 +0000 (00:00 +0000)]
NANO_OBJ shouldn't end with a '/', so remove it here. This makes the
pathnames printed not have the dreaded // which makes it hard to cut
and paste into an emacs find file command...
Add pkgfs, a file system implementation for reading files out of a
compressed tarball, aka package. The file system assumes that the
files are layed-out in the same order as needed to allow for the
package to be streamed. As such, it does not read an entire package
into memory first.
Some properties of the file system:
o Files that start with '+' are silently skipped. These are found
in FreeBSD package files.
o Files smaller than or equal to 4KB will be cached in memory and
as such allow for some flexibility in accessing files out of
order.
o Files with the .tgz suffix are assumed to be (sub-)packages and
signal the end for a directory scan.
Fix breakage introduced by r256843: removing the SA_CCB_WAITING bit
left some of the decisions based on its counterpart, SA_CCB_BUFFER_IO
being random. As a result, propagation of the residual information
for the SPACE command was broken, so the number of filemarks
encountered during a SPACE operation was miscalculated. Consequently,
systems relying on properly tracked filemark counters (like Bacula)
fell apart.
The change also removes a switch/case in sadone() which r256843
degraded to a single remaining case label.
Define a setvar() function for platforms using a shell unlike FreeBSD's
sh(1) for `/bin/sh' (e.g., bash(1) which lacks a setvar definition).
This is to improve portability to other Operating Systems (e.g., Linux).
Various style(9) and related fixes.
Update the copyright to be more in line with the current version in
our tree.
Remove the ancient rcsid.
Add a proper return from the main function
- Updated SYSCTL manual pages after recent changes to the kernel
SYSCTL code. Added description of new macros and functions.
- Merged dynamic and static SYSCTL related content into a single
manual page, hence parameters and functionality is very much the same.
- Uppercased all occurrences of "OID".
- Updated all SYSCTL examples.
ian [Thu, 31 Jul 2014 16:54:54 +0000 (16:54 +0000)]
Export an mmc or sd card's serial number from the mmc layer as an ivar.
In the mmcsd layer use this value to populate disk->d_ident. Also set
disk->d_descr to the full set of card identification info (includes vendor,
model, manufacturing date, etc).
Ensure that IP's added to CARP always use the CARP MAC
Previously there was a race condition between the address addition
and associating it with the CARP which resulted in the interface
MAC, instead of the CARP MAC, being used for a brief amount of time.
This caused "is using my IP address" warnings as well as data being
sent to the wrong machine due to incorrect ARP entries being recorded
by other devices on the network.
Correct a defect in r268591. In the implementation of the new function
pmap_unwire(), the call to MOEA64_PVO_TO_PTE() must be performed before
any changes are made to the PVO. Otherwise, MOEA64_PVO_TO_PTE() will
panic.
Add an example program to show how to use libpmc from a program.
This particular program attempts to use the TSC to measure how long
certainly libpmc operations take. Depending on the quality of
the rdtsc() macro on a particular architecture this may work
more or less well.
Commands which encounter a fatal error shouldn't be marked as completed.
Furthermore, provide an indication of the current command so it can be
determined which one actually failed.
Bring in LSI's phase16 - phase18 changes
* Implements Start Stop Unit for SATA direct-attach devices in IR mode to avoid
data corruption.
* Use CAM_DEV_NOT_THERE instead of CAM_SEL_TIMEOUT and CAM_TID_INVALID
Provide a means for loaders to control which file system to use. This
to counteract the default behaviour of always trying each and every
file system until one succeeds, or the open fails. The problem with the
loader is that we've implemented features based on this behavior. The
handling of compressed files is a good example of this. However, it is
in general highly undesirable to not have a one-time probe (or taste
in the geom lingo), followed by something similar to a mount whenever
we (first) read from a device. Everytime we go to the same device, we
can reasonably assume it (still) has the same file system. For file
systems that need to do far more that a trivial read of a super block,
not having something similar to a mount operation is disastrous from
a performance (and thus usability) perspective.
But, again, since we've implemented features based on this stateless
approach, things can get complicated quickly if and when we want to
change this. And yet, we sometimes do need stateful behaviour.
For this reason, this change simply introduces exclusive_file_system.
When set to the fsops of the file system to use, the open call will
only try this file system. Setting it to NULL restores the default
behaviour. It's a low-cost (low-brow?) approach to provide enough
control without re-implementing the guts of the loader.
A good example of when this is useful is when we're trying to load
files out of a container (say, a software packaga) that itself lives
on a file system or is fetched over the network. While opening the
container can be done in the normal stateless manner, once it is
opened, subsequent opens should only consider the container.
When restoring a UFS dump onto a ZFS filesystem, an assertion in
restore was failing because ZFS was reporting a blocksize that was
not a multiple of 1024. Replace restore's failed assertion with
code that writes restored files in a blocksize that works for
restore (a multiple of 1024) despite being non-optimal for ZFS.
- Output a summary of optional VT-x features in dmesg similar to CPU
features. If bootverbose is enabled, a detailed list is provided;
otherwise, a single-line summary is displayed.
- Add read-only sysctls for optional VT-x capabilities used by bhyve
under a new hw.vmm.vmx.cap node. Move a few exiting sysctls that
indicate the presence of optional capabilities under this node.
Make mmap() of the console device when using ofwfb work like other supported
framebuffer drivers. This lets ofwfb work with xf86-video-scfb and makes
the driver much more generic and less PCI-centric. This changes some
user-visible behavior and will require updates to the xorg-server port
on PowerPC when using ATI graphics cards.
Remove one-time use macros which check for the vnode lifecycle. More,
some parts of the checks are in fact redundand in the surrounding
code, and it is more clear what the conditions are by direct testing
of the flags. Two of the three macros were only used in assertions.
In vnlru_free(), all relevant parts of vholdl() were already inlined,
except the increment of v_holdcnt itself. Do not call vholdl() to do
the increment as well, this allows to make assertions in
vholdl()/vhold() more strict.
In v_incr_usecount(), call vholdl() before incrementing other ref
counters. The change is no-op, but it makes less surprising to see
the vnode state in debugger if interrupted inside v_incr_usecount().
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Garbage collect couple of unused fields from struct ifaddr:
- ifa_claim_addr() unused since removal of NetAtalk
- ifa_metric seems to be never utilized, always a copy of if_metric
Increase default ARC buf_hash_table size. When typical block size is small,
the hash table could be too small, which would lead to long hash chains and
limit performance for cached reads.
A new loader tunable, vfs.zfs.arc_average_blocksize, have been added which
allows users to override the default assumption of average (typical) block
size. Old default was 65536 (64 KiB) and new default is 8192 (8 KiB).
Illumos issue:
5034 ARC's buf_hash_table is too small
ian [Tue, 29 Jul 2014 02:37:48 +0000 (02:37 +0000)]
A while back, the array of segments used for a load/mapping operation was
moved from the stack into the tag structure. In retrospect that was a bad
idea, because nothing protects that array from concurrent access by
multiple threads.
This change moves the array to the map structure (actually it's allocated
following the structure, but all in a single malloc() call).
This also establishes a "sane" limit of 4096 segments per map. This is
mostly to prevent trying to allocate all of memory if someone accidentally
uses a tag with nsegments set to BUS_SPACE_UNRESTRICTED. If there's ever
a genuine need for more than 4096, don't hesitate to increase this (or
maybe make it tunable).
ian [Tue, 29 Jul 2014 02:37:31 +0000 (02:37 +0000)]
We never need bounce pages for memory we allocate. We cleverly allocate
memory the matches all the constraints of the dma tag so that bouncing
will never be required.
ian [Tue, 29 Jul 2014 02:36:41 +0000 (02:36 +0000)]
Memory belonging to an mbuf, or allocated by bus_dmamem_alloc(), never
triggers a need to bounce due to cacheline alignment. These buffers
are always aligned to cacheline boundaries, and even when the DMA operation
starts at an offset within the buffer or doesn't extend to the end of the
buffer, it's safe to flush the complete cachelines that were only partially
involved in the DMA. This is because there's a very strict rule on these
types of buffers that there will not be concurrent access by the CPU and
one or more DMA transfers within the buffer.
ian [Tue, 29 Jul 2014 02:36:27 +0000 (02:36 +0000)]
The run_filter() function doesn't just run dma tag exclusion filter
functions, it has evolved to make a variety of decisions about whether
the DMA needs to bounce, so rename it to must_bounce(). Rewrite it to
perform checks outside of the ancestor loop if they're based on information
that's wholly contained within the original tag. Now the loop only checks
exclusion zones in ancestor tags.
Also, add a new function, might_bounce() which does a fast inline check
of flags within the tag and map to quickly eliminate the need to call
the more expensive must_bounce() for each page in the DMA operation.
Within the mapping loops, use map->pagesneeded != 0 as a proxy for all
the various checks on whether bouncing might be required. If no pages
were reserved for bouncing during the checks before the mapping loop,
then there's no need to re-check any of the conditions that can lead
to bouncing -- all those checks already decided there would be no bouncing.
ian [Tue, 29 Jul 2014 02:35:44 +0000 (02:35 +0000)]
Correct the comparison logic when looking for intersections between
exclusion zones and phsyical memory. The phys_avail[i] entries are the
address of the first byte of ram in the region, and phys_avail[i+1]
entries are the address of the first byte of ram in the next region
(i.e., they're not included in the region that starts at phys_avail[i]).
ian [Tue, 29 Jul 2014 02:34:32 +0000 (02:34 +0000)]
The exclusion_bounce() routine compares unchanging values in the tag with
unchanging values in the phys_avail array, so do the comparisons just once
at tag creation time and set a flag to remember the result.
ian [Tue, 29 Jul 2014 02:31:29 +0000 (02:31 +0000)]
Rename _bus_dma_can_bounce(), add new inline routines.
DMA on arm can bounce for several reasons, and _bus_dma_can_bounce() only
checks for the lowaddr/highaddr exclusion ranges in the dma tag, so now
it's named exclusion_bounce(). The other reasons for bouncing are checked
by the new functions alignment_bounce() and cacheline_bounce().
If telldir() is called immediately after a call to seekdir(), POSIX
requires the return value of telldir() to equal the value passed to
seekdir(). The current seekdir code with SINGLEUSE enabled breaks
this case as each call to telldir() allocates a new cookie. Instead,
remove the SINGLEUSE code and change telldir() to look for an existing
cookie for the directory's current location rather than always creating
a new cookie.