hselasky [Sat, 15 Aug 2015 09:00:36 +0000 (09:00 +0000)]
Fix race in USB PF which can happen if we stop tracing exactly when
the kernel is tapping an USB transfer. This leads to a NULL pointer
access. The solution is to only trace while the USB bus lock is
locked.
ed [Sat, 15 Aug 2015 08:42:33 +0000 (08:42 +0000)]
Stop parsing digits if the value already exceeds USHRT_MAX.
There is no need for us to support parsing values that are larger than
the maximum terminal window size. In this case that would be the maximum
of unsigned short.
The problem with parsing larger values is that they can cause integer
overflows when adjusting the cursor position, leading to all sorts of
failing assertions.
oshogbo [Sat, 15 Aug 2015 06:34:49 +0000 (06:34 +0000)]
Add support for the arrays in nvlist library.
- Add
nvlist_{add,get,take,move,exists,free}_{number,bool,string,nvlist,
descriptor} functions.
- Add support for (un)packing arrays.
- Add the nvl_array_next field to the nvlist structure.
If an array is added by the nvlist_{move,add}_nvlist_array function
this field will contains next element in the array.
- Add the nitems field to the nvpair and nvpair_header structure.
This field contains number of elements in the array.
- Add special flag (NV_FLAG_IN_ARRAY) which is set if nvlist is a part of
an array.
- Add special type (NV_TYPE_NVLIST_ARRAY_NEXT).This type is used only
on packing/unpacking.
- Add new API for traversing arrays (nvlist_get_array_next).
- Add the nvlist_get_pararr function which combines the
nvlist_get_array_next and nvlist_get_parent functions. If nvlist is in
the array it will return next element from array. If nvlist is last
element in array or it isn't in array it will return his
container (parent). This function should simplify traveling over nvlist.
- Add tests for new features.
- Add documentation for new functions.
- Add my copyright.
- Regenerate the sys/cddl/compat/opensolaris/sys/nvpair.h file.
rpaulo [Fri, 14 Aug 2015 22:54:52 +0000 (22:54 +0000)]
Introduce a new make variable: NMFLAGS.
As the name indicates, these are flags to pass to nm(1). The newer
binutils have a plugin mechanism so, to build something with LLVM's
LTO, we need to pass flags to nm(1). This commit also extends
lorder(1) to pass NMFLAGS to nm(1).
rmacklem [Fri, 14 Aug 2015 22:02:14 +0000 (22:02 +0000)]
For the case where an NFSv4.1 ExchangeID operation has the client identifier
that already has a confirmed ClientID, the nfsrv_setclient() function would
not fill in the clientidp being returned. As such, the value of ClientID
returned would be whatever garbage was on the stack.
An NFSv4.1 client would not normally do this, but it appears that it can
happen for certain Linux clients. When it happens, the client persistently
retries the ExchangeID and Create_session after Create_session fails when
it uses the bogus clientid. With this patch, the correct clientid is replied.
This problem was identified in a packet trace supplied by
Ahmed Kamal via email.
jah [Fri, 14 Aug 2015 20:08:16 +0000 (20:08 +0000)]
Use pmap_quick_enter_page() to handle bouncing of unmapped buffers in the x86 busdma_bounce implementation. Also treat user buffers as unmapped.
This allows two things:
1. Sync'ing bounced maps in non-sleepable contexts. The physcopy* calls previously used could sleep on sf_buf operations in some cases.
2. Sync'ing user buffers outside the context of the owning process
alc [Fri, 14 Aug 2015 17:49:03 +0000 (17:49 +0000)]
Stop describing an acquire operation as a read barrier and a release
operation as a write barrier. That description has never been correct,
and it has caused confusion. An acquire operation orders writes as well
as reads, and a release operation orders reads as well as writes.
Also, explicitly say that a thread doesn't see its own accesses being
reordered. The reordering of a thread's accesses is only (potentially)
visible to another thread. Thus, memory barriers need only be used to
control the ordering of accesses between threads, not within a thread.
ian [Fri, 14 Aug 2015 16:48:07 +0000 (16:48 +0000)]
Use simple fixed name strings for these timecounters and eventimers which
are tied to fixed pieces of hardware; dynamic string formatting isn't needed.
pfg [Fri, 14 Aug 2015 14:58:04 +0000 (14:58 +0000)]
Remove a stale comment and clarify the original where it was taken from
The comment in the libc/sys symbol map referenced the generated symbols
for the syscall trampolines. Such comment was out of place in the secure
symbol map so remove the stale comment and attempt to clarify the old one
to avoid risks of confusion.
mav [Fri, 14 Aug 2015 13:10:30 +0000 (13:10 +0000)]
2618 arc.c mistypes in the comments
Reviewed by: Jason King <jason.brian.king@gmail.com>
Reviewed by: Josef Sipek <jeffpc@josefsipek.net>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Bart Coddens <bart.coddens@gmail.com>
hselasky [Fri, 14 Aug 2015 12:57:53 +0000 (12:57 +0000)]
Improve the realtime properties of USB transfers for embedded systems
like RPI-B and RPI-2.
Description of problem:
USB transfers can process data in their callbacks sometimes causing
unacceptable latency for other USB transfers. Separate BULK completion
callbacks from CONTROL, INTERRUPT and ISOCHRONOUS callbacks, and give
BULK completion callbacks lesser execution priority than the
others. This way USB audio won't be interfered by heavy USB ethernet
usage for example.
Further serve USB transfer completion in a round robin fashion,
instead of only serving the most CPU hungry. This has been done by
adding a third flag to USB transfer queue structure which keeps track
of looping callbacks. The "command" callback function then decides
what to do when looping.
As a way to make it more difficult to introduce bugs into the ARC, and to
make it easier to diagnose issues when bugs do creep in, it would be
beneficial to change the type of the arc_state_t's arcs_size field to be
a refcount_t instead of a uint64_t. This would allow us to make stricter
checks when incrementing and decrementing the value with debugging enabled,
but still fallback to simple, fast atomic operations when debugging is
disabled.
https://www.illumos.org/issues/6033
When we're looking for the list containing oldest buffer we never
actually look at the MFU lists even when we try to evict from MFU.
looks like a copy paste error, the fix is here:
mav [Fri, 14 Aug 2015 09:31:07 +0000 (09:31 +0000)]
MFV r277431: 5497 lock contention on arcs_mtx
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>
This patch attempts to reduce lock contention on the current arc_state_t
mutexes. These mutexes are used liberally to protect the number of LRU
lists within the ARC (e.g. ARC_mru, ARC_mfu, etc). The granularity at
which these locks are acquired has been shown to greatly affect the
performance of highly concurrent, cached workloads.
pfg [Fri, 14 Aug 2015 03:03:13 +0000 (03:03 +0000)]
Move the stack protector to a new "secure" directory
As part of the code refactoring to support FORTIFY_SOURCE we want
a new subdirectory "secure" to keep the files related to security.
Move the stack protector functions to this new directory.
edwin [Thu, 13 Aug 2015 23:57:44 +0000 (23:57 +0000)]
MFV of 286748,tzdata2015f
Update to tzdata2015f:
Changes affecting future time stamps
North Korea switches to +0830 on 2015-08-15. (Thanks to Steffen Thorsen.)
The abbreviation remains "KST". (Thanks to Robert Elz.)
Uruguay no longer observes DST. (Thanks to Steffen Thorsen and Pablo Camargo.)
Changes affecting past and future time stamps
Moldova starts and ends DST at 00:00 UTC, not at 01:00 UTC. (Thanks to Roman Tudos.)
emaste [Thu, 13 Aug 2015 17:50:47 +0000 (17:50 +0000)]
Roll WITHOUT_ELFTOOLCHAIN_TOOLS into WITHOUT_TOOLCHAIN
The option was added only to ease the transition from GNU Binutils to
ELF Tool Chain tools, and that process is now complete (for the viable
replacements). Noting the removal in UPDATING is sufficient as we have
not shipped a release with the option.
Reviewed by: brooks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3240
marcel [Thu, 13 Aug 2015 15:16:34 +0000 (15:16 +0000)]
Change md(4) to use weak symbols as start, end and size for the embedded
root disk. The embedded image is linked into the kernel in the .mfs
section.
Add rules and variables to kern.pre.mk and kern.post.mk that handle the
linking of the image. First objcopy is used to generate an object file.
Then, the object file is linked into the kernel.
Submitted by: Steve Kiernan <stevek@juniper.net>
Reviewed by: brooks@
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D2903
ian [Thu, 13 Aug 2015 14:43:25 +0000 (14:43 +0000)]
Constify the pointers to eventtimer and timecounter name strings.
The need for this appears as soon as you try to set the names to something
that isn't a "quoted literal". (I'm actually confused why quoted strings
aren't a problem as well, we must have some warning disabled.)
marcel [Thu, 13 Aug 2015 14:43:11 +0000 (14:43 +0000)]
Fix text mode operation.
We first map 64KB at 0xA0000 and then determine whether to work
in text or graphics mode. When graphics mode, the mapping is
precisely what we need and everything is fine. But text mode,
has the frame buffer relocated to 0xB8000. We didn't map that
much to safely add 0x18000 bytes to the base address.
Now we first check whether to work in text or graphics mode and
then map the frame buffer at the right address and with the
right size (0xA0000+64KB for graphics, 0xB8000+32KB for text).
melifaro [Thu, 13 Aug 2015 13:38:09 +0000 (13:38 +0000)]
Move lle update code from from gigantic ip_arpinput() to
separate bunch of functions. The goal is to isolate actual lle
updates to permit more fine-grained locking.
emaste [Thu, 13 Aug 2015 13:21:00 +0000 (13:21 +0000)]
arm64: turn unknown el0 exception into a SIGILL
It seems we get EXCP_UNKNOWN from QEMU when executing zeroed memory.
Print a register dump here and signal illegal instruction. Also print
a register dump for other invalid exceptions, before panic.
Reviewed by: andrew
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3370
mav [Thu, 13 Aug 2015 00:13:55 +0000 (00:13 +0000)]
MFV 286711: 6096 ZFS_SMB_ACL_RENAME needs to cleanup better
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
mav [Thu, 13 Aug 2015 00:10:36 +0000 (00:10 +0000)]
MFV 286709:
6093 zfsctl_shares_lookup should only VN_RELE() on zfs_zget() success
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Dan McDonald <danmcd@omniti.com>
mav [Wed, 12 Aug 2015 23:59:17 +0000 (23:59 +0000)]
MFV 286707: 5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
A ZFS feature flags (large blocks) tracks its refcounts as the number of
datasets that have ever used the feature. Several features of this type
are planned to be added (new checksum functions). This code should be made
common infrastructure rather than duplicating the code for each feature.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Author: Paul Dagnelie <pcd@delphix.com>
While running 'zfs recv' we noticed that every 128th 8K block required a
read. We were seeing that restore_write() was calling dmu_tx_hold_write()
and the indirect block was not cached. We should prefetch upcoming indirect
blocks to avoid having to go to disk and blocking the restore_write().
Allow an incremental send stream to be received as a clone, even if the
stream does not mark it as a clone.
np [Wed, 12 Aug 2015 22:09:58 +0000 (22:09 +0000)]
Reinstate unify_tcp_port_space and associated code that was lost during
the last OFED update (r278886).
iWARP on FreeBSD is properly integrated with the network stack and the
iWARP drivers _never_ operate out of any private TCP port-space that is
invisible to the kernel. Instead, an iWARP connection shows up as a TCP
socket (which is what it is) fully visible to the kernel and standard
tools like netstat, sockstat, etc.
ian [Wed, 12 Aug 2015 20:50:20 +0000 (20:50 +0000)]
If a specific timecounter has been chosen via sysctl, and a new timecounter
with higher quality registers (presumably in a module that has just been
loaded), do not undo the user's choice by switching to the new timecounter.
Document that behavior, and also the fact that there is no way to unregister
a timecounter (and thus no way to unload a module containing one).
dim [Wed, 12 Aug 2015 20:16:13 +0000 (20:16 +0000)]
In gcc's libcpp, stop using the INTTYPE_MAXIMUM() macro, which relies on
undefined behavior. The code used this macro to avoid problems on some
broken systems which define SSIZE_MAX incorrectly, but this is not
needed on FreeBSD, obviously.
ian [Wed, 12 Aug 2015 19:40:32 +0000 (19:40 +0000)]
Remove all dregs of the old PPS driver from this code, in preparation for
redoing it as a separate driver. Now that each hardware timer is handled by
a separate instance of the timer driver, it no longer makes sense to bundle
the pps driver with the regular timecounter code. (When all 8 timers were
handled by one driver there was no choice about this.)
Split the hardware register definitions out to their own file, so that the
new pps driver (coming in a separate commit later) can share them.
With the PPS driver gone, the question of which hardware timer to use for
what purpose becomes much easier (some instances can't do the PPS capture).
Now we can just hardcore timer2 for eventtimer and timer3 for timecounter.
This also now only instantiates devices for the 2 hardware timers actually
used to implement eventtimer and timecounter. This is required so that
other drivers can come along and attach to other hardware timers to provide
other functionality. (In addition to PPS, this hardware can also do PWM
stuff, general pulse width and frequency measurements, etc. Maybe some
day we'll have drivers for those things.)
https://www.illumos.org/issues/5981
When dmu_objset_find_dp gets called with a read lock held, it fans out
the work to the task queue. Each task in turn acquires its own read
lock before calling the callback. If during this process anyone tries
to a acquire a write lock, it will stall all read lock requests.Thus
the tasks will never finish, the read lock of the caller will never
get freed and the write lock never acquired. deadlock.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Arne Jansen <jansen@webgods.de>
imp [Wed, 12 Aug 2015 19:00:47 +0000 (19:00 +0000)]
Document build-tools better. Add rescue back because it builds /bin/sh
which has a build-tools target (see commit for how build-tools and
cross-tools differ).
https://www.illumos.org/issues/5269
When importing a pool (at boot or with zpool import) with many
filesystem, the process can take minutes. It doesn't matter whether
the pool has been exported cleanly or uncleanly. The problem is that
each dataset has its own log chain. On import, all datasets have to be
checked if there are logs to replay. The idea is to speed up this
process by paralellizing it.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Arne Jansen <jansen@webgods.de>
mav [Wed, 12 Aug 2015 18:23:08 +0000 (18:23 +0000)]
MFV r286682: 5765 add support for estimating send stream size with
lzc_send_space when source is a bookmark
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@nexenta.com>
Author: Max Grossman <max.grossman@delphix.com>
ed [Wed, 12 Aug 2015 17:46:26 +0000 (17:46 +0000)]
Perform cleanups in response to D3307.
- Document the kern_kevent_anonymous() function.
- Add assertions to ensure that we don't silently leave the kqueue
linked from a file descriptor table.
ed [Wed, 12 Aug 2015 17:42:20 +0000 (17:42 +0000)]
Add the last remaining system calls: send() and recv().
There is still one TODO item for these calls: add file descriptor
passing. The data structures are already prepared for this. It's just
the translation that's missing.
ian [Wed, 12 Aug 2015 17:23:15 +0000 (17:23 +0000)]
Add a routine to return the hardware instance/unit number from ti,hwmods,
given the hardware name.
The ti,hwmods property is used (among other things) to associate an fdt node
with a specific instance of some hardware. For example given a device node
that contains the property ti,hwmods = "timer3", if you call this passing
"timer" as the hwmod string to look for it would return 3.
https://www.illumos.org/issues/5695
In dmu_sync_ready(), a hole block pointer will have it's logical size
explicitly set as it's necessary for replay purposes. To "undo" this,
dmu_sync_done() will zero out any hole that it finds. This becomes a
problem when using the "hole_birth" feature, as this will also wipe out
any birth time that might have happened to be set on the hole.
...
As a fix, the logic to zero out a hole is only applied to old style
holes with a birth time of zero. Holes created with the "hole_birth"
feature enabled will have a non-zero birth time, and will be skipped
(thus preserving the ltime, type, and level information as well).
In addition, zdb was updated to also print the ltime, type, and level
information for these new style holes. Previously, only the logical
birth time would be printed.
Author: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
imp [Wed, 12 Aug 2015 17:19:52 +0000 (17:19 +0000)]
Why on earth have we been building rescue as a build tool for the past
12 years? Nothing downstream in the build uses it. Eliminate it as a
build tool.
andrew [Wed, 12 Aug 2015 17:06:22 +0000 (17:06 +0000)]
Set the counter-timer virtual offset to a know value, it may not have been
set by the boot code and are reset to an implementation defined value that
may be unknown.
imp [Wed, 12 Aug 2015 16:43:15 +0000 (16:43 +0000)]
Crunchgen needs to be bootstrapped to pick up the STRIP->STRIPBIN
changes to prevent the 'rescue: not found' errors from happening.
Bump FreeBSD_version to 1100078 since there's been no version bumps
since this change was made. Only people that installed since r284356
really need to do this bootstrapping, but since crunchgen needs to
bootstrap for other reasons, bumping the number was the simplest.
ed [Wed, 12 Aug 2015 16:17:00 +0000 (16:17 +0000)]
Properly return ENOTDIR when calling *at() on a non-vnode.
We already properly return ENOTDIR when calling *at() on a non-directory
vnode, but it turns out that if you call it on a socket, we see EINVAL.
Patch up namei to properly translate this to ENOTDIR.
marcel [Wed, 12 Aug 2015 15:26:32 +0000 (15:26 +0000)]
Better support memory mapped console devices, such as VGA and EFI
frame buffers and memory mapped UARTs.
1. Delay calling cninit() until after pmap_bootstrap(). This makes
sure we have PMAP initialized enough to add translations. Keep
kdb_init() after cninit() so that we have console when we need
to break into the debugger on boot.
2. Unfortunately, the ATPIC code had be moved as well so as to
avoid a spurious trap #30. The reason for which is not known
at this time.
3. In pmap_mapdev_attr(), when we need to map a device prior to the
VM system being initialized, use virtual_avail as the KVA to map
the device at. In particular, avoid using the direct map on amd64
because we can't demote by virtue of not being able to allocate
yet. Keep track of the translation.
Re-use the translation after the VM has been initialized to not
waste KVA and to satisfy the assumption in uart(4) that the handle
returned for the low-level console is the same as later returned
when the device is probed and attached.
4. In pmap_unmapdev() remove the mapping from the table when called
pre-init. Otherwise keep the mapping. During bus probe and attach
device resources are mapped and unmapped multiple times, which
would have us destroy the mapping used by the low-level console.
5. In pmap_init(), set pmap_initialized to signal that we're not
pre-init anymore. On amd64, bring the direct map in sync with the
translations created at that time.
6. Implement bus_space_map() and bus_space_unmap() for real: when
the tag corresponds to memory space, call the corresponding
pmap_mapdev() and pmap_unmapdev() functions to construct and
actual handle.
7. In efifb.c and vt_vga.c, remove the crutches and hacks and simply
call pmap_mapdev_attr() or bus_space_map() as desired.
Notes:
1. uart(4) already used bus_space_map() during low-level console
setup but since serial ports have traditionally been I/O port
based, the lack of a proper implementation for said function
was not a problem. It has always supported memory mapped UARTs
for low-level consoles by setting hw.uart.console accordingly.
2. The use of the direct map on amd64 without setting caching
attributes has been a bigger problem than previously thought.
This change has the fortunate (and unexpected) side-effect of
fixing various EFI frame buffer problems (though not all).
PR: 191564, 194952
Special thanks to:
1. XipLink, Inc -- generously donated an Intel Bay Trail E3800
based eval board (ADLE3800PC).
2. The FreeBSD Foundation, in particular emaste@ -- for UEFI
support in general and testing.
3. Everyone who tested the proposed for PR 191564.
4. jhb@ and kib@ for being a soundboard and applying a clue bat
if so needed.
ed [Wed, 12 Aug 2015 11:30:31 +0000 (11:30 +0000)]
Unignore signals when starting CloudABI processes.
As CloudABI processes cannot adjust their signal handlers, we need to
make sure that we start up CloudABI processes with consistent signal
masks. Though the POSIx standard signal behavior is all right, we do
need to make sure that we ignore SIGPIPE, as it would otherwise be
hard to interact with pipes and sockets.
Extend execsigs() to iterate over ps_sigignore and call sigdflt() for
each of the ignored signals.
kib [Wed, 12 Aug 2015 09:55:52 +0000 (09:55 +0000)]
In x2APIC mode, IPI generation is atomic because it is performed by
single ICR MSR write. This is in contrast with the xAPIC mode, where
we must read current ICR value, do bit fiddling and perform two 32-bit
register writes. As a consequence, there is no need to disable
interrupts around ICR value calculation and write.
Note that typical users of ipi_raw() and ipi_vectored() take spinlock,
which already disables interrupts. For them, the change removes
unneeded CLI and POPFL/Q instructions.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
kib [Wed, 12 Aug 2015 09:46:39 +0000 (09:46 +0000)]
Initialization of smp_tlb_wait does not require release semantic, no
data is synchronized by store/load to the variable. The
lapic_write_icr() function ensures that store buffers are flushed
before IPI command is issued.
Discussed with: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
ed [Wed, 12 Aug 2015 08:41:48 +0000 (08:41 +0000)]
Make blocking CloudABI futex operations work.
Blocking on locks and condition variables can be accomplished by polling
and using the special filters CONDVAR, LOCK_RDLOCK and LOCK_WRLOCK.
For now it wouldn't make sense to implement this functionality into
kqueue() itself, for the reason that they are CloudABI specific and
would require us to resize 'struct kevent' to hold all of the parameters
of interest.
Add a bandaid to the CloudABI poll system call to call into the futex
code directly if it detects specific combinations of events that are
used by the C library.