kib [Mon, 7 Dec 2015 12:20:26 +0000 (12:20 +0000)]
Add support for usermode (vdso-like) gettimeofday(2) and
clock_gettime(2) on ARMv7 and ARMv8 systems which have architectural
generic timer hardware. It is similar how the RDTSC timer is used in
userspace on x86.
Fix a permission problem where generic timer access from EL0 (or
userspace on v7) was not properly initialized on APs.
For ARMv7, mark the stack non-executable. The shared page is added for
all arms (including ARMv8 64bit), and the signal trampoline code is
moved to the page.
Reviewed by: andrew
Discussed with: emaste, mmel
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D4209
kib [Mon, 7 Dec 2015 12:09:04 +0000 (12:09 +0000)]
Update ctime when atime or birthtime are updated.
Cleanup setting of ctime/mtime/birthtime: do not set IN_ACCESS or
IN_UPDATE, then clear them with ufs_itimes(), making transient
(possibly inconsistent) change to the times, and then copy
user-supplied times into the inode. Instead, directly clear IN_ACCESS
or IN_UPDATE when user supplied the time, and copy the value into the
inode.
Minor inconsistency left is that the inode ctime is updated even when
birthtime update attempt is performed on a UFS1 volume.
hselasky [Mon, 7 Dec 2015 11:04:50 +0000 (11:04 +0000)]
Add support for setting the TX moderation mode via a sysctl entry. TX
completion events can be moderated in the same way like RX completion
events. Expose this functionality by a sysctl variable.
imp [Mon, 7 Dec 2015 10:24:40 +0000 (10:24 +0000)]
Start to split apart the different image formats that we need to
make. Add support for generating powerpc64 qemu images. We
can generate them, but there's something wrong booting them.
This also simplifies the user config files a bit, and removes
bits no longer true.
arybchik [Mon, 7 Dec 2015 06:04:24 +0000 (06:04 +0000)]
sfxge: switch to TxQ creation specific flags
It is better do not mix TxQ creation and receive event flags since only
checksum flags are applicable to TxQ.
Also it will allow to add a new TxQ creation specific flags.
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4389
arybchik [Mon, 7 Dec 2015 05:59:24 +0000 (05:59 +0000)]
sfxge: support PERMIT_SET_MAC_WHEN_FILTERS_INSTALLED flag
Use flag on vadapter alloc when reported as a supported capability.
Use the slow device reset only when the capability is missing.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4387
imp [Mon, 7 Dec 2015 04:02:52 +0000 (04:02 +0000)]
Fix up mtree with additional entries written to it by
nanobsd. implement support for NanoBSD touching a file (and possibly
recording that fact) as well as replacing a directory with a symlink.
Also specify the default uname and gname for files and use that as a
/set command at the top of the generated METALOG file.
smh [Mon, 7 Dec 2015 02:56:08 +0000 (02:56 +0000)]
Fix panic on shutdown due to iscsi event priority
iscsi's shutdown_pre_sync prio was SHUTDOWN_PRI_FIRST which caused it to
run before other high priority handlers such as filesystems e.g. ZFS.
This meant the iscsi sessions where removed before the ZFS geom consumer
was closed, resulting in a panic from g_access calls on debug kernels
due to negative acr.
Instead use the same as the old iscsi_initiator SHUTDOWN_PRI_DEFAULT-1
which allows it to run before dashutdown etc but after filesystems.
cem [Sun, 6 Dec 2015 17:46:12 +0000 (17:46 +0000)]
vm_fault_hold: handle vm_page_rename failure
On vm_page_rename failure, fix a missing object unlock and a double free of
a page.
First remove the old page, then rename into other page into first_object,
then free the old page. This avoids the problem on rename failure. This is
a little ugly but seems to be the most straightforward solution.
Tested with:
$ sysctl debug.fail_point.uma_zalloc_arg="1%return"
$ kyua test -k /usr/tests/sys/Kyuafile
cem [Sun, 6 Dec 2015 17:39:13 +0000 (17:39 +0000)]
pmap_invalidate_range: For very large ranges, flush the whole TLB
Typical TLBs have 40-512 entries available. At some point, iterating
every single page in a requested invalidation range and issuing invlpg
on it is more expensive than flushing the TLB and allowing it to reload
on demand.
Broadwell CPUs have 1536 L2 TLB entries, so I've picked the arbitrary
number 4096 entries as a hueristic at which point we flush TLB rather
than invalidating every single potential page.
tuexen [Sun, 6 Dec 2015 16:17:57 +0000 (16:17 +0000)]
Fix the allocation of outgoing streams:
* When processing a cookie, use the number of
streams announced in the INIT-ACK.
* When sending an INIT-ACK for an existing
association, use the value from the association,
not from the end-point.
imp [Sat, 5 Dec 2015 17:40:11 +0000 (17:40 +0000)]
When building no-priv, chmod etc/defaults/rc.conf before appending to
it and then chmod back. There's no chmod -push / chmod -pop so hard
code 444 as the right permissions here.
Also, fix more stray detritus that crept in (out?) while re-arranging
the deck chairs.
arybchik [Sat, 5 Dec 2015 17:11:14 +0000 (17:11 +0000)]
sfxge: erase nvram partitions in chunks equal to their erase size
The erase size is reported by the nvram info command.
Submitted by: Paul Fox <pfox at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4386
cem [Sat, 5 Dec 2015 17:01:38 +0000 (17:01 +0000)]
style.9: Add a small blurb about allowing bool
It was allowed before, but make it very explicit it is allowed now. And
prefer 'bool' to older types that were used for the same purpose -- int and
boolean_t.
Like with the C99 fixed-width types, use common sense when changing old
code.
melifaro [Sat, 5 Dec 2015 09:50:37 +0000 (09:50 +0000)]
Remove LLE read lock from IPv4 fast path.
LLE structure is mostly unchanged during its lifecycle.
To be more specific, there are 2 things relevant for fast path
lookup code:
1) link-level address change. Since r286722, these updates are performed
under AFDATA WLOCK.
2) Some sort of feedback indicating that this particular entry is used so
we re-send arp request to perform reachability verification instead of
expiring entry. The only signal that is needed from fast path is something
like binary yes/no.
The latter is solved by the following changes:
1) introduce special r_skip_req field which is read lockless by fast path,
but updated under (new) req_mutex mutex. If this field is non-zero, then
fast path will acquire lock and set it back to 0.
2) introduce simple state machine: incomplete->reachable<->verify->deleted.
Before that we implicitely had incomplete->reachable->deleted state machine,
with V_arpt_keep between "reachable" and "deleted". Verification was performed
in runtime 5 seconds before V_arpt_keep expire.
This is changed to "change state to verify 5 seconds before V_arpt_keep,
set r_skip_req to non-zero value and check it every second". If the value
is zero - then send arp verification probe.
These changes do not introduce any signifficant control plane overhead:
typically lle callout timer would fire 1 time more each V_arpt_keep (1200s)
for used lles and up to arp_maxtries (5) for dead lles.
As a result, all packets towards "reachable" lle are handled by fast path without
acquiring lle read lock.
Additional "req_mutex" is needed because callout / arpresolve_slow() or eventhandler
might keep LLE lock for signifficant amount of time, which might not be feasible
for fast path locking (e.g. having rmlock as ether AFDATA or lltable own lock).
kib [Sat, 5 Dec 2015 08:52:37 +0000 (08:52 +0000)]
It seems that at least some KVM versions advertise support for EIO
suppression but the version of the IOAPIC reported is 0x11 and neither
IOAPIC EOIR nor the Linux trick of temporal reprogramming of the pin
to edge-trigger mode to issue EOI work.
Disable eoi suppression if KVM is detected. The mode can still be
forced with the tunable.
Reported and tested by: Roman Mamontov <mr.xanto@gmail.com>
Sponsored by: The FreeBSD Foundation
kib [Sat, 5 Dec 2015 08:46:41 +0000 (08:46 +0000)]
In the pmap_set_pg() function, which enables the global bit on the
ptes mapping the kernel on CPUs where global TLB entries are
supported, revert to flushing only non-global entries, i.e. to the
pre-r291688 state. There is no need to flush global TLB entries,
since only global entries created during the previous iterations of
the loop could exist at this moment.
arybchik [Sat, 5 Dec 2015 07:04:11 +0000 (07:04 +0000)]
sfxge: support for MCDI logging implemented
Submitted by: Artem V. Andreev <Artem.Andreev at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4355
imp [Sat, 5 Dec 2015 04:43:56 +0000 (04:43 +0000)]
New config files for embedded boards.
Build with ../nanobsd.sh -c rpi.cfg, for example.
This can be done as a normal user.
This is a work in progress. It relies on the new nopriv
build stuff committed to nanobsd, but isn't complete yet.
Currently, one must copy files into the DOS partition
in the image. Also, ownership isn't preserved because
this doesn't use the new mtree-dedup.awk yet, but rather
some crazy mtree stuff. The image building bits will
move up into nanobsd when they are ready.
Also includes very preliminary support for building qemu
images for all platforms that we can for qemu. It is
missing aarch64, and we put the image on s2 instead of
s1 and mkimg can't mark s2 as active, so there's some
issues. Oh, and I didn't do it for arm.
imp [Sat, 5 Dec 2015 01:12:44 +0000 (01:12 +0000)]
Awk helper script that reads in a mtree METALOG file from installworld
(and soon augmented by nanobsd), performs the actions documented in
the script, and then spits out a new mtree file suitable for feeding
to makefs.
imp [Sat, 5 Dec 2015 01:10:04 +0000 (01:10 +0000)]
Setting NANO_NOPRIV_BUILD will now add -DNO_ROOT and METALOG=xxxx as
appropriate. First step in supporting a build w/o root. More to
follow as actions by customization scripts are not (yet) recorded in
the metalog, and duplicate entries in it aren't removed.
imp [Sat, 5 Dec 2015 00:54:43 +0000 (00:54 +0000)]
SRCCONF makes no sense in make.conf. Don't set it there. Rely on it
being in the environment. Also filter out the new SRC_ENV_CONF as
well. If you really need these set, set them in your config file,
not in the build environment used to launch nanobsd.
imp [Sat, 5 Dec 2015 00:15:04 +0000 (00:15 +0000)]
Minor cleanup in how we run make:
o Move SRCCONF and __MAKE_CONF into the environment to cope with
file paths with spaces in them better.
o Move the rest of the variable setting command line args into
__MAKE_CONF files.
o Trace the commands that we're using to build so they appear at the
top of the log.
o Be more consistent about quoting paths for cd and similar commands
to better cope with paths with spaces in them, though some more
work is likely needed.
o Add some comments about all this.
o Minor formatting tweaks in a couple places
jilles [Fri, 4 Dec 2015 16:32:29 +0000 (16:32 +0000)]
rc.subr: Check for running daemons before a custom start_cmd is executed.
Currently rc scripts implementing their own start_cmd do not enjoy the
benefits of rc.subr's own check for rc_pid.
This leads to around a third of ports with such a start_cmd not to check for
the process at all and two thirds of ports to re-implement this check
(sometimes wrongly).
This patch moves the check for rc_pid to before ${rc_arg}_cmd is executed.
bdrewery [Fri, 4 Dec 2015 07:54:19 +0000 (07:54 +0000)]
Fix 'install*' and many other missing targets with DIRDEPS_BUILD.
My changes in r291635 broke 'make install*' for DIRDEPS_BUILD but also
revealed that some other targets were not guaranteed to be created if
there was a SUBDIR defined. One example is 'installfiles' was never
defined if SUBDIR was not empty.
arybchik [Fri, 4 Dec 2015 06:54:46 +0000 (06:54 +0000)]
sfxge: [EF10] support RxQ scattering control
If, for example, a VF is configured to use a 1500 byte MTU, but the port
it is attached to is set to 9000 bytes, overlength frames can be received
by the VF. As Huntington scatters by default, these overlength packets
would be scattered across several descriptors, with all except the last
having the CONT bit set.
To avoid this, disable scatter when creating RXQs if the firmware
supports doing so, which all recent versions do. Then we only get
a single descriptor from an overlength frame. This will have the CONT
bit set to indicate it was truncated, so we can discard it.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4354
arybchik [Fri, 4 Dec 2015 06:51:37 +0000 (06:51 +0000)]
sfxge: add additional WRITESIZE value for NVRAM_INFO command
Submitted by: Paul Fox <pfox at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4353
mckusick [Fri, 4 Dec 2015 03:54:18 +0000 (03:54 +0000)]
We need to zero out the clustering variables in a freed vnode structure.
For completeness add a VNASSERT that there are no threads waiting on a
range lock (this was previously checked on every vnode free).
Reported by; Rick Macklem
Fix from: Mateusz Guzik
PR: 204949
ken [Fri, 4 Dec 2015 03:38:35 +0000 (03:38 +0000)]
Fix g_disk_vlist_limit() to work properly with deletes.
Add a new bp argument to g_disk_maxsegs(), and add a new function,
g_disk_maxsize() tha will properly determine the maximum I/O size for a
delete or non-delete bio.