imp [Sat, 5 Dec 2015 17:40:11 +0000 (17:40 +0000)]
When building no-priv, chmod etc/defaults/rc.conf before appending to
it and then chmod back. There's no chmod -push / chmod -pop so hard
code 444 as the right permissions here.
Also, fix more stray detritus that crept in (out?) while re-arranging
the deck chairs.
arybchik [Sat, 5 Dec 2015 17:11:14 +0000 (17:11 +0000)]
sfxge: erase nvram partitions in chunks equal to their erase size
The erase size is reported by the nvram info command.
Submitted by: Paul Fox <pfox at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4386
cem [Sat, 5 Dec 2015 17:01:38 +0000 (17:01 +0000)]
style.9: Add a small blurb about allowing bool
It was allowed before, but make it very explicit it is allowed now. And
prefer 'bool' to older types that were used for the same purpose -- int and
boolean_t.
Like with the C99 fixed-width types, use common sense when changing old
code.
melifaro [Sat, 5 Dec 2015 09:50:37 +0000 (09:50 +0000)]
Remove LLE read lock from IPv4 fast path.
LLE structure is mostly unchanged during its lifecycle.
To be more specific, there are 2 things relevant for fast path
lookup code:
1) link-level address change. Since r286722, these updates are performed
under AFDATA WLOCK.
2) Some sort of feedback indicating that this particular entry is used so
we re-send arp request to perform reachability verification instead of
expiring entry. The only signal that is needed from fast path is something
like binary yes/no.
The latter is solved by the following changes:
1) introduce special r_skip_req field which is read lockless by fast path,
but updated under (new) req_mutex mutex. If this field is non-zero, then
fast path will acquire lock and set it back to 0.
2) introduce simple state machine: incomplete->reachable<->verify->deleted.
Before that we implicitely had incomplete->reachable->deleted state machine,
with V_arpt_keep between "reachable" and "deleted". Verification was performed
in runtime 5 seconds before V_arpt_keep expire.
This is changed to "change state to verify 5 seconds before V_arpt_keep,
set r_skip_req to non-zero value and check it every second". If the value
is zero - then send arp verification probe.
These changes do not introduce any signifficant control plane overhead:
typically lle callout timer would fire 1 time more each V_arpt_keep (1200s)
for used lles and up to arp_maxtries (5) for dead lles.
As a result, all packets towards "reachable" lle are handled by fast path without
acquiring lle read lock.
Additional "req_mutex" is needed because callout / arpresolve_slow() or eventhandler
might keep LLE lock for signifficant amount of time, which might not be feasible
for fast path locking (e.g. having rmlock as ether AFDATA or lltable own lock).
kib [Sat, 5 Dec 2015 08:52:37 +0000 (08:52 +0000)]
It seems that at least some KVM versions advertise support for EIO
suppression but the version of the IOAPIC reported is 0x11 and neither
IOAPIC EOIR nor the Linux trick of temporal reprogramming of the pin
to edge-trigger mode to issue EOI work.
Disable eoi suppression if KVM is detected. The mode can still be
forced with the tunable.
Reported and tested by: Roman Mamontov <mr.xanto@gmail.com>
Sponsored by: The FreeBSD Foundation
kib [Sat, 5 Dec 2015 08:46:41 +0000 (08:46 +0000)]
In the pmap_set_pg() function, which enables the global bit on the
ptes mapping the kernel on CPUs where global TLB entries are
supported, revert to flushing only non-global entries, i.e. to the
pre-r291688 state. There is no need to flush global TLB entries,
since only global entries created during the previous iterations of
the loop could exist at this moment.
arybchik [Sat, 5 Dec 2015 07:04:11 +0000 (07:04 +0000)]
sfxge: support for MCDI logging implemented
Submitted by: Artem V. Andreev <Artem.Andreev at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4355
imp [Sat, 5 Dec 2015 04:43:56 +0000 (04:43 +0000)]
New config files for embedded boards.
Build with ../nanobsd.sh -c rpi.cfg, for example.
This can be done as a normal user.
This is a work in progress. It relies on the new nopriv
build stuff committed to nanobsd, but isn't complete yet.
Currently, one must copy files into the DOS partition
in the image. Also, ownership isn't preserved because
this doesn't use the new mtree-dedup.awk yet, but rather
some crazy mtree stuff. The image building bits will
move up into nanobsd when they are ready.
Also includes very preliminary support for building qemu
images for all platforms that we can for qemu. It is
missing aarch64, and we put the image on s2 instead of
s1 and mkimg can't mark s2 as active, so there's some
issues. Oh, and I didn't do it for arm.
imp [Sat, 5 Dec 2015 01:12:44 +0000 (01:12 +0000)]
Awk helper script that reads in a mtree METALOG file from installworld
(and soon augmented by nanobsd), performs the actions documented in
the script, and then spits out a new mtree file suitable for feeding
to makefs.
imp [Sat, 5 Dec 2015 01:10:04 +0000 (01:10 +0000)]
Setting NANO_NOPRIV_BUILD will now add -DNO_ROOT and METALOG=xxxx as
appropriate. First step in supporting a build w/o root. More to
follow as actions by customization scripts are not (yet) recorded in
the metalog, and duplicate entries in it aren't removed.
imp [Sat, 5 Dec 2015 00:54:43 +0000 (00:54 +0000)]
SRCCONF makes no sense in make.conf. Don't set it there. Rely on it
being in the environment. Also filter out the new SRC_ENV_CONF as
well. If you really need these set, set them in your config file,
not in the build environment used to launch nanobsd.
imp [Sat, 5 Dec 2015 00:15:04 +0000 (00:15 +0000)]
Minor cleanup in how we run make:
o Move SRCCONF and __MAKE_CONF into the environment to cope with
file paths with spaces in them better.
o Move the rest of the variable setting command line args into
__MAKE_CONF files.
o Trace the commands that we're using to build so they appear at the
top of the log.
o Be more consistent about quoting paths for cd and similar commands
to better cope with paths with spaces in them, though some more
work is likely needed.
o Add some comments about all this.
o Minor formatting tweaks in a couple places
jilles [Fri, 4 Dec 2015 16:32:29 +0000 (16:32 +0000)]
rc.subr: Check for running daemons before a custom start_cmd is executed.
Currently rc scripts implementing their own start_cmd do not enjoy the
benefits of rc.subr's own check for rc_pid.
This leads to around a third of ports with such a start_cmd not to check for
the process at all and two thirds of ports to re-implement this check
(sometimes wrongly).
This patch moves the check for rc_pid to before ${rc_arg}_cmd is executed.
bdrewery [Fri, 4 Dec 2015 07:54:19 +0000 (07:54 +0000)]
Fix 'install*' and many other missing targets with DIRDEPS_BUILD.
My changes in r291635 broke 'make install*' for DIRDEPS_BUILD but also
revealed that some other targets were not guaranteed to be created if
there was a SUBDIR defined. One example is 'installfiles' was never
defined if SUBDIR was not empty.
arybchik [Fri, 4 Dec 2015 06:54:46 +0000 (06:54 +0000)]
sfxge: [EF10] support RxQ scattering control
If, for example, a VF is configured to use a 1500 byte MTU, but the port
it is attached to is set to 9000 bytes, overlength frames can be received
by the VF. As Huntington scatters by default, these overlength packets
would be scattered across several descriptors, with all except the last
having the CONT bit set.
To avoid this, disable scatter when creating RXQs if the firmware
supports doing so, which all recent versions do. Then we only get
a single descriptor from an overlength frame. This will have the CONT
bit set to indicate it was truncated, so we can discard it.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4354
arybchik [Fri, 4 Dec 2015 06:51:37 +0000 (06:51 +0000)]
sfxge: add additional WRITESIZE value for NVRAM_INFO command
Submitted by: Paul Fox <pfox at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4353
mckusick [Fri, 4 Dec 2015 03:54:18 +0000 (03:54 +0000)]
We need to zero out the clustering variables in a freed vnode structure.
For completeness add a VNASSERT that there are no threads waiting on a
range lock (this was previously checked on every vnode free).
Reported by; Rick Macklem
Fix from: Mateusz Guzik
PR: 204949
ken [Fri, 4 Dec 2015 03:38:35 +0000 (03:38 +0000)]
Fix g_disk_vlist_limit() to work properly with deletes.
Add a new bp argument to g_disk_maxsegs(), and add a new function,
g_disk_maxsize() tha will properly determine the maximum I/O size for a
delete or non-delete bio.
bdrewery [Thu, 3 Dec 2015 22:39:42 +0000 (22:39 +0000)]
Revert r288966 as it is redundant and not right.
bsd.prog.mk and bsd.lib.mk already make OBJS depend on headers when there is
not .OBJDIR/.depend file, which is still true for the initial meta mode builds.
If there was something to benefit the meta mode build here then it should be
extended to the non-meta mode build as well.
Some of the problems here were just DPSRCS being hooked up wrongly, fixed in
r291330.
The logic itself is flawed as 'buildfiles' is in a different part of the
dependency tree than the objects and headers are, so the objects will still be
built independent from 'buildfiles'. 'buildfiles' is not ordered in the build
before objects.
ken [Thu, 3 Dec 2015 20:54:55 +0000 (20:54 +0000)]
Add asynchronous command support to the pass(4) driver, and the new
camdd(8) utility.
CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and
completed CCBs may be retrieved via the CAMIOGET ioctl. User
processes can use poll(2) or kevent(2) to get notification when
I/O has completed.
While the existing CAMIOCOMMAND blocking ioctl interface only
supports user virtual data pointers in a CCB (generally only
one per CCB), the new CAMIOQUEUE ioctl supports user virtual and
physical address pointers, as well as user virtual and physical
scatter/gather lists. This allows user applications to have more
flexibility in their data handling operations.
Kernel memory for data transferred via the queued interface is
allocated from the zone allocator in MAXPHYS sized chunks, and user
data is copied in and out. This is likely faster than the
vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in
configurations with many processors (there are more TLB shootdowns
caused by the mapping/unmapping operation) but may not be as fast
as running with unmapped I/O.
The new memory handling model for user requests also allows
applications to send CCBs with request sizes that are larger than
MAXPHYS. The pass(4) driver now limits queued requests to the I/O
size listed by the SIM driver in the maxio field in the Path
Inquiry (XPT_PATH_INQ) CCB.
There are some things things would be good to add:
1. Come up with a way to do unmapped I/O on multiple buffers.
Currently the unmapped I/O interface operates on a struct bio,
which includes only one address and length. It would be nice
to be able to send an unmapped scatter/gather list down to
busdma. This would allow eliminating the copy we currently do
for data.
2. Add an ioctl to list currently outstanding CCBs in the various
queues.
3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do
that.
4. Test physical address support. Virtual pointers and scatter
gather lists have been tested, but I have not yet tested
physical addresses or scatter/gather lists.
5. Investigate multiple queue support. At the moment there is one
queue of commands per pass(4) device. If multiple processes
open the device, they will submit I/O into the same queue and
get events for the same completions. This is probably the right
model for most applications, but it is something that could be
changed later on.
Also, add a new utility, camdd(8) that uses the asynchronous pass(4)
driver interface.
This utility is intended to be a basic data transfer/copy utility,
a simple benchmark utility, and an example of how to use the
asynchronous pass(4) interface.
It can copy data to and from pass(4) devices using any target queue
depth, starting offset and blocksize for the input and ouptut devices.
It currently only supports SCSI devices, but could be easily extended
to support ATA devices.
It can also copy data to and from regular files, block devices, tape
devices, pipes, stdin, and stdout. It does not support queueing
multiple commands to any of those targets, since it uses the standard
read(2)/write(2)/writev(2)/readv(2) system calls.
The I/O is done by two threads, one for the reader and one for the
writer. The reader thread sends completed read requests to the
writer thread in strictly sequential order, even if they complete
out of order. That could be modified later on for random I/O patterns
or slightly out of order I/O.
camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from
the pass(4) driver and also to send request notifications internally.
For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR)
per CAM CCB on the reading side, and a scatter/gather list
(CAM_DATA_SG) on the writing side. In addition to testing both
interfaces, this makes any potential reblocking of I/O easier. No
data is copied between the reader and the writer, but rather the
reader's buffers are split into multiple I/O requests or combined
into a single I/O request depending on the input and output blocksize.
For the file I/O path, camdd(8) also uses a single buffer (read(2),
write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list
(readv(2), writev(2), preadv(2), pwritev(2)) on writes.
Things that would be nice to do for camdd(8) eventually:
1. Add support for I/O pattern generation. Patterns like all
zeros, all ones, LBA-based patterns, random patterns, etc. Right
Now you can always use /dev/zero, /dev/random, etc.
2. Add support for a "sink" mode, so we do only reads with no
writes. Right now, you can use /dev/null.
3. Add support for automatic queue depth probing, so that we can
figure out the right queue depth on the input and output side
for maximum throughput. At the moment it defaults to 6.
4. Add support for SATA device passthrough I/O.
5. Add support for random LBAs and/or lengths on the input and
output sides.
6. Track average per-I/O latency and busy time. The busy time
and latency could also feed in to the automatic queue depth
determination.
sys/cam/scsi/scsi_pass.h:
Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue
and fetch asynchronous CAM CCBs respectively.
Although these ioctls do not have a declared argument, they
both take a union ccb pointer. If we declare a size here,
the ioctl code in sys/kern/sys_generic.c will malloc and free
a buffer for either the CCB or the CCB pointer (depending on
how it is declared). Since we have to keep a copy of the
CCB (which is fairly large) anyway, having the ioctl malloc
and free a CCB for each call is wasteful.
CAMIOQUEUE adds a CCB to the incoming queue. The CCB is
executed immediately (and moved to the active queue) if it
is an immediate CCB, but otherwise it will be executed
in passstart() when a CCB is available from the transport layer.
When CCBs are completed (because they are immediate or
passdone() if they are queued), they are put on the done
queue.
If we get the final close on the device before all pending
I/O is complete, all active I/O is moved to the abandoned
queue and we increment the peripheral reference count so
that the peripheral driver instance doesn't go away before
all pending I/O is done.
The new passcreatezone() function is called on the first
call to the CAMIOQUEUE ioctl on a given device to allocate
the UMA zones for I/O requests and S/G list buffers. This
may be good to move off to a taskqueue at some point.
The new passmemsetup() function allocates memory and
scatter/gather lists to hold the user's data, and copies
in any data that needs to be written. For virtual pointers
(CAM_DATA_VADDR), the kernel buffer is malloced from the
new pass(4) driver malloc bucket. For virtual
scatter/gather lists (CAM_DATA_SG), buffers are allocated
from a new per-pass(9) UMA zone in MAXPHYS-sized chunks.
Physical pointers are passed in unchanged. We have support
for up to 16 scatter/gather segments (for the user and
kernel S/G lists) in the default struct pass_io_req, so
requests with longer S/G lists require an extra kernel malloc.
The new passcopysglist() function copies a user scatter/gather
list to a kernel scatter/gather list. The number of elements
in each list may be different, but (obviously) the amount of data
stored has to be identical.
The new passmemdone() function copies data out for the
CAM_DATA_VADDR and CAM_DATA_SG cases.
The new passiocleanup() function restores data pointers in
user CCBs and frees memory.
Add new functions to support kqueue(2)/kevent(2):
passreadfilt() tells kevent whether or not the done
queue is empty.
passkqfilter() adds a knote to our list.
passreadfiltdetach() removes a knote from our list.
Add a new function, passpoll(), for poll(2)/select(2)
to use.
Add devstat(9) support for the queued CCB path.
sys/cam/ata/ata_da.c:
Add support for the BIO_VLIST bio type.
sys/cam/cam_ccb.h:
Add a new enumeration for the xflags field in the CCB header.
(This doesn't change the CCB header, just adds an enumeration to
use.)
sys/cam/cam_xpt.c:
Add a new function, xpt_setup_ccb_flags(), that allows specifying
CCB flags.
sys/cam/cam_xpt.h:
Add a prototype for xpt_setup_ccb_flags().
sys/cam/scsi/scsi_da.c:
Add support for BIO_VLIST.
sys/dev/md/md.c:
Add BIO_VLIST support to md(4).
sys/geom/geom_disk.c:
Add BIO_VLIST support to the GEOM disk class. Re-factor the I/O size
limiting code in g_disk_start() a bit.
sys/kern/subr_bus_dma.c:
Change _bus_dmamap_load_vlist() to take a starting offset and
length.
Add a new function, _bus_dmamap_load_pages(), that will load a list
of physical pages starting at an offset.
Update _bus_dmamap_load_bio() to allow loading BIO_VLIST bios.
Allow unmapped I/O to start at an offset.
sys/kern/subr_uio.c:
Add two new functions, physcopyin_vlist() and physcopyout_vlist().
sys/pc98/include/bus.h:
Guard kernel-only parts of the pc98 machine/bus.h header with
#ifdef _KERNEL.
This allows userland programs to include <machine/bus.h> to get the
definition of bus_addr_t and bus_size_t.
sys/sys/bio.h:
Add a new bio flag, BIO_VLIST.
sys/sys/uio.h:
Add prototypes for physcopyin_vlist() and physcopyout_vlist().
share/man/man4/pass.4:
Document the CAMIOQUEUE and CAMIOGET ioctls.
usr.sbin/Makefile:
Add camdd.
usr.sbin/camdd/Makefile:
Add a makefile for camdd(8).
cem [Thu, 3 Dec 2015 17:22:55 +0000 (17:22 +0000)]
if_ntb: Don't roundup MW size to full BAR size unnecessarily
Note that the MW allocation still must be BAR *aligned*. So, this only
loosens the constraints on MW allocation slightly. BAR-aligned does not
play well with large (GB+) BAR sizes.
Going forward, if anyone cares about if_ntb on very large BARs, I
suggest they add functionality to allocate a smaller window than the BAR
size, and set the BAR range to cover a window much larger than the
allocated window. This will require negotiating a window offset and
limit for protocol traffic. None of this is implemented in this
revision.
cem [Thu, 3 Dec 2015 17:21:10 +0000 (17:21 +0000)]
Pull vm_object_scan_all_shadowed out of vm_object_backing_scan
These two functions were largely unrelated, they just used the same same
loop logic to walk through a backing object's memq. Pull out the
all_shadowed test as its own function and eliminate
OBSC_TEST_ALL_SHADOWED. Rename vm_object_backing_scan to
vm_object_collapse_scan.
dim [Thu, 3 Dec 2015 15:41:10 +0000 (15:41 +0000)]
In assembler mode, clang defaulted to DWARF3, if only -g was specified.
Change this to DWARF2, in the simplest way possible. (Upstream, this
was fixed in clang trunk r250173, but this was done along with a lot of
shuffling around of debug option handling, so it cannot be applied
as-is.)
hselasky [Thu, 3 Dec 2015 14:56:17 +0000 (14:56 +0000)]
Convert the mlxen driver to use the BUSDMA(9) APIs instead of
vtophys() when loading mbufs for transmission and reception. While at
it all pointer arithmetic and cast qualifier issues were fixed, mostly
related to transmission and reception.
hselasky [Thu, 3 Dec 2015 13:29:20 +0000 (13:29 +0000)]
Updated the mlx4 and mlxen drivers to the latest version, v2.1.6:
- Added support for dumping the SFP EEPROM content to dmesg.
- Fixed handling of network interface capability IOCTLs.
- Fixed race when loading and unloading the mlxen driver by applying
appropriate locking.
- Removed two unused C-files.