hselasky [Sun, 25 Dec 2016 19:49:09 +0000 (19:49 +0000)]
Improve LinuxKPI device support. Only delete own BSD devices and not
the ones obtained through devclass_get_device(). Some minor code
cleanups while at it.
tuexen [Sun, 25 Dec 2016 17:37:18 +0000 (17:37 +0000)]
Remove a KASSERT which is not always true.
In case of the empty queue tp->snd_holes and tcp_sackhole_insert()
failing due to memory shortage, tp->snd_holes will be empty.
This problem was hit when stress tests where performed by pho.
mav [Sun, 25 Dec 2016 09:40:44 +0000 (09:40 +0000)]
Improve third-party copy error reporting.
For EXTENDED COPY:
- improve parameters checking to report some errors before copy start;
- forward sense data from copy target as descriptor in case of error;
- report which CSCD reported error in sense key specific information.
For WRITE USING TOKEN:
- pass through real sense data from copy target instead of reporting
our copy error, since for initiator its a "simple" write, not a copy.
jamie [Sat, 24 Dec 2016 23:51:27 +0000 (23:51 +0000)]
Improve IP address list representation in libxo output.
Extract decision-making about special-case printing of certain
jail parameters into a function.
Refactor emitting of IPv4 and IPv6 address lists into a function.
Resulting user-facing changes:
XO_VERSION is bumped to 2.
In verbose mode (-v), IPv4 and IPv6-Addresses are now properly emitted
as separate lists.
This only affects the output in encoding styles, i.e. xml and json.
In -n mode, ip4.addr and ip6.addr are formatted in the encoding styles'
native list types, e.g. instead of comma-separated lists, JSON arrays
are printed.
mav [Sat, 24 Dec 2016 17:42:34 +0000 (17:42 +0000)]
Improve length handling when writing sense data.
- Allow maximal sense size limitation via Control Extension mode page.
- When sense size limited, include descriptors atomically: whole or none.
- Set new SDAT_OVFL bit if some descriptors don't fit the limit.
- Report real written sense length instead of static maximal 252 bytes.
ngie [Sat, 24 Dec 2016 12:50:17 +0000 (12:50 +0000)]
Unbreak syslogd after r310494
Don't close all file descriptors greater than STDERR_FILENO (2) in
waitdaemon(..) -- only close fd (file descriptor for /dev/null used in
subsequent calls to dup2) if it's greater than STDERR_FILENO.
Reported by: subbsd@gmail.com, danny@cs.huji.ac.il
Pointyhat to: hrs
X-MFC with: r310494
ngie [Sat, 24 Dec 2016 11:41:16 +0000 (11:41 +0000)]
Be more strict about IpAddress type in snmp_value_parse(..)
- Use inet_pton with AF_INET instead of doing longhand with sscanf.
- Use gethostbyname2 with AF_INET to ensure that the hostname isn't
accidentally parsed with another address family, e.g. AF_INET6.
NB: IpAddress per RFC-2578 is IPv4 only. Work is in progress to add
the InetAddress type and friends documented in RFC-4001 and
elsewhere (which supports IPv4, IPv6, and more).
ngie [Sat, 24 Dec 2016 11:22:28 +0000 (11:22 +0000)]
Warning message cleanup
- Use warn instead of warnx + strerror(errno)
- Remove unnecessary trailing newline from a warnx call
- Add missing spaces following "," in syslog and warn* calls
kib [Sat, 24 Dec 2016 09:57:31 +0000 (09:57 +0000)]
Fix argument type and microoptimize swp_pager_meta_free().
The count argument natural type if vm_pindex_t, but due to the loop
organization, it has to be signed type to detect the termination
condition. Replace this logic by using distinguished counter for the
processed pages, and terminate loop when the counter exceeds the
argument.
Completely process one swblock for all relevant indexes instead of
doing relookup in hash when incrementing page index on the loop step.
Do not drop hash mutex around iterations.
Noted and reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
gonzo [Sat, 24 Dec 2016 00:30:29 +0000 (00:30 +0000)]
[spigen] Fix spigen attaching as a driver for SPI devices nodes in FDT
Return BUS_PROBE_NOWILDCARD in probe method to make sure that spigen
attaches only to the device created in identify method.
Before this change spigen probe method used to return 0 which meant it
competed with other drivers to be attached to the devices created for
child nodes of SPI bus node in FDT.
rmacklem [Fri, 23 Dec 2016 23:14:53 +0000 (23:14 +0000)]
Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors.
For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure
that indicates that the server has lost session/open/lock state.
However, recent testing by cperciva@ against the AmazonEFS server found
several problems with client recovery from this due to it generating this
failure frequently.
Briefly, the problems fixed are:
- If all session slots were in use at the time of the failure, some processes
would continue to loop waiting for a slot on the old session forever.
- If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION,
it would fail the RPC/syscall instead of initiating recovery and then
looping to retry the RPC.
- If a successful reply to an RPC for an old session wasn't processed
until after a new session was created for a NFS4ERR_BAD_SESSION error,
it would erroneously update the new session and corrupt it.
- The use of the first element of the session list in the nfs mount
structure (which is always the current metadata session) was slightly
racey. With changes for the above problems it became more racey, so all
uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT().
- Although the kernel malloc() usually allocates more bytes than requested
and, as such, this wouldn't have caused problems, the allocation of a
session structure was 1 byte smaller than it should have been.
(Null termination byte for the string not included in byte count.)
There are probably still problems with a pNFS data server that fails
with NFS4ERR_BAD_SESSION, but I have no server that does this to test
against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet.
Although this patch is fairly large, it should only affect the handling
of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server.
Thanks go to cperciva@ for the extension testing he did to help isolate/fix
these problems.
jah [Fri, 23 Dec 2016 15:14:56 +0000 (15:14 +0000)]
Move the objects used to create temporary mappings for i386 pmap zero and copy
operations to the MD PCPU region. Change sysmap initialization to only
allocate KVA pages for CPUs that are actually present. As a minor
optimization, this also prevents false sharing between adjacent sysmap objects
since the pcpu struct is already cacheline-aligned.
While here, move pc_qmap_addr initialization for the BSP into
pmap_bootstrap(), which allows use of pmap_quick* functions during early boot.
ngie [Fri, 23 Dec 2016 06:56:48 +0000 (06:56 +0000)]
Group all loadable modules in the %default section
This will allow new users to uncomment the modules and have things work
with less head scratching, in the event they decide to uncomment any
of the section separators, e.g. %usm or %vcm, as the module loading is
only effective in the %default section.
ngie [Fri, 23 Dec 2016 05:07:28 +0000 (05:07 +0000)]
Clarify failure in snmp_output(..) with call to snmp_pdu_decode
- Explicitly test snmp_pdu_encode against SNMP_CODE_OK instead of assuming
any non-zero value is bad.
- Print out the code before calling abort() to give the end-user something
actionable to debug without having to recompile the binary, since the
core might not have these details.
jhb [Fri, 23 Dec 2016 03:27:11 +0000 (03:27 +0000)]
Teach DDB how to unwind across a kernel stack overflow.
Kernel stack overflows in MIPS call panic() directly from an assembly
handler after storing the interrupted context's registers in a
trapframe. Rather than inferring the location of ra, sp, and pc from
the instruction stream, recognize the pc of a kernel stack overflow
and pull the registers from the trapframe.
jhb [Fri, 23 Dec 2016 03:20:34 +0000 (03:20 +0000)]
MFamd64: Various fixes for MIPS minidumps.
- Honor PG_NODUMP by not dumping pages with this flag set.
- Pat the watchdog during dumps to avoid a watchdog reset while writing
out a dump.
- Reformat the output during a dump to update every 10% done rather than
every 2MB dumped.
- Include UMA small pages and pages holding PV entries in minidumps.
hrs [Thu, 22 Dec 2016 23:39:11 +0000 (23:39 +0000)]
- Add -S option to specify the source address/port for UDP communication.
- Document -S option.
- Document that -h option supports AF_LOCAL.
- Split preparation of UDP sockets in logmessage() into socksetup().
ngie [Thu, 22 Dec 2016 22:30:42 +0000 (22:30 +0000)]
Revert r310138
Adding %b support to vfprintf for parity with kernel space requires
more discussion/review.
In particular, many parties were concerned over introducing a
non-standard format qualifier to *printf(3) which didn't already
exist in other OSes, e.g. Linux, thus making code which used %b
harder to port to other operating systems.
ian [Thu, 22 Dec 2016 21:11:42 +0000 (21:11 +0000)]
Use ${.OBJDIR} to refer to the kernel build object dir, instead of trying
to recreate it from ${MAKEOBJDIRPREFIX} and ${SRC_BASE} and ${KERNCONF},
the latter being especially problematic when KERNCONF is set to the names
of multiple kernel configs.
adrian [Thu, 22 Dec 2016 21:01:56 +0000 (21:01 +0000)]
[rsu] convert rsu to use the ieee80211_rx_stats struct to pass up RSSI, PHY and rate information.
I don't yet know which RX descriptor bits map to shortgi, long-gi,
short-preamble, long-preamble, STBC, LDPC, HT40, etc - so I can't
easily add those just yet.
There's apparently no per-frame RX RSSI information exposed so we
also just use the results from the previous calibration task.
This also tidies up how the per-mbuf RSSI is pushed into the frame -
now that it's attached to the mbuf via rx_stats, we don't have to
do any silly hijinx to get it out of the frame processing path.
jhb [Thu, 22 Dec 2016 20:28:06 +0000 (20:28 +0000)]
Fix dump_avail[] for MALTA platforms to include the kernel.
dump_avail[] is supposed to be a superset of phys_avail[] that
describes all of the memory ranges that should be included in a full
dump. minidumps don't consider pages described by dump_avail[] to be
valid and thus they are excluded via the is_dumpable() function. Most
MIPS platforms (including MALTA) set dump_avail[] to be identical to
phys_avail[]. In particular, phys_avail[] doesn't include the kernel
itself, so pages for the kernel and it's global variables are not
considered dumpable and not included in the dump. Fix this by setting
dump_avail[0] to the first memory address (0) rather than the end of
the kernel.
Several other MIPS platforms have the same bug, though I am only able
to test malta in qemu. The correct fix is to set dump_avail[] to
describe RAM and in particular to not set dump_avail[0] to the end of
the kernel (kernel_kseg0_end).
julian [Thu, 22 Dec 2016 18:30:29 +0000 (18:30 +0000)]
If you are going to be run individually to make a new timezone set
then ensure the destination directories exist.
Especially if you define OLDTIMEZONES because the mtree pass
doesn't do it for you.
jhb [Thu, 22 Dec 2016 18:05:22 +0000 (18:05 +0000)]
Replace passive voice with active voice and other tweaks.
- Drop uses of 'will'.
- Replace 'to use' with active voice.
- Tidy language around interrupt types and clarify that INTx doesn't
work on VFs.
- Drop leading articles from sysctl/tunable descriptions.
- Tweak the wording of several sysctl/tunable descriptions.
markj [Thu, 22 Dec 2016 17:51:44 +0000 (17:51 +0000)]
Revert part of r300109.
The removal of TAILQ_FOREACH_SAFE introduced a small race: when the last
thread on a sleepqueue is awoken, it reclaims the sleepqueue and may begin
executing on a different CPU before sleepq_resume_thread() returns. This
leaves a window during which it may go back to sleep and incorrectly be
awoken again by the caller of sleepq_broadcast().
Reported and tested by: pho
MFC after: 3 days
Sponsored by: Dell EMC Isilon
markj [Thu, 22 Dec 2016 17:44:27 +0000 (17:44 +0000)]
rtld: Fix a couple of bugs around the unloading of ELF filters.
- Pass the correct object to unload_filtees().
- Use a marker to restart iteration after unload_filtees() has returned.
It calls dlclose() and may recursively remove entries from the global
object list, so TAILQ_FOREACH_SAFE is not sufficient.
markj [Thu, 22 Dec 2016 17:41:32 +0000 (17:41 +0000)]
rtld: Ensure that dlopen() cannot obtain a reference on a doomed object.
rtld drops the bind lock to call fini functions in an object prior to
unmapping it. The new "doomed" state flag prevents the acquisition of new
references for an object while the lock is dropped.
royger [Thu, 22 Dec 2016 16:09:44 +0000 (16:09 +0000)]
xen: fix IPI setup with EARLY_AP_STARTUP
Current Xen IPI setup functions require that the caller provide a device in
order to obtain the name of the interrupt from it. With early AP startup this
device is no longer available at the point where IPIs are bound, and a KASSERT
would trigger:
panic: NULL pcpu device_t
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff82233a20
vpanic() at vpanic+0x186/frame 0xffffffff82233aa0
kassert_panic() at kassert_panic+0x126/frame 0xffffffff82233b10
xen_setup_cpus() at xen_setup_cpus+0x5b/frame 0xffffffff82233b50
mi_startup() at mi_startup+0x118/frame 0xffffffff82233b70
btext() at btext+0x2c
Fix this by no longer requiring the presence of a device in order to bind IPIs,
and simply use the "cpuX" format where X is the CPU identifier in order to
describe the interrupt.
hrs [Thu, 22 Dec 2016 13:46:17 +0000 (13:46 +0000)]
- Fix a use-after-free bug when dq_timeout == 1 and
sending SIGTERM to the process failed. It is an
unusual situation but it can happen.
- Split deadq_remove() into deadq_remove() and
deadq_removebypid().
- Normalize variable names of struct deadq_entry *.
hrs [Thu, 22 Dec 2016 05:23:38 +0000 (05:23 +0000)]
- Simplify masklen->netmask conversion for AF_INET6.
- Use iov[N] by array index instead of using pointer v = &iov[0] to
make the compiler catch an out-of-range access of the array.
pfg [Wed, 21 Dec 2016 15:26:59 +0000 (15:26 +0000)]
pax(1): Fix a bug with archives smaller than 512 bytes.
The problem here is that the archive is too short (< 512 bytes). The
buffer routines, try to read at least 512 bytes, even when we try to
determine what format file we have, which is wrong.
Obtained from: NetBSD (CVS rev 1.26)
MFC after: 5 days
mav [Wed, 21 Dec 2016 15:17:47 +0000 (15:17 +0000)]
Add support for SITUA bit in Logical Block Provisioning mode page.
VMware tries to enable this bit to avoid multiple threshold notifications
in case of multiple initiators connected to the same LUN. Unfortunately
their code sends MODE SELECT(6) request with parameter length hardcoded
for the page without any thresholds. Since we have four threshold and our
page is bigger, this attempt fails, that is correct in my understanding.
So all we can do about this now is to report proper error code and hope
VMware fix their code one day.
emaste [Wed, 21 Dec 2016 14:06:44 +0000 (14:06 +0000)]
libunwind: make __{de,}register_frame compatible with libgcc API
The libgcc __register_frame and __deregister_frame functions take a
pointer to a set of FDE/CIEs, terminated by an entry where length is 0.
In Apple's libunwind implementation the pointer is taken to be to a
single FDE. I suspect this was just an Apple bug, compensated by Apple-
specific code in LLVM.
See lib/ExecutionEngine/RuntimeDyld/RTDyldMemoryManager.cpp and
http://lists.llvm.org/pipermail/llvm-dev/2013-April/061737.html
for more detail.
This change is based on the LLVM RTDyldMemoryManager.cpp. It should
later be changed to be alignment-safe.
Reported by: dim
Reviewed by: dim
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D8869
ed [Wed, 21 Dec 2016 08:32:20 +0000 (08:32 +0000)]
Add an example inetd(8) entry for the Prometheus sysctl exporter.
I went through the process of allocating a default port number for this
exporter, TCP 9124. This means that we can add an entry to the services
file as well.
List of Prometheus default port numbers:
https://github.com/prometheus/prometheus/wiki/Default-port-allocations
ed [Wed, 21 Dec 2016 08:29:44 +0000 (08:29 +0000)]
Add a Prometheus exporter for sysctl values.
Now that we have our sysctl tree annotated with aggregation labels,
let's go ahead and provide a very simple utility for exporting the
sysctl tree in Prometheus' format. It can either be used in conjunction
with the Prometheus node exporter or run through inetd(8).
The reason why I'm opting for having it in the base system is because it
has a pretty strong integration with some of sysctl's innards, such as
access to iterators, name lookups, metadata and type information. As I
am investigating whether we can add histograms as native types to sysctl
as well, this integration will only get stronger as we go along. That's
why it would be safer to oversee the development of this exporter
ourselves, as opposed to having it as an external project.
This exporter is remarkably compact, especially when compared to the
official Linux binary of the Prometheus node exporter (16 KB vs 12 MB).
I guess this could be an interesting aspect for monitoring embedded
FreeBSD-based systems.
hrs [Wed, 21 Dec 2016 06:42:30 +0000 (06:42 +0000)]
- Add fklog into struct socklist. Files and local/remote sockets are
now processed in struct socklist in a consistent manner.
- Add helper functions to add a new entry of struct socklist, filed, or peer.
- Use the same routine for -l, -p, and -S.
- Close /dev/klog when read(2) failed.
sephe [Wed, 21 Dec 2016 03:09:07 +0000 (03:09 +0000)]
hyperv/storvsc: The max channel in PDU actually means the max sub-chans.
Use proper name for local variables. PDU fields' name was not changed yet.
While I'm here, make # of usable channels tunable. This eases further
testing.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8851
landonf [Wed, 21 Dec 2016 00:50:21 +0000 (00:50 +0000)]
bhnd(4): Use a stable sort key to produce deterministic nvram_map_gen.awk
output.
When ordering SROM layout entries, we now use the unique (var_id,
rev_start, rev_end) tuple as the sort key; this fixes the previously
non-deterministic output when sorting entries with overlapping var_ids.
ken [Tue, 20 Dec 2016 21:17:07 +0000 (21:17 +0000)]
Turn on FC-Tape by default in the isp(4) driver.
FC-Tape provides additional link level error recovery, and is
highly recommended for tape devices. It will only be turned on for
a given target if the target supports it.
Without this setting, we default to whatever FC-Tape setting is in
NVRAM on the card.
This can be overridden by setting the following loader tunable, for
example for isp0:
hint.isp.0.nofctape=1
sys/conf/options:
Add a new kernel config option, ISP_FCTAPE_OFF, that
defaults the FC-Tape configuration to off.
sys/dev/isp/isp_pci.c:
If ISP_FCTAPE_OFF is defined, turn off FC-Tape. Otherwise,
turn it on if the card supports it.
share/man/man4/isp.4:
Add a description of FC-Tape to the isp(4) man page.
Add descriptions of the fctape and nofctape options, as well as the
ISP_FCTAPE_OFF kernel configuration option.
Add the ispfw module and kernel drivers to the suggested
configurations at the top of the man page so that users are less
likely to leave it out. The driver works well with the included
firmware, but may not work at all with whatever firmware the user
has flashed on their card.