Ed Schouten [Mon, 27 Jul 2015 13:17:57 +0000 (13:17 +0000)]
Make shutdown() return ENOTCONN as required by POSIX, part deux.
Summary:
Back in 2005, maxim@ attempted to fix shutdown() to return ENOTCONN in case the socket was not connected (r150152). This had to be rolled back (r150155), as it broke some of the existing programs that depend on this behavior. I reapplied this change on my system and indeed, syslogd failed to start up. I fixed this back in February (279016) and MFC'ed it to the supported stable branches. Apart from that, things seem to work out all right.
Since at least Linux and Mac OS X do the right thing, I'd like to go ahead and give this another try. To keep old copies of syslogd working, only start returning ENOTCONN for recent binaries.
I took a look at the XNU sources and they seem to test against both SS_ISCONNECTED, SS_ISCONNECTING and SS_ISDISCONNECTING, instead of just SS_ISCONNECTED. That seams reasonable, so let's do the same.
Test Plan:
This issue was uncovered while writing tests for shutdown() in CloudABI:
Marius Strobl [Mon, 27 Jul 2015 12:14:14 +0000 (12:14 +0000)]
- Probe UICLASS_CDC/UISUBCLASS_ABSTRACT_CONTROL_MODEL/0xff again. This
variant of Microsoft RNDIS, i. e. their unofficial version of CDC ACM,
has been disabled in r261544 for resolving a conflict with umodem(4).
Eventually, in r275790 that problem was dealt with in the right way.
However, r275790 failed to put probing of RNDIS devices in question
back.
- Initialize the device prior to querying it, as required by the RNDIS
specification. Otherwise already determining the MAC address may fail
rightfully.
- On detach, halt the device again.
- Use UCDC_SEND_ENCAPSULATED_{COMMAND,RESPONSE}. While these macros are
resolving to the same values as UR_{CLEAR_FEATURE,GET_STATUS}, the
former set is way more appropriate in this context.
- Report unknown - rather: unimplemented - events unconditionally and
not just in debug mode. This ensures that we'll get some hint of what
is going wrong instead of the driver silently failing.
- Deal with the Microsoft ActiveSync requirement of using an input buffer
the size of the expected reply or larger - except for variably sized
replies - when querying a device.
- Fix some pointless NULL checks, style bugs etc.
This changes allow urndis(4) to communicate with a Microsoft-certified
USB RNDIS test token.
Ed Schouten [Mon, 27 Jul 2015 10:07:29 +0000 (10:07 +0000)]
Add a futex implementation for CloudABI.
Summary:
CloudABI provides two different types of futex objects: read-write locks
and condition variables. There is no need to provide separate support
for once objects and thread joining, as these are efficiently simulated
by blocking on a read-write lock. Mutexes simply use read-write locks.
Condition variables always have a lock object associated to them. They
always know to which lock a thread needs to be migrated if woken up.
This allows us to implement requeueing. A broadcast on a condition
variable will never cause multiple threads to be woken up at once. They
will be woken up iteratively.
This implementation still has lots of room for improvement. Locking is
coarse and right now we use linked lists to store all of the locks and
condition variables, instead of using a hash table. The primary goal of
this implementation was to behave correctly. Performance will be
improved as we go.
Test Plan:
This futex implementation has been in use for the last couple of months
and seems to work pretty well. All of the cloudlibc and libc++ unit
tests seem to pass.
Ed Schouten [Mon, 27 Jul 2015 10:04:06 +0000 (10:04 +0000)]
Sync in latest upstream system call definitions.
Futex object scopes have been renamed from using their own constants to
simply reusing the existing CLOUDABI_MAP_{PRIVATE,SHARED} flags, as they
are more accurate in this context.
Change the dev argument from a full path to just the device
identification (e.g. isa:0x3f0 or pci0:2:1:0). In libbus,
the device is turned into a path name. For bus_space_map(),
the resource is now specified in a second argument.
Before:
bus.map('/dev/proto/pci0:2:1:0/pcicfg')
busdma.tag_create('/dev/proto/pci0:2:1:0/busdma', ...)
Now:
bus.map('pci0:2:1:0', 'pcicfg')
busdma.tag_create('pci0:2:1:0', ...)
Document r285679, bsdinstall(8) updates to workaround various
problematic BIOSes when booting from GPT, and partition scheme
selection in the UFS partition menu.
o make sure the boundary is a power of 2, when not zero.
o don't convert 0 to ~0 just so that we can use MIN. ~0 is not a
valid boundary. Introduce BNDRY_MIN that deals with 0 values
that mean no boundary.
Build debug version of rmlock's methods only when LOCK_DEBUG > 0.
Currently LOCK_DEBUG is always defined in sys/lock.h (0 or 1).
This means that debugging code always built. In addition the kernel
modules have always defined LOCK_DEBUG as 1. So, debugging rmlock code
is always used by kernel modules.
Pedro F. Giffuni [Sun, 26 Jul 2015 00:11:04 +0000 (00:11 +0000)]
Bump GCC max-inline-insns-single in libiconv_modules and grep
This is required by our FORTIFY_SOURCE implementation as it
does more inlining. As a rule of thumb, FORTIFY_SOURCE doubles
the number of inlines except that in grep inlining
blows up for some reason.
Revert r173708's modifications to vm_object_page_remove().
Assume that a vnode is mapped shared and mlocked(), and then the vnode
is truncated, or truncated and then again extended past the mapping
point EOF. Truncation removes the pages past the truncation point,
and if pages are later created at this range, they are not properly
mapped into the mlocked region, and their wiring count is wrong.
The revert leaves the invalidated but wired pages on the object queue,
which means that the pages are found by vm_object_unwire() when the
mapped range is munlock()ed, and reused by the buffer cache when the
vnode is extended again.
The changes in r173708 were required since then vm_map_unwire() looked
at the page tables to find the page to unwire. This is no longer
needed with the vm_object_unwire() introduction, which follows the
objects shadow chain.
Also eliminate OBJPR_NOTWIRED flag for vm_object_page_remove(), which
is now redundand, we do not remove wired pages.
Reported by: trasz, Dmitry Sivachenko <trtrmitya@gmail.com>
Suggested and reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Michael Tuexen [Sat, 25 Jul 2015 18:26:09 +0000 (18:26 +0000)]
Move including netinet/icmp6.h around to avoid a problem when including
netinet/icmp6.h and net/netmap.h. Both use ni_flags...
This allows to build multistack with SCTP support.
With the removal of b_saveaddr in the r285819, b_data must be reset to
b_kvabase when the buffer is reclaimed. Otherwise, if b_data for the
mapped buffer was adjusted with the page-offset portion of b_offset,
nothing would re-adjust the b_data, which breaks buffer management
code which expects page-aligned b_data (see e.g. bpman_qenter(), which
skips partial pages).
Fix a minor issue with the GB_KVAALLOC requests, which could result in
returning the mapped buffer if the reused buffer is mapped and have
the right amount of KVA reserved.
Improve assertion in the vfs_buf_check_mapped() to catch unmapped
buffers which have their b_data incorrectly adjusted with offset.
Reported and tested by: pho (previous version)
Reviewed by: jeff (previous version)
Sponsored by: The FreeBSD Foundation
Synchronize PIN input/output modes with gnu/dts/include/dt-bindings/pinctrl/am33xx.h
gpio driver requires exact value to match SoC pin mode with GPIO pin direction
OF_getencprop_alloc shouldn't be used to get string value. If string
length + 1 is not divisible by 4 this function returns NULL property
value. Otherwise - string with each 4 letters inverted
Xin LI [Sat, 25 Jul 2015 00:21:29 +0000 (00:21 +0000)]
Document the fact that system(3) can easily be misused due to shell meta
characters are honored. While I'm there also mention posix_spawn in the
SEE ALSO section.
Alan Cox [Fri, 24 Jul 2015 19:43:18 +0000 (19:43 +0000)]
Add a comment discussing the appropriate use of the atomic_*() functions
with acquire and release semantics versus the *mb() functions on amd64
processors.
Ed Maste [Fri, 24 Jul 2015 17:46:43 +0000 (17:46 +0000)]
ar: add -U (unique) option to disable -D (deterministic) mode
This is required in order for us to support deterministic mode by
default. If multiple -D or -U options are specified on the command
line, the final one takes precedence. GNU ar also uses -U for this.
An equivalent change will be applied to ELF Tool Chain's version of ar.
PR: 196929
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3175
Marius Strobl [Fri, 24 Jul 2015 17:01:16 +0000 (17:01 +0000)]
- Since r253161, uart_intr() abuses FILTER_SCHEDULE_THREAD for signaling
uart_bus_attach() during its test that 20 iterations weren't sufficient
for clearing all pending interrupts, assuming this means that hardware
is broken and doesn't deassert interrupts. However, under pressure, 20
iterations also can be insufficient for clearing all pending interrupts,
leading to a panic as intr_event_handle() tries to schedule an interrupt
handler not registered. Solve this by introducing a flag that is set in
test mode and otherwise restores pre-r253161 behavior of uart_intr(). The
approach of additionally registering uart_intr() as handler as suggested
in PR 194979 is not taken as that in turn would abuse special pccard and
pccbb handling code of intr_event_handle(). [1]
- Const'ify uart_driver_name.
- Fix some minor style bugs.
Marius Strobl [Fri, 24 Jul 2015 16:00:35 +0000 (16:00 +0000)]
- In mpt_send_handshake_cmd(), use bus_space_write_stream_4(9) for writing
raw data to the doorbell offset in order to clarify the intent and for
avoiding unnecessarily converting the endianess back and forth.
Unfortunately, the same can't be done in mpt_recv_handshake_reply() as
16-bit data needs to be read using 32-bit bus accessors.
- In mpt_recv_handshake_reply(), get rid of a redundant variable.
Marius Strobl [Fri, 24 Jul 2015 15:13:21 +0000 (15:13 +0000)]
o Revert the other functional half of r239864, i. e. the merge of r134227
from x86 to use smp_ipi_mtx spin lock not only for smp_rendezvous_cpus()
but also for the MD cache invalidation, TLB demapping and remote register
reading IPIs due to the following reasons:
- The cross-IPI SMP deadlock x86 otherwise is subject to can't happen on
sparc64. That's because on sparc64, spin locks don't disable interrupts
completely but only raise the processor interrupt level to PIL_TICK. This
means that IPIs still get delivered and direct dispatch IPIs such as the
cache invalidation etc. IPIs in question are still executed.
- In smp_rendezvous_cpus(), smp_ipi_mtx is held not only while sending an
IPI_RENDEZVOUS, but until all CPUs have processed smp_rendezvous_action().
Consequently, smp_ipi_mtx may be locked for an extended amount of time as
queued IPIs (as opposed to the direct ones) such as IPI_RENDEZVOUS are
scheduled via a soft interrupt. Moreover, given that this soft interrupt
is only delivered at PIL_RENDEZVOUS, processing of smp_rendezvous_action()
on a target may be interrupted by f. e. a tick interrupt at PIL_TICK, in
turn leading to the target in question trying to send an IPI by itself
while IPI_RENDEZVOUS isn't fully handled, yet, and, thus, resulting in a
deadlock.
o As mentioned in the commit message of r245850, on least some sun4u platforms
concurrent sending of IPIs by different CPUs is fatal. Therefore, hold the
reintroduced MD ipi_mtx also while delivering cross-traps via MI helpers,
i. e. ipi_{all_but_self,cpu,selected}().
o Akin to x86, let the last CPU to process cpu_mp_bootstrap() set smp_started
instead of the BSP in cpu_mp_unleash(). This ensures that all APs actually
are started, when smp_started is no longer 0.
o In all MD and MI IPI helpers, check for smp_started == 1 rather than for
smp_cpus > 1 or nothing at all. This avoids races during boot causing IPIs
trying to be delivered to APs that in fact aren't up and running, yet.
While at it, move setting of the cpu_ipi_{selected,single}() pointers to
the appropriate delivery functions from mp_init() to cpu_mp_start() where
it's better suited and allows to get rid of the global isjbus variable.
o Given that now concurrent IPI delivery no longer is possible, also nuke
the delays before completely disabling interrupts again in the CPU-specific
cross-trap delivery functions, previously giving other CPUs a window for
sending IPIs on their part. Actually, we now should be able to entirely get
rid of completely disabling interrupts in these functions. Such a change
needs more testing, though.
o In {s,}tick_get_timecount_mp(), make the {s,}tick variable static. While not
necessary for correctness, this avoids page faults when accessing the stack
of a foreign CPU as {s,}tick now is locked into the TLBs as part of static
kernel data. Hence, {s,}tick_get_timecount_mp() always execute as fast as
possible, avoiding jitter.
Randall Stewart [Fri, 24 Jul 2015 14:09:03 +0000 (14:09 +0000)]
Fix an issue with MAC OS locking and also optimize the case
where we are sending back a stream-reset and a sack timer is running, in
that case we should just send the SACK.
If apropos(1) and whatis(1) are not hardlinks to man(1) that means the system is
using mandocdb, then man -k should spawn apropos(1) and/or whatis(1) directly
Ed Schouten [Fri, 24 Jul 2015 07:46:02 +0000 (07:46 +0000)]
Implement the basic system calls that operate on pathnames.
Summary:
Unlike FreeBSD, CloudABI does not use null terminated strings for its
pathnames. Introduce a function called copyin_path() that can be used by
all of the filesystem system calls that use pathnames. This change
already implements the system calls that don't depend on any additional
functionality (e.g., conversion of struct stat).
Also implement the socket system calls that operate on pathnames, namely
the ones used by the C library functions bindat() and connectat(). These
don't receive a 'struct sockaddr_un', but just the pathname, meaning
they could be implemented in such a way that they don't depend on the
size of sun_path. For now, just use the existing interfaces.
Add a missing #include to cloudabi_syscalldefs.h to get this code to
build, as one of its macros depends on UINT64_C().
Test Plan:
These implementations have already been tested in the CloudABI branch on
GitHub. They pass all of the tests.
Jeff Roberson [Thu, 23 Jul 2015 19:13:41 +0000 (19:13 +0000)]
Refactor unmapped buffer address handling.
- Use pointer assignment rather than a combination of pointers and
flags to switch buffers between unmapped and mapped. This eliminates
multiple flags and generally simplifies the logic.
- Eliminate b_saveaddr since it is only used with pager bufs which have
their b_data re-initialized on each allocation.
- Gather up some convenience routines in the buffer cache for
manipulating buf space and buf malloc space.
- Add an inline, buf_mapped(), to standardize checks around unmapped
buffers.
Ed Schouten [Thu, 23 Jul 2015 11:11:01 +0000 (11:11 +0000)]
Allow cap_rights_{set,clear,is_set} to be called with no arguments.
In the CloudABI code I sometimes call into cap_rights_* without
providing any arguments. Though one could argue that this doesn't make
sense, in this specific case it's hard to avoid, as the rights that
should be tested against are forwarded by a couple of wrapper macros.
Jeff Roberson [Thu, 23 Jul 2015 02:20:41 +0000 (02:20 +0000)]
- Don't defeat the FIFO nature of the buffer cache by eliminating the
most recently used buffer when we are under paging pressure. This is
a perversion of the buffer and page replacement algorithms and recent
improvements to the page daemon have rendered it unnecessary. In the
event that low-memory deadlocks become an issue it would be possible
to make a daemon or event handler that performs a similar action on
the oldest buffers rather than the newest. Since the buf cache
is analogous to the page cache and some minimum working set is desired
another possibility is to simply shrink the minimum working set which
has less downside now that file pages are not directly mapped.
Sponsored by: EMC / Isilon
Reviewed by: alc, kib (with some minor objection)
Tested by: pho
Andrew Rybchenko [Wed, 22 Jul 2015 16:25:18 +0000 (16:25 +0000)]
sfxge: added fallbacks for pre 4.2.1 firmware support
Driver must be able to start against older firmware that is missing
recently added MCDI calls, otherwise firmware upgrade will not be
possible.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D3145
Randall Stewart [Wed, 22 Jul 2015 11:30:37 +0000 (11:30 +0000)]
Fix several problems with Stream Reset.
1) We were not handling (or sending) the IN_PROGRESS case if
the other side (or our side) was not able to reset (awaiting more data).
2) We would improperly send a stream-reset when we should not. Not
waiting until the TSN had been assigned when data was inqueue.
Ed Schouten [Wed, 22 Jul 2015 07:32:49 +0000 (07:32 +0000)]
Add Makefiles for CloudABI kernel modules.
Place all of the machine/pointer size independent code in a kernel
module called 'cloudabi'. All of the 64-bit specific code goes in a
separate module called 'cloudabi64'. The latter is only enabled on
amd64, as it is the only architecture supported.
John Baldwin [Wed, 22 Jul 2015 01:09:02 +0000 (01:09 +0000)]
Various changes to the registers displayed in DDB for x86.
- Fix segment registers to only display the low 16 bits.
- Remove unused handlers and entries for the debug registers.
- Display xcr0 (if valid) in 'show sysregs'.
- Add '0x' prefix to MSR values to match other values in 'show sysregs'.
- MFamd64: Display various MSRs in 'show sysregs'.
- Add a 'show dbregs' to display the value of debug registers.
- Dynamically size the column width for register values to properly
align columns on 64-bit platforms.
- Display %gs for i386 in 'show registers'.
Mark Johnston [Tue, 21 Jul 2015 23:57:38 +0000 (23:57 +0000)]
Fix counter reads on platforms where sizeof(uint64_t) != sizeof(uint64_t *).
In the kernel, structs such as tcpstat are manipulated as an array of
counter_u64_t (uint64_t *), but made visible to userland as an array of
uint64_t. kread_counters() was previously copying the counter array into
user space and sequentially overwriting each counter with its value. This
mostly affects IPsec counters, as other counters are exported via sysctl.
PR: 201700
Tested by: Jason Unovitch
MFC after: 1 week
Mark Johnston [Tue, 21 Jul 2015 23:22:23 +0000 (23:22 +0000)]
Let the unwinder handle faults during function prologues or epilogues.
The i386 and amd64 DDB stack unwinders contain code to detect and handle
the case where the first frame is not completely set up or torn down. This
code was accidentally unused however, since db_backtrace() was never called
with a non-NULL trap frame. This change fixes that.
Also remove get_rsp() from the amd64 code. It appears to have come from
i386, which needs to take into account whether the exception triggered a
CPL switch, since SS:ESP is only pushed onto the stack if so. On amd64,
SS:RSP is pushed regardless, so get_rsp() was doing the wrong thing for
kernel-mode exceptions. As a result, we can also remove custom print
functions for these registers.
Mark Johnston [Tue, 21 Jul 2015 23:13:11 +0000 (23:13 +0000)]
Improve stack unwinding on i386 and amd64 after an IP fault.
If we can't find a symbol corresponding to the faulting instruction, assume
that the previously-executed function is a call and attempt to find the
calling function using the return address on the stack. Otherwise we end
up associating the last stack frame with the current call, which is
incorrect and causes the unwinder to skip printing of the calling function,
resulting in a confusing backtrace.
Mark Johnston [Tue, 21 Jul 2015 23:07:55 +0000 (23:07 +0000)]
Don't return undefined symbols to a DDB symbol lookup.
Undefined symbols have a value of zero, so it makes no sense to return
such a symbol when performing a lookup by value. This occurs for example
when unwinding the stack after calling a NULL function pointer, and we
confusingly report the faulting function as uart_sab82532_class() on
amd64.
Convert db_print_loc_and_inst() to only attempt disassembly if we managed
to find a symbol corresponding to the IP. Otherwise we may fault and
re-enter the debugger.
Mark Johnston [Tue, 21 Jul 2015 23:03:21 +0000 (23:03 +0000)]
Remove some dead code from DDB's amd64 stack unwinder.
The amd64 port copied some code from i386 to fetch function arguments and
display them in backtraces. However, it was commented out and can't easily
be implemented since the function arguments are passed in
registers rather than on the stack in amd64. Remove it in preparation for
some bug fixes in this area.