ngie [Thu, 5 Nov 2015 07:48:48 +0000 (07:48 +0000)]
MFC r289913,r289916:
r289913:
Use 't' (bits) not 'i' (bytes) for describing MRIE (aka
"Method of Reporting Informational Exceptions") in the SCSI mode database as
the field described in X3T10/94-190 (revision 4; page 2, table 1) [1.] is
4 bits wide, not 4 bytes wide
Bug 200619
Reported by: Michael Baptist <mbaptist@isilon.com>
Submitted by: Lars Skodje <lskodje@isilon.com>
Sponsored by: EMC / Isilon Storage Division
r289916:
Limit RESOLUTION_MAX to INT_MAX, not UINT_MAX (all spelled out) so the
mode value isn't always clipped to -1 when (resolution * size) == 32, which
would have been the case with values => {4i,32b,32t}.
This seems to have been broken in r64382.
PR: 200619
Reported by: Michael Baptist
Submitted by: Lars Skodje
Sponsored by: EMC / Isilon Storage Division
hrs [Wed, 4 Nov 2015 01:00:42 +0000 (01:00 +0000)]
MFC r288600:
- Schedule DAD for IN6_IFF_TENTATIVE addresses in nd6_timer(). This
catches cases that DAD probes cannot be sent because of
IFF_UP && !IFF_DRV_RUNNING.
- nd6_dad_starttimer() now calls nd6_dad_ns_output(), instead of
calling it before nd6_dad_starttimer().
- Do not release an entry in dadq when a duplicate entry is being
added.
hselasky [Tue, 3 Nov 2015 10:24:54 +0000 (10:24 +0000)]
MFC r285914, r289029 and r289560:
- Move the remainder of host controller capability registers reading from
xhci_start_controller() to xhci_init(). These values don't change at run-
time so there's no point of acquiring them on every USB_HW_POWER_RESUME
instead of only once during initialization. In r276717, reading the first
couple of registers in question already had been moved as a prerequisite
for the changes in that revision.
- Identify ASMedia ASM1042A controllers.
- Use NULL instead of 0 for pointers.
- Add quirks for USB 3.0 PCI devices.
kib [Tue, 3 Nov 2015 08:31:01 +0000 (08:31 +0000)]
MFC r289660,r289664:
Do not allow to execute ptrace(PT_TRACE_ME) when the process is
already traced or when there is no parent which can trace the process.
dteske [Mon, 2 Nov 2015 21:46:58 +0000 (21:46 +0000)]
MFC r287696:
The <arch>/mkisoimages.sh script in release knows how to add
extra bits from an "xtra-bits-dir". This feature is unusable
from release/Makefile. Add an XTRADIR setting to use it.
MFC r287697: Whitespace alignment
wollman [Fri, 30 Oct 2015 19:26:55 +0000 (19:26 +0000)]
Long-overdue MFC of r280930:
Fix overflow bugs in and remove obsolete limit from kernel RPC
implementation.
The kernel RPC code, which is responsible for the low-level scheduling
of incoming NFS requests, contains a throttling mechanism that
prevents too much kernel memory from being tied up by NFS requests
that are being serviced. When the throttle is engaged, the RPC layer
stops servicing incoming NFS sockets, resulting ultimately in
backpressure on the clients (if they're using TCP). However, this is
a very heavy-handed mechanism as it prevents all clients from making
any requests, regardless of how heavy or light they are. (Thus, when
engaged, the throttle often prevents clients from even mounting the
filesystem.) The throttle mechanism applies specifically to requests
that have been received by the RPC layer (from a TCP or UDP socket)
and are queued waiting to be serviced by one of the nfsd threads; it
does not limit the amount of backlog in the socket buffers.
The original implementation limited the total bytes of queued requests
to the minimum of a quarter of (nmbclusters * MCLBYTES) and 45 MiB.
The former limit seems reasonable, since requests queued in the socket
buffers and replies being constructed to the requests in progress will
all require some amount of network memory, but the 45 MiB limit is
plainly ridiculous for modern memory sizes: when running 256 service
threads on a busy server, 45 MiB would result in just a single
maximum-sized NFS3PROC_WRITE queued per thread before throttling.
Removing this limit exposed integer-overflow bugs in the original
computation, and related bugs in the routines that actually account
for the amount of traffic enqueued for service threads. The old
implementation also attempted to reduce accounting overhead by
batching updates until each queue is fully drained, but this is prone
to livelock, resulting in repeated accumulate-throttle-drain cycles on
a busy server. Various data types are changed to long or unsigned
long; explicit 64-bit types are not used due to the unavailability of
64-bit atomics on many 32-bit platforms, but those platforms also
cannot support nmbclusters large enough to cause overflow.
This code (in a 10.1 kernel) is presently running on production NFS
servers at CSAIL.
Summary of this revision:
* Removes 45 MiB limit on requests queued for nfsd service threads
* Fixes integer-overflow and signedness bugs
* Avoids unnecessary throttling by not deferring accounting for
completed requests
delphij [Thu, 29 Oct 2015 17:00:51 +0000 (17:00 +0000)]
MFC r289038,r289041:
Add encoding for mime-types.
Fix short month names and replace %b with %_m in date_fmt for Chinese
locales.
When using a Chinese locale, such as zh_TW.UTF-8 or zh_CN.UTF-8,
nl_langinfo(ABMON_*) only returned numbers. For instance,
nl_langinfo(ABMON_1) returns 1, nl_langinfo(ABMON_2) returns 2, and
so on.
This causes problems in applications that put the short month name
and the day of the month together. For example, 'Apr 14' in English
becomes '414æ—¥' in Chinese on the top bar of GNOME Shell.
This problem may be resolved by appending '月' to all short month
names and replacing %b with %_m in date_fmt. ja_JP.UTF-8 already
does this, and this matches the en_US.ISO8859-1 behavior, which
returns 'Oct'. The GNU C Library also returns values with '月'
appended.
PR: 199441
Submitted by: Ting-Wei Lan <lantw44 gmail com>
hiren [Thu, 29 Oct 2015 00:36:10 +0000 (00:36 +0000)]
MFC r289293
Fix an unnecessarily aggressive behavior where mtu clamping begins on first
retransmission timeout (rto) when blackhole detection is enabled. Make
sure it only happens when the second attempt to send the same segment also fails
with rto.
Also make sure that each mtu probing stage (usually 1448 -> 1188 -> 524) follows
the same pattern and gets 2 chances (rto) before further clamping down.
Note: RFC4821 doesn't specify implementation details on how this situation
should be handled.
jhb [Thu, 29 Oct 2015 00:18:03 +0000 (00:18 +0000)]
MFC 278582:
MFi386:
When building some of the boot loaders with clang, and DEBUG_FLAGS or
CFLAGS having '-g' in it, clang outputs several assembly directives that
are too new for our version of binutils.
Therefore, assemble the resulting .s files with clang instead. A more
general solution can be implemented when a GNU as-compatible driver for
clang's integrated assembler appears.
gjb [Wed, 28 Oct 2015 13:30:14 +0000 (13:30 +0000)]
MFC r262957, r267591, r289634:
r262957 (marcel):
Change the terminal type/class for enabled serial lines to 3wire. This
allows us to change the uart(4) driver to not hardcode specific line
settings for the serial console.
A terminal type of 3wire makes sure the console still works when no DCD
signal is present, which preserves behviour. When it is known that the
terminal server (or DCE in general) provides DCD, a terminal type/class
of std can be used. This has the effect of being logged out when one
disconnects from the console -- improving security overall.
r267591 (grehan):
Convert the potential console port over to using 3wire, for i386/amd64.
r289634:
Enable all callin ttys if the tty is an available console.
jhb [Tue, 27 Oct 2015 17:00:04 +0000 (17:00 +0000)]
MFC 271389,286330,286331,286358,286378,286380,286381,286383,286388,286848,
286849,286857,286860,286913,286914,286937-286940,286962,286963,288405,
288406,288424,288454-288456,288625,288626,288832,288834,288950,288997,
289080:
Merge most of the recent changes to truss in HEAD. The largest effects
are that fork following now uses a single truss process (and thus truss -c
reports counts for the entire tree of processes instead of separate dumps
for each process). truss -c also reports counts for all system calls
instead of only a subset. More system call arguments are also decoded.
System calls should now report the correct number of arguments (instead
of 6), and some platforms that did not properly decode arguments might
now do so (e.g. mips64).
Changes relative to the equivalent commits to HEAD include:
- The ia64 backend was refactored similar to the other backends.
- _umtx_lock/_umtx_unlock entries were updated similar to other system
call entries.
- 10 does not have futimens(), utimensat(), EVFILT_PROCDESC, EVFILT_SENDFILE,
RLIMIT_KQUEUES, O_VERIFY, NOTE_FILE_POLL, or EV_FORCEONESHOT.
271389:
Stop accessing the saved stack pointer by looking past the end of the
array of registers.
286330:
Whitespace fix: remove some spurious spaces before commas.
286331:
Rework get_string() to make it more robust when fetching strings of unknown
length. In particular, instead of blinding fetching 1k blocks, do an initial
fetch up to the end of the current page followed by page-sized fetches up to
the maximum size. Previously if the 1k buffer crossed a page boundary and
the second page was not valid, the entire operation would fail.
286358:
Add recently added values of various flags and enumerations including
kevent filters, kevent flags, flags to mmap, seek locations, fcntl
operations, file flags, socket domains, open flags, resource limits, and
pathconf values.
286378:
Don't mark the fcntl flag argument as an output parameter so that it is
always decoded. Previously the argument was not decoded if fcntl() failed.
286380:
Decode the arguments to mkfifo() and fix an off-by-one error in the arguments
to mknod().
286381:
Decode the arguments passed to the *at() family of system calls. This is
especially useful now that libc's open() always calls openat(). While here,
fix a few other things:
- Decode the mode argument passed to access(), eaccess(), and faccessat().
- Decode the atfd paramete to pretty-print AT_FDCWD.
- Decode the special AT_* flags used with some of the *at() system calls.
- Decode arguments for fchmod(), lchmod(), fchown(), lchown(), eaccess(),
and futimens().
- Decode both of the timeval structures passed to futimes() instead of just
the first one.
286383:
Whitespace fixes to consistently use spaces before }'s and
wrap long lines.
286388:
Consistently use both leading and trailing spaces inside of the {}'s
when pretty-printing structures. Most structures used both spaces,
but some only used a trailing space and some used neither.
286848:
- Decode the arguments for several signal-related system calls: sigpending,
sigqueue, sigreturn, sigsuspend, sigtimedwait, sigwait, sigwaitinfo, and
thr_kill.
- Print signal sets as a structure (with {}'s) and in particular use this to
differentiate empty sets from a NULL pointer.
- Decode arguments for some other system calls: issetugid, pipe2, sysarch
(operations are only decoded for amd64 and i386), and thr_self.
286849:
Decode the optional SOCK_NONBLOCK and SOCK_CLOEXEC flags passed in a
socket type.
286857:
Tidy the linux_socketcall decoding:
- Don't exit if get_struct() fails, instead print the raw pointer value to
match all other argument decoding cases.
- Use an xlat table instead of a home-rolled switch for the operation name.
- Display the nested socketcall args structure as a structure instead of as
two inline arguments.
286860:
Use an xlat table and xlookup() instead of a home-rolled version for the
sigprocmask operation type.
286913:
Change the argument formatting function to use a stdio FILE object opened
with open_memstream() to build the string for each argument. This allows
for more complicated argument building without resorting to intermediate
malloc's, etc.
Related, the strsig*() functions no longer return allocated strings but
use a static global buffer instead.
286914:
Expand the decoding of kevent structures.
- Print the ident value as decimal instead of hexadecimal for filter types
that use "small" values such as file descriptors and PIDs.
- Decode NOTE_* flags in the fflags field of kevents for several system
filter types.
286937:
Use nitems().
286938:
Various style and whitespace fixes.
287939:
Always use %j with an intmax_t cast to print time_t values. time_t is
longer than long on 32-bit platforms with a 64-bit time_t.
286940:
ino_t is unsigned, so use uintmax_t instead of intmax_t when printing it.
286962:
Rework the argv and env string fetching for execve to be more robust.
Before truss would fetch 100 string pointers and happily walk off the end
of the array if it never found a NULL. This also means for a short argv
list it could fail entirely if the 100 string pointers spanned into an
unmapped page.
Instead, fetch page-aligned blocks of string pointers in a loop fetching
each string until a NULL is found.
While here, make use of the open memstream file descriptor instead of
allocating a temporary array. This allows us to fetch each string once
instead of twice.
286963:
Handle the conditional decoding of execve() argument and environment
arrays generically rather than duplicating a hack in all of the backends.
- Add two new system call argument types and use them instead of StringArray
for the argument and environment arguments execve and linux_execve.
- Honor the -a/-e flags in the handling of these new types.
- Instead of printing "<missing argument>" when the decoding is disabled,
print the raw pointer value.
288424:
Several changes to truss.
- Refactor the interface between the ABI-independent code and the
ABI-specific backends. The backends now provide smaller hooks to
fetch system call arguments and return values. The rest of the
system call entry and exit handling that was previously duplicated
among all the backends has been moved to one place.
- Merge the loop when waiting for an event with the loop for handling stops.
This also means not emulating a procfs-like interface on top of ptrace().
Instead, use a single event loop that fetches process events via waitid().
Among other things this allows us to report the full 32-bit exit value.
- Use PT_FOLLOW_FORK to follow new child processes instead of forking a new
truss process for each new child. This allows one truss process to monitor
a tree of processes and truss -c should now display one total for the
entire tree instead of separate summaries per process.
- Use the recently added fields to ptrace_lwpinfo to determine the current
system call number and argument count. The latter is especially useful
and fixes a regression since the conversion from procfs. truss now
generally prints the correct number of arguments for most system calls
rather than printing extra arguments for any call not listed in the
table in syscalls.c.
- Actually check the new ABI when processes call exec. The comments claimed
that this happened but it was not being done (perhaps this was another
regression in the conversion to ptrace()). If the new ABI after exec
is not supported, truss detaches from the process. If truss does not
support the ABI for a newly executed process the process is killed
before it returns from exec.
- Along with the refactor, teach the various ABI-specific backends to
fetch both return values, not just the first. Use this to properly
report the full 64-bit return value from lseek(). In addition, the
handler for "pipe" now pulls the pair of descriptors out of the
return values (which is the true kernel system call interface) but
displays them as an argument (which matches the interface exported by
libc).
- Each ABI handler adds entries to a linker set rather than requiring
a statically defined table of handlers in main.c.
- The arm and mips system call fetching code was changed to follow the
same pattern as amd64 (and the in-kernel handler) of fetching register
arguments first and then reading any remaining arguments from the
stack. This should fix indirect system call arguments on at least
arm.
- The mipsn32 and n64 ABIs will now look for arguments in A4 through A7.
- Use register %ebp for the 6th system call argument for Linux/i386 ABIs
to match the in-kernel argument fetch code.
- For powerpc binaries on a powerpc64 system, fetch the extra arguments
on the stack as 32-bit values that are then copied into the 64-bit
argument array instead of reading the 32-bit values directly into the
64-bit array.
288454:
- Remove extra integer argument from truncate() and ftruncate(). This is
probably fallout from the removal of the extra padding argument before
off_t in 7. However, that padding still exists for 32-bit powerpc, so
use QUAD_ALIGN.
- Fix QUAD_ALIGN to be zero for powerpc64. It should only be set to 1
for 32-bit platforms that add padding to align 64-bit arguments.
288455:
The id_t type used to pass IDs to wait6(2) and procctl(6) is a 64-bit
integer. Fix the argument decoding to treat this as a quad instead of an
int. This includes using QUAD_ALIGN and QUAD_SLOTS as necessary. To
continue printing IDs in decimal, add a new QuadHex argument type that
prints a 64-bit integer in hex, use QuadHex for the existing off_t arguments,
repurpose Quad to print a 64-bit integer in decimal, and use Quad for id_t
arguments.
This fixes the decoding of wait6(2) and procctl(2) on 32-bit platforms.
288456:
Rather than groveling around in a socket address structure for a socket
address's length (and then overriding it if it "looks wrong"), use the
next argument to the system call to determine the length. This is more
reliable since this is what the kernel depends on anyway and is also
simpler.
288625:
Add decoding for struct statfs.
288626:
Style fix.
288832:
Fix tracking of unknown syscalls for 'truss -c'.
This is done by changing get_syscall() to either lookup the known syscall
or add it into the list with the default handlers for printing.
This also simplifies some code to not have to check if the syscall variable
is set or NULL.
288834:
Add decoding for modfind(2)
288950:
Group the decoded system calls by ABI and sort the calls within each ABI.
288997:
Correct a comment.
289080:
Let -c imply -S (hide signal output).
Without this, the signals are shown seemingly randomly in the output before
the final summary is shown. This is especially noticeable when there is
not much output from the application being traced.
delphij [Tue, 27 Oct 2015 00:37:19 +0000 (00:37 +0000)]
MFC r289269:
Use chroot(2) instead of using prefixes for files.
Previously, the code prefixes the chroot path to actual file paths to
simulate the effect. This, however, will not work for tzset(3) which
expects the current system have a working set of timezone data files,
and that is not always the case.
This changeset simplifies the handling of paths and use an actual
chroot(2) call to implement the effect.
vangyzen [Mon, 26 Oct 2015 16:21:56 +0000 (16:21 +0000)]
Disable SSE in libthr
Clang emits SSE instructions on amd64 in the common path of
pthread_mutex_unlock. If the thread does not otherwise use SSE,
this usage incurs a context-switch of the FPU/SSE state, which
reduces the performance of multiple real-world applications by a
non-trivial amount (3-5% in one application).
Instead of this change, I experimented with eagerly switching the
FPU state at context-switch time. This did not help. Most of the
cost seems to be in the read/write of memory--as kib@ stated--and
not in the #NM handling. I tested on machines with and without
XSAVEOPT.
One counter-argument to this change is that most applications already
use SIMD, and the number of applications and amount of SIMD usage
are only increasing. This is absolutely true. I agree that--in
general and in principle--this change is in the wrong direction.
However, there are applications that do not use enough SSE to offset
the extra context-switch cost. SSE does not provide a clear benefit
in the current libthr code with the current compiler, but it does
provide a clear loss in some cases. Therefore, disabling SSE in
libthr is a non-loss for most, and a gain for some.
I refrained from disabling SSE in libc--as was suggested--because
I can't make the above argument for libc. It provides a wide variety
of code; each case should be analyzed separately.
ngie [Mon, 26 Oct 2015 00:08:40 +0000 (00:08 +0000)]
MFC r289450:
Set dev->fd to -1 when calling cam_close_spec_device with a valid dev->fd
descriptor to avoid trashing valid file descriptors that access dev->fd at a
later point in time
ngie [Mon, 26 Oct 2015 00:06:04 +0000 (00:06 +0000)]
MFC r289332:
Fix test-fenv:test_dfl_env when run on some amd64 CPUs
Compare the fields that the AMD [1] and Intel [2] specs say will be
set once fnstenv returns.
Not all amd64 capable processors zero out the env.__x87.__other field
(example: AMD Opteron 6308). The AMD64/x64 specs aren't explicit on what the
env.__x87.__other field will contain after fnstenv is executed, so the values
in env.__x87.__other could be filled with arbitrary data depending on how the
CPU-specific implementation of fnstenv.
Skip the B_flag testcase to stop blowing up freebsd-current@ with
"test failure emails" because kyua report-jenkins doesn't properly
escape non-printable chars
r288678:
Merge additional testcases and improvements to bin/ls/ls_tests from
^/user/ngie/more-tests.
- Additional testcases added:
-- ls -D
-- ls -F
-- ls -H
-- ls -L
-- ls -R
-- ls -S
-- ls -T
-- ls -b
-- ls -d
-- ls -f
-- ls -g
-- ls -h
-- ls -i
-- ls -k
-- ls -l
-- ls -m
-- ls -n
-- ls -o
-- ls -p
-- ls -q/ls -w
-- ls -r
-- ls -s
-- ls -t
-- ls -u
-- ls -y
- Socket file creation is limited to the ls -F testcase, greatly speeding up
the test process
- The ls -C testcase was made more robust by limiting the number of columns
via COLUMNS and by dynamically formulating the columns/lines.
- Add `atf_test_case` before all testcase `head` functions.
Sponsored by: EMC / Isilon Storage Division
r288905:
Add some more syncs to quiesce the filesystem after creating the
files to see if this fixes deterministic Jenkin failures
r288906:
Explicitly set BLOCKSIZE to 512 in the environment
r288907:
Call sync consistently using atf_check
Remove superfluous sync's
r289102:
Remove all of the syncs
They're unnecessary as shown by further testing on my VM
jilles [Sun, 25 Oct 2015 21:39:23 +0000 (21:39 +0000)]
MFC r288309: fnmatch(): Remove exponential behaviour as in sh r229201.
The old code was exponential in the number of asterisks in the pattern.
However, once a match has been found upto the next asterisk, the previous
asterisks are no longer relevant.
jilles [Sun, 25 Oct 2015 17:17:50 +0000 (17:17 +0000)]
MFC r288430: wordexp: Rewrite to make WRDE_NOCMD reliable.
Shell syntax is too complicated to detect command substitution and unquoted
operators reliably without implementing much of sh's parser. Therefore, have
sh do this detection.
While changing sh's support anyway, also read input from a pipe instead of
arguments to avoid {ARG_MAX} limits and improve privacy, and output count
and length using 16 instead of 8 digits.
The basic concept is:
execl("/bin/sh", "sh", "-c", "freebsd_wordexp ${1:+\"$1\"} -f "$2",
"", flags & WRDE_NOCMD ? "-p" : "", <pipe with words>);
The WRDE_BADCHAR error is still implemented in libc. POSIX requires us to
fail strings containing unquoted braces with code WRDE_BADCHAR. Since this
is normally not a syntax error in sh, there is still a need for checking
code in libc, we_check().
The new we_check() is an optimistic check that all the characters
<newline> | & ; < > ( ) { }
are quoted. To avoid duplicating too much sh logic, such characters are
permitted when quoting characters are seen, even if the quoting characters
may themselves be quoted. This code reports all WRDE_BADCHAR errors; bad
characters that get past it and are a syntax error in sh return WRDE_SYNTAX.
Although many implementations of WRDE_NOCMD erroneously allow some command
substitutions (and ours even documented this), there appears to be code that
relies on its security (codesearch.debian.net shows quite a few uses).
Passing untrusted data to wordexp() still exposes a denial of service
possibility and a fairly large attack surface.
This is also a MFC of r286830 to reduce conflicts. I changed the code
somewhat to avoid changes from r286941; in particular, WRDE_BADVAL can still
only be returned if WRDE_UNDEF was passed.
Relnotes: yes
Security: fixes command execution with wordexp(untrusted, WRDE_NOCMD)