landonf [Sat, 27 Aug 2016 00:06:20 +0000 (00:06 +0000)]
[mips/broadcom]: Replace static frequency table with generic PMU clock
handling.
- Extended PWRCTL/PMU APIs to support querying clock frequency during very
early boot, prior to bus attach.
- Implement generic PMU-based calculation of UART rclk values.
- Replaced use of static frequency tables (bcm_socinfo) with
runtime-determined values.
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D7552
landonf [Sat, 27 Aug 2016 00:03:02 +0000 (00:03 +0000)]
bhnd(4): Initial PMU/PWRCTL power and clock management support.
- Added bhnd_pmu driver implementations for PMU and PWRCTL chipsets,
derived from Broadcom's ISC-licensed HND code.
- Added bhnd bus-level support for routing per-core clock and resource
power requests to the PMU device.
- Lift ChipCommon support out into the bhnd module, dropping
bhnd_chipc.
Reviewed by: mizhka
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D7492
mm [Fri, 26 Aug 2016 23:50:44 +0000 (23:50 +0000)]
MFV r304866:
Sync libarchive with vendor including security fixes
Vendor issues fixed:
Issue #731: Reject tar entries >= INT64_MAX
Issue #744 (part of Issue #743): Enforce sandbox with very long pathnames
Issue #748: Zip decompression failure with highly-compressed data
Issue #767: Buffer overflow printing a filename
Issue #770: Zip read: be more careful about extra_length
ed [Fri, 26 Aug 2016 20:23:10 +0000 (20:23 +0000)]
Improve compatibility of calls to dirname() on constant strings.
As the xinstall(8) utility had to be patched up to work with the POSIXly
correct basename()/dirname() prototypes, we make it pretty hard to build
previous versions of FreeBSD on HEAD. xinstall(8) is part of the
bootstrap tools.
Add some logic to <libgen.h> to automatically detect bad calls to
dirname() based on the type of the argument. If the argument is of type
'const char *', we simply fall back to calling into dirname@FBSD_1.0
directly.
I'll also give basename() similar treatment when importing the
thread-safe version of that function.
landonf [Fri, 26 Aug 2016 20:16:02 +0000 (20:16 +0000)]
[mips/broadcom] Generic platform_reset() support.
This adds support for performing platform_reset() on all supported
devices, using early boot enumeration of chipc capabilities and
available cores.
- Added Broadcom-specific MIPS CP0 register definitions used by
BCM4785-specific reset handling.
- Added a bcm_platform structure for tracking chipc/pmu/cfe platform
data.
- Extended the BCMA EROM API to support early boot lookup of core info
(including port/region mappings).
- Extended platform_reset() to support PMU, PMU+AOB, and non-PMU
devices.
Reviewed by: mizhka
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D7539
jhb [Fri, 26 Aug 2016 20:15:22 +0000 (20:15 +0000)]
Enable I/O MMU when PCI pass through is first used.
Rather than enabling the I/O MMU when the vmm module is loaded,
defer initialization until the first attempt to pass a PCI device
through to a guest. If the I/O MMU fails to initialize or is not
present, than fail the attempt to pass a PCI device through to a
guest.
The hw.vmm.force_iommu tunable has been removed since the I/O MMU is
no longer enabled during boot. However, the I/O MMU support can be
disabled by setting the hw.vmm.iommu.enable tunable to 0 to prevent
use of the I/O MMU on any systems where it is buggy.
np [Fri, 26 Aug 2016 17:38:13 +0000 (17:38 +0000)]
cxgbe/iw_cxgbe: Various fixes to the iWARP driver.
- Return appropriate error code instead of ENOMEM when sosend() fails in
send_mpa_req.
- Fix for problematic race during destroy_qp.
- Abortive close in the failure of send_mpa_reject() instead of normal close.
- Remove the unnecessary doorbell flowcontrol logic.
jhibbits [Fri, 26 Aug 2016 03:36:37 +0000 (03:36 +0000)]
Prevent BSS from being cleared twice on BookE
Summary:
First time BSS is cleared in booke_init(), Second time it's cleared in
powerpc_init(). Any variable initialized between two those guys gets wiped out
what is wrong. In particular it wipes tlb1_entries initialized by tlb1_init(),
which was fine when tlb1_init() was called a second time, but this was removed
in r304656.
Submitted by: Ivan Krivonos <int0dster_gmail.com>
Differential Revision: https://reviews.freebsd.org/D7638
np [Thu, 25 Aug 2016 21:55:17 +0000 (21:55 +0000)]
cxgbe/cxgbei: Read the chip's configuration to determine the actual
hardware send and receive PDU limits. Report these limits to ICL and
take them into account when setting the socket's send and receive buffer
sizes. The driver used a single hardcoded limit everywhere prior to
this change.
ache [Thu, 25 Aug 2016 21:14:26 +0000 (21:14 +0000)]
Original fgetln() from 44lite return sucess for line tail errors,
i.e. partial line, but set __SERR and errno in the same time, which
is inconsistent.
Now both OpenBSD and NetBSD return failure, i.e. no line and set error
indicators for such case, so make our fgetln() and fgetwln()
(as its wide version) compatible with the rest of *BSD.
kp [Thu, 25 Aug 2016 19:40:25 +0000 (19:40 +0000)]
Add libifc, a library implementing core functionality that exists in ifconfig(8) today.
libifc (pronounced lib-ifconfig) aims to be a light abstraction layer between
programs and the kernel APIs for managing the network configuration.
This should hopefully make programs easier to maintain, and reduce code
duplication.
Work will begin on making ifconfig(8) use this library in the near future.
This code is still evolving. The interface should not be considered stable until
it is announced as such.
Submitted By: Marie Helene Kvello-Aune <marieheleneka@gmail.com>
Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D7529
kib [Thu, 25 Aug 2016 19:15:02 +0000 (19:15 +0000)]
In both do_rw_wrlock() and do_rw_rdlock() after r304808, do not
obliterate possible error from sleep with errors from
umtxq_check_susp(), when looping to clear URWLOCK_{READ,WRITE}_WAITERS.
Noted and reviewed by: vangyzen
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
ache [Thu, 25 Aug 2016 17:13:04 +0000 (17:13 +0000)]
Don't check for __SERR which may stick from one of any previous stdio
functions.
__SERR is for user and the rest of stdio code do not check it
for error sensing internally, only set it.
In vf(w)printf.c here it is more easy to save __SERR, clear and restore it.
ngie [Thu, 25 Aug 2016 17:07:43 +0000 (17:07 +0000)]
Add non-TRUSTEDBSD prefixed knobs for the _PC_ACL* and {CAP,INF,MAC}_PRESENT knobs
It's not necessarily intuitive that the variables to query contain TRUSTEDBSD
in the prefix. Add non-TRUSTEDBSD prefixed knobs for querying things like
"_PC_ACL_NFS4".
kib [Thu, 25 Aug 2016 16:35:42 +0000 (16:35 +0000)]
Prevent leak of URWLOCK_READ_WAITERS flag for urwlocks.
If there was some error, e.g. the sleep was interrupted, as in the
referenced PR, do_rw_rdlock() did not cleared URWLOCK_READ_WAITERS.
Since unlock only wakes up write waiters when there is no read
waiters, for URWLOCK_PREFER_READER kind of locks, the result was
missed wakeups for writers.
In particular, the most visible victims are ld-elf.so locks in
processes which loaded libthr, because rtld locks are urwlocks in
prefer-reader mode. Normal rwlocks fall into prefer-reader mode only
if thread already owns rw lock in read mode, which is not typical and
correspondingly less visible. In the PR, unowned rtld bind lock was
waited for in the process where only one thread was left alive.
Note that do_rw_wrlock() correctly clears URWLOCK_WRITE_WAITERS in
case of errors.
Reported and tested by: longwitz@incore.de
PR: 211947
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
bde [Thu, 25 Aug 2016 13:46:52 +0000 (13:46 +0000)]
Less-quick fix for locking fixes in r172250. r172250 added a second
syscons spinlock for the output routine alone. It is better to extend
the coverage of the first syscons spinlock added in r162285. 2 locks
might work with complicated juggling, but no juggling was done. What
the 2 locks actually did was to cover some of the missing locking in
each other and deadlock less often against each other than a single
lock with larger coverage would against itself. Races are preferable
to deadlocks here, but 2 locks are still worse since they are harder
to understand and fix.
Prefer deadlocks to races and merge the second lock into the first one.
Extend the scope of the spinlocking to all of sc_cnputc() instead of
just the sc_puts() part. This further prefers deadlocks to races.
Extend the kdb_active hack from sc_puts() internals for the second lock
to all spinlocking. This reduces deadlocks much more than the other
changes increases them. The s/p,10* test in ddb gets much further now.
Hide this detail in the SC_VIDEO_LOCK() macro. Add namespace pollution
in 1 nested #include and reduce namespace pollution in other nested
#includes to pay for this.
Move the first lock higher in the witness order. The second lock was
unnaturally low and the first lock was unnaturally high. The second
lock had to be above "sleepq chain" and/or "callout" to avoid spurious
LORs for visual bells in sc_puts(). Other console driver locks are
already even higher (but not adjacent like they should be) except when
they are missing from the table. Audio bells also benefit from the
syscons lock being high so that audio mutexes have chance of being
lower. Otherwise, console drviver locks should be as low as possible.
Non-spurious LORs now occur if the bell code calls printf() or is
interrupted (perhaps by an NMI) and the interrupt handler calls
printf(). Previous commits turned off many bells in console i/o but
missed ones done by the teken layer.
lstewart [Thu, 25 Aug 2016 13:33:32 +0000 (13:33 +0000)]
Pass the number of segments coalesced by LRO up the stack by repurposing the
tso_segsz pkthdr field during RX processing, and use the information in TCP for
more correct accounting and as a congestion control input. This is only a start,
and an audit of other uses for the data is left as future work.
cy [Thu, 25 Aug 2016 13:24:11 +0000 (13:24 +0000)]
Remove the gratuitous check for $FreeBSD$ and rename the function
to ntpd_init_leapfile, to ensure a copy exists in /var/db if a copy
isn't already there.
andrew [Thu, 25 Aug 2016 12:42:41 +0000 (12:42 +0000)]
Don't set *dev in the zfs root case, it may be NULL and will correctly be
set later in the function. This fixes a potential NULL pointer dereference
found on arm64.
Obtained from: ABT Systems Ltd
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
andrew [Thu, 25 Aug 2016 10:53:03 +0000 (10:53 +0000)]
Map coherent memory in a non-coherent dma tag as uncached. This is similar
to what the 32-bit arm code does, with the exception that it always assumes
the tag is non-coherent.
Tested by: jmcneill
Obtained from: ABT Systems Ltd
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
jmmv [Thu, 25 Aug 2016 10:28:47 +0000 (10:28 +0000)]
Make use of Kyua's work directories.
Change the vnode tests to use the current directory when creating temporary
files, which we can assume is a volatile work directory, and then make the
kqueue_test.sh driver _not_ abandon the directory created by Kyua.
This makes the various kqueue tests independent of each other, and ensures
the temporary file is cleaned up on failure.
sephe [Thu, 25 Aug 2016 05:50:19 +0000 (05:50 +0000)]
hyperv/storvsc: Increase queue depth and rework channel selection.
- Increasing queue depth gives ~100% performance improvement for
randwrite fio test in Azure.
- New channel selection, which takes LUN id and the current cpuid
into consideration, gives additional ~20% performance improvement
for ranwrite fio test in Azure.
Submitted by: Hongzhang Jiang <honzhan microsoft com>
Modified by: sephe
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7622
np [Thu, 25 Aug 2016 05:22:53 +0000 (05:22 +0000)]
Make the iSCSI parameter negotiation more flexible.
Decouple the send and receive limits on the amount of data in a single
iSCSI PDU. MaxRecvDataSegmentLength is declarative, not negotiated, and
is direction-specific so there is no reason for both ends to limit
themselves to the same min(initiator, target) value in both directions.
Allow iSCSI drivers to report their send, receive, first burst, and max
burst limits explicitly instead of using hardcoded values or trying to
derive all of them from the receive limit (which was the only limit
reported by the drivers prior to this change).
Display the send and receive limits separately in the userspace iSCSI
utilities.
cy [Thu, 25 Aug 2016 02:58:41 +0000 (02:58 +0000)]
Add logic to replace the working ntp leap-seconds file in /var/db
if it contains a $FreeBSD$ header. The header will cause the file
to fail checksum of the hash causing ntpd to ignore the file.
bde [Wed, 24 Aug 2016 18:59:24 +0000 (18:59 +0000)]
Flesh out the state and flags args to sccnopen(). Set state flags to
indicate (potentially partial) success of the open. Use these to
decide what to close in sccnclose(). Only grab/ungrab use open/close
so far.
Add a per-sc variable to count successful keyboard opens and use
this instead of the grab count to decide if the keyboad state has
been switched.
Start fixing the locking by using atomic ops for the most important
counter -- the grab level one. Other racy counting will eventually
be fixed by normal mutex or kdb locking in most cases.
Use a 2-entry per-sc stack of states for grabbing. 2 is just enough
to debug grabbing, e.g., for gets(). gets() grabs once and might not
be able to do a full (or any) state switch. ddb grabs again and has
a better chance of doing a full state switch and needs a place to
stack the previous state. For more than 3 levels, grabbing just
changes the count. Console drivers should try to switch on every i/o
in case lower levels of nesting failed to switch but the current level
succeeds, but then the switch (back) must be completed on every i/o
and this flaps the state unless the switch is null. The main point
of grabbing is to make it null quite often. Syscons grabbing also
does a carefully chosen screen focus that is not done on every i/o.
bde [Wed, 24 Aug 2016 17:26:11 +0000 (17:26 +0000)]
Reorganise a little to prepare for locking fixes:
- in sccnopen(), open the keyboard before the screen. The keyboard
currently requires Giant (although it must be spinlocked to work
correctly as a console), so the previous order would be a LOR if
it has any semblance of locking.
- add a (currently dummy) state arg to scgetc().
nwhitehorn [Wed, 24 Aug 2016 16:49:14 +0000 (16:49 +0000)]
Close a race when making the CPU idle under pHyp. If an interrupt occurs
between the beginning of the idle function and actually going idle, the
CPU could go to sleep with pending work.
tsoome [Wed, 24 Aug 2016 16:40:29 +0000 (16:40 +0000)]
Bug 212038 - svn commit: r304321 broken bhyve zvol VM bhyveload hang 100% WCPU
As the support for large blocks was enabled in loader zfs code, the
heap in userboot was left not changed, resulting with failure of detecting
and accessing zfs pools for bhyve virtual machines.
This fix does set the heap to use same amount of memory as the zfsloader
is using. To make it possible to test and verify loader functions, bhyve
is providing very useful option, but it also means, we like to keep feature
parity with [zfs]loader as close as possible.
tsoome [Wed, 24 Aug 2016 16:30:15 +0000 (16:30 +0000)]
Bug 212114 - loader: zio_checksum_verify() must test spa for NULL pointer
The issue was introduced with adding support for salted checksums, and
was revealed by bhyve userboot.so.
During pool discovery the loader is reading pool label from disks, and
at that time the spa structure is not yet set up, so the NULL pointer
is passed for spa. This condition must be checked to avoid the corruption
of the memory and NULL pointer dereference.
andrew [Wed, 24 Aug 2016 12:57:40 +0000 (12:57 +0000)]
Add support to promote and demote managed superpages. This is the last part
needed before enabling superpages on arm64. This code is based on the amd64
pmap with changes as needed to handle the differences between the two
architectures.
Obtained from: ABT Systems Ltd
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
ed [Wed, 24 Aug 2016 11:35:49 +0000 (11:35 +0000)]
Add a Makefile for building the cloudabi32 kernel module.
Where the cloudabi64 kernel can be used to execute 64-bit CloudABI
binaries, this one should be used for 32-bit binaries. Right now it
works on i386 and amd64.
ed [Wed, 24 Aug 2016 10:51:33 +0000 (10:51 +0000)]
Make execution of 32-bit CloudABI executables work on amd64.
A nice thing about requiring a vDSO is that it makes it incredibly easy
to provide full support for running 32-bit processes on 64-bit systems.
Instead of letting the kernel be responsible for composing/decomposing
64-bit arguments across multiple registers/stack slots, all of this can
now be done in the vDSO. This means that there is no need to provide
duplicate copies of certain system calls, like the sys_lseek() and
freebsd32_lseek() we have for COMPAT_FREEBSD32.
This change imports a new vDSO from the CloudABI repository that has
automatically generated code in it that copies system call arguments
into a buffer, padding them to eight bytes and zero-extending any
pointers/size_t arguments. After returning from the kernel, it does the
inverse: extracting return values, in the process truncating
pointers/size_t values to 32 bits.
ed [Wed, 24 Aug 2016 10:36:52 +0000 (10:36 +0000)]
Remove an unused header file.
The native CloudABI data types header file used to be pulled in by the
vDSOs when they were still written in C. Since they are now all
rewritten in assembly, this can go away.
ed [Wed, 24 Aug 2016 10:13:18 +0000 (10:13 +0000)]
Convert pointers obtained from the threadattr_t structure with TO_PTR().
In all of these source files, the userspace pointer size corresponds
with the kernelspace pointer size, meaning that casting directly works.
As I'm planning on making 32-bit execution on 64-bit systems work as
well, use TO_PTR() here as well, so that the changes between source
files remain minimal.
jmmv [Wed, 24 Aug 2016 10:10:26 +0000 (10:10 +0000)]
Skip ls tests that use sparse files if these are not supported.
Some of the ls(1) tests create really large sparse files to validate
the number formatting features of ls(1). Unfortunately, those tests fail
if the underlying test file system does not support sparse files, as is the
case when /tmp is mounted on tmpfs.
Before running these tests, check if the test file system supports sparse
files by using getconf(1) and skip them if not. Note that the support for
this query was just added to getconf(1) in r304694.
tuexen [Wed, 24 Aug 2016 06:22:53 +0000 (06:22 +0000)]
When aborting an association, send the ABORT before notifying the upper
layer. For the kernel this doesn't matter, for the userland stack, it does.
While there, silence a clang warning when compiling it in userland.
bde [Wed, 24 Aug 2016 05:54:11 +0000 (05:54 +0000)]
Fix key delay and repeat, part 2.
Use sbintime_t timeouts with precision control to get very accurate
timing. It costs little to always ask for about 1% accuracy, and the
not so new event timer implementation usual delivers that, and when
it can't it gets much closer than our previous coarse timeouts and
buggy simple countdown.
The 2 fastest atkbd repeat rates have periods 34 and 38 msec, and ukbd
pretended to support rates in between these. This requires
sub-microsecond precision and accuracy even to handle the 4 msec
difference very well, but ukbd asked the timeout subsystem for timeouts
of 25 msec and the buggy simple countdown of this gave a a wide range
of precisions and accuracies depending on HZ and other timer
configuration (sometimes better than 25 msec but usually more like 50
msec). We now ask for and usually get precision and accuracy of about
1% for each repeat and much better on average.
The 1% accuracy is overkill. Rounding of 30 cps to 34 msec instead of
33 already gives an error of +2% instead of -1%, and ut AT keyboards on
PS/2 interfaces have similar errors.
A timeout is now scheduled for every keypress and release. This allows
some simplifications that are not done. It allows removing the timeout
scheduling for exiting polled mode where it was unsafe in ddb mode. This
is done. Exiting polled mode had some problems with extra repeats. Now
exiting polled mode lets an extra timeout fire and the state is fudged
so that the timeout handler does very little.
The sc->time_ms variable is unsigned to avoid overflow. Differences of
it need to be signed. Signed comparisons were emulated by testing an
emulated sign bits. This only works easily for '<' comparisonss, but
we now need a '<=' comparison. Change the difference variable to
signed and use a signed comparison. Using unsigned types here didn't
prevent overflow bugs but just reduced them. Overflow occurs with
n repeats at the silly repeat period of [U]INT_MAX / n. The old countdown
had an off by 1 error, and the simplifications would simply count down
1 to 0 and not need to accumulate possibly-large repeat repeats.
sephe [Wed, 24 Aug 2016 04:21:15 +0000 (04:21 +0000)]
hyperv/hn: Log a warning for RESET_CMPLT.
RESET is not used by the hn(4) at all, and RESET_CMPLT does not even
have a rid to match with the pending requests. So, let's put it
onto an independent switch branch and log a warning about it.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7602
jhibbits [Wed, 24 Aug 2016 03:51:40 +0000 (03:51 +0000)]
Fix system hang when large FDT is in use
Summary:
Kernel maps only one page of FDT. When FDT is more than one page in size, data
TLB miss occurs on memmove() when FDT is moved to kernel storage
(sys/powerpc/booke/booke_machdep.c, booke_init())
This introduces a pmap_early_io_unmap() to complement pmap_early_io_map(), which
can be used for any early I/O mapping, but currently is only used when mapping
the fdt.
Submitted by: Ivan Krivonos <int0dster_gmail.com>
Differential Revision: https://reviews.freebsd.org/D7605