jhb [Thu, 14 Jun 2018 22:31:30 +0000 (22:31 +0000)]
Exit with an error if a linker hints file can't be found.
Continuing with a NULL hints variable just triggers a segfault later on.
The other error cases in this function all exit for an error rather than
warning.
brooks [Thu, 14 Jun 2018 21:27:25 +0000 (21:27 +0000)]
Name the implementation of brk and sbrk sys_break().
The break() system call was renamed (several times) starting in v3
AT&T UNIX when C was invented and break was a language keyword. The
last vestage of a need for it to be called something else (eg obreak)
was removed in r225617 which consistantly prefixed all syscall
implementations.
regnode::enable_cnt is generally used to refcount regulator nodes. For
GPIOs, the refcount was done on the gpio_entry since more than one regulator
can share a GPIO.
GPIO regulators were not taking part in the node refcount, since they had
their own mechanism. This caused some fallout after manu started disabling
everybody's unused regulators in r331989.
rmacklem [Thu, 14 Jun 2018 20:36:55 +0000 (20:36 +0000)]
Add the "-p" and "-m" options to nfsd.c for the pNFS service.
The "-p" option specifies that the nfsd should run a pNFS service instead
of a regular NFS service. The "-m" option is only meaningful when used with
"-p" to specify that mirroring on the DSs should be done and on how many of
them.
This change requires the kernel changes committed as r334930.
The man page update will be committed as a separate commit soon.
kib [Thu, 14 Jun 2018 19:41:02 +0000 (19:41 +0000)]
Handle the race between fork/vm_object_split() and faults.
If fault started before vmspace_fork() locked the map, and then during
fork, vm_map_copy_entry()->vm_object_split() is executed, it is
possible that the fault instantiate the page into the original object
when the page was already copied into the new object (see
vm_map_split() for the orig/new objects terminology). This can happen
if split found a busy page (e.g. from the fault) and slept dropping
the objects lock, which allows the swap pager to instantiate
read-behind pages for the fault. Then the restart of the scan can see
a page in the scanned range, where it was already copied to the upper
object.
Fix it by instantiating the read-ahead pages before
swap_pager_getpages() method drops the lock to allocate pbuf. The
object scan would see the whole range prefilled with the busy pages
and not proceed the range.
Note that vm_fault rechecks the map generation count after the object
unlock, so that it restarts the handling if raced with split, and
re-lookups the right page from the upper object.
In collaboration with: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kevans [Thu, 14 Jun 2018 18:34:02 +0000 (18:34 +0000)]
a10_ahci: Correct clock indices for new bindings
r329104 imported 4.15 DTS which brought CCU to a10/a20. In the process, they
swapped the ordering of 'clocks' for allwinner,sun4i-a10-ahci on both
sun4i-a10 and sun7i-a20 from PLL, Gate to Gate, PLL.
kevans [Thu, 14 Jun 2018 17:50:29 +0000 (17:50 +0000)]
aw_ccung: Add a10/a20 support
Note: At this time, this has only been tested on a single board from one of
the supported SoCs. This is enough to boot the board from MMC and have
functional USB- which is still an improvement over where we were at just
before with no functional clocks.
jhibbits [Thu, 14 Jun 2018 17:23:51 +0000 (17:23 +0000)]
Split the PowerISA 3.0 HPT implementation from historic
PowerISA 3.0 makes several changes to not only the format of the HPT but
also the behavior surrounding it. For instance, TLBIE no longer requires
serialization. Removing this lock cuts buildworld time in half on a
18-core/72-thread POWER9 system, demonstrating that this lock is highly
contended on such a system.
There was odd behavior observed trying to make this change in a
backwards-compatible manner in moea64_native.c, so the best option was to
fully split it, and largely revert the original changes adding POWER9
support to the original file.
manu [Thu, 14 Jun 2018 17:18:15 +0000 (17:18 +0000)]
arm timer: Add workaround for Allwinner A64 timer
The timer present in allwinner A64 SoC is unstable, value can jump backward
or forward.
It was found that when bit 11 and upper roll over the low bits can sometimes
being read as all as 1 or all as 0.
Simply ignore the values for those cases.
ken [Thu, 14 Jun 2018 17:08:44 +0000 (17:08 +0000)]
Fix da(4) locking when probing SMR drives.
Probing host aware and host managed SMR drives got broken in revision
330796.
The added cam_periph_lock() calls were in areas in dadone() where
the peripheral lock was already held.
Since then, dadone() has been split into separate functions that are
dedicated to each probe state.
The result is that when probing a host aware drive, I ran into a recursive
lock acquisition in dadone_probeatalogdir(). I would have run into the
same problem in dadone_probeataiddir(), and in dadone_probeatasup() and
dadone_probeatazone() in the error paths had the probe continued.
The solution is to take out all of the extra cam_periph_lock() calls. I
also added cam_periph_assert(periph, MA_OWNED) near the top of each of
the dadone_* calls. These make it clear to anyone coming along in the
the future that the lock is held in the probe done functions.
Also add a locking assert in daprobedone(), to make it clear that it must
be called with the periph lock held.
kevans [Thu, 14 Jun 2018 16:09:29 +0000 (16:09 +0000)]
devmatch: Address some rc nits
- devmatch_enable in rc.conf(5) was not gating the start of devmatch
- Use quietstart in devd/devmatch to suppress dozens of 'Cannot start'
messages and other spurious messages from rc.subr(8) that aren't
necessarily helpful.
jhibbits [Thu, 14 Jun 2018 16:01:11 +0000 (16:01 +0000)]
Fix CTR formatting for moea64_native bootstrap
On very large memory systems 'size' can become 2GB or larger, resulting in a
negative value being formatted. Also, moea64_pteg_count is already a long, so
format it as such.
ae [Thu, 14 Jun 2018 11:15:39 +0000 (11:15 +0000)]
In m_megapullup() use m_getjcl() to allocate 9k or 16k mbuf when requested.
It is better to try allocate a big mbuf, than just silently drop a big
packet. A better solution could be reworking of libalias modules to be
able use m_copydata()/m_copyback() instead of requiring the single
contiguous buffer.
kib [Thu, 14 Jun 2018 11:09:51 +0000 (11:09 +0000)]
Reorganize code flow in fpudna()/npxdna() to highlight the critical
section scope. Sprinkle __predict_false() for conditions known to
never occur or occur only on rare platforms.
kib [Thu, 14 Jun 2018 10:33:26 +0000 (10:33 +0000)]
Remove printf() in #NM handler.
Give up and remove the almost useless informational message reporting
that device not available exception occured while our state tracking
indicates the current CPU has FPU context loaded for the current
thread.
It seems that this is recurring bug with some VM monitors.
rmacklem [Thu, 14 Jun 2018 10:00:19 +0000 (10:00 +0000)]
Move four functions in nfscl.ko to nfscommon.ko.
Four functions nfscl_reqstart(), nfscl_fillsattr(), nfsm_stateidtom()
and nfsmnt_mdssession() are now called from within the nfsd.
As such, they needed to be moved from nfscl.ko to nfscommon.ko so that
nfsd.ko would load when nfscl.ko wasn't loaded.
ae [Thu, 14 Jun 2018 09:36:25 +0000 (09:36 +0000)]
Add NULL check like the rest of code has.
It is possible that ifma_protospec becomes NULL in this function for
some entry, but it is still referenced and thus it will not unlinked
from the list. Then "restart" condition triggers and this entry with
NULL ifma_protospec will lead to page fault.
manu [Thu, 14 Jun 2018 06:39:33 +0000 (06:39 +0000)]
rk_i2c: Add driver for the I2C controller present in RockChip SoC
This controller have a special mode for RX to help with smbus-like transfer
when the controller will automatically send the slave address, register address
and read the data. Use it when possible.
The same mode for TX is describe is the datasheet but is broken and have been
since ~10 years of presence of this controller in RockChip SoCs.
Attach this driver early at we need it to communicate with the PMIC early in the
boot.
Do not hook it to the kernel build for now.
manu [Thu, 14 Jun 2018 06:28:09 +0000 (06:28 +0000)]
if_dwc_rk: Add DesignWare driver for RockChip SoCs.
Add driver for the designware ethernet controller found in some RockChip SoCs.
The driver still rely on a lot of things setup by the bootloader like clocks
and phy mode.
But since netbooting is the only/easiest way to boot rockchip board at the
moment add the driver so other people can test/dev on thoses boards.
manu [Thu, 14 Jun 2018 05:46:57 +0000 (05:46 +0000)]
rk_armclk: Add the write mask to the register mux value
This was omitted in r334112 and r334996 which cause the PLL to not correctly
reparent, leaving the armclk to be derived from the APLL instead of the NPLL.
The arm core clock is now correctly set to 600Mhz via the assigned-clock present
in the DTB.
manu [Thu, 14 Jun 2018 05:43:45 +0000 (05:43 +0000)]
rk_pll: Add support for mode
RockChip PLL have two modes controlled by a register, a "slow mode" (the
default one) where the frequency is derived from the 24Mhz oscillator on the
board, and a "normal" one when the pll take it's input from the real PLL output.
manu [Thu, 14 Jun 2018 05:41:16 +0000 (05:41 +0000)]
rk_pinctrl: Only add gpio subnode
This is the only node we are interested in so do not waste time to test
creating device that will be either unused or fail as most of the nodes
don't have a compatible string.
rrs [Thu, 14 Jun 2018 03:27:42 +0000 (03:27 +0000)]
This fixes several bugs that Larry Rosenman helped me find in
Rack with respect to its handling of TCP Fast Open. Several
fixes all related to TFO are included in this commit:
1) Handling of non-TFO retransmissions
2) Building the proper send-map when we are doing TFO
3) Dealing with the ack that comes back that includes the
SYN and data.
It appears that with this commit TFO now works :-)
Thanks Larry for all your help!!
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D15758
imp [Thu, 14 Jun 2018 01:15:19 +0000 (01:15 +0000)]
NVME support is only for x86 and powerpc64.
Implement MK_NVME now that the expression for where NVMe is
complicated. Default it to "yes" for x86 and powerpc64 and
no everywhere else. Use it in camcontrol to define WITH_NVME
for those platforms where we support nvme.
This should fix the newly introduced nvme files to camcontrol
which were building everywhere.
imp [Wed, 13 Jun 2018 22:00:08 +0000 (22:00 +0000)]
Make camcontrol identify work with nda devices
Both ATA and NVME have an identify command. They are completely
different, but to the user they are the same. Leverage nvmecontrol's
print_controller code to provide that functionality to camcontrol
identify. Query the path to see what kind of protocol it supports, and
send the most appropriate command down. Refactor nvme_print_dev a
little to make it easy to get the nvme cdata.
imp [Wed, 13 Jun 2018 22:00:02 +0000 (22:00 +0000)]
Make it possible to use print_controller from another program
Rename print_controller to nvme_print_controller. Put it in its
own file for easy inclusion. Move util.c to be nc_util.c to not
conflict with camcontrol. add nvecontrol_ext.h to define shared
interfaces.
trasz [Wed, 13 Jun 2018 18:50:51 +0000 (18:50 +0000)]
Mention that ports are used to build packages, this fact - obvious
to the developers, but much less so to users - seems to be rather
weakly documented.
trasz [Wed, 13 Jun 2018 18:34:49 +0000 (18:34 +0000)]
Get rid of references to /usr/share/doc/ from ports(7) and getosreldate(3).
The handbooks are not installed there anymore. While here, improve the
URLs markup a bit.
kib [Wed, 13 Jun 2018 17:55:09 +0000 (17:55 +0000)]
Enable eager FPU context switch by default on amd64.
With compilers making increasing use of vector instructions the
performance benefit of lazily switching FPU state is no longer a
desirable tradeoff. Linux switched to eager FPU context switch some
time ago, and the idea was floated on the FreeBSD-current mailing list
some years ago[1].
Enable eager FPU context switch by default on amd64, with a tunable/sysctl
available to turn it back off.
jtl [Wed, 13 Jun 2018 17:04:41 +0000 (17:04 +0000)]
Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable. There are a few exceptions. For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9). The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.
(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory. This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory. This change makes the behavior consistent.)
This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified. After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.
Allocations that do need executable memory have various choices. They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.
Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.
imp [Wed, 13 Jun 2018 16:48:07 +0000 (16:48 +0000)]
Implement a 'car limit' for bioq.
Allow one to implement a 'car limit' for
bioq_disksort. debug.bioq_batchsize sets the size of car limit. Every
time we queue that many requests, we start over so that we limit the
latency for requests when the software queue depths are large. A value
of '0', the default, means to revert to the old behavior.
andrew [Wed, 13 Jun 2018 15:56:24 +0000 (15:56 +0000)]
Switch to the SMCCC function for branch predictor hardening. The previous
method may not have worked as the firmware checks for the ARCH_WORKAROUND_1
function ID.
andrew [Wed, 13 Jun 2018 15:32:00 +0000 (15:32 +0000)]
Add support for the ARM SMC Calling Convention (SMCCC). This is a method
to call into the firmware in a similar way to the existing PSCI, and used
PSCI to detect when SMCCC is enabled.
There is a function ID space we can use. Currently we only support 3
functions in the ARM Architecture Calls region, however it is expected we
will expend these in the future.
asomers [Wed, 13 Jun 2018 14:55:31 +0000 (14:55 +0000)]
audit(4): fix the definition of ARG_TERMID_ADDR
Due to a copy/paste error in r168688, ARG_TERMID_ADDR has the same
definition as ARG_SADDRUNIX. Fix it.
The header change, while publicly visible, is guarded by #ifdef KERNEL, and
I can't find any kmod ports that use it. So I'm not bumping
__FreeBSD_version.
bde [Wed, 13 Jun 2018 12:22:00 +0000 (12:22 +0000)]
Fix the encoding of major and minor numbers in 64-bit dev_t by restoring
the old encodings for the lower 16 and 32 bits and only using the
higher 32 bits for unusually large major and minor numbers. This
change breaks compatibility with the previous encoding (which was only
used in -current).
Fix truncation to (essentially) 16-bit dev_t in newnfs v3.
Any encoding of device numbers gives an ABI, so it can't be changed
without translations for compatibility. Extra bits give the much
larger complication that the translations need to compress into fewer
bits. Fortunately, more than 32 bits are rarely needed, so
compression is rarely needed except for 16-bit linux dev_t where it
was always needed but never done.
The previous encoding moved the major number into the top 32 bits.
Almost no translation code handled this, so the major number was blindly
truncated away in most 32-bit encodings. E.g., for ffs, mknod(8) with
major = 1 and minor = 2 gave dev_t = 0x10000002; ffs cannot represent
this and blindly truncated it to 2. But if this mknod was run on any
released version of FreeBSD, it gives dev_t = 0x102. ffs can represent
this, but in the previous encoding it was not decoded, giving major = 0,
minor = 0x102.
The presence of bugs was most obvious for exporting dev_t's from an
old system to -current, since bugs in newnfs augment them. I fixed
oldnfs to support 32-bit dev_t in 1996 (r16634), but this regressed
to 16-bit dev_t in newnfs, first to the old 16-bit encoding and then
further in -current. E.g., old ad0 with major = 234, minor = 0x10002
had the correct (major, minor) number on the wire, but newnfs truncated
this to (234, 2) and then the previous encoding shifted the major
number into oblivion as seen by ffs or old applications.
I first tried to fix this by translating on every ABI/API boundary, but
there are too many boundaries and too many sloppy translations by blind
truncation. So use the old encoding for the low 32 bits so that sloppy
translations work no worse than before provided the high 32 bits are
not set. Add some error checking for when bits are lost. Keep not
doing any error checking for translations for almost everything in
compat/linux.
compat/freebsd32/freebsd32_misc.c:
Optionally check for losing bits after possibly-truncating assignments as
before.
compat/linux/linux_stats.c:
Depend on the representation being compatible with Linux's (or just with
itself for local use) and spell some of the translations as assignments in
a macro that hides the details.
fs/nfsclient/nfs_clcomsubs.c:
Essentially the same fix as in 1996, except there is now no possible
truncation in makedev() itself. Also fix nearby style bugs.
kern/vfs_syscalls.c:
As for freebsd32. Also update the sysctl description to include file
numbers, and change it to describe device ids as device numbers.
sys/types.h:
Use inline functions (wrapped by macros) since the expressions are now
a bit too complicated for plain macros. Describe the encoding and
some of the reasons for it. 16-bit compatibility didn't leave many
reasonable choices for the 32-bit encoding, and 32-bit compatibility
doesn't leave many reasonable choices for the 64-bit encoding. My
choice is to put the 8 new minor bits in the low 8 bits of the top 32
bits. This minimizes discontiguities.
Reviewed by: kib (except for rewrite of the comment in linux_stats.c)
andrew [Wed, 13 Jun 2018 12:17:11 +0000 (12:17 +0000)]
Rename the ThunderX CPU identification macros to include the X. This is the
name people know the product by, and is consistent with the later SoC ID
macros.