cognet [Fri, 16 Feb 2018 17:50:06 +0000 (17:50 +0000)]
Define CK_MD_TSO for the relevant arches (i386, amd64 and sparc64).
Defaulting to CK_MD_RMO has the unfortunate side effect of generating
memory barriers that are useless on those arches, and the even more
unfortunate side effect of generating lfence/sfence/mfence on i386, even
if older CPUs don't support it.
This should fix the panic reported when using IPFW on a Pentium 3.
Note that mfence and sfence might still be used in a few case, but that
shouldn't happen in FreeBSD right now, and should be fixed upstream first.
kevans [Fri, 16 Feb 2018 17:46:07 +0000 (17:46 +0000)]
stand/lua: Chop off the decimal for numbers passed to setcursor
Decimals screw up the escape sequence and the cursor will not get set. Right
now this only affects setting the cursor for drawing "Welcome to FreeBSD" --
the resulting number after our (x+(w/2)-9) calculation gets output as
"14.0."
This should be fixed at the interpreter level, rather than here, but this is
not a widespread problem at the moment so we'll fix it up in further work.
andrew [Fri, 16 Feb 2018 16:22:54 +0000 (16:22 +0000)]
Put the pine64 root filesystem on teh correct partition.
The Pine64 root filesystem was incorrectly created directly on the MBR
partition. This can cause the loader to get confused when loading the
kernel from this filesystem.
The loader will see this as a small partition meaning later checks to
ensure it doesn't read past the end of the disk incorrectly report a
failure. This seems to work mostly by accident with the released images as
they are smaller than the reported size, however after growfs has run the
image may no longer boot.
markj [Fri, 16 Feb 2018 15:41:03 +0000 (15:41 +0000)]
Fix a memory leak introduced in r328426.
ffs_sbget() may return a superblock buffer even if it fails, so the
caller must be prepared to free it in this case. Moreover, when tasting
alternate superblock locations in a loop, ffs_sbget()'s readfunc
callback must free the previously allocated buffer.
emaste [Fri, 16 Feb 2018 15:38:02 +0000 (15:38 +0000)]
Correct module symbol export handling
EXPORT_SYMS can be set to YES, NO, a list of symbols to export from a
module, or to a filename containing such a list. For the case that it
is set to a symbol list, replace spaces in the list with newlines, so
the created file is in the format expected by kmod_syms.awk.
emaste [Fri, 16 Feb 2018 15:00:14 +0000 (15:00 +0000)]
Rationalize license text on Linuxolator files
Many licenses on Linuxolator files contained small variations from the
standard FreeBSD license text. To avoid license proliferation switch to
the standard 2-clause FreeBSD license for those files where I have
permission from each of the listed copyright holders. Additional files
waiting on permission from others are listed in review D14210.
Approved by: kan, marcel, sos, rdivacky
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
kevans [Fri, 16 Feb 2018 14:39:41 +0000 (14:39 +0000)]
stand/lua: Create a "carousel" menu entry type
This is a pre-cursor to boot environment support in lualoader. Create a new
menu item type, "carousel_entry", that generally provides a callback to get
the list of items, a carousel_id for storing the current value, and the
standard name/func functions that an entry has.
The difference between this and a normal menu item, functionally, is that
selecting a carousel item will automatically rotate through available items
and wrap back at the beginning when the list is exhausted.
The 'name' function takes the choice index, current choice, and the list of
choices as parameters so that the menu item can decorate the name freely as
desired.
The 'func' function takes the current choice as a parameter, so it can act
accordingly.
The kernel menu item has been rewritten to use the carousel_entry type as
both an example and initial test of its functionality before it is used for
boot environment options.
avg [Fri, 16 Feb 2018 06:59:35 +0000 (06:59 +0000)]
read-behind / read-ahead support for zfs_getpages()
ZFS caches blocks it reads in its ARC, so in general the optional
pages are not as useful as with filesystems that read the data
directly into the target pages. But still the optional pages
are useful to reduce the number of page faults and associated
VM / VFS / ZFS calls.
Another case that gets optimized (as a side effect) is paging in
from a hole. ZFS DMU does not currently provide a convenient
API to check for a hole. Instead it creates a temporary zero-filled
block and allows accessing it as if it were a normal data block.
Getting multiple pages one by one from a hole results in repeated
creation and destruction of the temporary block (and an associated
ARC header).
Tested with fsx using various supported blocks sizes from 512 bytes
to 128 KB and additionally 1 MB.
Please note that in illumos and ZoL they do not do the range-locking in
the page-in path. This is because ZFS has a double-caching problem
between ARC and page cache and that requires zfs_read() and zfs_write()
to consult pages in the page cache. So, in those functions they first
lock a range and then lock pages corresponding to the range. While in
the page-in (and maybe page-out) path they first lock the pages and then
would lock the range. So, they would have a deadlock.
I believe that FreeBSD does not have that problem, because the page-in
deals only with invalid pages while zfs_read() and zfs_write() need to
access only valid pages. They do not wait on a busy page unless it's
already valid.
silby [Fri, 16 Feb 2018 06:51:39 +0000 (06:51 +0000)]
Prevent savecore from reading bounds from the current directory.
Rev 244218 removed the requirement that you provide a dump
directory when checking if there is a coredump ready to be written.
That had the side-effect of causing the bounds file to be read
from the current working directory instead of the dump directory.
As the bounds file is irrelevant when just checking, the simplest
fix is to not read the bounds file when checking.
cy [Fri, 16 Feb 2018 05:48:45 +0000 (05:48 +0000)]
Document memset_s(3). memset_s(3) is defined in
C11 standard (ISO/IEC 9899:2011) K.3.7.4.1 The memset_s function
(p: 621-622)
Fix memset(3) portion of the man page by replacing the first argument
(destination) "b" with "dest", which is more descriptive than "b".
This also makes it consistent with the term used in the memset_s()
portion of the man page.
See also http://en.cppreference.com/w/c/string/byte/memset.
anish [Fri, 16 Feb 2018 05:17:00 +0000 (05:17 +0000)]
This change fixes duplicate detection of same IOMMU/AMD-Vi device for Ryzen with EFR support.
IVRS can have entry of type legacy and non-legacy present at same time for same AMD-Vi device. ivhd driver will ignore legacy if new IVHD type is present as specified in AMD-Vi specification. Earlier both of IVHD entries used and two ivhd devices were created.
Add support for new IVHD type 0x11 and 0x40 in ACPI. Create new struct of type acpi_ivrs_hardware_new for these new type of IVHDs. Legacy type 0x10 will continue to use acpi_ivrs_hardware.
kevans [Fri, 16 Feb 2018 04:50:14 +0000 (04:50 +0000)]
stand/lua: Say "loader prompt" instead of "lua interpreter"
Noting that we're in lualoader is nice, but it's not a difference we raelly
need to expose to Fred. Re-word it to match the 4th wording and reduce
differences.
kevans [Fri, 16 Feb 2018 04:45:53 +0000 (04:45 +0000)]
stand/lua: Remove explicit alias from "Back to main menu"
This removes a redundant alias that has since been converted into a global
alias. It was converted to a global alias before to ensure that we always
have a way to go up one level in the menu.
kevans [Fri, 16 Feb 2018 04:03:15 +0000 (04:03 +0000)]
stand/lua: Set reasonable ACPI default based on presence
Set it based on hint.acpi.0.rsdp. Initially, hint.acpi.0.disabled will be
respected. "Using System Defaults" will override whether it's explicitly
disabled by hint and re-load it based on whether it's present on the system.
Unlike the 4th version, this is not restricted to x86. I have no strong
reasoning for this, so this is definitely open to change.
kevans [Fri, 16 Feb 2018 03:14:23 +0000 (03:14 +0000)]
stand/lua: Don't descend into an empty kernels submenu
This submenu is likely going to go away in favor of kernel selection as it
is done in forth at the moment, but for the time being don't descend into it
if we have no kernels available for listing.
imp [Fri, 16 Feb 2018 00:17:32 +0000 (00:17 +0000)]
Eliminate bsd.stand.mk and -fPIC 32-bit intel builds
OK. We don't really need a bsd.stand.mk, and it was causing a -fPIC
for the toolchain to be added (bogusly) when building on amd64. Pull
all relevant defs back into defs.mk and delete bsd.stand.mk.
This saves about 15-20k on i386 loader and zfsloader which when
combined with Lua give us a lot more stack space in those constrained
environments.
kevans [Thu, 15 Feb 2018 19:49:15 +0000 (19:49 +0000)]
libsa: Consolidate tftp sendrecv into net.c sendrecv
bootp/arp/rarp/rpc all use the sendrecv implementation in net.c. tftp has
its own implementation because it passes an extra parameter into the recv
callback for the received payload type to be held.
These sendrecv implementations are otherwise equivalent, so consolidate
them. The other users of sendrecv won't be using the extra argument to recv,
but this gives us only one place to worry about respecting timeouts and one
consistent timeout behavior.
brooks [Thu, 15 Feb 2018 17:26:30 +0000 (17:26 +0000)]
Fix getdirentries(2) under 32-bit compat.
The latest version of getdirentries (syscall 554) takes a pointer
an an off_t as the last argument. The old version which copies out
an int32_t was being used instead. Use the standard sys_getdirentries()
implementation instead.
cognet [Thu, 15 Feb 2018 15:46:14 +0000 (15:46 +0000)]
Rename the ACPI variant of the gicv2m driver from "gicv2m" to "gicv2m_acpi".
The FDT variant is called "gicv2m" too, and as both would try to register
on gic, only one of them would succeed, while we want them both in a
GENERIC kernel.
kevans [Thu, 15 Feb 2018 15:01:07 +0000 (15:01 +0000)]
stand: Fix ubldr after r329190
metadata load files were consolidated in r329190, and these relocation fixup
bits were inadvertently dropped in the process. Re-add them to fix boot with
ubldr.
It panicked here:
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/
zio.c#430
pio->io_lock is DEAD, thus a panic. Further analysis shows the "pio"
(parent zio of "cio") has already been destroyed.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Youzhong Yang <youzhong@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: George Wilson <george.wilson@delphix.com>
eadler [Thu, 15 Feb 2018 03:22:53 +0000 (03:22 +0000)]
devd: don't pass &fds in useless parameters to select(2)
select(2) should be declared as restrict. In addition the only fd in
the fdset is open O_RDONLY, and it's not a socket that can provide OOB
notifications,
rpokala [Thu, 15 Feb 2018 03:22:04 +0000 (03:22 +0000)]
mxge(4) should pass unhandled ioctls to ether_ioctl()
Panasas discovered that ioctl(SIOCGLAGGPORT) returns ENOTTY for mxge(4) when
the NIC is not a member of a lagg. This came as a surprise, because the
SIOCGLAGGPORT handler in if_lagg.c only returns ENOENT (if run against the
laggX interface, rather than a physical port) or EINVAL (if run against a
non-member physical port). This behavior was not seen with other drivers,
such as bge(4), igb(4), and cxl(4). When I compared their respective ioctl
handlers, I found that they all called ether_ioctl() for the default (i.e.
unhandled) case; by contrast, mxge(4) only calls ether_ioctl() for two
specific cases, and returns ENOTTY for the default case.
Remove the two cases which explicitly call ether_ioctl(), and let the
default case call it instead. This matches what the vast majority of the NIC
drivers do.
asomers [Wed, 14 Feb 2018 23:52:39 +0000 (23:52 +0000)]
zfsd: Allow zfsd to work on any type of GEOM provider
cddl/usr.sbin/zfsd/zfsd_event.cc
Remove the check for da and ada devices. This way zfsd can work on md,
geli, glabel, gstripe, etc devices. geli in particular is useful
combined with ZFS. gnop is also useful for simulating drive pulls in
the ZFSD test suite.
Also, eliminate the DevfsEvent class entirely. Move its
responsibilities into GeomEvent. We can get everything we need to know
just from listening to GEOM events.
lib/libdevdctl/event.cc
Fix GeomEvent::DevName for CREATE events. Oddly, the relevant field is
named "cdev" for CREATE events but "devname" for disk events.
eugen [Wed, 14 Feb 2018 21:17:44 +0000 (21:17 +0000)]
ng_pppoe(8): add support for user-supplied Host-Uniq tag.
A few ISP filter PADI requests based on such tag,
to force the use of their own routers.
The custom Host-Uniq tag is passed in the NGM_PPPOE_CONNECT
control message, so it can be used with FreeBSD ppp(8)
and mpd without any other change.
Add support to send and receive PADM messages,
HURL and MOTM, often used by service providers to provide
ACS information and other configuration settings
to the user CPE.
asomers [Wed, 14 Feb 2018 20:26:09 +0000 (20:26 +0000)]
gpart: append partition name to the underlying provider's physical path
If the underlying provider's physical path is null, then the gpart device's
physical path will be, too. Otherwise, it will append the partition name,
such as "/p1" or "/s1/a". This will make gpart work better with zfsd(8).
asomers [Wed, 14 Feb 2018 20:15:32 +0000 (20:15 +0000)]
geli: append "/eli" to the underlying provider's physical path
If the underlying provider's physical path is null, then the geli device's
physical path will be, too. Otherwise, it will append "/eli". This will make
geli work better with zfsd(8).
bdrewery [Wed, 14 Feb 2018 18:43:50 +0000 (18:43 +0000)]
nanosleep(2): Fix bogus incrementing of rmtp by tc_tick_sbt on [EINTR].
sbt is the time in the future that the tsleep_sbt() is expected to be completed
at. sbtt is the current time. Depending on the precision with sysctl
kern.timecounter.alloweddeviation the start time may be incremented by
tc_tick_sbt. The same increment is needed for the current time of sbtt before
calculating the difference. The impact of missing this increment is that rmtp
may increase by one tc_tick_sbt on every early [EINTR] return. If the same
struct is passed in for rqtp as rmtp this can result in rqtp effectively
incrementing by tc_tick_sbt and sleeping longer than originally intended.
This problem was introduced in r247797.
Reviewed by: kib, markj, vangyzen (all on an older version of the test)
MFC after: 2 weeks
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D14362
imp [Wed, 14 Feb 2018 18:21:54 +0000 (18:21 +0000)]
Simple script to image a small test area from a built tree. Build with
'cd stand; make MK_FORTH=no MK_LOADER_LUA=yes' then run this script.
You can then test with lua-test.sh with the same parameter.
manu [Wed, 14 Feb 2018 18:05:37 +0000 (18:05 +0000)]
efi: Only scan the BLKIO MEDIA once
Scan only the BLOCK IO MEDIA once instead of each time for each type of
device (fd, cd and hdd).
Leave the mechanism to free and reprobe all devices if one day we want
to implement a "dev rescan" thing.
imp [Wed, 14 Feb 2018 17:51:51 +0000 (17:51 +0000)]
A quick test script that we can run to use userboot's test mode to
excersize the load loader. Assumes that we already have a suitable
root area that you pass in with the first arg.
asomers [Wed, 14 Feb 2018 15:49:31 +0000 (15:49 +0000)]
Implement .vop_pathconf and .vop_getacl for the .zfs ctldir
zfsctl_common_pathconf will report all the same variables that regular ZFS
volumes report. zfsctl_common_getacl will report an ACL equivalent to 555,
except that you can't read xattrs or edit attributes.
Fixes a bug where "ls .zfs" will occasionally print something like:
ls: .zfs/.: Operation not supported
kevans [Wed, 14 Feb 2018 15:40:13 +0000 (15:40 +0000)]
libsa: Fix IP recv timeout
readip() doesn't, at the moment, properly indicate to callers that it has
timed out. One can tell that it's timed out if errno == EAGAIN when it
returns, but this is not ideal. Restructure it a little bit to explicitly
set errno to ETIMEDOUT if we've exhausted tleft.
I found two places that care about where it timed out or not: sendrecv in
net.c and sendrecv_tftp. Both are structured to pass smaller timeout values
to readip while tracking a larger timeout. Neither of them were able to do
this properly with readip not indicating ETIMEDOUT, so fix it.
While here, straighten out the time (t/t1) usage in sendrecv_tftp.
This would have manifested itself in periodic failures to NFS/TFTP boot for
no apparent reason because MINTMO/MAXTMO were not actually being respected
properly. Problems were not reported with NFS, only TFTP.
eadler [Wed, 14 Feb 2018 07:59:30 +0000 (07:59 +0000)]
msun: signed overflow in atan2
As a component of atan2(y, x), the case of x == 1.0 is farmed out to
atan(y). The current implementation of this comparison is vulnerable
to signed integer underflow (that is, undefined behavior), and it's
performed in a somewhat more complicated way than it need be. Change
it to not be quite so cute, rather directly comparing the high/low
bits of x to the specific IEEE-754 bit pattern that encodes 1.0.
Note that while there are three different e_atan* files in the
relevant directory, only this one needs fixing. e_atan2f.c already
compares against the full bit pattern encoding 1.0f, while
e_atan2l.cuses bitwise-ands/ors/nots and so doesn't require a change.
jhibbits [Wed, 14 Feb 2018 02:51:28 +0000 (02:51 +0000)]
PPC64: Get the timestap from the proper OF field
Summary:
After revision rS328534('PPC64: use hwref instead of cpuid'), FreeBSD on
powerpc64 virtual machine panics since it is unable to read the
timebase, showing the following error:
get-property for timebase-frequency on zero phandle
panic: Unable to determine timebase frequency!
With the change above, cpuref->cr_hwref does not contain the phandle
anymore, thus, it never reads the proper CPU entry in OF.
kib [Wed, 14 Feb 2018 00:31:45 +0000 (00:31 +0000)]
Ensure memory consistency on COW.
From the submitter description:
The process is forked transitioning a map entry to COW
Thread A writes to a page on the map entry, faults, updates the pmap to
writable at a new phys addr, and starts TLB invalidations...
Thread B acquires a lock, writes to a location on the new phys addr, and
releases the lock
Thread C acquires the lock, reads from the location on the old phys addr...
Thread A ...continues the TLB invalidations which are completed
Thread C ...reads from the location on the new phys addr, and releases
the lock
In this example Thread B and C [lock, use and unlock] properly and
neither own the lock at the same time. Thread A was writing somewhere
else on the page and so never had/needed the lock. Thread C sees a
location that is only ever read|modified under a lock change beneath
it while it is the lock owner.
To fix this, perform the two-stage update of the copied PTE. First,
the PTE is updated with the address of the new physical page with
copied content, but in read-only mode. The pmap locking and the page
busy state during PTE update and TLB invalidation IPIs ensure that any
writer to the page cannot upgrade the PTE to the writable state until
all CPUs updated their TLB to not cache old mapping. Then, after the
busy state of the page is lifted, the faults for write can proceed and
do not violate the consistency of the reads.
The change is done in vm_fault because most architectures do need IPIs
to invalidate remote TLBs. More, I think that hardware guarantees of
atomicity of the remote TLB invalidation are not enough to prevent the
inconsistent reads of non-atomic reads, like multi-word accesses
protected by a lock. So instead of modifying each pmap invalidation
code, I did it there.
Discovered and analyzed by: Elliott.Rabe@dell.com
Reviewed by: markj
PR: 225584 (appeared to have the same cause)
Tested by: Elliott.Rabe@dell.com, emaste, Mike Tancsa <mike@sentex.net>, truckman
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14347
kib [Wed, 14 Feb 2018 00:25:18 +0000 (00:25 +0000)]
Do not call pmap_enter() with invalid protection mode.
If the map entry elookup was performed due to the mapping changes, we
need to ensure that there is still some access permission bit
requested which is compatible with the current vm_map_entry mode. If
not, restart the handler from scratch instead of trying to save the
current progress.
Also adjust fault_type to not include cleared permission bits.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14347