Mark Johnston [Thu, 23 Aug 2018 14:58:19 +0000 (14:58 +0000)]
Add an lld option to emit PC-relative relocations for ifunc calls.
The current kernel ifunc implementation creates a PLT entry for each
ifunc definition. ifunc calls therefore consist of a call to the
PLT entry followed by an indirect jump. The jump target is written
during boot when the kernel linker resolves R_[*]_IRELATIVE relocations.
This implementation is defined by requirements for userland code, where
text relocations are avoided. This requirement is not present for the
kernel, so the implementation has avoidable overhead (namely, an extra
indirect jump per call).
Address this for now by adding a special option to the static linker
to inhibit PLT creation for ifuncs. Instead, relocations to ifunc call
sites are passed through to the output file, so the kernel linker can
enumerate such call sites and apply PC-relative relocations directly
to the text section. Thus the overhead of an ifunc call becomes exactly
the same as that of an ordinary function call. This option is only for
use by the kernel and will not work for regular programs.
The final form of this optimization is up for debate; for now, this
change is simple and static enough to be acceptable as an interim
solution.
Reviewed by: emaste
Discussed with: arichardson, dim
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D16748
Kyle Evans [Thu, 23 Aug 2018 13:38:38 +0000 (13:38 +0000)]
efiloader: Setup FDT in autoload to fix overlays clobbering kenv
manu found in the noted PR that overlays seemed to be clobbering the kenv
and killing the boot. Further inspection revealed that one can `fdt ls` at
the loader prompt for a successful boot, but autoboot breaks it.
In the autoboot case, first setup of FDT is happening in the middle of
bi_load, which triggers loading of the DTBO from /boot.
This is bad, bad, bad. Files in the loader are loaded somewhere in the
middle of the address space one after another. bi_load starts building the
needed kernel bootinfo immediately after the highest-addr loaded file. File
loads in the middle of bi_load suddenly clobber bootinfo and everything goes
off the rails.
The solution to this is to use take advantage of arch_autoload to setup FDT
in efiloader compiled with LOADER_FDT_SUPPORT. This matches how it works in
ubldr land, and is how it should have worked when overlay support was added
to efiloader since fdt_setup_fdtp now has the potential to load files
(courtesy of fdt_platform_load_dtb).
Emmanuel Vadot [Thu, 23 Aug 2018 13:21:01 +0000 (13:21 +0000)]
dts: Import DTS for arm64
- Most of the boards are using U-Boot, u-boot embed a DTB that isn't
compiled with -@ (overlay ready) so we cannot use overlays. We want
overlays, overlays are nice.
- The DTS life is going to linux, then sometimes it's imported in
U-Boot but it depend on the SoC family, U-Boot doesn't batch import
every DTS like we do. So sometimes to U-Boot DTS are very old. Or when
an interesting patch in commited upstream it is in Linux X+2 (roughly 4
months from now), we then have to wait for U-Boot to catch up, that
give us between 4 and 6 months to have an update.
- Some boards like the Marvell ones have 3 DTS, the one in the
vendor U-Boot made by Marvell themselves, the one in u-boot mainline
and the one in Linux. I found that the DTS in the Marvell U-Boot have
some problem with FreeBSD (especially the macchiatobin that declare
node with the same address but not the same size, that is not something
that the rman code can handle, it could be modified, I don't know the
code well enough). Also some compatible are used when they shouldn't,
for example they declare the gpio being orion-gpio while this binding
requires interrupts supports, which the node doesn't have.
- The above situation is mostly the same with RockChip SoCs (possibly
others, those are the only SoCs I work on that have this problem).
Note that importing the DTS doesn't mean that every board will use
them, I don't intend to copy the DTB to the GENERIC memstick image for
the Overdrive 1000/3000 for example, the ones provided by the firmware
works fine.
RPI3 will still stay an exception as we use the DTB provided by the
rpi-firmware package, so they come from the rpi foundation linux fork.
Renato Botelho [Thu, 23 Aug 2018 10:38:59 +0000 (10:38 +0000)]
usr.sbin/ndp: Cleanup in preparation to add libxo support
* Constify rtpref_str declaration
* Remove unused h_errno declaration
* Use time_t type for expire
* Use strlcpy to set static "?" value to ifname
* Rename local variable 's' to stop shadowing global definition
* Close socket used in pfx_flush()
* Use local variables for sock() in setdefif() and getdefif()
* Increase WARNS to 3
Michael Tuexen [Thu, 23 Aug 2018 06:03:59 +0000 (06:03 +0000)]
Don't use the explicit number 32 for the length of the secrets,
use sizeof() or explicit #definesi instead. No functional change.
This was suggested by jmg@.
Warner Losh [Thu, 23 Aug 2018 05:06:31 +0000 (05:06 +0000)]
Add a special note to UPDATING for the devmatch stuff. While tested,
there's an elevated risk of trouble, and you must update kernel,
userland and rc scripts for the best experience.
Warner Losh [Thu, 23 Aug 2018 05:06:22 +0000 (05:06 +0000)]
When trying to match the nomatch event passed to us, attempt to look
up the device described by the nomatch event in the device tree. If we
find it, then if the device is marked as have already attached to a
device once, then ignore the device.
This keeps us from reloading the device driver when it has just been
manually unloaded. All devies that have had a driver attach to them at
least once no longer participate in pnp-based autoloading.
Warner Losh [Thu, 23 Aug 2018 05:06:16 +0000 (05:06 +0000)]
Add a new device flag: DF_ATTACHED_ONCE
This flag is set once the device has been successfully attached. When
set, it inhibits devmatch from trying to match the device. This in
turn allows kldunload to work as expected. Prior to the change, the
driver would immediately reload because devmatch had no notion that
the driver had once been attached, and therefore shouldn't participate
in further matching.
Warner Losh [Thu, 23 Aug 2018 05:06:11 +0000 (05:06 +0000)]
Remove sorting of matches and print all the matches as we find them.
This backs out the hack we added in r329458. Now that we can freeze /
thaw probing, this is a much better solution to that problem. Revert
to simply printing the results as we find them, and relying on an
external sort | uniq to clean up the list.
Warner Losh [Thu, 23 Aug 2018 05:05:47 +0000 (05:05 +0000)]
Create devctl freeze/thaw.
This adds it to devctl, libdevctl, defines the two IOCTLs and
implements the kernel bits. causes any new drivers that are added via
kldload to be deferred until a 'thaw' comes in. These do not stack: it
is an error to freeze while frozen, or thaw while thawed.
Kyle Evans [Thu, 23 Aug 2018 02:26:40 +0000 (02:26 +0000)]
dtc(1): Update to 0892ec7; HACKING and implicit header fixes
Fixes courtesy of arichardson and jmg:
- HACKING was pointing to the wrong place
- Added headers were being relied on implicitly, but libstdc++ did not
comply with the unspoken wishes of dtc.
Kyle Evans [Thu, 23 Aug 2018 01:45:18 +0000 (01:45 +0000)]
bectl(8): jail: Tear down jail by default after command exits
Add a -U flag to get back the old behavior. The new behavior is a little
more friendly to the common use cases, jail the BE and execute a script.
Having the jail torn down automatically when the script is finished, or when
you exit the shell, is a little more friendly than having to remember to
`bectl ujail`.
Batch mode (-b) will continue to leave the jail up, as it's assumed the
caller has other intentions.
Conrad Meyer [Thu, 23 Aug 2018 01:42:45 +0000 (01:42 +0000)]
devstat(9): Constify function parameters that can be const
No functional change.
When attempting to document the changed argument types in devstat.9, I
discovered the 20 year old manual page severely mismatched reality even
prior to my simple change. So I took a first cut pass cleaning that up to
match reality. I'm sure I've missed some things; the goal was just to leave
it better than when I started.
Kyle Evans [Thu, 23 Aug 2018 01:22:13 +0000 (01:22 +0000)]
fdt_fixups: relocate the /chosen node after applying fixups
As indicated by the comment, any fixups applied (which might include
overlays) can invalidate the previously located node by adding nodes or
setting/adding properties. The later fdt_setprop of fixup-applied property
would then fail because of the bad/wrong node offset.
This would have generally been harmless, but potentially caused multiple
applications of fixups and caused a little bit of bloat.
Alan Somers [Wed, 22 Aug 2018 23:31:27 +0000 (23:31 +0000)]
tftpd: Fix data corruption bug with netascii
Transferring files in netascii format requires, among other things,
translating all CR characters to a CR,NUL pair. tftpd does this correctly
except when the CR occurs as the last octet of a packet. In that case, it
erroneously drops the NUL which should be part of the following packet. The
bug was caused by using 0 as a sentinel value in a variable that could
legitimately hold 0. Fix it by switching the sentinel value to -1.
PR: 178055
Reported by: Richard <rsitze@gmail.com>
Reviewed by: cem
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D16853
[ig4] Fix I/O timeout issue with Designware I2C controller on AMD platforms
Due to hardware limitation AMD I2C controller can't trigger pending
interrupt if interrupt status has been changed after clearing
interrupt status bits. So, I2C will lose the interrupt and IO will be
timed out. Implements a workaround to disable I2C controller interrupt
and re-enable I2C interrupt before existing interrupt handler.
Conrad Meyer [Wed, 22 Aug 2018 22:19:42 +0000 (22:19 +0000)]
KASSERT: Make runtime optionality optional
Add an option, KASSERT_PANIC_OPTIONAL, that allows runtime KASSERT()
behavior changes. When this option is not enabled, code that allows
KASSERTs to become optional is not enabled, and all violated assertions
cause termination.
The runtime KASSERT behavior was added in r243980.
One important distinction here is that panic has __dead2
("attribute((noreturn))"), while kassert_panic does not. Static analyzers
like Coverity understand __dead2. Without it, KASSERTs go misunderstood,
resulting in many false positives that result from violation of program
invariants.
Michael Tuexen [Wed, 22 Aug 2018 21:23:32 +0000 (21:23 +0000)]
Add support for send, receive and state-change DTrace providers for
SCTP. They are based on what is specified in the Solaris DTrace manual
for Solaris 11.4.
Mark Johnston [Wed, 22 Aug 2018 20:44:30 +0000 (20:44 +0000)]
Prepare the kernel linker to handle PC-relative ifunc relocations.
The boot-time ifunc resolver assumes that it only needs to apply
IRELATIVE relocations to PLT entries. With an upcoming optimization,
this assumption no longer holds, so add the support required to handle
PC-relative relocations targeting GNU_IFUNC symbols.
- Provide a custom symbol lookup routine that can be used in early boot.
The default lookup routine uses kobj, which is not functional at that
point.
- Apply all existing relocations during boot rather than filtering
IRELATIVE relocations.
- Ensure that we continue to apply ifunc relocations in a second pass
when loading a kernel module.
Reviewed by: kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D16749
Marcelo Araujo [Wed, 22 Aug 2018 20:23:08 +0000 (20:23 +0000)]
Add -s "help" and -l "help" to print a list of supported PCI and LPC devices.
For tools that uses bhyve such like libvirt, it is important to be able to
probe what features are supported by the given bhyve binary.
To give more context, libvirt probes bhyve's capabilities in a not very
effective way:
- Running 'bhyve -h' and parsing output.
- To detect devices, it runs 'bhyve -s 0,dev' for every each device and
parses error output to identify if the device is supported or not.
Patrick Kelsey [Wed, 22 Aug 2018 19:38:48 +0000 (19:38 +0000)]
Extended pf(4) ioctl interface and pfctl(8) to allow bandwidths of
2^32 bps or greater to be used. Prior to this, bandwidth parameters
would simply wrap at the 2^32 boundary. The computations in the HFSC
scheduler and token bucket regulator have been modified to operate
correctly up to at least 100 Gbps. No other algorithms have been
examined or modified for correct operation above 2^32 bps (some may
have existing computation resolution or overflow issues at rates below
that threshold). pfctl(8) will now limit non-HFSC bandwidth
parameters to 2^32 - 1 before passing them to the kernel.
The extensions to the pf(4) ioctl interface have been made in a
backwards-compatible way by versioning affected data structures,
supporting all versions in the kernel, and implementing macros that
will cause existing code that consumes that interface to use version 0
without source modifications. If version 0 consumers of the interface
are used against a new kernel that has had bandwidth parameters of
2^32 or greater configured by updated tools, such bandwidth parameters
will be reported as 2^32 - 1 bps by those old consumers.
All in-tree consumers of the pf(4) interface have been updated. To
update out-of-tree consumers to the latest version of the interface,
define PFIOC_USE_LATEST ahead of any includes and use the code of
pfctl(8) as a guide for the ioctls of interest.
Eric Joyner [Wed, 22 Aug 2018 18:19:56 +0000 (18:19 +0000)]
if_media: Add new 2.5G/5G/25G/40G/50G/100G/200G/400G media types
Upcoming Ethernet hardware will support new media types that aren't in the kernel
yet, so they are added here. These mostly include new 25G/50G/100G media types;
and this commit introduces new 200G/400G speeds and media.
Alexander Motin [Wed, 22 Aug 2018 16:32:53 +0000 (16:32 +0000)]
Add dmu_tx_assign() error handling in zfs_unlinked_drain().
The error handling got lost during r334810, while according to the report
error there may happen in case of dataset being over quota. In such case
just leave the node in the unlinked list to be freed sometimes later.
Alexander Motin [Wed, 22 Aug 2018 16:27:24 +0000 (16:27 +0000)]
Create separate taskqueue to call zfs_unlinked_drain().
r334810 introduced zfs_unlinked_drain() dispatch to taskqueue on every
deletion of a file with extended attributes. Using system_taskq for that
with its multiple threads in case of multiple files deletion caused all
available CPU threads to uselessly spin on busy locks, completely blocking
the system.
Use of single dedicated taskqueue is the only easy solution I've found,
while in would be great if we could specify that some task should be
executed only once at a time, but never in parallel, while many tasks
could use different threads same time.
Kurt Lidl [Wed, 22 Aug 2018 15:29:54 +0000 (15:29 +0000)]
Turn off LOADER_GELI and LOADER_LUA for sparc64, until those options
are fully debugged. With these options off, the unified "loader"
binary for sparc64 works to boot a kernel from ZFS.
Kurt Lidl [Wed, 22 Aug 2018 14:33:57 +0000 (14:33 +0000)]
Increase the size of the heap size available on sparc64 during
operation of "loader". The dramatic increase in size of
SPA_MAXBLOCKSIZE in r304321 causes the heap space to be exhausted,
so malloc() fails, ultimately leading to a memcpy() with a
destination of 0x0.
Note that ntpd_sync_on_start is a preferred alternative to ntpdate_enable.
A similar note is already present in the description of the
ntpd_sync_on_start variable.
This patch adds a note to the description of the ntpdate_enable variable.
This way it would be easier to spot. Otherwise a user might skip the part
of the manual describing ntpd_sync_on_start if they stop reading after
learning about ntpdate_enable.
Reviewed by: bcr
Approved by: mat (mentor)
Differential Revision: https://reviews.freebsd.org/D16519
Rick Macklem [Wed, 22 Aug 2018 12:20:10 +0000 (12:20 +0000)]
Revert r320757 since it can cause "excl->shared" panics.
PR#230752 shows a panic where an nfsd thread tries to do soconnect() on
the AF_LOCAL socket used by the nfsuserd while already holding an
exclusive lock on it. I am not 100% sure how this happens, but since an
AF_LOCAL socket is in the file system namespace it is conceivable that it
could lock it and then attempt an upcall to the nfsuserd.
However, reverting r320757 stops the nfsuserd from using an AF_LOCAL
socket, so it should avoid any such panic().
r320757 did fix a problem with running the nfsuserd when jails were
enabled, but that can be dealt with less elegantly by allowing the
use of an alternate address instead of 127.0.0.1.
The gssd daemon also uses an AF_LOCAL socket, but it will do upcalls
before the nfsd thread processes the RPC, so I think it should not
be suseptible to this problem.
Alex Richardson [Wed, 22 Aug 2018 11:56:51 +0000 (11:56 +0000)]
Use unifdef -x1 instead of ignoring the shell exit code
This way the target fails if unifdef doesn't exist or doesn't modify the
file instead of just generating an empty .c file.
I found this while building without inherited $PATH (D16815)
Alex Richardson [Wed, 22 Aug 2018 11:56:42 +0000 (11:56 +0000)]
Stop using unifdef to generate bsdxml.h
The current invocation of unifdef causes the build to fail when using a shell
with -o pipefail on by default since unifdef will return a non-zero exit status
if it changes something. The only thing this call to unifdef does is remove 5
lines that will be ignored by the compiler anyway. Furthermore, it is the only
make rule in the source tree that requires unifdef. Removing this call also
makes it slightly easier to build without inhering $PATH (D16815) since we
don't need unifdef anymore.
I also noticed that the sed call to replace the include guard has been broken
for over 10 years since the import of expat 2.0.1 changed it from
`XmlParse_INCLUDED` to `Expat_INCLUDED`. I could also fix this but since it's
been broken for so long and no one noticed, it's probably not necessary.
Toomas Soome [Wed, 22 Aug 2018 10:04:42 +0000 (10:04 +0000)]
loader: bios loader should allow to chain load a file
The current chain command does accept only device, allow also a file to be used,
such as /boot/pmbr or /boot/mbr (or stored third party MBR/VBR block).
Chuck Tuffli [Wed, 22 Aug 2018 04:29:24 +0000 (04:29 +0000)]
Make NVMe compatible with the original API
The original NVMe API used bit-fields to represent fields in data
structures defined by the specification (e.g. the op-code in the command
data structure). The implementation targeted x86_64 processors and
defined the bit fields for little endian dwords (i.e. 32 bits).
This approach does not work as-is for big endian architectures and was
changed to use a combination of bit shifts and masks to support PowerPC.
Unfortunately, this changed the NVMe API and forces #ifdef's based on
the OS revision level in user space code.
This change reverts to something that looks like the original API, but
it uses bytes instead of bit-fields inside the packed command structure.
As a bonus, this works as-is for both big and little endian CPU
architectures.
Bump __FreeBSD_version to 1200081 due to API change
Kyle Evans [Wed, 22 Aug 2018 01:52:55 +0000 (01:52 +0000)]
lualoader: Fix loader.conf(5) EOL validation for 'exec' lines
This includes some light rework to simplify the line parsing, as well. If
we hit a line match, we'll always either use the line and move on to the
next line, or we'll spew out malformed line errors.
We had multiple spots to output the error and set the status based on
whether we had a non-nil first capture group or failed EOL validation, but
it was always the same error. Light rework entails a small label jump to
skip error handling and elimination of 'found' local.
Matt Macy [Wed, 22 Aug 2018 01:50:12 +0000 (01:50 +0000)]
Remove legacy drm and drm2 from tree
As discussed on the MLs drm2 conflicts with the ports' version and there
is no upstream for most if not all of drm. Both have been merged in to
a single port.
Users on powerpc, 32-bit hardware, or with GPUs predating Radeon
and i915 will need to install the graphics/drm-legacy-kmod. All
other users should be able to use one of the LinuxKPI-based ports:
graphics/drm-stable-kmod, graphics/drm-next-kmod, graphics/drm-devel-kmod.
Kyle Evans [Tue, 21 Aug 2018 23:42:20 +0000 (23:42 +0000)]
lualoader: Refactor config line expressions
A couple of issues addressed:
1.) Modules with - in the name were not recognized as modules
2.) The module regex was repeated for each place a module name may appear
3.) The 'strip leading space' bits were repeated for each expression
4.) The trailing 'comment validation' stuff was repeated every expression
#4 still has some more work to be done. exec lines, for instance, don't
capture a 'value' -- there's only one capture pattern. This throws off the
'c' value that we match, so the trailing bits aren't *actually* being
validated. This isn't a new issue, though, so a future comit will address
this.
muge(4) is the USB ethernet adapter that is used in RPi 3B+. Shipping it
in GENERIC kernel allows using NFS root out of the box instead of either
building custom kernel or modifying loader.conf for early loading of if_muge.ko
Fedor Uporov [Tue, 21 Aug 2018 18:39:47 +0000 (18:39 +0000)]
FUSE extattrs: fix issue when neither uio nor size were not passed to VOP_*.
The requested size was returned incorrectly in case uio == NULL from listextattr because the
nameprefix/name conversion was not applied.
Also, make a_size/uio returning logic more unified with other filesystems.
John Baldwin [Tue, 21 Aug 2018 17:13:51 +0000 (17:13 +0000)]
Remove 'imen' global variable from atpic(4).
In pre-SMPng, the global 'imen' was used to track mask state of the
hardware interrupts and was aligned to the masks used by spl*().
When the atpic code was converted to using the x86 interrupt source
abstraction, the global 'imen' was preserved by having each PIC
instance point to an invididual byte in the global 'imen' to hold its
8-bit interrupt mask. The global 'imen' is no longer used for
anything however, so rather than storing pointers in 'struct atpic',
just store the individual 8-bit mask for each PIC as a char.
While here, convert the ATPIC macro to using C99 initializers.
Alex Richardson [Tue, 21 Aug 2018 16:52:14 +0000 (16:52 +0000)]
Relax the check added in 338096
Checking for any include below ${SRCTOP}/sys is too strict and breaks
e.g. mkimg which includes sys/sys/disk. ABI issues will only be caused
by including headers in sys/sys since they might not match the host.
Mark Johnston [Tue, 21 Aug 2018 16:37:37 +0000 (16:37 +0000)]
Set arc_kmem_cache_reap_retry_ms to 0 and make it configurable.
r329759 introduced this parameter, which controls the rate at which ZFS
UMA zones are drained when the ARC reclaim thread is shrinking the ARC.
The reclamation target is derived from the global free page count, and
arc_shrink() only frees buffers back to UMA, so the free page count is
not updated until the zones are drained. Thus, back-to-back calls to
arc_shrink() within the arc_kmem_cache_reap_retry_ms interval do not
provide immediate feedback to the arc_reclaim control loop, so we may
free more of the ARC than needed to address a transient page shortage.
As we do not implement the asynchronous zone draining added in r329759,
disable the retry interval, restoring pre-r329759 behaviour. That is,
we will drain the ZFS UMA zones before each attempt to shrink the ARC.
Reviewed by: mav
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Colin Percival [Tue, 21 Aug 2018 15:30:47 +0000 (15:30 +0000)]
Quieten the svn (or svnlite) commands used to extract information from an
SVN checkout for placement into an EC2 AMI. We only run these if there
is a .svn directory; but in the event that SVN was used to check out a
tree which is then exported over NFS, we were unnecessarily noisy.
Michael Tuexen [Tue, 21 Aug 2018 14:12:30 +0000 (14:12 +0000)]
Enabling the IPPROTO_IPV6 level socket option IPV6_USE_MIN_MTU on a TCP
socket resulted in sending fragmented IPV6 packets.
This is fixes by reducing the MSS to the appropriate value. In addtion,
if the socket option is set before the handshake happens, announce this
MSS to the peer. This is not stricly required, but done since TCP
is conservative.
Michael Tuexen [Tue, 21 Aug 2018 13:25:32 +0000 (13:25 +0000)]
Refactor the SHUTDOWN_PENDING state handling.
This is not a functional change but a preperation for the upcoming
DTrace support. It is necessary to change the state in one
logical operation, even if it involves clearing the sub state
SHUTDOWN_PENDING.
Marcelo Araujo [Tue, 21 Aug 2018 11:22:49 +0000 (11:22 +0000)]
- Add CSV output to gstat via -C flag.
Add a -C option, similar to -B, that allows gstat to produce basic CSV output
with absolute timestamps (ISO 8601, nearly.) Multiple devices are handled by
way of a single-pivot CSV table with duplicated timestamps for each object
output.