John Baldwin [Fri, 25 Oct 2019 22:17:24 +0000 (22:17 +0000)]
MFC 353023: Fix the EMBEDFS_FORMAT helper variable for riscv64.
It was defined with the wrong MACHINE_ARCH previously. This permits
using an MFS image without defining MD_ROOT_SIZE which has various
benefits (one being that the build is able to treat the MFS image as
a dependency and properly re-link the kernel with the new image when
building with NO_CLEAN).
John Baldwin [Fri, 25 Oct 2019 22:15:20 +0000 (22:15 +0000)]
MFC 353059: Restore description of packets dropped due to full reassembly queue.
r265408 renamed tcps_rcvmemdrop to tcps_rcvreassfull and gave it a more
specific description. r279122 (libxo-ification) reverted that change.
This commit brings it back, but with a small tweak to the description.
John Baldwin [Fri, 25 Oct 2019 22:04:05 +0000 (22:04 +0000)]
MFC 353585,353586: Support hot insertion and removal of PCI devices on EC2.
353585:
Export pci_attach() and pci_detach().
353586:
Support hot insertion and removal of PCI devices on EC2.
Install ACPI notify handlers on PCI devices with an _EJ0 method. This
handler is invoked when devices are added or removed.
- When an ACPI_NOTIFY_DEVICE_CHECK event posts, rescan the parent bus
device. Note that strictly speaking we only need to rescan the
specified device, but BUS_RESCAN is what is available, so we rescan
the entire bus.
- When an ACPI_NOTIFY_EJECT_REQUEST event posts, detach the device
associated with the ACPI handle, invoke the _EJ0 method, and then
delete the device.
Eventually this might be changed to vector notify events to devd in
userspace where devctl can be used instead to permit more complex
actions such as graceful unmounting of filesystems.
John Baldwin [Fri, 25 Oct 2019 21:23:44 +0000 (21:23 +0000)]
MFC 353371: Don't free the cursor boundary tag during vmem_destroy().
The cursor boundary tag is statically allocated in the vmem instead of
from the vmem_bt_zone. Explicitly remove it from the vmem's segment
list in vmem_destroy before freeing all the segments from the vmem.
John Baldwin [Fri, 25 Oct 2019 21:14:43 +0000 (21:14 +0000)]
MFC 353323: Set the FID field in lookaside crypto requests to the rx queue ID.
The PCI block in the adapter requires this field to be set to a valid
queue ID. It is not clear why it did not fail on all machines, but
the effect was that crypto operations reading input data via DMA
failed with an internal PCI read error on machines with 128G or more
of RAM.
Many extern struct pcpu <something>__pcpu declarations were
copied/pasted in sources. The issue is that the definition is MD, but
it cannot be provided by machine/pcpu.h due to actual struct pcpu
defined in sys/pcpu.h later than the inclusion of machine/pcpu.h.
This forced the copying when other code needed direct access to
__pcpu. There is no way around it, due to machine/pcpu.h supplying
part of struct pcpu fields.
To work around the problem, add a new machine/pcpu_aux.h header, which
should fill any needed MD definitions after struct pcpu definition is
completed. This allows to remove copies of __pcpu spread around the
source. Also on x86 it makes it possible to remove work arounds like
OFFSETOF_CURTHREAD or clang specific warnings supressions.
Michael Tuexen [Fri, 25 Oct 2019 18:17:56 +0000 (18:17 +0000)]
MFC r354044:
Ensure that the flags indicating IPv4/IPv6 are not changed by failing
bind() calls. This would lead to inconsistent state resulting in a panic.
A fix for stable/11 was committed in
https://svnweb.freebsd.org/base?view=revision&revision=338986
Reported by: syzbot+2609a378d89264ff5a42@syzkaller.appspotmail.com
Obtained from: jtl@
Sponsored by: Netflix, Inc.
Alexander Motin [Fri, 25 Oct 2019 15:02:50 +0000 (15:02 +0000)]
MFC r353454: Allocate device softc from the device domain.
Since we are trying to bind device interrupt threads to the device domain,
it should have sense to make memory often accessed by them local. If domain
is not known, fall back to round-robin.
Alexander Motin [Fri, 25 Oct 2019 14:55:37 +0000 (14:55 +0000)]
MFC r352630: Make nvme(4) driver some more NUMA aware.
- For each queue pair precalculate CPU and domain it is bound to.
If queue pairs are not per-CPU, then use the domain of the device.
- Allocate most of queue pair memory from the domain it is bound to.
- Bind callouts to the same CPUs as queue pair to avoid migrations.
- Do not assign queue pairs to each SMT thread. It just wasted
resources and increased lock congestions.
- Remove fixed multiplier of CPUs per queue pair, spread them even.
This allows to use more queue pairs in some hardware configurations.
- If queue pair serves multiple CPUs, bind different NVMe devices to
different CPUs.
Alexander Motin [Fri, 25 Oct 2019 14:51:21 +0000 (14:51 +0000)]
MFC r342771 (by cem): Expose threads-per-core and physical core count information
With new sysctls (to the best of our ability do detect them). Restructured
smp.4 slightly for clarity (keep relevant stuff closer to the top) while
documenting.
Note: A little more than just the tun/tap merge has been MFC'd to ease
auditing correctness/differences, as some later commits were cherry-picked
back to tun+tap.
r347241:
tun/tap: merge and rename to `tuntap`
tun(4) and tap(4) share the same general management interface and have a lot
in common. Bugs exist in tap(4) that have been fixed in tun(4), and
vice-versa. Let's reduce the maintenance requirements by merging them
together and using flags to differentiate between the three interface types
(tun, tap, vmnet).
This fixes a couple of tap(4)/vmnet(4) issues right out of the gate:
- tap devices may no longer be destroyed while they're open [0]
- VIMAGE issues already addressed in tun by kp
[0] emaste had removed an easy-panic-button in r240938 due to devdrn
blocking. A naive glance over this leads me to believe that this isn't quite
complete -- destroy_devl will only block while executing d_* functions, but
doesn't block the device from being destroyed while a process has it open.
The latter is the intent of the condvar in tun, so this is "fixed" (for
certain definitions of the word -- it wasn't really broken in tap, it just
wasn't quite ideal).
ifconfig(8) also grew the ability to map an interface name to a kld, so
that `ifconfig {tun,tap}0` can continue to autoload the correct module, and
`ifconfig vmnet0 create` will now autoload the correct module. This is a
low overhead addition.
r347394:
tuntap: Properly detach tap ifp
r347404:
tuntap: Don't down tap interfaces if LINK0 is set
r347483:
tuntap: Improve style
No functional change.
tun_flags of the tuntap_driver was renamed to ident_flags to reflect the
fact that it's a subset of the tun_flags that identifies a tuntap device.
This maps more easily (visually) to the TUN_DRIVER_IDENT_MASK that masks off
the bits of tun_flags that are applicable to tuntap driver ident. This is a
purely cosmetic change.
r351220:
if_tuntap: minor improvements
Rewrite a loop to avoid duplicating the exit condition.
Simplify mask processing in tunpoll().
Fix minor typos.
r351229:
tuntap: belatedly add MODULE_VERSION for if_tun and if_tap
When tun/tap were merged, appropriate MODULE_VERSION should have been added
for things like modfind(2) to continue to do the right thing with the old
names.
r352148:
Remove empty tap/tun modules directories after r347241
r353056:
if_tuntap: add a busy/unbusy mechanism, replace destroy OPEN check
A future commit will create device aliases when a tuntap device is renamed
so that it's still easily found in /dev after the rename. Said mechanism
will want to keep the tun alive long enough to either realize that it's
about to go away or complete the alias creation, even if the alias is about
to get destroyed.
While we're introducing it, using it to prevent open devices from going away
makes plenty of sense and keeps the logic on waking up tun_destroy clean, so
we don't have multiple places trying to cv_broadcast unless it's still in
use elsewhere.
r353057:
if_tuntap: create /dev aliases when a tuntap device gets renamed
Currently, if you do:
$ ifconfig tun0 create
$ ifconfig tun0 name wg0
$ ls -l /dev | egrep 'wg|tun'
You will see tun0, but no wg0. In fact, it's slightly more annoying to make
the association between the new name and the old name in order to open the
device (if it hadn't been opened during the rename).
Register an eventhandler for ifnet_arrival_events and catch interface
renames. We can determine if the ifnet is a tun easily enough from the
if_dname, which matches the cevsw.d_name from the associated tuntap_driver.
Some locking dance is required because renames don't require the device to
be opened, so it could go away in the middle of handling the ioctl, but as
soon as we've verified this isn't the case we can attempt to busy the tun
and either bail out if the tun device is dying, or we can proceed with the
rename.
We only create these aliases on a best-effort basis. Renaming a tun device
to "usbctl", which doesn't exist as an ifnet but does as a /dev, is clearly
not that disastrous, but we can't and won't create a /dev for that.
r353781:
tuntap(4): Drop TUN_IASET
This flag appears to have been effectively unused since introduction to
if_tun(4) -- drop it now.
r353782:
tuntap(4): break out after setting TUN_DSTADDR
This is now the only flag we set in this loop, terminate early.
r353785:
tuntap(4): Use make_dev_s to avoid si_drv1 race
This allows us to avoid some dance in tunopen for dealing with the
possibility of dev->si_drv1 being NULL as it's set prior to the devfs node
being created in all cases.
There's still the possibility that the tun device hasn't been fully
initialized, since that's done after the devfs node was created. Alleviate
this by returning ENXIO if we're not to that point of tuncreate yet.
This work is what sparked r353128, full initialization of cloned devices
w/ specified make_dev_args.
r353786:
tuntap(4): use cdevpriv w/ dtor for last close instead of d_close
cdevpriv dtors will be called when the reference count on the associated
struct file drops to 0, while d_close can be unreliable for cleaning up
state at "last close" for a number of reasons. As far as tunclose/tundtor is
concerned the difference is minimal, so make the switch.
r353877:
tuntap(4): properly declare if_tun and if_tap modules
Simply adding MODULE_VERSION does not do the trick, because the modules
haven't been declared. This should actually fix modfind/kldstat, which
r351229 aimed and failed to do.
This should make vm-bhyve do the right thing again when using the ports
version, rather than the latest version not in ports.
Kyle Evans [Fri, 25 Oct 2019 00:47:37 +0000 (00:47 +0000)]
MFC r353872-r353873: lualoader color handling fixes
r353872:
lualoader: don't botch disabling of color
When colors are disabled, color.escape{fg,bg} would return the passed in
color rather than the proper ANSI sequence for the color.
color.escape{fg,bg} would be wrong.
Instead return '', as the associated reset* functions will also return ''.
This should get rid of the funky '2' and '4' in the kernel selector if
you're booting serial.
r353873:
lualoader: fix setting of loader_color=NO in loader.conf(5)
Previously color.disabled would be calculated at color module load time,
then never touched again. We can detect serial boots beyond just what we're
told by loader.conf(5) so this works out in many cases, but we must
re-evaluate the situation after the config is loaded to make sure we're not
supposed to be forcing it enabled/disabled.
John Baldwin [Fri, 25 Oct 2019 00:16:57 +0000 (00:16 +0000)]
MFC 350662:
Detect invalid PCI devices more correctly in PCI interrupt router drivers.
- Check for an invalid device (vendor is invalid) before reading the
header type register when examining function 0 of a possible device.
- When iterating over functions of a device, reject any device whose
16-bit vendor is invalid rather than requiring the full 32-bit
vendor+device to be all 1's. In practice the latter check is
probably fine, but checking the vendor is what the PCI spec
recommends.
Kyle Evans [Thu, 24 Oct 2019 21:43:01 +0000 (21:43 +0000)]
MFC r353788: picobsd: add deprecation notices
Notices appear both in picobsd(8) (near the top for easy notice) and are
also printed to stderr on every invocation of picobsd for visibility.
The tentative date for removal is October 31st, as no volunteers have
stepped forward at all from postings to -arch@ at least.
picobsd: add deprecation notices
Move pcpu KVA out of .bss into dynamically allocated VA at
pmap_bootstrap(). This avoids demoting superpage mapping .data/.bss.
Also it makes possible to use pmap_qenter() for installation of
domain-local pcpu page on NUMA configs.
Refactor pcpu and IST initialization by moving it to helper functions.
John Baldwin [Thu, 24 Oct 2019 21:02:24 +0000 (21:02 +0000)]
MFC 351434: Fix universe to include arm LINT kernel configs.
Strip comments from the NOTES.armv[57] files as is done for other
NOTES files when building the corresponding LINT configs. Without
this, the LINT configs contained the NO_UNIVERSE comment from the
NOTES.armv[57] files.
John Baldwin [Thu, 24 Oct 2019 20:48:30 +0000 (20:48 +0000)]
MFC 350549: Set ISOPEN in namei flags when opening executable interpreters.
These vnodes are explicitly opened via VOP_OPEN via
exec_check_permissions identical to the main exectuable image.
Setting ISOPEN allows filesystems to perform suitable checks in
VOP_LOOKUP (e.g. close-to-open consistency in the NFS client).
Doing some tests with very high interrupt rates I've noticed that one of
conditions I added in r232207 to make interrupt threads in most cases
run on local CPU never worked as expected (worked only if previous time
it was executed on some other CPU, that is quite opposite). It caused
additional CPU usage to run full CPU search and could schedule interrupt
threads to some other CPU.
This patch removes that code and instead reuses existing non-interrupt
code path with some tweaks for interrupt case:
- On SMT systems, if current thread is idle, don't look on other threads.
Even if they are busy, it may take more time to do fill search and bounce
the interrupt thread to other core then execute it locally, even sharing
CPU resources. It is other threads should migrate, not bound interrupts.
- Try hard to keep interrupt threads within LLC of their original CPU.
This improves scheduling cost and supposedly cache and memory locality.
On a test system with 72 threads doing 2.2M IOPS to NVMe this saves few
percents of CPU time while adding few percents to IOPS.
John Baldwin [Thu, 24 Oct 2019 19:07:52 +0000 (19:07 +0000)]
MFC 350178: Improve the precision of bhyve's vPIT.
Use 'struct bintime' instead of 'sbintime_t' to manage times in vPIT
to postpone rounding to final results rather than intermediate
results. In tests performed by Joyent, this reduced the error measured
by Linux guests by 59 ppm.
Marius Strobl [Thu, 24 Oct 2019 14:18:06 +0000 (14:18 +0000)]
MFC: r353778
- In em_intr(), just call em_handle_link() instead of duplicating it.
- In em_msix_link(), properly handle IGB-class devices after the iflib(4)
conversion again by only setting EM_MSIX_LINK for the EM-class 82574
and by re-arming link interrupts unconditionally, i. e. not only in
case of spurious interrupts. This fixes the interface link state change
detection for the IGB-class. [1]
- In em_if_update_admin_status(), only re-arm the link state change
interrupt for 82574 and also only if such a device uses MSI-X, i. e.
takes advantage of autoclearing. In case of INTx and MSI as well as
for LEM- and IGB-class devices, re-arming isn't appropriate here and
setting EM_MSIX_LINK isn't either.
While at it, consistently take advantage of the hw variable.
Kyle Evans [Thu, 24 Oct 2019 04:08:24 +0000 (04:08 +0000)]
MFC r349928: Allow efi loader to get network params from uboot
Summary:
efi loader does not work with static network parameters. It always uses
BOOTP/DHCP and also uses RARP as a fallback. Problems with DHCP servers can
cause the loader to fail to populate network parameters.
Kyle Evans [Thu, 24 Oct 2019 04:04:53 +0000 (04:04 +0000)]
MFC r349471, r351166: Tweak EFI_STAGING_SIZE
r349471:
Increase EFI_STAGING_SIZE to 100MB on x64
To avoid failures when the large 18MB nvidia.ko module is being loaded,
increase EFI_STAGING_SIZE from 64MB to 100MB on x64 systems.
Leave the other platforms at 64MB.
r351166:
Reduce size of EFI_STAGING_SIZE to 32 on arm
Reduce the size of the EFI_STAGING area we allocate on arm to 32. On arm SBC
such as the NanoPi-NEOLTS the staging area allocation will fail on the 256MB
model with a staging size of 64.
Add support for an HTTP "network filesystem" using the UEFI's HTTP
stack.
This also supports HTTPS, but TianoCore EDK2 implementations currently
crash while fetching loader files.
Only IPv4 is supported at the moment. IPv6 support is planned for a
follow-up changeset.
Note that we include some headers from the TianoCore EDK II project in
stand/efi/include/Protocol verbatim, including links to the license instead
of including the full text because that's their preferred way of
communicating it, despite not being normal FreeBSD project practice.
r349395:
Disconnect EFI HTTP support
The EFI HTTP code has been causing boot failures for people, so disable it
while a fix is being worked on.
r349404:
Re-enable loader efi http boot and fix dv_open bug if dv_init failed
The code in efihttp.c was assuming that dv_open wouldn't be called if
dv_init failed. But the dv_init return value is currently ignored.
Add a new variable, `efihttp_init_done` and only proceed in dv_open if
it's true. This fixes the loader on systems without efi http support.
r349564:
Clean efihttp pointer-sign warnings
The Http protocol structure is using unsigned char strings, Use type casts
where needed.
r349565:
efihttp: comparison of integers of different signs
message.HeaderCount is UINTN (unsigned int), so should be i.
r349566:
efihttp: mark unused arguments with __unused
we do have __unused, lets use it.
r349613:
efihttp: mac and err can be used uninitialized
While there, also check if mac != NULL, and use pointer compare for ipv4
and dns.
r350444:
Fix EFI loader build when LOADER_NET_SUPPORT=no.
Add map-vdisk and unmap-vdisk commands to create virtual disk interface on
top of file. This will allow to use disk image from file system to load and
start the kernel.
By mapping file, we create vdiskX device, the device will be listed by lsdev
[-v] and can be accessed directly as ls vdisk0p1:/path or can be used as
value for currdev variable.
vdisk strategy function does not use bcache as we have bcache used with
backing file. vdisk can be unmapped when all consumers have closed the open
files.
In first iteration we do not support the zfs images because zfs pools do
keep the device open (there is no "zpool export" mechanism). Adding zfs
support is relatively simple, we just need to run zfs disk probe after
mapping is done.
Kyle Evans [Thu, 24 Oct 2019 03:52:32 +0000 (03:52 +0000)]
MFC r345477, r346675, r346984, r348748
r345477:
Distinguish between "no partition" and "choose best partition" with a
constant.
The values of the d_slice and d_partition fields of a disk_devdesc have a
few values with special meanings in the disk_open() routine. Through various
evolutions of the loader code over time, a d_partition value of -1 has
meant both "use the first ufs partition found in the bsd label" and "don't
open a bsd partition at all, open the raw slice."
This defines a new special value of -2 to mean open the raw slice, and it
gives symbolic names to all the special values used in d_slice and
d_partition, and adjusts all existing uses of those fields to use the new
constants.
The phab review for this timed out without being accepted, but I'm still
citing it below because there is useful commentary there.
r346675:
Restore the ability to open a raw disk or partition in loader(8).
The disk_open() function searches for "the best partition" when slice and
partition information is not provided as part of the device name. As of
r345477 the slice and partition fields of a disk_devdesc are initialized to
D_SLICEWILD and D_PARTWILD; in the past they were initialized to -1, which
was sometimes interpreted as meaning 'wildcard' and sometimes as 'open the
raw partition' depending on the context. So as an unintended side effect of
r345477 it became basically impossible to ever open a disk or partition
without doing the 'best partition' search. One visible effect of that was
the inability to open the raw disk to read the partition table correctly in
zfs_probe_dev(), leading to failures to find the zfs pool unless it was on
the first partition.
Now instead of always initializing slice and partition to wildcards, the
disk_parsedev() function initializes them based on the presence of a
path/file name following the device. If there is any path or filename
following the ':' that ends the device name, then slice and partition are
initialized to D_SLICEWILD and D_PARTWILD. If there is nothing after the
':' then it is considered to be a request to open the raw device or
partition itself (not a file stored within it), and the fields are
initialized to D_SLICENONE and D_PARTNONE.
With this change in place, all the tests in src/tools/boot are succesful
again, including the recently-added cases of booting from a zfs pool on
a partition other than slice 1 of the device.
r346984:
Use D_PARTISGPT rather than bare 255
These three cases dovetail with other places in the code where we use
or set D_PARTISGPT when we mean that the partitioning scheme is
GPT. Use this #define to make the code easier to undertand.
r348748:
loader: disk_open() should honor D_PARTNONE
The D_PARTNONE is documented to make it possible to open raw MBR
partition, but the current disk_open() does not really implement this
statement.
The current code is checking partition against -1 (D_PARTNONE) but does
attempt to open partition table in case we do have FreeBSD MBR partition
type.
Instead, we should check -2 (D_PARTWILD).
In case we do have MBR + BSD label, this code is only working because
by default, the first BSD partiton is created starting with relative sector
0, and we can still access the BSD table from that MBR slice.
Kyle Evans [Thu, 24 Oct 2019 03:48:28 +0000 (03:48 +0000)]
MFC r344560, r344718
r344560:
stand: Remove unused i386 EFI MD bits
r328169 removed the copy of bootinfo that would've made this somewhat
functional. However, this is irrelevant- earlier work in r292338 was done to
exit boot services in the MI bi_load() rather than having N copies of the
GetMemoryMap/ExitBootServices dance.
i386 never quite caught up to that; ldr_enter was still being called but
the prereq for that, ldr_bootinfo, was no longer. As a consequence, this
ExitBootServices() was being called with a mapkey=0, clearly bogus, and
reportedly breaking the boot in some instances.
r344718:
EFI: don't call printf after ExitBootServices, since it uses Boot Services
ExitBootServices terminates all boot services including console access.
Attempting to call printf afterwards can result in a crash, depending on the
implementation.
Move any printf statements to before we call bi_load, and remove any that
depend on calling bi_load first.
Kyle Evans [Thu, 24 Oct 2019 03:40:20 +0000 (03:40 +0000)]
MFC (proactively; not required yet) r339673: Fix stand/ build after r339671.
ffs_subr.c requires calculate_crc32c() from libkern. Unfortunately we
cannot just add libkern/crc32.c to libstand because crc32.o is already
compiled from contrib/zlib/crc32.c. Use the include trick to rename
the source.
Note that libstand also provides crc32.c which seems to be unused.
Kyle Evans [Thu, 24 Oct 2019 03:37:17 +0000 (03:37 +0000)]
MFC r340834: Disable build-id in i386 binary boot components
A user may enable build-id for all builds by adding
LDFLAGS=-Wl,--build-id=sha1 to /etc/make.conf. In this case the build-id
note ends added up to mbr and pmbr's .text, which makes it too large (it
ends up being 532 bytes). To avoid this explicitly turn off build-id for
these components.
Kyle Evans [Thu, 24 Oct 2019 03:20:27 +0000 (03:20 +0000)]
MFC r349201: efinet: Defer exclusively opening the network handles
Don't commit to exclusive access to the network device handle by
efinet until the loader has decided to load something through the
network. This allows for the possibility of other users of the
network device.
Kyle Evans [Thu, 24 Oct 2019 03:19:45 +0000 (03:19 +0000)]
MFC r349217: Tell loader to ignore newer features enabled on the root pool.
There are many new features in ZoF. Most, if not all, do not effect read
only usage.
Encryption in particular is enabled at the pool level but used at the
dataset level.
The loader obviously will not be able to boot if the boot dataset is
encrypted, but
should not care if some other dataset in the root pool is encrypted.
This is like efi_devpath_match, but allows differing device media
paths. Those just specify the partition information.
r348659:
Use newly minted efi_devpath_same_disk() instead of
efi_devpath_match(). This fixes a regression in r347193.
r348674:
Don't shadow a global zfsmount variable.
r348675:
ufs_module.c can't currently be compiled with -Wcast-align, but the
code is safe enough. Turn off the warning for now until I can find the
right construct to silence it in the code.
r348678:
Eliminate unused uuid parameters from gptread and gptread_table. We
only need it for the gptfind() function, where it's used.
r348760:
Use simple malloc/free instead of dropping down to the UEFI
BootServices AllocatePool/FreePool calls. They are simpler to use and
result in the same thing happening.
r348766:
Remove left-over status variables
r348768:
Rework the reporting of the priority.
Simplify the code a bit and rework how we report the results
of the probing.
r348811:
Break out the disk selection protocol from the rest of boot1.
Segregate the disk probing and selection protocol from the rest of the
boot loader.
r348812:
Create gptboot.efi
This is a primary boot loader that is intended to implement the
gptboot partition selection algorithm just like we did for BIOS
booting. While the preferred method for UEFI is to use the UEFI Boot
Manager protocol, there are situations where that can't be done: some
BIOS makers interfere with the protocol in unhelpful ways, there's a
new standard for a zero variable write from the client OS, and finally
for USB drives that might be mobile between systems with multiple
partitions there needs to be a media stable way to select.
r348814:
Add stuff to disable warning for %S
Add the customary warnings to disable format checking on armv7. Code
move to new files, and the unconditional setting of WARNS to 6
provoked it on tinerbox...
r348194:
loader: Add pnp functions for autoloading modules based on linker.hints
This adds some new commands to loader :
- pnpmatch
This takes a pnpinfo string as argument and tries to find a kernel module
associated with it. -v and -d option are available and are the same as in
devmatch (v is verbose, d dumps the hints).
- pnpload
This takes a pnpinfo string as argument and tries to load a kernel module
associated with it.
- pnpautoload
This will attempt to load every kernel module for each buses. Each buses are
probed, the probe function will generate pnpinfo string and load kernel module
associated with it if it exists.
Only simplebus for FDT system is implemented for now.
Since we need the dtb and overlays to be applied before searching the tree
fdt_devmatch_next will load and apply the dtb + overlays.
All the pnp parsing code comes from devmatch and is the same at 99%.
r348196:
loader: Remove unused variable
r348204:
Remove yet another unused variable.
r348207:
Initialize a variable to fix build with GCC.
Some of these files using <FOO>_DEBUG defined a DEBUG() macro to serve as a
debug-printf. -DDEBUG is useful to enable some debugging output across
multiple ELF/common parts, so switch the DEBUG-as-printf macros over to
something more like DPRINTF that is more commonly used for this kind of
thing and less likely to conflict.
userboot/elf64_freebsd debugging also assumed %llx for uint64; use PRIx64
instead.
r347219:
loader: use safer DPRINTF body for non-debug case
r347220:
loader: bcache code does not need to check argument for free()
The current console code is printing out 8 spaces for tab, calculate
the amount of spaces based on tab stops.
r347389:
loader: ptable_print() needs two tabs sometimes
Since the partition/slice names do vary in length, check the length
of the fixed part of the line against 3 * 8, if the lenth is less than
3 tab stops, print out extra tab.
use snprintf() instead of sprintf.
r347391:
loader: no-TERM_EMU is broken now
If TERM_EMU is not defined, we do not have curx variable. Use conout mode
for efi and expose get_pos() for i386.
r347393:
loader: use DPRINTF in biosdisk.c and define safe DPRINTF
r345066 did miss biosdisk.c.
Also define DPRINTF as ((void)0) for case we do not want debug printouts.
r347553:
loader: fix memory handling errors in module.c
file_loadraw():
check for file_alloc() and strdup() results.
we leak 'name'.
mod_load() does leak 'filename'.
mod_loadkld() does not need to check fp, file_discard() does check.
r348040:
stand: TARGET_ARCH is spelled MACHINE_ARCH in Makefiles
Add a wrapper around efi_delenv akin to efi_freebsd_getenv and
efi_getenv.
r346703:
Move initialization of the block device handles earlier (we're just
snagging them from UEFI BIOS). Call the device type init routines
earlier as well, as they don't depend on how the console is
setup. This will allow us to read files earlier in boot, so any rare
error messages that this might move only to the EFI console will be an
acceptable price to pay. Also tweak the order of has_kbd so it resides
next to the rest of the console code. It needs to be after we initialize
the buffer cache.
r346704:
Add the proper range of years for Netflix's copyright on this
file. Note that I wrote it.
r346879:
Read in and parse /efi/freebsd/loader.env from the boot device's
partition as if it were on the command line.
Fetch FreeBSD-LoaderEnv UEFI enviornment variable. If set, read in
loader environment variables from it. Otherwise read in
/efi/freebsd/loader.env. Both are read relative to the device
loader.efi loaded from (they aren't full UEFI device paths)
Next fetch FreeBSD-NextLoaderEnv UEFI environment variable. If
present, read the file it points to in as above and delete the UEFI
environment variable so it only happens once.
This lets one set environment variables in the bootloader.
Unfortunately, we don't have all the mechanisms in place to parse the
file, nor do we have the magic pattern matching in place that
loader.conf has. Variables are of the form foo=bar. No quotes are
supported, so spaces aren't allowed, for example. Also, variables like
foo_load=yes are intercepted when we parse the loader.conf file and
things are done based on that. Since those aren't done here, variables
that cause an action to happen won't work.
r346880:
Implement uefi_rootdev
If uefi_rootdev is set in the environment, then treat it like a device
path. Convert the string to a device path and see if we can find a
device that matches. If so, use that device at our root dev no matter
what. If it's bad in any way, the boot will fail.
When set, we ignore all the hints that the UEFI boot manager has set
for us. We also always fail back to the OK prompt when we can't find
the right thing to boot rather than failing back to the UEFI boot
manager. This has the side effect of also expanding the cases where we
fail back to the OK prompt to include when we're booted under UEFI,
but UEFI::BootCurrent isn't set in the environment and we can't find a
proper place to boot from.
r347023:
stand: correct mis-merge from r346879
Small mis-merge from multiple WIP resulted in block io media handles getting
double-initialized. This resulted in some installations oddly landing at the
mountroot prompt.
r347059:
Remove stray '*'
We're storing an EFI_HANDLE, not an pointer to a handle. Since
EFI_HANDLE is a void * anyway, this has little practical effect since
the conversion to / from void * and void ** is silent.
r347060:
When we can't get memory, trying again right away is going to
fail. Rather than print N failure messages, bail on the first one.
r347061:
Substitute boot1 with ${BOOT1}
Allow for other names to be built, so parameterize this makefile to
avoid hard coding boot1.
r347062:
Use SRC+= rather than SRC=
To allow boot1/Makefile to be included, use SRC+= rathern than SRC=
so the including Makefile can add additional sources to the build.
r347193:
Reach over and pull in devpath.c from libefi
This allows us to remove three nearly identical functions because the
differences don't matter, and the size difference is trivial.
r347194:
We only ever need one devinfo per handle. So allocate it outside of
looping over the filesystem modules rather than doing a malloc + free
each time through the loop. In addition, nothing changes from loop to
loop, so setup the new devinfo outside the loop as well.
r347201:
Simplify boot1 allocation of handles.
There's no need to pre-malloc the number of handles. Instead call
LocateHandles twice, once to get the size, and once to get the
data.
Kyle Evans [Thu, 24 Oct 2019 02:46:36 +0000 (02:46 +0000)]
MFC r346701: loader: fdt: Add fdt_is_setup function
When efi_autoload is called it will call fdt_setup_fdtp which setup the
dtb and overlays. If a user already loaded at dtb or overlays or just
printed the efi provided dtb, this will re-setup everything and also
re-applying the overlays.
Test that everything is setup before doing it again.
Add an interface to remove / delete UEFI variables.
r346353:
Minor tweak to the debug
Make it clear we're loading from UFS.
r346407:
Add define for CONST.
Newer interfaces take CONST parameters, so define CONST to minimize
differences between our headers and the standards docs.
r346408:
Add UEFI definitions related to converting string to DEVICE_PATH
Add definitions from UEFI 2.7 Errata B standards doc for converting a
text string to a device path. Added clearly missing 'e' at the end of
Device to resolve mismatch in that document in
EFI_DEVICE_PATH_FROM_TEXT_PROTOCOL element names.
r346409:
Add wrapper functions to convert strings to EFI_DEVICE_PATH
In anticipation of new functionality, create routines to convert char *
and a CHAR16 * to a EFI_DEVICE_PATH
EFI_DEVICE_PATH *efi_name_to_devpath(const char *path);
EFI_DEVICE_PATH *efi_name_to_devpath16(CHAR16 *path);
void efi_devpath_free(EFI_DEVICE_PATH *dp);
The first two return an EFI_DEVICE_PATH for the passed in paths. The
third frees up the storage the first two return when the caller is
done with it.
r346430:
Start to reduce the number of #ifdef EFI_ZFS_BOOT
There's a number of EFI_ZFS_BOOT #ifdefs that aren't needed, or can be
eliminated with some trivial #defines. Remove the EFI_ZFS_BOOT ifdefs
that aren't needed. Replace libzfs.h include which is not safe to
include without EFI_ZFS_BOOT with efizfs.h which is and now
conditionally included libzfs.h. Define efizfs_set_preferred away
and define efi_zfs_probe to NULL when ZFS is compiled out.
r346573:
Move setting of console earlier in boot.
There's no reason we can't setup the console first thing after the
arch flags are setup. We set it undconditionally to efi. This is a
good default, and will get us error messages to at least the efi
console no matter what. This will also prime the pump so that as other
variables are set, they will take effect and the console will be
correct as soon as those env vars are set. Also remove the redundant
setting of the console to efi when we know the console is efi.
r346575:
Create boot_img as a global variable
Get the information from the image that we're booting and store it in
a global variable. Prefer using this to passing it around. Remove the
special case for zfs that set the preferred boot handle by having it
uses this global variable diretly.
Kyle Evans [Thu, 24 Oct 2019 02:36:42 +0000 (02:36 +0000)]
MFC r345998-r346002, r346007-r346008: various loader improvements
r345998:
loader: malloc+bzero is calloc
Replace malloc+bzero in module.c with calloc.
r345999:
loader: file_addmodule should check for memory allocation
strdup() can return NULL.
r346000:
loader: remove pointer checks before free() in module.c
free() does check for NULL argument, remove duplicate checks.
r346001:
loader: file_addmetadata() should check for memory allocation
malloc() can return NULL.
r346002:
loader: mod_loadkld() error: we previously assumed 'last_file' could be null
The last_file variable is used to reset the loadaddr variable back to
original
value; however, it is possible the last_file is NULL, so we can not blindly
trust it. But then again, we can just save the original loadaddr and use
the saved value for recovery.
r346007:
loader: add file_remove() function to undo file_insert_tail().
346002 did miss the fact that we do not only undo the loadaddr, but also
we need to remove the inserted module. Implement file_remove() to do the
job.
r346008:
loader: command_lsefi: ret can be used uninitialized
Kyle Evans [Thu, 24 Oct 2019 02:34:48 +0000 (02:34 +0000)]
MFC r345330: loader: fix loading of kernels with . in path
The loader indended to search the kernel file name (only) for . but
instead searched the entire path, so paths like
"boot/test.elfv2/kernel" would not work.
Kyle Evans [Thu, 24 Oct 2019 02:27:16 +0000 (02:27 +0000)]
MFC r342054-r342055, r342742: loader diagnostics
r342054:
Print an error message in efi_main.c if we can't allocate memory for the
heap
With the default Qemu parameters, only 128MB RAM gets given to a VM. This
causes
the loader to be unable to allocate the 64MB it needs for the heap. This
change
makes the cause of the error more obvious.
r342055:
Cast error message in efi_main.c to CHAR16* to avoid build error
r342742:
loader.efi: efi variable rework and lsefi command added
This update does add diag and debug capabilities to interpret the efi
variables, configuration and protocols (lsefi).
The side effect is that we add/update bunch of related headers.
Kyle Evans [Thu, 24 Oct 2019 02:25:30 +0000 (02:25 +0000)]
MFC r341433: Move inclusion of src.opts.mk later.
src.opts.mk includes bsd.own.mk. This in turn defines CTFCONVERT_CMD
depending on the MK_CTF value. We then set MK_CTF to no, which has no
real effect. The solution is to set all the MK_foo values before
including src.opts.mk.
This should stop the cdboot binary from exploding in size for releases built
WITH_CTF=yes in src.conf.
Dimitry Andric [Mon, 21 Oct 2019 17:45:00 +0000 (17:45 +0000)]
MFC r339524 (by imp):
Add missing options.
WITHOUT_LOADER_LUA is only needed since we turned it off by default on
powerpc and sparc64 in r338203. Same with
WITHOUT_LOADER_GEIL. WITH_NVME, WITHOUT_NVME, WITH_LOADER_FORCE_LE
have been needed since they were added.
MFC r353737:
Provide a src.conf(5) description for the new WITHOUT_CAROOT option, and
rename the WITH_LOADER_VERIEXEC_PASS_MANFIEST description to its correct
name. Also correct a bunch of spelling errors in that description.
Kyle Evans [Mon, 21 Oct 2019 01:24:21 +0000 (01:24 +0000)]
MFC r352711-r352712: Address posix_spawn(3) signal issues
r352711:
rfork(2): add RFSPAWN flag
When RFSPAWN is passed, rfork exhibits vfork(2) semantics but also resets
signal handlers in the child during creation to avoid a point of corruption
of parent state from the child.
This flag will be used by posix_spawn(3) to handle potential signal issues.
r352712:
posix_spawn(3): handle potential signal issues with vfork
Described in [1], signal handlers running in a vfork child have
opportunities to corrupt the parent's state. Address this by adding a new
rfork(2) flag, RFSPAWN, that has vfork(2) semantics but also resets signal
handlers in the child during creation.
x86 uses rfork_thread(3) instead of a direct rfork(2) because rfork with
RFMEM/RFSPAWN cannot work when the return address is stored on the stack --
further information about this problem is described under RFMEM in the
rfork(2) man page.
Addressing this has been identified as a prerequisite to using posix_spawn
in subprocess on FreeBSD [2].
Kyle Evans [Mon, 21 Oct 2019 00:08:34 +0000 (00:08 +0000)]
MFC r351650, r351795-r351796: writemapping accounting for posixshm
r351650:
posixshm: switch to OBJT_SWAP in advance of other changes
Future changes to posixshm will start tracking writeable mappings in order
to support file sealing. Tracking writeable mappings for an OBJT_DEFAULT
object is complicated as it may be swapped out and converted to an
OBJT_SWAP. One may generically add this tracking for vm_object, but this is
difficult to do without increasing memory footprint of vm_object and blowing
up memory usage by a significant amount.
On the other hand, the swap pager can be expanded to track writeable
mappings without increasing vm_object size. This change is currently in
D21456. Switch over to OBJT_SWAP in advance of the other changes to the
swap pager and posixshm.
r351795:
vm pager: writemapping accounting for OBJT_SWAP
Currently writemapping accounting is only done for vnode_pager which does
some accounting on the underlying vnode.
Extend this to allow accounting to be possible for any of the pager types.
New pageops are added to update/release writecount that need to be
implemented for any pager wishing to do said accounting, and we implement
these methods now for both vnode_pager (unchanged) and swap_pager.
The primary motivation for this is to allow other systems with OBJT_SWAP
objects to check if their objects have any write mappings and reject
operations with EBUSY if so. posixshm will be the first to do so in order to
reject adding write seals to the shmfd if any writable mappings exist.
r351650 switched posixshm to using OBJT_SWAP for shm_object
r351795 added support to the swap_pager for tracking writeable mappings
Take advantage of this and start tracking writeable mappings; fd sealing
will use this to reject a seal on writing with EBUSY if any such mapping
exist.
r353644:
libbe(3): add needed bits for be_destroy to auto-destroy some origins
New BEs can be created from either an existing snapshot or an existing BE.
If an existing BE is chosen (either implicitly via 'bectl create' or
explicitly via 'bectl create -e foo bar', for instance), then bectl will
create a snapshot of the current BE or "foo" with be_snapshot, with a name
formatted like: strftime("%F-%T") and a serial added to it.
This commit adds the needed bits for libbe or consumers to determine if a
snapshot names matches one of these auto-created snapshots (with some light
validation of the date/time/serial), and also a be_destroy flag to specify
that the origin should be automatically destroyed if possible.
A future commit to bectl will specify BE_DESTROY_AUTOORIGIN by default so we
clean up the origin in the most common case, non-user-managed snapshots.
r353646:
bectl(8): destroy: use BE_DESTROY_AUTOORIGIN if -o is not specified
-o will force the origin to be destroyed unconditionally.
BE_DESTROY_AUTOORIGIN, on the other hand, will only destroy the origin if it
matches the format used by be_snapshot. This lets us clean up the snapshots
that are clearly not user-managed (because we're creating them) while
leaving user-created snapshots in place and warning that they're still
around when the BE created goes away.
r353663:
libbe(3): Fix destroy of imported BE w/ AUTOORIGIN
Imported BE, much like the activated BE, will not have an origin that we can
fetch/examine for destruction. be_destroy should not return BE_ERR_NOORIGIN
for failure to get the origin property for BE_DESTROY_AUTOORIGIN, because
we don't really know going into it that there's even an origin to be
destroyed.
BE_DESTROY_NEEDORIGIN has been renamed to BE_DESTROY_WANTORIGIN because only
a subset of it *needs* the origin, so 'need' is too strong of verbiage.
This was caught by jenkins and the bectl tests, but kevans failed to run the
bectl tests prior to commit.
r353128:
kern_conf: fully initialize cloned devices with make_dev_args, too
Attempting to initialize si_drv{1,2} with mda_si_drv{1,2} does not work if
you are operating on cloned devices.
clone_create must be called prior to the make_dev* family to create/return
the device on the clonelist as needed. This device is later returned early
in newdev(), prior to si_drv{0,1,2} initialization.
This patch simply breaks out of the loop if we've found a device and
finishes init.
r353129:
Remove the remnants of SI_CHEAPCLONE
SI_CHEAPCLONE was introduced in r66067 for use with cloned bpfs. It was
later also used in tty, tun, tap at points. The rough timeline for being
removed in each of these is as follows:
- r181690: bpf switched to use cdevpriv API by ed@
- r181905: ed@ rewrote the TTY later to be mpsafe
- r204464: kib@ removes it from tun/tap, declaring it unused
I've not yet been able to dig up any other consumers in the intervening 9
years. It is no longer set on any devices in the tree and leaves an
interesting situation in make_dev_sv where we're ok with the device already
being set SI_NAMED.
Dimitry Andric [Sat, 19 Oct 2019 15:58:20 +0000 (15:58 +0000)]
MFC r353655:
Ensure lld respects the WITH/WITHOUT_SHARED_TOOLCHAIN option
Traditionally, toolchain components such as cc, as, and ld have been
built as static executables. The WITH_SHARED_TOOLCHAIN option from
src.conf(5) is meant to link these as regular executables, e.g. using
shared libraries.
The build of ld.lld did not yet check this option. Fix the Makefile so
it will do so now.
Reported by: Mike Cui <cuicui@gmail.com>
PR: 241257