delphij [Sat, 2 Aug 2014 03:56:06 +0000 (03:56 +0000)]
MFC r268855: MFV r268848:
Instead of asserting all zio's be properly aligned, only assert
on the logical ones.
Cap uberblocks at 8k, otherwise with ashift=17, there would be
only one uberblock.
This fixes a problem that zdb would trip assert on pools with
ashift >= 0xe (8k).
While there, also change the code so it only attempt to condense
space map unless the uncondensed size consumes greater than
zfs_metaslab_condense_block_threshold blocks.
Illumos issue:
4958 zdb trips assert on pools with ashift >= 0xe
rmacklem [Fri, 1 Aug 2014 21:10:41 +0000 (21:10 +0000)]
MFC: r268115
Merge the NFSv4.1 server code in projects/nfsv4.1-server over
into head. The code is not believed to have any effect
on the semantics of non-NFSv4.1 server behaviour.
It is a rather large merge, but I am hoping that there will
not be any regressions for the NFS server.
jhb [Fri, 1 Aug 2014 21:00:18 +0000 (21:00 +0000)]
MFC 256657,257423,264837,267559:
Sync vmrun.sh with HEAD:
- Add -e option to vmrun.sh passed to bhyveload(8) to set loader
environment variables.
- Stop passing unused -I option to bhyve(8).
- Reformat the usage to fit in 80 colums and other cleanups.
- Add -C option to specify the console device.
- Add -H option to pass a host path to bhyveload(8).
- Support for multiple disk and tap devices.
truckman [Fri, 1 Aug 2014 15:04:46 +0000 (15:04 +0000)]
MFC r268780
Nuke the never-used RF_TIMESHARE feature, reducing the complexity of the
code. The consensus on arch@ is that this feature might have been useful
in the distant past, but is now just unnecessary bloat.
The int_rman_activate_resource() and int_rman_deactivate_resource()
functions become trivial, so manually inline them.
The special deferred handling of RF_ACTIVE is no longer needed in
reserve_resource_bound(), so eliminate the associated code at the
end of the function.
These changes reduce the object file size by more than 500 bytes on i386.
Update the rman.9 man page to reflect the removal of the RF_TIMESHARE
feature.
Add a 'raw' parameter to the 'modinfo' subcommand. This is handy when
trying to figure out why a QSFP+/SFP+ connector or cable wasn't
identified correctly by cxgbe(4). Its output looks like this:
r268971:
Simplify r267600, there's no need to distinguish between allocated and
inlined mbufs.
r269032:
cxgbe(4): Keep track of the clusters that have to be freed by the
custom free routine (rxb_free) in the driver. Fail MOD_UNLOAD with
EBUSY if any such cluster has been handed up to the kernel but hasn't
been freed yet. This prevents a panic later when the cluster finally
needs to be freed but rxb_free is gone from the kernel.
MFC r264434:
DTrace's pid provider works by inserting breakpoint instructions at probe
sites and installing a hook at the kernel's trap handler. The fasttrap code
will emulate the overwritten instruction in some common cases, but otherwise
copies it out into some scratch space in the traced process' address space
and ensures that it's executed after returning from the trap.
In Solaris and illumos, this (per-thread) scratch space comes from some
reserved space in TLS, accessible via the fs segment register. This
approach is somewhat unappealing on FreeBSD since it would require some
modifications to rtld and jemalloc (for static TLS) to ensure that TLS is
executable, and would thus introduce dependencies on their implementation
details. I think it would also be impossible to safely trace static binaries
compiled without these modifications.
This change implements the functionality in a different way, by having
fasttrap map pages into the target process' address space on demand. Each
page is divided into 64-byte chunks for use by individual threads, and
fasttrap's process descriptor struct has been extended to keep track of
any scratch space allocated for the corresponding process.
With this change it's possible to trace all libc functions in a program,
e.g. with
MFC r268808:
Increase maximal number of SCSI ports in CTL from 32 to 128.
After I gave each iSCSI target its own port, the old limit appeared to be
not so big. This change almost proportionally increases per-LUN memory
use, but it is still three times better then it was before r268807.
MFC r268807:
Reduce per-LUN memory usage from 18MB to 1.8MB.
CTL never had use for CA support code since SPI has gone, and there is no
even frontends supporting that. But it still was reserving 256 bytes of
memory per LUN per every possible initiator on every possible port.
Wrap unused code with ifdef's in case somebody ever need it.
MFC r268767:
Add support for VMWare dialect of EXTENDED COPY command, aka VAAI Clone.
This allows to clone VMs and move them between LUNs inside one storage
host without generating extra network traffic to the initiator and back,
and without being limited by network bandwidth.
LUNs participating in copy operation should have UNIQUE NAA or EUI IDs set.
For LUNs without these IDs VMWare will use traditional copy operations.
Beware: the above LUN IDs explicitly set to values non-unique from the VM
cluster point of view may cause data corruption if wrong LUN is addressed!
MFC: r268726
Move the "retry:" label so that the calls to m_pullup() are
not done after the call to m_defrag(). This fixes a problem
where m_pullup() would prepend an mbuf to the list created
by m_defrag() making the chain greater than 32 again.
MFC r263329:
Only invoke fasttrap hooks for traps from user mode, and ensure that they're
called with interrupts enabled. Calling fasttrap_pid_probe() with interrupts
disabled can lead to deadlock if fasttrap writes to the process' address
space.
MFC r262669:
When our linker merges .SUNW_dof sections from multiple files, it simply
concatenates the DOF tables into one section. Previously, the USDT init
code in drti.o would only look at the first table in the DOF section; with
this change, it iterates over all the tables, passing each DOF table to
the kernel.
marius [Tue, 29 Jul 2014 13:11:37 +0000 (13:11 +0000)]
MFC: r269051
Copying pages via temporary mappings in the !DMAP case of pmap_copy_pages()
involves updating the corresponding page tables followed by accesses to the
pages in question. This sequence is subject to the situation exactly described
in the "AMD64 Architecture Programmer's Manual Volume 2: System Programming"
rev. 3.23, "7.3.1 Special Coherency Considerations" [1, p. 171 f.]. Therefore,
issuing the INVLPG right after modifying the PTE bits is crucial (see also
r269050, MFCed to stable/10 in r269235).
For the amd64 PMAP code, the order of instructions was already correct. The
above fact still is worth documenting, though.
marius [Tue, 29 Jul 2014 13:08:37 +0000 (13:08 +0000)]
MFC: r269050
- Copying and zeroing pages via temporary mappings involves updating the
corresponding page tables followed by accesses to the pages in question.
This sequence is subject to the situation exactly described in the "AMD64
Architecture Programmer's Manual Volume 2: System Programming" rev. 3.23,
"7.3.1 Special Coherency Considerations" [1, p. 171 f.]. Therefore, issuing
the INVLPG right after modifying the PTE bits is crucial.
For pmap_copy_page(), this has been broken in r124956 and later on carried
over to pmap_copy_pages() derived from the former, while all other places
in the i386 PMAP code use the correct order of instructions in this regard.
Fixing the latter breakage solves the problem of data corruption seen with
unmapped I/O enabled when running at least bare metal on AMD R-268D APUs.
However, this might also fix similar corruption reported for virtualized
environments.
- In pmap_copy_pages(), correctly set the cache bits on the source page being
copied. This change is thought to be a NOP for the real world, though. [2]
When doing an "extreme rewind" import ("zpool import -XF"), we attempt
to verify all data in the pool, essentially scrubbing the entire pool.
The problem is that spa_load_verify_cb() issues an unbounded number of
concurrent scrub i/os. This can lead to all of memory being used for
these zio's, wedging the system. Like normal scrub, we need to put a
cap on the number of outstanding i/os, and have the traverse thread
block when we reach this cap.
For this purpose the cap can be very large (10,000) to optimize the
elevator algorithm. Three kernel tunables have been added:
The latter two tunables controls whether metadata and/or user data
when doing extreme rewind.
Make 'zpool import -T' imply scrub.
Make zpool import -T <txg> accept hexadecimal values for the txg when
prefixed with 0x.
Skip txg's for which there is no uberblock when doing extreme rewind.
Skip reading all user data twice by skipping prefetches when doing
extreme rewinds as we do not access via the ARC.
Illumos issues:
4970 need controls on i/o issued by zpool import -XF
4971 zpool import -T should accept hex values
4972 zpool import -T implies extreme rewind, and thus a scrub
4973 spa_load_retry retries the same txg
4974 spa_load_verify() reads all data twice
MFC r268236,268264,268524,268646,268802,269021:
This brings VHD support to mkimg(1); both dynamic and fixed file formats.
Dynamic VHD and VMDK file images are now sparsely written, meaning that
"free" sectors do not occupy space.
MFC r268608:
The tmpfs_link() must not dereference the filesystem-specific data for
a vnode until it is verified that the vnode indeed belongs to tmpfs mount.
MFC r268606:
Generalize vn_get_ino() to allow filesystems to use custom vnode
producer. Convert inline copies of vn_get_ino() in msdosfs and cd9660
into the uses of vn_get_ino_gen().
ian [Fri, 25 Jul 2014 23:29:55 +0000 (23:29 +0000)]
MFC r266565, r266651:
Map device memory using PTE_DEVICE attributes, and also ensure that the
shared flag is set on normal-memory mappings made via pmap_kenter() for SMP.
The "shared flag" part of this change isn't obvious from the diff, here's
the deal... by using the array of preformatted page table entry templates
instead of constructing the PTE from scratch, we automatically get the
right attribute bits set for both caching and shared.
ian [Fri, 25 Jul 2014 23:21:36 +0000 (23:21 +0000)]
MFC r263373, r268402
Add a way to apply CFLAGS only when building the given architecture. This
is useful primarily on a system used for cross-building, when you have a
set of flags to apply to the TARGET_ARCH being cross-built but don't want
those settings applied to building the cross-tools or other components that
run on the build host machine.
Support CXXFLAGS.${MACHINE_ARCH} as well as CFLAGS. This allows different
C++ options for toolchain versus target when cross-building.
ian [Fri, 25 Jul 2014 23:12:22 +0000 (23:12 +0000)]
MFC r261530
Set the malloc alignment to 64 bytes on platforms that use the U-Boot API
device drivers. Recent versions of u-boot run with the MMU enabled, and
require DMA-based I/O to be aligned to cache line boundaries.
r268640:
Allow multi-byte reads in the private CHELSIO_T4_GET_I2C ioctl. The
firmware allows up to 48B to be read this way but the driver limits
itself to 8B at a time to remain compatible with old cxgbetool
binaries.
dim [Thu, 24 Jul 2014 20:16:45 +0000 (20:16 +0000)]
MFC r268957:
Run mtree for BSD.tests.dist during make xdev-install, if the tests are
enabled (which they are in the default configuration). Otherwise, it
will fail because ${XDDESTDIR}/usr/include/atf-c does not exist.
MFC r268466:
Calculate the amount of resident pages by looking in the objects chain
backing the region. Add a knob to disable the residency calculation at
all.
MFC r268490:
Unconditionally initialize addr to handle the case of changed map
timestamp while the map is unlocked.
MFC r268711:
Change the calculation of the kinfo_vmentry field kve_private_resident
to reflect its name.
MFC r268712:
Followup to r268466.
- Move the code to calculate resident count into separate function.
It reduces the indent level and makes the operation of
vmmap_skip_res_cnt tunable more clear.
- Optimize the calculation of the resident page count for map entry.
Skip directly to the next lowest available index and page among the
whole shadow chain.
- Restore the use of pmap_incore(9), only to verify that current
mapping is indeed superpage.
- Note the issue with the invalid pages.
MFC r268420:
Remove IO_SYNC flag when writing extended file attributes on ZFS.
While it is possible to create and write file, modify its permissions, etc.
without ever doing sync, it looks odd that it is required for setting
extended file attributes on ZFS. UFS does not do sync there too.
Samba uses those extended attributes to store some its data, and doing it
synchronously by many times reduces file creation performance for systems
without SLOG device.
Now defaults to a 16x8 font size. The height and width can be specified
using -h and -w respectively.
r267012: Make the bold font optional
r267035: Use a hash to speed up glyph deduplication
Walking a linked list of all glyphs to look for a duplicate is very slow
for large fonts (e.g., for CJK character sets). In my test the runtime
for a sample 40000 character font went from just over 80 seconds on
average to just over 2 seconds.
r267119: -w sets the width, not height
r267123: Support "GNU Unifont" format font data
The GNU Unifont .hex format is a text file. Each line represents one
glyph and consists of a four-digit hex code point, a colon, and pairs of
hex digits representing the bitmap. By default an 8x16 font is assumed,
with 16x16 double-width glyphs, resulting in either 32 or 64 hex digits
for the bitmap.
Our version of the file format supports comments at the top of the file
to set the height and width:
Each row of bitmap data is rounded up to byte width - for example, a
10-pixel wide font uses 4 characters per row.
See http://czyborra.com/unifont/ for more background on the original
format.
r267126: Accept space after BITMAP in .bdf parser
The Unifont BDF generator incorrectly adds a space after BITMAP, and
and that error has been widely propagated.
r267173: use -h height and -w width args
r267298: Hide stats by default and improve error handling
The font stats are interesting, but rather verbose.
r267301: Speed up bold glyph map deduplication
Perform an O(n) deduplication pass over the bold maps at the end, rather
than walking the normal map list to look for a duplicate glyph each time
a bold mapping entry is added.
r267324: handle failure writing output font
r267337: move to usr.bin/vtfontcvt
vtfontcvt is useful for end users to convert arbitrary bitmap fonts
for use by vt(4). It can also be used as a build tool, allowing us
to keep the source font data in the src tree rather than uuencoded
binaries.
r268022: Rename the WITHOUT_VT_SUPPORT knob to WITHOUT_VT
The _SUPPORT knobs have a consistent meaning which differs from the
behaviour controlled by this knob. As the knob is opt-out and has not
appeared in a release the impact should be low.
r268172: correct width calculation (.hex files and commandline)
r268948: Use the standard way of printing the usage string
r268949: Remove redundant return statement after errx
Also update vtfontcvt(8), based on inclusion in FreeBSD 10.1
MFC 267883:
Expand r261243 even further and ignore any I/O port resources assigned to
PCI root bridges except for the one known-valid case on x86 where bridges
claim the I/O port registers used for PCI config space access.
MFC r263678: lldb: Invoke PT_KILL from ProcessPosix::DoDestroy
We previously sent SIGKILL to the debuggee in DoDestroy, but did not
actually detach or kill via ptrace. It seems that this somehow didn't
matter on Linux, but did on FreeBSD.
This would happen when quitting LLDB while stopped at a breakpoint, for
example. The debuggee remained stopped in ptrace (with the signal
either pending or lost). After a timeout of a second or two LLDB exits,
which caused the debuggee to resume and dump core from an unhandled
SIGTRAP.
BringProcessIntoLimbo is a poorly named wrapper for ptrace(PT_KILL)
which is the desired behaviour from DoDestroy.
Omit "too many sections" warnings if the ELF file is not dynamically
linked (and is therefore skipped anyway), and otherwise output it only
once. An errant core file would previously cause kldxref to output a
number of warnings.
Also introduce a MAXSEGS #define and replace literal 2 with it, to make
comparisons clear.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
MFC r265477: Merge -fstandalone-debug from Clang r198655:
Implement a new -fstandalone-debug option. rdar://problem/15685848
It controls everything that -flimit-debug-info used to, plus the
vtable type optimization. The old -fno-limit-debug-info option is now an
alias to -fstandalone-debug and vice versa.
Standalone is the default on Darwin until dtrace is updated to work with
non-standalone debug info (rdar://problem/15758808).
Note: I kept the LimitedDebugInfo name in CodeGenOptions::DebugInfoKind
because NoStandaloneDebugInfo sounded even more confusing.