Will Andrews [Wed, 17 Dec 2014 00:22:41 +0000 (00:22 +0000)]
Initialize an argument to NULL instead of expecting dlinfo() to do it.
dlinfo() is a weak reference that may not be initialized at the time of
execution. The default implementation (in lib/libc/gen/dlfcn.c) neither
modifies the address pointed to by the third argument nor returns an error.
The iret instruction may generate #np and #ss fault, besides #gp.
When returning to usermode, the handler for that exceptions is also
executed with wrong gs base. Handle all three possible faults in the
same way, checking for iret fault, and performing full iret.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Will Andrews [Tue, 16 Dec 2014 17:59:05 +0000 (17:59 +0000)]
Make NanoBSD source-able from other scripts.
Summary:
This change converts NanoBSD into a two-script bundle.
- defaults.sh contains all non-CLI code. Most NanoBSD code is moved into
this file.
- nanobsd.sh now consists just of a command line interface that calls into
functions in defaults.sh.
Test Plan: Run NanoBSD using a previously-working configuration.
Add ability to not specify a zone identifier twice, when both source and
destination addresses are specified.
For example:
# ping6 -S fe80::1%ix0 ff02::1
or
# ping6 -S fe80::1 fe80::2%ix0
Ed Schouten [Tue, 16 Dec 2014 09:21:56 +0000 (09:21 +0000)]
Rename cpack*() to CMPLX*().
The C11 standard introduced a set of macros (CMPLX, CMPLXF, CMPLXL) that
can be used to construct complex numbers from a pair of real and
imaginary numbers. Unfortunately, they require some compiler support,
which is why we only define them for Clang and GCC>=4.7.
The cpack() function in libm performs the same task as CMPLX(), but
cannot be used to generate compile-time constants. This means that all
invocations of cpack() can safely be replaced by C11's CMPLX(). To keep
the code building with GCC 4.2, provide copies of CMPLX() that can at
least be used to generate run-time complex numbers.
This makes it easier to build some of the functions outside of libm.
Neel Natu [Tue, 16 Dec 2014 06:33:57 +0000 (06:33 +0000)]
For level triggered interrupts clear the PIC IRR bit when the interrupt pin
is deasserted. Prior to this change each assertion on a level triggered irq
pin resulted in two interrupts being delivered to the CPU.
Pyun YongHyeon [Tue, 16 Dec 2014 06:13:30 +0000 (06:13 +0000)]
Fix a bug introdiced in r217548. According to NS DP83815 data
sheet, RX filter should be disabled before programming.
Previously it was clearing wrong bits so RX filter was not
disabled in RX filter configuration.
Xin LI [Mon, 15 Dec 2014 18:28:22 +0000 (18:28 +0000)]
MFV r275784:
Plug a memory leak in libzfs. In zfs_iter_bookmarks, an nvlist is allocated
before calling lzc_get_bookmarks, which allocates the nvlist again (and
overwrites the pointer to previously allocated list).
Illumos issue:
5427 memory leak in libzfs when doing rollback
Xin LI [Mon, 15 Dec 2014 18:22:45 +0000 (18:22 +0000)]
MFV r275783:
Convert ARC flags to use enum. Previously, public flags are defined in
arc.h and private flags are defined in arc.c which can lead to confusion
and programming errors.
Consistently use 'hdr' (when referencing arc_buf_hdr_t) instead of 'buf'
or 'ab' because arc_buf_t are often named 'buf' as well.
Illumos issue:
5369 arc flags should be an enum
5370 consistent arc_buf_hdr_t naming scheme
Calculate the segment's memory size (p_memsz) using the virtual
addresses, not the file offsets. Otherwise padding preceeding SHT_NOBITS
sections may be excluded from the calculation, resulting in a segment
that is too small.
Ed Maste [Mon, 15 Dec 2014 14:25:42 +0000 (14:25 +0000)]
Remove empty generated file upon gperf failure
Prior to this change the build could fail as follows, if gperf is not
available (or fails):
- make(1) stops due to the gperf error, but an empty target file
(cfns.h) is still created
- the empty cfns.h is newer than the source cfns.gperf so it is not
regenerated on subsequent builds
- the gcc build fails (undefined reference to libc_name_p)
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Add a facility for non-init process to declare itself the reaper of
the orphaned descendants. Base of the API is modelled after the same
feature from the DragonFlyBSD.
Requested by: bapt
Reviewed by: jilles (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Xin LI [Mon, 15 Dec 2014 08:02:22 +0000 (08:02 +0000)]
5427 memory leak in libzfs when doing rollback
Reviewed by: Michael Tsymbalyuk <mtzaurus@gmail.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Jan Kryl <jan.kryl@nexenta.com>
Xin LI [Mon, 15 Dec 2014 07:59:33 +0000 (07:59 +0000)]
5369 arc flags should be an enum
5370 consistent arc_buf_hdr_t naming scheme
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: George Wilson <george.wilson@delphix.com>
Xin LI [Mon, 15 Dec 2014 07:52:23 +0000 (07:52 +0000)]
MFV r275551:
Remove "dbuf phys" db->db_data pointer aliases.
Use function accessors that cast db->db_data to the appropriate
"phys" type, removing the need for clients of the dmu buf user
API to keep properly typed pointer aliases to db->db_data in order
to conveniently access their data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:
In zap_leaf() and zap_leaf_byteswap, now that the pointer alias
field l_phys has been removed, use the db_data field in an on
stack dmu_buf_t to point to the leaf's phys data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:
Remove the db_user_data_ptr_ptr field from dbuf and all logic
to maintain it.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:
Modify the DMU buf user API to remove the ability to specify
a db_data aliasing pointer (db_user_data_ptr_ptr).
cddl/contrib/opensolaris/cmd/zdb/zdb.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_diff.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_bookmark.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deleg.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_destroy.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_userhold.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h:
Create and use the new "phys data" accessor functions
dsl_dir_phys(), dsl_dataset_phys(), zap_m_phys(),
zap_f_phys(), and zap_leaf_phys().
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h:
Remove now unused "phys pointer" aliases to db->db_data
from clients of the DMU buf user API.
Xin LI [Mon, 15 Dec 2014 04:51:36 +0000 (04:51 +0000)]
MFV r275549:
Add a loader tunable, vfs.zfs.arc_meta_min, which controls how much metadata
ZFS should keep in ARC at minimum.
In arc_evict(), when doing recycle, take more factors into account by
applying the following policy:
1. If no evictable data, evict metadata;
2. If no evictable metadata, evict data;
3. If we hit arc_meta_limit, evict metadata;
4. If we haven't hit arc_meta_min, evict data;
5* (Illumos only, not present in new FreeBSD code, yet) evict the oldest
cached element from data and metadata.
(FreeBSD) evict the data type specified by caller, which is the
existing behavior.
Note that because of our splitted locks (implemented in r205231 to improve
scalability by reducing lock contention), implementing the fifth Illumos
behavior will not be cheap, so for now just implement the 1-4 and fall back
to current behavior for 5.
Illumos issue:
5368 ARC should cache more metadata
MFC after: 2 months (assuming we didn't found better solution)
Add facility to stop all userspace processes. The supposed use of the
feature is to quisce the system before suspend.
Stop is implemented by reusing the thread_single(9) with the special
mode SINGLE_ALLPROC. SINGLE_ALLPROC differs from the existing
single-threading modes by allowing (requiring) caller to operate on
other process. Interruptible sleeps for !TDF_SBDRY threads are
suspended like SIGSTOP does it, instead of aborting the sleep, like
SINGLE_NO_EXIT, to avoid spurious EINTRs on resume.
Provide debugging sysctl debug.stop_all_proc, which causes total stop
and suspends syncer, while waiting for variable reset for resume. It
is used for debugging; should be removed after the real use of the
interface is added.
In collaboration with: pho
Discussed with: avg
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Only sleep interruptible while waiting for suspension end when
filesystem specified VFCF_SBDRY flag, i.e. for NFS.
There are two issues with the sleeps. First, applications may get
unexpected EINTR from the disk i/o syscalls. Second, interruptible
sleep allows the stop of the process, and since mount point is
referenced while thread sleeps, unmount cannot free mount point
structure' memory, blocking unmount indefinitely.
Even for NFS, it is probably only reasonable to enable PCATCH for intr
mounts, but this information is currently not available at VFS level.
Reported and tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The vinactive() call in vgonel() may start writes for the dirty pages,
creating delayed write buffers belonging to the reclaimed vnode. Put
the buffer cleanup code after inactivation.
Add asserts that ensure that buffer queues are empty and add BO_DEAD
flag for bufobj to check that no buffers are added after the cleanup.
BO_DEAD is only used by INVARIANTS-enabled kernels.
Reported and tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Xin LI [Sat, 13 Dec 2014 02:08:18 +0000 (02:08 +0000)]
MFV r275548:
Verify that the block pointer is structurally valid, before attempting to
read it in. It can only be invalid in the case of a ZFS bug, but this
change will help identify such bugs in a more transparent way, by
panic'ing with a relevant message, rather than indexing off the end of an
array or something.
Illumos issue:
5349 verify that block pointer is plausible before reading
Xin LI [Sat, 13 Dec 2014 01:39:24 +0000 (01:39 +0000)]
MFV r275546:
Reduce scrub activities when system there is enough dirty data, namely when
dirty data is more than zfs_vdev_async_write_active_min_dirty_percent (once
we start to increase the number of concurrent async writes).
While there also correct rounding error which would make scrub end up
pausing for (zfs_txg_timeout + 1) seconds instead of the desired
zfs_txg_timeout seconds.
Illumos issue:
5351 scrub goes for an extra second each txg
5352 scrub should pause when there is some dirty data
Xin LI [Sat, 13 Dec 2014 01:26:06 +0000 (01:26 +0000)]
MFV r275545:
If zio_checksum_error() returns other than ECKSUM (e.g. EINVAL), it does not
fill in the "zio_bad_cksum_t *info" parameter. Caller should not attempt to
use it in this case.
Illumos issue:
5348 zio_checksum_error() only fills in info if ECKSUM
Xin LI [Sat, 13 Dec 2014 01:10:17 +0000 (01:10 +0000)]
MFV r275542:
If a dnode has a spill block and there is an error while accessing
a data block then traverse_dnode() loses information about that error
and returns a status of visiting the spill block.
This issue is discovered by Spectra Logic.
Illumos issue:
5311 traverse_dnode may report success when it should not
John-Mark Gurney [Fri, 12 Dec 2014 19:56:36 +0000 (19:56 +0000)]
Add some new modes to OpenCrypto. These modes are AES-ICM (can be used
for counter mode), and AES-GCM. Both of these modes have been added to
the aesni module.
Included is a set of tests to validate that the software and aesni
module calculate the correct values. These use the NIST KAT test
vectors. To run the test, you will need to install a soon to be
committed port, nist-kat that will install the vectors. Using a port
is necessary as the test vectors are around 25MB.
All the man pages were updated. I have added a new man page, crypto.7,
which includes a description of how to use each mode. All the new modes
and some other AES modes are present. It would be good for someone
else to go through and document the other modes.
A new ioctl was added to support AEAD modes which AES-GCM is one of them.
Without this ioctl, it is not possible to test AEAD modes from userland.
Add a timing safe bcmp for use to compare MACs. Previously we were using
bcmp which could leak timing info and result in the ability to forge
messages.
Add a minor optimization to the aesni module so that single segment
mbufs don't get copied and instead are updated in place. The aesni
module needs to be updated to support blocked IO so segmented mbufs
don't have to be copied.
We require that the IV be specified for all calls for both GCM and ICM.
This is to ensure proper use of these functions.
Increase the buffer size to keep the list of programm names when
parsing programm specification. It is safe to not check out of bounds
access, because !isprint(p[i]) check will stop reading, when '\0'
character will be read from the input string.
Marcel Moolenaar [Fri, 12 Dec 2014 06:13:31 +0000 (06:13 +0000)]
The size of the first level reference count table is given in terms of the
number of clusters it occupies. It's not the number of entries in the table,
as it is for the L1 cluster table.
For small images, the two are the same. With the unit tests based on small
images, this change has therefore no effect on the unit test. For larger
images (like the FreeBSD 10.1-RELEASE image), this gives a discrepancy that
actually shows up when running "qemu-img check".
Justin Hibbits [Fri, 12 Dec 2014 03:58:51 +0000 (03:58 +0000)]
Add new PowerPC relocations to binutils
Summary:
LLVM/Clang generates relocations that our binutils doesn't understand, but newer
binutils does. I got permission from the author of a series of patches to
relicense them as GPLv2 for use in FreeBSD. The upstream git hashes are:
Use ipsec6_in_reject() to simplify ip6_ipsec_fwd() and ip6_ipsec_input().
ipsec6_in_reject() does the same things, also it counts policy violation
errors.
Do IPSEC check in the ip6_forward() after addresses checks.
Also use ip6_ipsec_fwd() to make code similar to IPv4 implementation.
Use ipsec4_in_reject() to simplify ip_ipsec_fwd() and ip_ipsec_input().
ipsec4_in_reject() does the same things, also it counts policy violation
errors.
Remove flags and tunalready arguments from ipsec4_process_packet()
and make its prototype similar to ipsec6_process_packet.
The flags argument isn't used here, tunalready is always zero.
Move ip_ipsec_fwd() from ip_input() into ip_forward().
Remove check for presence PACKET_TAG_IPSEC_IN_DONE mbuf tag from
ip_ipsec_fwd(). PACKET_TAG_IPSEC_IN_DONE tag means that packet is
already handled by IPSEC code. This means that before IPSEC processing
it was destined to our address and security policy was checked in
the ip_ipsec_input(). After IPSEC processing packet has new IP
addresses and destination address isn't our own. So, anyway we can't
check security policy from the mbuf tag, because it corresponds
to different addresses.
We should check security policy that corresponds to packet
attributes in both cases - when it has a mbuf tag and when it has not.
Remove PACKET_TAG_IPSEC_IN_DONE mbuf tag lookup and usage of its
security policy. The changed block of code in ip*_ipsec_input() is
called when packet has ESP/AH header. Presence of
PACKET_TAG_IPSEC_IN_DONE mbuf tag in the same time means that
packet was already handled by IPSEC and reinjected in the netisr,
and it has another ESP/AH headers (encrypted twice?).
Since it was already processed by IPSEC code, the AH/ESP headers
was already stripped (and probably outer IP header was stripped too)
and security policy from the tdb_ident was applied to those headers.
It is incorrect to apply this security policy to current headers.
Also make ip_ipsec_input() prototype similar to ip6_ipsec_input().
Remove check for presence of PACKET_TAG_IPSEC_PENDING_TDB and
PACKET_TAG_IPSEC_OUT_CRYPTO_NEEDED mbuf tags. They aren't used in FreeBSD.
Instead check presence of PACKET_TAG_IPSEC_OUT_DONE mbuf tag. If it
is found, bypass security policy lookup as described in the comment.
PACKET_TAG_IPSEC_OUT_DONE tag added to mbuf when IPSEC code finishes
ESP/AH processing. Since it was already finished, this means the security
policy placed in the tdb_ident was already checked. And there is no reason
to check it again here.
Enji Cooper [Wed, 10 Dec 2014 23:18:11 +0000 (23:18 +0000)]
Fix building termcap.db when make obj is run beforehand from a clean tree by
using make variables for the filenames, which helps resolve pathing
appropriately when running cap_mkdb