ae [Tue, 16 Dec 2014 14:59:20 +0000 (14:59 +0000)]
Add ability to not specify a zone identifier twice, when both source and
destination addresses are specified.
For example:
# ping6 -S fe80::1%ix0 ff02::1
or
# ping6 -S fe80::1 fe80::2%ix0
ed [Tue, 16 Dec 2014 09:21:56 +0000 (09:21 +0000)]
Rename cpack*() to CMPLX*().
The C11 standard introduced a set of macros (CMPLX, CMPLXF, CMPLXL) that
can be used to construct complex numbers from a pair of real and
imaginary numbers. Unfortunately, they require some compiler support,
which is why we only define them for Clang and GCC>=4.7.
The cpack() function in libm performs the same task as CMPLX(), but
cannot be used to generate compile-time constants. This means that all
invocations of cpack() can safely be replaced by C11's CMPLX(). To keep
the code building with GCC 4.2, provide copies of CMPLX() that can at
least be used to generate run-time complex numbers.
This makes it easier to build some of the functions outside of libm.
neel [Tue, 16 Dec 2014 06:33:57 +0000 (06:33 +0000)]
For level triggered interrupts clear the PIC IRR bit when the interrupt pin
is deasserted. Prior to this change each assertion on a level triggered irq
pin resulted in two interrupts being delivered to the CPU.
yongari [Tue, 16 Dec 2014 06:13:30 +0000 (06:13 +0000)]
Fix a bug introdiced in r217548. According to NS DP83815 data
sheet, RX filter should be disabled before programming.
Previously it was clearing wrong bits so RX filter was not
disabled in RX filter configuration.
delphij [Mon, 15 Dec 2014 18:28:22 +0000 (18:28 +0000)]
MFV r275784:
Plug a memory leak in libzfs. In zfs_iter_bookmarks, an nvlist is allocated
before calling lzc_get_bookmarks, which allocates the nvlist again (and
overwrites the pointer to previously allocated list).
Illumos issue:
5427 memory leak in libzfs when doing rollback
delphij [Mon, 15 Dec 2014 18:22:45 +0000 (18:22 +0000)]
MFV r275783:
Convert ARC flags to use enum. Previously, public flags are defined in
arc.h and private flags are defined in arc.c which can lead to confusion
and programming errors.
Consistently use 'hdr' (when referencing arc_buf_hdr_t) instead of 'buf'
or 'ab' because arc_buf_t are often named 'buf' as well.
Illumos issue:
5369 arc flags should be an enum
5370 consistent arc_buf_hdr_t naming scheme
Calculate the segment's memory size (p_memsz) using the virtual
addresses, not the file offsets. Otherwise padding preceeding SHT_NOBITS
sections may be excluded from the calculation, resulting in a segment
that is too small.
emaste [Mon, 15 Dec 2014 14:25:42 +0000 (14:25 +0000)]
Remove empty generated file upon gperf failure
Prior to this change the build could fail as follows, if gperf is not
available (or fails):
- make(1) stops due to the gperf error, but an empty target file
(cfns.h) is still created
- the empty cfns.h is newer than the source cfns.gperf so it is not
regenerated on subsequent builds
- the gcc build fails (undefined reference to libc_name_p)
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
kib [Mon, 15 Dec 2014 12:01:42 +0000 (12:01 +0000)]
Add a facility for non-init process to declare itself the reaper of
the orphaned descendants. Base of the API is modelled after the same
feature from the DragonFlyBSD.
Requested by: bapt
Reviewed by: jilles (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
delphij [Mon, 15 Dec 2014 07:52:23 +0000 (07:52 +0000)]
MFV r275551:
Remove "dbuf phys" db->db_data pointer aliases.
Use function accessors that cast db->db_data to the appropriate
"phys" type, removing the need for clients of the dmu buf user
API to keep properly typed pointer aliases to db->db_data in order
to conveniently access their data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:
In zap_leaf() and zap_leaf_byteswap, now that the pointer alias
field l_phys has been removed, use the db_data field in an on
stack dmu_buf_t to point to the leaf's phys data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:
Remove the db_user_data_ptr_ptr field from dbuf and all logic
to maintain it.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:
Modify the DMU buf user API to remove the ability to specify
a db_data aliasing pointer (db_user_data_ptr_ptr).
cddl/contrib/opensolaris/cmd/zdb/zdb.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_diff.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_bookmark.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deleg.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_destroy.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_userhold.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h:
Create and use the new "phys data" accessor functions
dsl_dir_phys(), dsl_dataset_phys(), zap_m_phys(),
zap_f_phys(), and zap_leaf_phys().
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h:
Remove now unused "phys pointer" aliases to db->db_data
from clients of the DMU buf user API.
delphij [Mon, 15 Dec 2014 04:51:36 +0000 (04:51 +0000)]
MFV r275549:
Add a loader tunable, vfs.zfs.arc_meta_min, which controls how much metadata
ZFS should keep in ARC at minimum.
In arc_evict(), when doing recycle, take more factors into account by
applying the following policy:
1. If no evictable data, evict metadata;
2. If no evictable metadata, evict data;
3. If we hit arc_meta_limit, evict metadata;
4. If we haven't hit arc_meta_min, evict data;
5* (Illumos only, not present in new FreeBSD code, yet) evict the oldest
cached element from data and metadata.
(FreeBSD) evict the data type specified by caller, which is the
existing behavior.
Note that because of our splitted locks (implemented in r205231 to improve
scalability by reducing lock contention), implementing the fifth Illumos
behavior will not be cheap, so for now just implement the 1-4 and fall back
to current behavior for 5.
Illumos issue:
5368 ARC should cache more metadata
MFC after: 2 months (assuming we didn't found better solution)
kib [Sat, 13 Dec 2014 16:18:29 +0000 (16:18 +0000)]
Add facility to stop all userspace processes. The supposed use of the
feature is to quisce the system before suspend.
Stop is implemented by reusing the thread_single(9) with the special
mode SINGLE_ALLPROC. SINGLE_ALLPROC differs from the existing
single-threading modes by allowing (requiring) caller to operate on
other process. Interruptible sleeps for !TDF_SBDRY threads are
suspended like SIGSTOP does it, instead of aborting the sleep, like
SINGLE_NO_EXIT, to avoid spurious EINTRs on resume.
Provide debugging sysctl debug.stop_all_proc, which causes total stop
and suspends syncer, while waiting for variable reset for resume. It
is used for debugging; should be removed after the real use of the
interface is added.
In collaboration with: pho
Discussed with: avg
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
kib [Sat, 13 Dec 2014 16:07:01 +0000 (16:07 +0000)]
Only sleep interruptible while waiting for suspension end when
filesystem specified VFCF_SBDRY flag, i.e. for NFS.
There are two issues with the sleeps. First, applications may get
unexpected EINTR from the disk i/o syscalls. Second, interruptible
sleep allows the stop of the process, and since mount point is
referenced while thread sleeps, unmount cannot free mount point
structure' memory, blocking unmount indefinitely.
Even for NFS, it is probably only reasonable to enable PCATCH for intr
mounts, but this information is currently not available at VFS level.
Reported and tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kib [Sat, 13 Dec 2014 16:02:37 +0000 (16:02 +0000)]
The vinactive() call in vgonel() may start writes for the dirty pages,
creating delayed write buffers belonging to the reclaimed vnode. Put
the buffer cleanup code after inactivation.
Add asserts that ensure that buffer queues are empty and add BO_DEAD
flag for bufobj to check that no buffers are added after the cleanup.
BO_DEAD is only used by INVARIANTS-enabled kernels.
Reported and tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
delphij [Sat, 13 Dec 2014 02:08:18 +0000 (02:08 +0000)]
MFV r275548:
Verify that the block pointer is structurally valid, before attempting to
read it in. It can only be invalid in the case of a ZFS bug, but this
change will help identify such bugs in a more transparent way, by
panic'ing with a relevant message, rather than indexing off the end of an
array or something.
Illumos issue:
5349 verify that block pointer is plausible before reading
delphij [Sat, 13 Dec 2014 01:39:24 +0000 (01:39 +0000)]
MFV r275546:
Reduce scrub activities when system there is enough dirty data, namely when
dirty data is more than zfs_vdev_async_write_active_min_dirty_percent (once
we start to increase the number of concurrent async writes).
While there also correct rounding error which would make scrub end up
pausing for (zfs_txg_timeout + 1) seconds instead of the desired
zfs_txg_timeout seconds.
Illumos issue:
5351 scrub goes for an extra second each txg
5352 scrub should pause when there is some dirty data
delphij [Sat, 13 Dec 2014 01:26:06 +0000 (01:26 +0000)]
MFV r275545:
If zio_checksum_error() returns other than ECKSUM (e.g. EINVAL), it does not
fill in the "zio_bad_cksum_t *info" parameter. Caller should not attempt to
use it in this case.
Illumos issue:
5348 zio_checksum_error() only fills in info if ECKSUM
delphij [Sat, 13 Dec 2014 01:10:17 +0000 (01:10 +0000)]
MFV r275542:
If a dnode has a spill block and there is an error while accessing
a data block then traverse_dnode() loses information about that error
and returns a status of visiting the spill block.
This issue is discovered by Spectra Logic.
Illumos issue:
5311 traverse_dnode may report success when it should not
jmg [Fri, 12 Dec 2014 19:56:36 +0000 (19:56 +0000)]
Add some new modes to OpenCrypto. These modes are AES-ICM (can be used
for counter mode), and AES-GCM. Both of these modes have been added to
the aesni module.
Included is a set of tests to validate that the software and aesni
module calculate the correct values. These use the NIST KAT test
vectors. To run the test, you will need to install a soon to be
committed port, nist-kat that will install the vectors. Using a port
is necessary as the test vectors are around 25MB.
All the man pages were updated. I have added a new man page, crypto.7,
which includes a description of how to use each mode. All the new modes
and some other AES modes are present. It would be good for someone
else to go through and document the other modes.
A new ioctl was added to support AEAD modes which AES-GCM is one of them.
Without this ioctl, it is not possible to test AEAD modes from userland.
Add a timing safe bcmp for use to compare MACs. Previously we were using
bcmp which could leak timing info and result in the ability to forge
messages.
Add a minor optimization to the aesni module so that single segment
mbufs don't get copied and instead are updated in place. The aesni
module needs to be updated to support blocked IO so segmented mbufs
don't have to be copied.
We require that the IV be specified for all calls for both GCM and ICM.
This is to ensure proper use of these functions.
ae [Fri, 12 Dec 2014 11:29:54 +0000 (11:29 +0000)]
Increase the buffer size to keep the list of programm names when
parsing programm specification. It is safe to not check out of bounds
access, because !isprint(p[i]) check will stop reading, when '\0'
character will be read from the input string.
marcel [Fri, 12 Dec 2014 06:13:31 +0000 (06:13 +0000)]
The size of the first level reference count table is given in terms of the
number of clusters it occupies. It's not the number of entries in the table,
as it is for the L1 cluster table.
For small images, the two are the same. With the unit tests based on small
images, this change has therefore no effect on the unit test. For larger
images (like the FreeBSD 10.1-RELEASE image), this gives a discrepancy that
actually shows up when running "qemu-img check".
jhibbits [Fri, 12 Dec 2014 03:58:51 +0000 (03:58 +0000)]
Add new PowerPC relocations to binutils
Summary:
LLVM/Clang generates relocations that our binutils doesn't understand, but newer
binutils does. I got permission from the author of a series of patches to
relicense them as GPLv2 for use in FreeBSD. The upstream git hashes are:
ae [Thu, 11 Dec 2014 19:09:57 +0000 (19:09 +0000)]
Use ipsec6_in_reject() to simplify ip6_ipsec_fwd() and ip6_ipsec_input().
ipsec6_in_reject() does the same things, also it counts policy violation
errors.
Do IPSEC check in the ip6_forward() after addresses checks.
Also use ip6_ipsec_fwd() to make code similar to IPv4 implementation.
ae [Thu, 11 Dec 2014 18:55:54 +0000 (18:55 +0000)]
Use ipsec4_in_reject() to simplify ip_ipsec_fwd() and ip_ipsec_input().
ipsec4_in_reject() does the same things, also it counts policy violation
errors.
ae [Thu, 11 Dec 2014 17:34:49 +0000 (17:34 +0000)]
Remove flags and tunalready arguments from ipsec4_process_packet()
and make its prototype similar to ipsec6_process_packet.
The flags argument isn't used here, tunalready is always zero.
ae [Thu, 11 Dec 2014 16:53:29 +0000 (16:53 +0000)]
Move ip_ipsec_fwd() from ip_input() into ip_forward().
Remove check for presence PACKET_TAG_IPSEC_IN_DONE mbuf tag from
ip_ipsec_fwd(). PACKET_TAG_IPSEC_IN_DONE tag means that packet is
already handled by IPSEC code. This means that before IPSEC processing
it was destined to our address and security policy was checked in
the ip_ipsec_input(). After IPSEC processing packet has new IP
addresses and destination address isn't our own. So, anyway we can't
check security policy from the mbuf tag, because it corresponds
to different addresses.
We should check security policy that corresponds to packet
attributes in both cases - when it has a mbuf tag and when it has not.
ae [Thu, 11 Dec 2014 14:58:55 +0000 (14:58 +0000)]
Remove PACKET_TAG_IPSEC_IN_DONE mbuf tag lookup and usage of its
security policy. The changed block of code in ip*_ipsec_input() is
called when packet has ESP/AH header. Presence of
PACKET_TAG_IPSEC_IN_DONE mbuf tag in the same time means that
packet was already handled by IPSEC and reinjected in the netisr,
and it has another ESP/AH headers (encrypted twice?).
Since it was already processed by IPSEC code, the AH/ESP headers
was already stripped (and probably outer IP header was stripped too)
and security policy from the tdb_ident was applied to those headers.
It is incorrect to apply this security policy to current headers.
Also make ip_ipsec_input() prototype similar to ip6_ipsec_input().
ae [Thu, 11 Dec 2014 14:43:44 +0000 (14:43 +0000)]
Remove check for presence of PACKET_TAG_IPSEC_PENDING_TDB and
PACKET_TAG_IPSEC_OUT_CRYPTO_NEEDED mbuf tags. They aren't used in FreeBSD.
Instead check presence of PACKET_TAG_IPSEC_OUT_DONE mbuf tag. If it
is found, bypass security policy lookup as described in the comment.
PACKET_TAG_IPSEC_OUT_DONE tag added to mbuf when IPSEC code finishes
ESP/AH processing. Since it was already finished, this means the security
policy placed in the tdb_ident was already checked. And there is no reason
to check it again here.
ngie [Wed, 10 Dec 2014 23:18:11 +0000 (23:18 +0000)]
Fix building termcap.db when make obj is run beforehand from a clean tree by
using make variables for the filenames, which helps resolve pathing
appropriately when running cap_mkdb
ngie [Wed, 10 Dec 2014 20:40:03 +0000 (20:40 +0000)]
Remove termcap entry reordering; install the file verbatim instead
termcap entry reordering requires ex (which is available via usr.bin/vi), which
breaks on build hosts where installworld is run with MK_VI == no (or when
make delete-old is run on ^/projects/building-blocks as vi, et al, are
removed on the branch when the knob is tweaked to => "no")
Reordering termcap was believed to improve performance, but the file is now
accessed via /etc/termcap.db, so /etc/termcap (and /usr/share/misc/termcap by
proxy) access is less preferred.
Reordering the file broke the historical comment <-> entry mapping as well,
which could muddle the purpose of entries in the file, so it could be
potentially harmful to readers in its reordered state.
Discussion took place on hackers@ here:
https://lists.freebsd.org/pipermail/freebsd-hackers/2014-December/046657.html
trasz [Wed, 10 Dec 2014 14:36:44 +0000 (14:36 +0000)]
Add "-media" autofs map, to access data on removable media, such as CD
drives or flash keys. It can be enabled by uncommenting a single entry
in default /etc/auto_master. It can also be easily modified to use
fuse-based filesystems instead of in-kernel ones.
There is still one deficiency - the mountpoints are permanent, they
don't disappear when user removes the media. Fixing it needs some
autofs changes.
Differential Revision: https://reviews.freebsd.org/D1210
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
trasz [Wed, 10 Dec 2014 14:14:16 +0000 (14:14 +0000)]
Add fstyp(8). This utility, named after its SVR4 counterpart, detects
filesystems. It differs from file(1) in that it gives machine-parseable
output, it outputs filesystem labels, doesn't get confused by other
formats metadata, and runs in Capsicum sandbox.
Differential Revision: https://reviews.freebsd.org/D1255
Relnotes: yes
Sponsored by: The FreeBSD Foundation
royger [Wed, 10 Dec 2014 13:25:21 +0000 (13:25 +0000)]
xen/intr: balance dynamic interrupts across available vCPUs
By default Xen binds all event channels to vCPU#0, and FreeBSD only shuffles
the interrupt sources once, at the end of the boot process. Since new event
channels might be created after this point (because new devices or backends
are added), try to automatically shuffle them at creation time.
This does not affect VIRQ or IPI event channels, that are already bound to a
specific vCPU as requested by the caller.
royger [Wed, 10 Dec 2014 11:42:02 +0000 (11:42 +0000)]
xen: mask event channels while binding them to a vCPU
Mask the event channel source before trying to bind it to a CPU, this
prevents stray interrupts from firing while assigning them and hitting the
KASSERT in xen_intr_handle_upcall.
royger [Wed, 10 Dec 2014 11:21:52 +0000 (11:21 +0000)]
xen: move grant table code
Mave the grant table code into the dev/xen folder in preparation for turning
it into a device using the newbus interface. This is just code motion, no
functional changes.
delphij [Wed, 10 Dec 2014 08:18:22 +0000 (08:18 +0000)]
In r268924 __fflush was modified so that when write(2) was not successful,
_p and _w are adjusted to account for the partial write (if any).
However, _p and _w should not be unconditionally adjusted and should only
be changed when we actually wrote some bytes, or the accumulated accounting
error will eventually result in a heap buffer overflow.
Reported by: adrian and alfred (Norse Corporation)
Security: FreeBSD-SA-14:27.stdio
Security: CVE-2014-8611
ian [Wed, 10 Dec 2014 04:54:43 +0000 (04:54 +0000)]
Fix the watchdog timeout calculation to prevent wrap. The RPi hardware
can't do a timeout bigger than 15 seconds. The code wasn't checking for
this and because bitmasking was involved the requested timeout was
basically adjusted modulo-16. That led to things like a 128 second
timeout actually being a 9 second timeout, which accidentally worked fine
until watchdogd was changed to only pet the dog once every 10 seconds.