Change /usr/libexec/ld-elf.so.1 to point at ../../libexec/ld-elf.so.1
instead of /libexec/ld-elf.so.1. Below in the Makefile we execute
'chflags noschg ${DESTDIR}/usr/libexec/ld-elf.so.1', which follows
symlink and removes 'schg' flag from system's /libexec/ld-elf.so.1
instead of the one in DESTDIR. It is also more friendly to use
replative paths in symlink in case of jail/chroot environments.
Jung-uk Kim [Tue, 4 Dec 2012 00:37:17 +0000 (00:37 +0000)]
Tidy up bsd.cpu.mk for X86 CPUs:
- Do not limit recent processors to "prescott" class for i386 target. There
is no reason for this hack because clang is default now. On top of that, it
will only grow indefinitely over time.
- Add more CPUTYPEs, i.e., "athlon-fx", "core-avx2", "atom", "penryn", and
"yonah". Note "penryn" and "yonah" are intentionally undocumented because
they are not supported by gcc and marked deprecated by clang.
- Add more CPUTYPE aliases, i.e., "barcelona" (-> amdfam10), "westmere" and
"nehalem" (-> corei7). Note these are intentionally undocumented because
they are not supported by (base) gcc and/or clang. However, LLVM (backend)
seems to "know" the differences. Most likely, they were deprecated with
other vendor code names and clang did not bother implementing them at all.
- Add i686 to MACHINE_CPU for "c3-2" (VIA Nehemiah). Both gcc & clang treat
it like an i686-class processor.
- Add IDT "winchip2" and "winchip-c6" for completeness (undocumented).
- Order processors per make.conf example, i.e., CPU vendors and models.
- Tidy up make.conf example, i.e., remove "by gcc" (because we have aliases)
and remove "prescott" from AMD64 architecture (because it is not correct).
Print the frame addresses for the backtraces on i386 and amd64. It
allows both to inspect the frame sizes and to manually peek into the
frames from ddb, if needed.
The vnode_free_list_mtx is required unconditionally when iterating
over the active list. The mount interlock is not enough to guarantee
the validity of the tailq link pointers. The __mnt_vnode_next_active()
and __mnt_vnode_first_active() active lists iterators helper functions
did not provided the neccessary stability for the list, allowing the
iterators to pick garbage.
This was uncovered after the r243599 made the active list iterators
non-nop.
Since a vnode interlock is before the vnode_free_list_mtx, obtain the
vnode ilock in the non-blocking manner when under vnode_free_list_mtx,
and restart iteration after the yield if the lock attempt failed.
Assert that a vnode found on the list is active, and assert that the
helpers return the vnode with interlock owned.
Xin LI [Mon, 3 Dec 2012 21:49:37 +0000 (21:49 +0000)]
Note that the manual page of less(1) says:
Note that a preprocessor cannot output an empty file, since that
is interpreted as meaning there is no replacement, and the origi-
nal file is used. To avoid this, if LESSOPEN starts with two ver-
tical bars, the exit status of the script becomes meaningful. If
the exit status is zero, the output is considered to be replace-
ment text, even if it empty. If the exit status is nonzero, any
output is ignored and the original file is used. For compatibil-
ity with previous versions of less, if LESSOPEN starts with only
one vertical bar, the exit status of the preprocessor is ignored.
Use two pipe symbols for zless, so that zless'ing a compressed empty
file will give output rather than being interpreted as its compressed
form, which is typically a binary.
Thanks Mark Nudelman for pointing out this difference and the
suggested solution.
Jack F Vogel [Mon, 3 Dec 2012 21:38:02 +0000 (21:38 +0000)]
Remove the sysctl process_limit interface, after some
thought I've decided its overkill,a simple tuneable for
each RX and TX limit, and then init sets the ring values
based on that, should be sufficient.
More importantly, fix a bug causing a panic, when changing
the define style to IXGBE_LEGACY_TX a taskqueue init was
inadvertently set #ifdef when it should be #ifndef.
Robert Watson [Sun, 2 Dec 2012 22:09:16 +0000 (22:09 +0000)]
Specifically point at the Handbook instructions for world updates in
UPDATING by URL.
As there has been some confusion over the need to run "mergemaster -p",
part of our standard upgrade procedure, following the recent addition of
an "auditdistd" user, add a note about it to UPDATING explicitly.
Adrian Chadd [Sun, 2 Dec 2012 06:24:08 +0000 (06:24 +0000)]
Delete the per-TXQ locks and replace them with a single TX lock.
I couldn't think of a way to maintain the hardware TXQ locks _and_ layer
on top of that per-TXQ software queuing and any other kind of fine-grained
locks (eg per-TID, or per-node locks.)
So for now, to facilitate some further code refactoring and development
as part of the final push to get software queue ps-poll and u-apsd handling
into this driver, just do away with them entirely.
I may eventually bring them back at some point, when it looks slightly more
architectually cleaner to do so. But as it stands at the present, it's
not really buying us much:
* in order to properly serialise things and not get bitten by scheduling
and locking interactions with things higher up in the stack, we need to
wrap the whole TX path in a long held lock. Otherwise we can end up
being pre-empted during frame handling, resulting in some out of order
frame handling between sequence number allocation and encryption handling
(ie, the seqno and the CCMP IV get out of sequence);
* .. so whilst that's the case, holding the lock for that long means that
we're acquiring and releasing the TXQ lock _inside_ that context;
* And we also acquire it per-frame during frame completion, but we currently
can't hold the lock for the duration of the TX completion as we need
to call net80211 layer things with the locks _unheld_ to avoid LOR.
* .. the other places were grab that lock are reset/flush, which don't happen
often.
My eventual aim is to change the TX path so all rejected frame transmissions
and all frame completions result in any ieee80211_free_node() calls to occur
outside of the TX lock; then I can cut back on the amount of locking that
goes on here.
There may be some LORs that occur when ieee80211_free_node() is called when
the TX queue path fails; I'll begin to address these in follow-up commits.
Rick Macklem [Sun, 2 Dec 2012 01:16:04 +0000 (01:16 +0000)]
Add an nfssvc() option to the kernel for the new NFS client
which dumps out the actual options being used by an NFS mount.
This will be used to implement a "-m" option for nfsstat(1).
- Add support for Etron EJ168 USB 3.0 Host Controllers.
This brand of controllers expects that the number of
contexts specified in the input slot context points
to an active endpoint context, else it refuses to
operate.
- Ring the correct doorbell when streams mode is used.
- Wrap one or two long lines.
Tested by: Markus Pfeiffer (DragonFlyBSD)
MFC after: 1 week
Protect against DoS attacks, such as being described in CVE-2010-2632.
The changes were derived from what has been committed to NetBSD, with
modifications. These are:
1. Preserve the existsing GLOB_LIMIT behaviour by including the number
of matches to the set of parameters to limit.
2. Change some of the limits to avoid impacting normal use cases:
GLOB_LIMIT_STRING - change from 65536 to ARG_MAX so that glob(3)
can still provide a full command line of expanded names.
GLOB_LIMIT_STAT - change from 128 to 1024 for no other reason than
that 128 feels too low (it's not a limit that impacts the
behaviour of the test program listed in CVE-2010-2632).
GLOB_LIMIT_PATH - change from 1024 to 65536 so that glob(3) can
still provide a fill command line of expanded names.
3. Protect against buffer overruns when we hit the GLOB_LIMIT_STAT or
GLOB_LIMIT_READDIR limits. We append SEP and EOS to pathend in
those cases. Return GLOB_ABORTED instead of GLOB_NOSPACE when we
would otherwise overrun the buffer.
This change also modifies the existing behaviour of glob(3) in case
GLOB_LIMIT is specifies by limiting the *new* matches and not all
matches. This is an important distinction when GLOB_APPEND is set or
when the caller uses a non-zero gl_offs. Previously pre-existing
matches or the value of gl_offs would be counted in the number of
matches even though the man page states that glob(3) would return
GLOB_NOSPACE when gl_matchc or more matches were found.
The limits that cannot be circumvented are GLOB_LIMIT_STRING and
GLOB_LIMIT_PATH all others can be crossed by simply calling glob(3)
again and with GLOB_APPEND set.
The entire description above applies only when GLOB_LIMIT has been
specified of course. No limits apply when this flag isn't set!
Andriy Gapon [Sat, 1 Dec 2012 18:16:14 +0000 (18:16 +0000)]
ioapic_program_intpin: program high bits before low bits
Programming the low bits has a side-effect if unmasking the pin if it is
not disabled. So if an interrupt was pending then it would be delivered
with the correct new vector but to the incorrect old LAPIC.
This fix could be made clearer by preserving the mask bit while
programming the low bits and then explicitly resetting the mask bit
after all the programming is done.
Probability to trip over the fixed bug could be increased by bootverbose
because printing of the interrupt information in ioapic_assign_cpu
lengthened the time window during which an interrupt could arrive while
a pin is masked.
Reported by: Andreas Longwitz <longwitz@incore.de>
Tested by: Andreas Longwitz <longwitz@incore.de>
MFC after: 12 days
Andriy Gapon [Sat, 1 Dec 2012 18:12:55 +0000 (18:12 +0000)]
gfs_file_inactive: replace bad code with ugly code
Also, make it explicit that V_XATTRDIR is not properly supported in gfs
code yet.
The bad code was plain incorrect: (a) it spoiled handling of v_usecount
reaching zero and (b) it leaked v_holdcnt.
The ugly code employs potentially unsafe locking tricks.
Ideally we should separate vnode lifecycle and gfs node lifecycle.
A gfs node should have its own reference count where its child nodes
should be accounted.
PR: kern/151111
Reviewed by: kib
MFC after: 13 days
In globextend(), take advantage of the fact that realloc(NULL, size) is
equivalent to malloc(size). This eliminates the conditional expression
used for calling either realloc() or malloc() when realloc() will do
all the time.
In globextend() when the pathv vector cannot be (re-)allocated, don't
free and clear the gl_pathv pointer in the glob_t structure. Such
breaks the invariant of the glob_t structure, as stated in the comment
right in front of the globextend() function. If gl_pathv was non-NULL,
then gl_pathc was > 0. Making gl_pathv a NULL pointer without also
setting gl_pathc to 0 is wrong.
Since we otherwise don't free the memory associated with a glob_t in
error cases, it's unlikely that this change will cause a memory leak
that wasn't already there to begin with. Callers of glob(3) must
call globfree(3) irrespective of whether glob(3) returned an error
or not.
Robert Watson [Sat, 1 Dec 2012 15:11:46 +0000 (15:11 +0000)]
Merge a number of changes required to hook up OpenBSM 1.2-alpha2's
auditdistd (distributed audit daemon) to the build:
- Manual cross references
- Makefile for auditdistd
- rc.d script, rc.conf entrie
- New group and user for auditdistd; associated aliases, etc.
The audit trail distribution daemon provides reliable,
cryptographically protected (and sandboxed) delivery of audit tails
from live clients to audit server hosts in order to both allow
centralised analysis, and improve resilience in the event of client
compromises: clients are not permitted to change trail contents
after submission.
Submitted by: pjd
Sponsored by: The FreeBSD Foundation (auditdistd)
Robert Watson [Sat, 1 Dec 2012 13:46:37 +0000 (13:46 +0000)]
Merge OpenBSM 1.2-alpha2 changes from contrib/openbsm to
src/sys/{bsm,security/audit}. There are a few tweaks to help with the
FreeBSD build environment that will be merged back to OpenBSM. No
significant functional changes appear on the kernel side.
Obtained from: TrustedBSD Project
Sponsored by: The FreeBSD Foundation (auditdistd)
Jack F Vogel [Sat, 1 Dec 2012 01:24:40 +0000 (01:24 +0000)]
Patch #12 OK, I said there was only 11 patches, but unfortunately
the revamped sysctl code did not work, and needed a change. This
makes the limit get set at the time that all sysctl stats are
created and is actually more elegant imho anyway.
Jack F Vogel [Sat, 1 Dec 2012 00:11:24 +0000 (00:11 +0000)]
Patch #11 - The final patch: this one greatly improves the
TX hot path by getting rid of index calculations and simply
managing pointers. Much of the creative code is due to my
coworker here at Intel, Alex Duyck, thanks Alex!
Also, this whole series of patches was given the critical
eye of Gleb Smirnoff and is all the better for it, thanks
Gleb!
Robert Watson [Sat, 1 Dec 2012 00:02:31 +0000 (00:02 +0000)]
Merge a number of post-1.2-alpha2 changes to OpenBSM into the OpenBSM
vendor area; these sort out various post-release issues, largely to do
with integrating OpenBSM with the base FreeBSD build. All of these
changes will appear in a future 1.2-alpha3:
Change 219846 on 2012/11/26 by rwatson@rwatson_cinnamon
Update several instances of Apple Computer to Apple; a change made
in the FreeBSD tree some years ago but not propagated to OpenBSM.
Change 219845 on 2012/11/26 by rwatson@rwatson_cinnamon
Remove Apple acknowledgement clause from file with Christian
Peron copyright (with permission from Christian).
Change 219836 on 2012/11/23 by rwatson@rwatson_cinnamon
Replace further instances of <> with "" for local includes in
auditdistd.
Change 219834 on 2012/11/23 by rwatson@rwatson_cinnamon
For current-directory headers, use #include "" rather than #include
<>.
Change 219832 on 2012/11/23 by rwatson@rwatson_cinnamon
Be more consistent with the remainder of OpenBSM and include
config/config.h rather than config.h.
Don't include config.h from synch.h, which is included only from
.c files that already include config.h.
Change 219831 on 2012/11/23 by pjd@pjd_anger
Add Xref to auditdistd(8).
Suggested by: rwatson
Obtained from: TrustedBSD Project
Sponsored by: The FreeBSD Foundation (auditdistd)
Jilles Tjoelker [Fri, 30 Nov 2012 23:51:33 +0000 (23:51 +0000)]
libc: Allow setting close-on-exec in fopen/freopen/fdopen.
This commit adds a new mode option 'e' that must follow any 'b', '+' and/or
'x' options. C11 is clear about the 'x' needing to follow 'b' and/or '+' and
that is what we implement; therefore, require a strict position for 'e' as
well.
For freopen() with a non-NULL path argument and fopen(), the close-on-exec
flag is set iff the 'e' mode option is specified. For freopen() with a NULL
path argument and fdopen(), the close-on-exec flag is turned on if the 'e'
mode option is specified and remains unchanged otherwise.
Although the same behaviour for fopen() can be obtained by open(O_CLOEXEC)
and fdopen(), this needlessly complicates the calling code.
Apart from the ordering requirement, the new option matches glibc.
Robert Watson [Fri, 30 Nov 2012 23:50:07 +0000 (23:50 +0000)]
Import OpenBSM 1.2-alpha2:
OpenBSM 1.2 alpha 2
- auditdistd, a distributed audit trail management daemon, has now been
merged. This allows trail files to be securely and reliably synced from
audited hosts to an audit server, and employs TLS encryption. Where
available, it uses Capsicum to sandbox the service. This work was
contributed by Pawel Jakub Dawidek under sponsorship from the FreeBSD
Foundation.
OpenBSM 1.2 alpha 1
- Add Capsicum-related error numbers for FreeBSD: ENOTCAPABLE, ECAPMODE.
- Add Capsicum, process descriptor audit events for FreeBSD.
- Allow 0% minspace.
- Fixes from the clang static analyser.
- Fix expiration of trail files when the host parameter is used.
- Various typo fixes.
- Support for Solaris privilege and privilege set tokens.
- Documentation for getachost(), improvements for getacfilesz().
- Fix a directory descriptor leak that happened when audit trail partitions
filled.
- Support for more Linux distributions with a partial contemporary endian.h.
- Improved escaping of XML-encapsulated BSM.
- A variety of minor documentation, style, and functional.
Obtained from: TrustedBSD Project
Sponsored by: The FreeBSD Foundation (auditdistd)
Jack F Vogel [Fri, 30 Nov 2012 23:45:55 +0000 (23:45 +0000)]
Patch #8 Performance changes - this one improves locality,
moving some counters and data to the ring struct from
the adapter struct, also compressing some data in the
move.
Jack F Vogel [Fri, 30 Nov 2012 23:28:01 +0000 (23:28 +0000)]
Patch #7 This is primarily about processing limit control.
- add a limit for both RX and TX, change the default to 256
- change the sysctl usage to be common, and now to be called
during init for each ring.
- the TX limit is not yet used, but the changes in the last
patch in this series uses the value.
- the motivation behind these changes is to improve data
locality in the final code.
- rxeof interface changes since it now gets limit from the
ring struct
Before the change directory descriptor was totally ignored,
so the relative path argument was appended to current working
directory path and not to the path provided by descriptor, thus
wrong paths were stored in audit logs.
Now that we use directory descriptor in vfs_lookup, move
AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2() calls to the place where
we hold file descriptors table lock, so we are sure paths will
be resolved according to the same directory in audit record and
in actual operation.
Jack F Vogel [Fri, 30 Nov 2012 23:13:56 +0000 (23:13 +0000)]
Patch #6 Whitespace cleanup, and removal of some very old
defines (at Gleb's request). Also, change the defines around
the old transmit code to IXGBE_LEGACY_TX, I do this to make
it possible to define this regardless of the OS level (it is
not defined by default). There are also a couple changed
comments for clarity.
Jack F Vogel [Fri, 30 Nov 2012 23:06:27 +0000 (23:06 +0000)]
Patch #5 Cleanup unused IEEE1588 code fragments, the day may
come when this feature gets implemented, but its not here yet
and I see no reason to leave this laying around.
Currently when we discover that trail file is greater than configured
limit we send AUDIT_TRIGGER_ROTATE_KERNEL trigger to the auditd daemon
once. If for some reason auditd didn't rotate trail file it will never
be rotated.
Change it by sending the trigger when trail file size grows by the
configured limit. For example if the limit is 1MB, we will send trigger
on 1MB, 2MB, 3MB, etc.
This is also needed for the auditd change that will be committed soon
where auditd may ignore the trigger - it might be ignored if kernel
requests the trail file to be rotated too quickly (often than once a second)
which would result in overwriting previous trail file.
Sponsored by: FreeBSD Foundation (auditdistd)
MFC after: 2 weeks
Currently on each record write we call VFS_STATFS() to get available space
on the file system as well as VOP_GETATTR() to get trail file size.
We can assume that trail file is only updated by the audit worker, so instead
of asking for file size on every write, get file size on trail switch only
(it should be zero, but it's not expensive) and use global variable audit_size
protected by the audit worker lock to keep track of trail file's size.
This eliminates VOP_GETATTR() call for every write. VFS_STATFS() is satisfied
from in-memory data (mount->mnt_stat), so shouldn't be expensive.
Sponsored by: FreeBSD Foundation (auditdistd)
MFC after: 2 weeks
Jack F Vogel [Fri, 30 Nov 2012 22:54:14 +0000 (22:54 +0000)]
Patch #4 - this does two things, it removes a number of statistics,
these are FCOE stats (fiber channel over ethernet), something that
FreeBSD does not yet have, they were mistaken for flow control by
the implementor I believe. Secondly, the real flow control stats
are oddly named with a 'link' tag on the front, it was requested
by my validation engineer to make these stats have the same name as
the igb driver for clarity and that seemed reasonable to me.
Jack F Vogel [Fri, 30 Nov 2012 22:33:21 +0000 (22:33 +0000)]
Patch #2 - remove OACTIVE and DEPLETED notions from the
multiqueue code, this functionality has proven to be more
trouble than it was worth. Thanks to Gleb for a second
critical look over my code and help in the patches!
Allow OpenSSL to use arc4random(3) on FreeBSD. arc4random(3) was modified
some time ago to use sysctl instead of /dev/random to get random data,
so is now much better choice, especially for sandboxed processes that have
no direct access to /dev/random.
* Global IPFW_DYN_LOCK() is changed to per-bucket mutex.
* State expiration is done in ipfw_tick every second.
* No expiration is done on forwarding path.
* hash table resize is done automatically and does not flush all states.
* Dynamic UMA zone is now allocated per each VNET
* State limiting is now done via UMA(9) api.
- Implement "fdt mres" sub-command that prints reserved memory regions
- Add "fdt addr" subcommand that lets you specify preloaded blob address
- Do not pre-initialize blob for "fdt addr"
- Do not try to load dtb every time fdt subcommand is issued,
do it only once
- Change the way DTB is passed to kernel. With introduction of "fdt addr"
actual blob address can be not virtual but physical or reside in
area higher then 64Mb. ubldr should create copy of it in kernel area
and pass pointer to this newly allocated buffer which is guaranteed to work
in kernel after switching on MMU.
- Convert memreserv FDT info to "memreserv" property of root node
FDT uses /memreserve/ data to notify OS about reserved memory areas.
Technically it's not real property, it's just data blob, sequence
of <start, size> pairs where both start and size are 64-bit integers.
It doesn't fit nicely with OF API we use in kernel, so in order to unify
thing ubldr converts this data to "memreserve" property using the same
format for addresses and sizes as /memory node.
Add fdt_get_reserved_regions function. API is simmilar to fdt_get_mem_regions
It returns memory regions restricted from being used by kernel. These
regions are dfined in "memreserve" property of root node in the same
format as "reg" property of /memory node