MFC r224107:
Clear pending ifnet events, in an attempt at preventing
ng_ether_link_state() from being dispatched after we have
cleared our IFP2NG(ifp).
MFC: r223774
The algorithm used by nfscl_getopen() could have resulted in
multiple instances of the same lock_owner when a process both
inherited an open file descriptor plus opened the same file itself.
Since some NFSv4 servers cannot handle multiple instances of
the same lock_owner string, this patch changes the algorithm
used by nfscl_getopen() in the new NFSv4 client to keep that
from happening. The new algorithm is simpler, since there is
no longer any need to ascend the process's parentage tree because
all NFSv4 Closes for a file are done at VOP_INACTIVE()/VOP_RECLAIM(),
making the Opens indistinct w.r.t. use with Lock Ops.
This problem was discovered at the recent NFSv4 interoperability
Bakeathon.
Add an option to have a fail point term only execute when run by a
specified pid. This is helpful for automated testing involving a global
knob that would otherwise be executed by many other threads.
MFC: r223747
Modify the new NFSv4 client so that it appends a file handle
to the lock_owner4 string that goes on the wire. Also, add
code to do a ReleaseLockOwner Op on the lock_owner4 string
before a Close. Apparently not all NFSv4 servers handle multiple
instances of the same lock_owner4 string, at least not in a
compatible way. This patch avoids having multiple instances,
except for one unusual case, which will be fixed by a future commit.
Found at the recent NFSv4 interoperability Bakeathon.
MFC r223886:
Implement a helper functions to locally set thread-private flag, and
restore it to the previous state. Note that only setting a flag locally
is supported.
MFC r223887:
Use helper functions instead of manually managing TDP_INBDFLUSH.
MFC r223888:
Use 'curthread_pflags' instead of 'thread_pflags' to signify that only
curthread can be operated upon.
Remember to unlock the peripheral prior to notifying the user. Make some
allocations M_NOWAIT so that we don't try and sleep with a nested non-sleepable
lock.
This makes the userland scsi_target begin to function again.
MFC 223870,223937:
- Note that -a, -C, -H, -j, and -z are also toggles.
- Add a leading space to the status messages output after toggling the
'C' and 'H' flags at runtime. This matches messages output for other
toggles which leave the first column in the message blank to hold the
cursor.
MFC 223510:
Don't die if either of INET or INET6 aren't in the running kernel.
Instead, report "protocol not supported" errors at runtime if a user
attempts to use a protocol that the kernel doesn't support.
MFC 223477,223597,223611:
- The recent change to increase the zfsboot size to 64k made a few BIOSes
unhappy (probably they don't handle crossing the 64k boundary, etc.).
Fix this by changing zfsldr to use a loop reading from the disk one
sector at a time. To avoid trashing the saved copy of the MBR which is
used for disk I/O, read zfsboot2 at address 0x9000. This has the
advantage that BTX no longer needs to be relocated as it is read into
the correct location. However, the loop to relocate zfsboot2.bin can
now cross a 64k boundary, so change it to use relative segments instead.
(This will need further work if zfsboot2.bin ever exceeds 64k.)
While here, stop storing a relocated copy of zfsldr at 0x700. This was
only used by the xread hack which has recently been removed (and even
that use was dubious). Also, include the BIOS error code as hex when
reporting read errors to aid in debugging.
- Remove the fake BPB from zfsldr. zfsldr doesn't support booting from
floppies, so it will not be used as the start of an emulated floppy
image on a bootable CD which is what the fake BPB was used for.
- Only check that EDD packet mode is available once at the start of
zfsldr rather than for each disk sector now that we read data in one
sector at a time. As a result, collapse the remaining bits of read
up into nread and rename nread to read.
MFC 223091:
Add location and pnpinfo strings for puc device ports. The location is
announced during boot and contains the port number. The pnpinfo string
lists the port type (PUC_TYPE_* constants).
Make activemap_write_start/complete check the keepdirty list, when
stating if we need to update activemap on disk. This makes keepdirty
serve its purpose -- to reduce number of metadata updates.
Discussed with: pjd
r223655, 223974:
Check the returned value of activemap_write_complete() and update matadata on
disk if needed. This should fix a potential case when extents are cleared in
activemap but metadata is not updated on disk.
MFC r223661:
Improve error reporting. Use corresponding error message when file to be
preprocessed is missing. Also suggest to use absolute pathname if -p
option is specified.
MFC: r223940: If one's message is longer than the buffer size, then we reset
'cnt' at the wrong point and the actual column # get out of sync across the
buffer size.
MFC: r223657
Fix the new NFSv4 client so that it doesn't fill the cached
mode attribute in as 0 when doing writes. The change adds
the Mode attribute plus the others except Owner and Owner_group
to the list requested by the NFSv4 Write Operation. This fixed
a problem where an executable file built by "cc" would get mode
0111 instead of 0755 for some NFSv4 servers.
Found at the recent NFSv4 interoperability Bakeathon.
mm [Tue, 12 Jul 2011 13:16:46 +0000 (13:16 +0000)]
MFC r223623:
Add a new "REFCOMPRESSRATIO" property.
For snapshots, this is the same as COMPRESSRATIO, but for
filesystems/volumes, the COMPRESSRATIO is based on the data "USED" (ie,
includes blocks in children, but not blocks shared with the origin).
This is needed to figure out how much space a filesystem would use if it
were not compressed (ignoring snapshots).
MFC r223862:
Permit ARP to proceed for IPv4 host routes for which the gateway is the
same as the host address. This already works fine for INET6 and ND6.
While here, remove two function pointers from struct lltable which are
only initialized but never used.
In userland, sign extend the offset for JA instructions.
We currently use that to implement "ip6 protochain", and "pc" might be
wider than "pc->k", in which case we need to arrange that "pc->k" be
sign-extended, by casting it to bpf_int32.
r221363:
Add "-a alignment" option to gpart(8). When it specified gpart(8)
tries to align partition start offset and size to be multiple of
alignment value.
r221967:
Some partitioning schemes want to have partitions that are aligned
with geometry. And they do recalculation of user specified parameters.
MBR, PC98, VTOC8, EBR schemes are doing that. For these schemes an
auto alignment feature (ie. gpart add -a alignment) would not work.
But it can work for GPT and BSD schemes. BSD scheme usualy is created
inside MBR, so we can use knowledge about offset of MBR partition to
calculate aligned values for BSD partitions.
Use "offset" attribute of the parent provider for better alignment.
r222263:
Fix calculation of alignment for odd values. Also do not change value
when it is already aligned.
r222264:
Simplify ALIGNDOWN macro.
r222630:
Use stripesize and stripeoffset in the automatic calculation of
partition offsets. If user requests specific alignment and
provider's stripesize is not zero, then use a least common multiple
from the stripesize and user specified value.
Also fix "gpart resize" implementation: do not try to align the partition
size, because the start offset may be not aligned. Instead align the
end offset and then calculate size. Also use stripesize and stripeoffset
for "gpart resize" command.
r222631:
Always use LCM when stripesize > 0.
r222819:
Do not use LCM from stripesize and user specified alignment value.
When user wants have specific alignment - do what user wants.
Use stripesize as alignment value in case, when some of gpart's
arguments are ommitted for automatic calculation.
Suggested by: mav
r223158:
Add "alignment" param to the request before calling gpart_autofill().
r223355:
The "size" param needs no adjusting to stripeoffset.
Log:
LibAliasInit() should allocate memory with M_WAITOK flag. Modify it
and its callers.
Log:
- Rewrite functions that copyin/out NAT configuration, so that they
calculate required memory size dynamically.
- Fix races on chain re-lock.
- Introduce new field to ip_fw_chain - generation count. Now utilized
only in the NAT configuration, but can be utilized wider in ipfw.
- Get rid of NAT_BUF_LEN in ip_fw.h
Add a special mount option "failok" to indicate that the administrator wants
the system to proceed to boot without bailing out into single user mode,
even when the file system can not be successfully mounted.
This option is implemented in mount(8) and not passed into kernel.
MFC: r221333
Remove usr/include/nfs/krpc.h and usr/include/nfs/nfsdiskless.h from
ObsoleteFiles.inc, since these files have been reincarnated in the new
NFS implementation.
Discussed with dim@.
MFC r222808:
Sync ng_nat with recent (r222806) ipfw_nat changes:
Make a behaviour of the libalias based in-kernel NAT a bit closer to
how natd(8) does work. natd(8) drops packets only when libalias returns
PKT_ALIAS_IGNORED and "deny_incoming" option is set, but ipfw_nat
always did drop packets that were not aliased, even if they should
not be aliased and just are going through.
Also add SCTP support: mark response packets to skip firewall processing.
MFC r222806:
Make a behaviour of the libalias based in-kernel NAT a bit closer to
how natd(8) does work. natd(8) drops packets only when libalias returns
PKT_ALIAS_IGNORED and "deny_incoming" option is set, but ipfw_nat
always did drop packets that were not aliased, even if they should
not be aliased and just are going through.
PR: kern/122109, kern/129093, kern/157379
Submitted by: Alexander V. Chernikov (previous version)
MFC: r223441
Plug an mbuf leak in the new NFS client that occurred when a
server replied NFS3ERR_JUKEBOX/NFS4ERR_DELAY to an rpc.
This affected both NFSv3 and NFSv4. Found during testing
at the recent NFSv4 interoperability Bakeathon.
MFC: r223436
Fix the new NFSv4 client so that it uses the same uid as
was used for doing a mount when performing system operations
on AUTH_SYS mounts. This resolved an issue when mounting
a Linux server. Found during testing at the recent
NFSv4 interoperability Bakeathon.
MFC r222582:
O_FORWARD_IP is only action which depends from the result of lookup of
dynamic rules. We are doing forwarding in the following cases:
o For the simple ipfw fwd rule, e.g.
fwd 10.0.0.1 ip from any to any out xmit em0
fwd 127.0.0.1,3128 tcp from any to any 80 in recv em1
o For the dynamic fwd rule, e.g.
fwd 192.168.0.1 tcp from any to 10.0.0.3 3333 setup keep-state
When this rule triggers it creates a dynamic rule, but this
dynamic rule should forward packets only in forward direction.
o And the last case that does not work before - simple fwd rule which
triggers when some dynamic rule is already executed.
MFC r223660:
Initialize elements of state array when creating the GPT table.
This fixes the problem, when the secondary GPT header is not erased when
partition table destroyed. Move equal operations from g_part_gpt_create
and g_part_gpt_recover to the separate function g_gpt_set_defaults.
ALL BIND USERS ARE ENCOURAGED TO UPGRADE IMMEDIATELY
This update addresses the following vulnerability:
CVE-2011-2464
=============
Severity: High
Exploitable: Remotely
Description:
A defect in the affected BIND 9 versions allows an attacker to remotely
cause the "named" process to exit using a specially crafted packet. This
defect affects both recursive and authoritative servers. The code location
of the defect makes it impossible to protect BIND using ACLs configured
within named.conf or by disabling any features at compile-time or run-time.
MFC r223608:
Disable microcode loading for 82550 and 82550C controllers. Loading
the microcode caused SCB timeouts. Linux driver does not allow
microcode loading for these controllers and jfv also confirmed that
there is no need to do and it shouldn't.
MFC: r223382
Change the NFSv4 nfsuserd daemon so that it doesn't preload the
uid<->username mapping cache with an entry when another entry
for that uid is already loaded. This fixes a case where the
mapping of "toor" would replace "root" when the daemon was started,
resulting in no mapping for "root" until the cache entry for "toor"
timed out.
The algorithm is inefficient, but since it is only done once when
the daemon is started up, I don't think that's an issue.
MFC: r223373
Fix the new NFSv4 server so that it checks for VREAD_ACL when
a client does a Getattr for an ACL and not VREAD_ATTRIBUTES.
This was found during the recent NFSv4 interoperability Bakeathon.
MFC: r223349
Fix the new NFSv4 server so that it only allows Lookup of
directories and symbolic links when traversing non-exported
file systems. Found during the recent NFSv4 interoperability
Bakeathon.
MFC: r223348
Fix the new NFSv4 server so that it allows Access and Readlink
operations while traversing non-exported file systems. This is
required for some non-FreeBSD clients to do NFSv4 mounts. Found during
the recent NFSv4 interoperability Bakeathon.
MFC: r223312
Fix a number of places where the new NFS server did not
lock the mutex when manipulating rc_flag in the DRC cache.
This is believed to fix a hung server that was reported
to the freebsd-fs@ list on June 9 under the subject heading
"New NFS server stress test hang", where all the threads
were waiting for the RC_LOCKED flag to clear.
MFC: r223309
Fix the kgssapi so that it can be loaded as a module. Currently
the NFS subsystems use five of the rpcsec_gss/kgssapi entry points,
but since it was not obvious which others might be useful, all
nineteen were included. Basically the nineteen entry points are
set in a structure called rpc_gss_entries and inline functions
defined in sys/rpc/rpcsec_gss.h check for the entry points being
non-NULL and then call them. A default value is returned otherwise.
When dropping privileges prefer capsicum over chroot+setgid+setuid.
We can use capsicum for secondary worker processes and hastctl.
When working as primary we drop privileges using chroot+setgid+setuid
still as we need to send ioctl(2)s to ggate device, for which capsicum
doesn't allow (yet).
r221898 (pjd):
When using capsicum to sanbox, still use other methods first, just in case
one of them have some problems.
r221899 (pjd):
Currently we are unable to use capsicum for the primary worker process,
because we need to do ioctl(2)s, which are not permitted in the capability
mode. What we do now is to chroot(2) to /var/empty, which restricts access
to file system name space and we drop privileges to hast user and hast
group.
This still allows to access to other name spaces, like list of processes,
network and sysvipc.
To address that, use jail(2) instead of chroot(2). Using jail(2) will restrict
access to process table, network (we use ip-less jails) and sysvipc (if
security.jail.sysvipc_allowed is turned off). This provides much better
separation.
r222224 (pjd):
To handle BIO_FLUSH and BIO_DELETE requests in secondary worker we need
to use ioctl(2). This is why we can't use capsicum for now to sandbox
secondary. Capsicum is still used to sandbox hastctl.
r223584 (pjd):
Log a warning if we cannot sandbox using capsicum, but only under debug level 1.
It would be too noisy to log it as a proper warning as CAPABILITIES are not
compiled into GENERIC by default.
r223585 (pjd):
Compile capsicum support only if HAVE_CAPSICUM is defined.
MFC r223227: rc.subr: Eliminate about 100 forks from the boot sequence.
With the current sh, placing eval in a command substitution always results
in a fork(), even if it is the only command and only executes a single
simple command. Therefore, avoid it where it can be avoided easily.
Side effect: values starting with a hyphen and all whitespace are preserved.
The values are defaults and names for rc.conf variables and messages to be
given about obsolete ones.
The change in the _echoonce function is not included in this MFC because
stable/8 does not have this function.
Since head/ and stable/8 have different handling for geom control request
parameters, r215941 should be modified to allow use -F option in the
"gpart restore" command - "force" parameter should be ascii string.
MFC r223522: sh(1): Improve documentation of shell patterns:
* Shell patterns are also for ${var#pat} and the like.
* An '!' by itself will not trigger pathname generation so do not call it a
meta-character, even though it has a special meaning directly after an
'['.
* Character ranges are locale-dependent.
* A '^' will complement a character class like '!' but is non-standard.
yongari [Wed, 29 Jun 2011 17:18:33 +0000 (17:18 +0000)]
MFC r223405:
Remove link state change callback handler. There is no need to
register both status change and link state change callbacks.
Implement checking valid link in state change callback and poll
active link state in vr_tick(). This allows immediate detection of
lost link as well as protecting driver from frequent link flips during
link renegotiation. taskq implementation was removed because driver
now needs to poll link state in vr_tick().
While I'm here do not report current link state if interface is not
running.
dim [Wed, 29 Jun 2011 16:43:44 +0000 (16:43 +0000)]
MFC r223579:
For some reason, contrib/traceroute/traceroute.c ensures MAXHOSTNAMELEN
is defined, but then proceeds to use a hardcoded maximum hostname length
of 64 anyway. Fix this by checking against MAXHOSTNAMELEN instead.
jhb [Wed, 29 Jun 2011 16:16:59 +0000 (16:16 +0000)]
MFC 223198:
- Use a dedicated task to handle deferred transmits from the if_transmit
method instead of reusing the existing per-queue interrupt task.
Reusing the per-queue interrupt task could result in both an interrupt
thread and the taskqueue thread trying to handle received packets on a
single queue resulting in out-of-order packet processing.
- Don't define igb_start() at all on 8.0 and where if_transmit is used.
Replace last remaining call to igb_start() with a loop to kick off
transmit on each queue instead.
- Call ether_ifdetach() earlier in igb_detach().
- Drain tasks and free taskqueues during igb_detach().
jhb [Wed, 29 Jun 2011 15:58:26 +0000 (15:58 +0000)]
MFC 221393,222930:
Reimplement how PCI-PCI bridges manage their I/O windows. Previously the
driver would verify that requests for child devices were confined to any
existing I/O windows, but the driver relied on the firmware to initialize
the windows and would never grow the windows for new requests. Now the
driver actively manages the I/O windows.
This is implemented by allocating a bus resource for each I/O window from
the parent PCI bus and suballocating that resource to child devices. The
suballocations are managed by creating an rman for each I/O window. The
suballocated resources are mapped by passing the bus_activate_resource()
call up to the parent PCI bus. Windows are grown when needed by using
bus_adjust_resource() to adjust the resource allocated from the parent PCI
bus. If the adjust request succeeds, the window is adjusted and the
suballocation request for the child device is retried.
When growing a window, the rman_first_free_region() and
rman_last_free_region() routines are used to determine if the front or
end of the existing I/O window is free. From using that, the smallest
ranges that need to be added to either the front or back of the window
are computed. The driver will first try to grow the window in whichever
direction requires the smallest growth first followed by the other
direction if that fails.
Subtractive bridges will first attempt to satisfy requests for child
resources from I/O windows (including attempts to grow the windows). If
that fails, the request is passed up to the parent PCI bus directly
however.
The PCI-PCI bridge driver will try to use firmware-assigned ranges for
child BARs first and only allocate a "fresh" range if that specific range
cannot be accommodated in the I/O window. This allows systems where the
firmware assigns resources during boot but later wipes the I/O windows
(some ACPI BIOSen are known to do this) to "rediscover" the original I/O
window ranges.
The ACPI Host-PCI bridge driver has been adjusted to correctly honor
hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge
makes a wildcard request for an I/O window range.
The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option
is enabled.
trociny [Tue, 28 Jun 2011 19:27:34 +0000 (19:27 +0000)]
MFC r222164, r222228, r222467, r223181:
r222164 (pjd):
Recognize HIO_FLUSH requests.
r222228 (pjd):
Keep statistics on number of BIO_READ, BIO_WRITE, BIO_DELETE and BIO_FLUSH
requests as well as number of activemap updates.
Number of BIO_WRITEs and activemap updates are especially interesting, because
if those two are too close to each other, it means that your workload needs
bigger number of dirty extents. Activemap should be updated as rarely as
possible.
r222467:
If READ from the local node failed we send the request to the remote
node. There is no use in doing this for synchronization requests.
r223181:
In HAST we use two sockets - one for only sending the data and one for
only receiving the data. In r220271 the unused directions were
disabled using shutdown(2).
Unfortunately, this broke automatic receive buffer sizing, which
currently works only for connections in ETASBLISHED state. It was a
root cause of the issue reported by users, when connection between
primary and secondary could get stuck.
Disable the code introduced in r220271 until the issue with automatic
buffer sizing is not resolved.
Reported by: Daniel Kalchev <daniel@digsys.bg>, danger, sobomax
Tested by: Daniel Kalchev <daniel@digsys.bg>, danger
edwin [Tue, 28 Jun 2011 10:46:02 +0000 (10:46 +0000)]
Bring iso3166 file in sync with HEAD
MFC of recent updates on the ISO3166 file:
r223633
- Remove AN again now that tzdata2011h has been imported.
r222094
- Put AN back after finding out that tzsetup(1) will complain that
it doesn't exist. It will be removed again once the tzdata distribution
files have been updated with the replacements for AN.
r222014
- Revert change to "MF" I made in r189767. I bet that at the time of r189767
I checked with http://www.iso.org/iso/country_codes/iso_3166_code_lists.htm
and "MF" was officially spelled in English as "Saint Martin" there, but now
that "SX" exists (for "Sint Maarten (Dutch part)") (nice official "English"
spelling!) they seem to have added a "(French part)" suffix to "MF". Since
this is also in line with Newsletter VI-1 (2007-09-21), catch up.
r222011
- ISO3166: Update for newsletters VI-7 and VI-8 from 2010
- Name change for SH
- BQ, CW, and SX replace AN