MFC r197445:
Let fall down in the hard path (thus handling shared waiters wakeup
correctly) for the shared waiters also in the rwlock held in shared mode
as well, fixing possible deadlocks.
Please note that this is a special condition as we want this fix in
before RC2 as we assume it is critical and so it has been handled
as an instant-merge. For the STABLE_7 branch, 1 week before the MFC
is assumed.
MFC 197350:
Re-remove the IBM0057 ID used for PS/2 mouse controllers. The asl for the
61p includes the hotkey device as IBM0068 and the mouse as IBM0057 similar
to other systems.
MFC r197099: pci(4): don't perform maximum register number check
Different sub-kinds of PCI buses may have different rules and
thus it is up for the bus backends to do proper input checks.
For example, PCIe allows configuration register numbers < 0x1000,
while for PCI proper the limit is 0x100.
And, in fact, the buses already do the checks.
Add a few SCSI controllers to GENERIC that can be found in Powermacs.
This allows installation onto SCSI disks as shipped, for example,
as an option with the Powermac G3.
MFC revs 197129,197130,197132:
Fixes to mcast userland API.
--
Fix an API issue in leave processing for IPv4 multicast groups.
* Do not assume that the group lookup performed by imo_match_group()
is valid when ifp is NULL in this case.
* Instead, return EADDRNOTAVAIL if the ifp cannot be resolved for the
membership we are being asked to leave.
Caveat user:
* The way IPv4 multicast memberships are implemented in the inpcb layer
at the moment, has the side-effect that struct ip_moptions will
still hold the membership, under the old ifp, until ip_freemoptions()
is called for the parent inpcb.
* The underlying issue is: the inpcb layer does not get notification
of ifp being detached going away in a thread-safe manner.
This is non-trivial to fix.
--
Fix an obvious logic error in the IPv4 multicast leave processing,
where the filter mode vector was not updated correctly after the leave.
--
Tighten input checking in inp_join_group():
* Don't try to use the source address, when its family is unspecified.
* If we get a join without a source, on an existing inclusive
mode group, this is an error, as it would change the filter mode.
Fix a problem with the handling of in_mfilter for new memberships:
* Do not rely on imf being NULL; it is explicitly initialized to a
non-NULL pointer when constructing a membership.
* Explicitly initialize *imf to EX mode when the source address
is unspecified.
This fixes a problem with in_mfilter slot recycling in the join path.
--
Don't allow joins w/o source on an existing group.
This is almost always pilot error.
We don't need to check for group filter UNDEFINED state at t1,
because we only ever allocate filters with their groups, so we
unconditionally reject such calls with EINVAL.
Trying to change the active filter mode w/o going through IP_MSFILTER
is also disallowed.
Deals with the case described in PR 137164 upfront, cumulative
with the fix in svn rev 197132 which only calls imo_match_source()
if the source address family was not unspecified.
--
Revision 197136 has a text conflict, however it is a comment only change.
PR: 137164, 138689, 138690, 138691
Submitted by: Stef Walter (with fixups)
Approved by: re (kib)
MFC r196962: Fix /usr/bin/unzip: A bug deep in libarchive's read-ahead logic
(incorrect handling of zero-length reads before the copy buffer is
allocated) is masked by the iso9660 taster. Tar and cpio both enable
that taster so were protected from the bug; unzip is susceptible.
This both fixes the bug and updates the test harness to exercise
this case.
Submitted by: Ed Schouten diagnosed the bug and drafted a patch
Approved by: re (kib)
- Prevent a panic on modern controllers by increasing CISS_MAX_PHYSTGT to 256
- Fix MSI and PERFORMANT interrupt programming. Fixes hang on boot.
- Fix locking bugs in ioctl handler
Most of this has been soaking at Yahoo for several months, if not longer. The
quick MFC is due to the impending 8.0-RC1 build.
MFC 197257:
Fix a bug reported by Daniel Mentz:
When authenticating DATA chunks some DATA chunks
might get stuck when the MTU gets decreased via
an ICMP message.
Fixes two bugs:
1) A lock issue, if we ever had to try again
we would double lock the INP lock.
2) We were allowing (at wrap) associd 0... which really
we cannot allow since 0 normally means in most socket
API calls that we are wishing to effect something on
the INP not TCB.
Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.
The bootp code installs an interface address and the nfs client
module tries to install the same address again. This extra code
is removed, which was discovered by the removal of a call to
in_ifscrub() in r196714. This call to in_ifscrub is put back here
because the SIOCAIFADDR command can be used to change the prefix
length of an existing alias.
- Routing messages are not generated when adding and removing
interface address aliases.
- Loopback route installed for an interface address alias is
not deleted from the routing table when that address alias
is removed from the associated interface.
- Function in_ifscrub() is called extraneously.
Previously local end of point-to-point interface is not reachable
within the system that owns the interface. Packets destined to
the local end point leak to the wire towards the default gateway
if one exists. This behavior is changed as part of the L2/L3
rewrite efforts. The local end point is now reachable within the
system. The inpcb code needs to consider this fact during the
address selection process.
MFC r197224:
Use explicit int values for the device states in order to allow, if
necessary, in the future, adds of new states without breaking ABI
between revisions.
Please note that this is a special condition as we want this fix in
before RC1 as we assume it is critical and so it has been handled
as an instant-merge.
MFC r197223:
Fix sched_switch_migrate() by assuming locks cannot be shared and a
deadlock between 3 different threads by acquiring both runqueue locks
when doing the migration.
Please note that this is a special condition as we want this fix in
before RC1 as we assume it is critical and so it has been handled
as an instant-merge. For the STABLE_7 branch, 1 week before the MFC
is assumed.
MFC r196888:
The clear_remove() and clear_inodedeps() call vn_start_write(NULL, &mp,
V_NOWAIT) on the non-busied mount point. Unmount might free ufs-specific
mp data, causing ffs_vgetf() to access freed memory.
Remove 'ad:' prefix from disk serial number. We don't want serial number
to change when we reconnect the disk in a way that it is accessible through
CAM for example.
Discussed with: trasz
Simplify g_disk_ident_adjust() function and allow any printable character
in serial number.
Discussed with: trasz
Obtained from: Wheel Sp. z o.o. (http://www.wheel.pl)
Make serial numbers of daX disks visible by GEOM.
No objections from: scottl
Obtained from: Wheel Sp. z o.o. (http://www.wheel.pl)
r196662:
Add missing mountpoint vnode locking.
This fixes panic on assertion with DEBUG_VFS_LOCKS and vfs.usermount=1 when
regular user tries to mount dataset owned by him.
r196702:
Remove empty directory.
r196703:
Backport the 'dirtying dbuf' panic fix from newer ZFS version.
Reported by: Thomas Backman <serenity@exscape.org>
r196919:
bzero() on-stack argument, so mutex_init() won't misinterpret that the
lock is already initialized if we have some garbage on the stack.
PR: kern/135480
Reported by: Emil Mikulic <emikulic@gmail.com>
r196927:
Changing provider size is not really supported by GEOM, but doing so when
provider is closed should be ok.
When administrator requests to change ZVOL size do it immediately if ZVOL
is closed or do it on last ZVOL close.
PR: kern/136942
Requested by: Bernard Buri <bsd@ask-us.at>
r196928:
Teach zdb(8) how to obtain GEOM provider size.
PR: kern/133134
Reported by: Philipp Wuensche <cryx-freebsd@h3q.com>
r196943:
- Avoid holding mutex around M_WAITOK allocations.
- Add locking for mnt_opt field.
r196944:
Don't recheck ownership on update mount. This will eliminate LOR between
vfs_busy() and mount mutex. We check ownership in vfs_domount() anyway.
Noticed by: kib
Reviewed by: kib
r196947:
Defer thread start until we set priority.
Reviewed by: kib
r196950:
Fix detection of file system being shared. Now zfs unshare/destroy/rename
command will properly remove exported file systems.
r196953:
When snapshot mount point is busy (for example we are still in it)
we will fail to unmount it, but it won't be removed from the tree,
so in that case there is no need to reinsert it.
Reported by: trasz
r196954:
If we have to use avl_find(), optimize a bit and use avl_insert() instead of
avl_add() (the latter is actually a wrapper around avl_find() + avl_insert()).
Fix similar case in the code that is currently commented out.
r196965:
Fix reference count leak for a case where snapshot's mount point is updated.
r196978:
Call ZFS_EXIT() after locking the vnode.
r196979:
On FreeBSD we don't have to look for snapshot's mount point,
because fhtovp method is already called with proper mount point.
r196980:
When we automatically mount snapshot we want to return vnode of the mount point
from the lookup and not covered vnode. This is one of the fixes for using .zfs/
over NFS.
r196982:
We don't export individual snapshots, so mnt_export field in snapshot's
mount point is NULL. That's why when we try to access snapshots over NFS
use mnt_export field from the parent file system.
r196985:
Only log successful commands! Without this fix we log even unsuccessful
commands executed by unprivileged users. Action is not really taken, but it is
logged to pool history, which might be confusing.
Reported by: Denis Ahrens <denis@h3q.com>
r196992:
Implement __assert() for Solaris-specific code. Until now Solaris code was
using Solaris prototype for __assert(), but FreeBSD's implementation.
Both take different arguments, so we were either core-dumping in assert()
or printing garbage.
Reported by: avg
r197131:
Tighten up the check for race in zfs_zget() - ZTOV(zp) can not only contain
NULL, but also can point to dead vnode, take that into account.
PR: kern/132068
Reported by: Edward Fisk <7ogcg7g02@sneakemail.com>, kris
Fix based on patch from: Jaakko Heinonen <jh@saunalahti.fi>
r197133:
- Protect reclaim with z_teardown_inactive_lock.
- Be prepared for dbuf to disappear in zfs_reclaim_complete() and check if
z_dbuf field is NULL - this might happen in case of rollback or forced
unmount between zfs_freebsd_reclaim() and zfs_reclaim_complete().
- On forced unmount wait for all znodes to be destroyed - destruction can be
done asynchronously via zfs_reclaim_complete().
r197150:
There is a bug where mze_insert() can trigger an assert() of inserting
the same entry twice. This bug is not fixed yet, but leads to situation
where when try to access corrupted directory the kernel will panic.
Until the bug is properly fixed, try to recover from it and log that it
happened.
r197152:
Extend scope of the z_teardown_lock lock for consistency and "just in case".
r197153:
When zfs.ko is compiled with debug, make sure that znode and vnode point at
each other.
r197167:
Work-around READDIRPLUS problem with .zfs/ and .zfs/snapshot/ directories
by just returning EOPNOTSUPP. This will allow NFS server to fall back to
regular READDIR.
Note that converting inode number to snapshot's vnode is expensive operation.
Snapshots are stored in AVL tree, but based on their names, not inode numbers,
so to convert inode to snapshot vnode we have to interate over all snalshots.
This is not a problem in OpenSolaris, because in their READDIRPLUS
implementation they use VOP_LOOKUP() on d_name, instead of VFS_VGET() on
d_fileno as we do.
r197177:
Support both case: when snapshot is already mounted and when it is not yet
mounted.
r197200:
Modify mount(8) to skip MNT_IGNORE file systems by default, just like df(1)
does. This is not POLA violation, because there is no single file system in the
base that use MNT_IGNORE currently, although ZFS snapshots will be mounted with
MNT_IGNORE after next commit.
Reviewed by: kib
r197201:
- Mount ZFS snapshots with MNT_IGNORE flag, so they are not visible in regular
df(1) and mount(8) output. This is a bit smilar to OpenSolaris and follows
ZFS route of not listing snapshots by default with 'zfs list' command.
- Add UPDATING entry to note that ZFS snapshots are no longer visible in
mount(8) and df(1) output by default.
MFC 197062:
Don't malloc a buffer while holding the prison0 mutex. Instead, use a loop
where we figure out the hostname length under the lock, malloc the buffer
with the lock dropped, then recheck the length under the lock and loop again
if the buffer is now too small.
MFC r197048:
Add LK_NOWITNESS to the vn_lock() calls done on newly created nfs
vnodes, since these nodes are not linked into the mount queue and,
as such, the vn_lock() cannot cause a deadlock so LORs are harmless.
Suggested by: kib
Approved by: re (kensmith), kib (mentor)
MFC r196651: AM/PM date format for ja_JP.eucJP and ja_JP.SJIS were
localized by r193869. However, ja_JP.UTF-8 wasn't. So, reflect it
to ja_JP.UTF-8 as well.
MFC r196475:
- Add AS lookup functionality to traceroute6(8) as well.
- Support for IPv6 transport for AS lookup.
- Introduce $RA_SERVER to set whois server.
- Support for 4 byte ASN.
- ANSIfy function declaration in as.c.
MFC r197031:
Unlock the image vnode around the call of pmc PMC_FN_PROCESS_EXEC hook.
The hook calls vn_fullpath(9), that should not be executed with a vnode
lock held.
This fixes a bug where the value set by SCTP_PARTIAL_DELIVERY_POINT
was not honored, if the socket buffer size was not 4 times that large.
MFC of 196509.
This fixes kern/138516, an mbuf leak in both the em
and igb driver, when a transmit fails the packet/mbuf
was not being requeued. Thanks to those that pointed
this problem out.
When joining a multicast group, the inp_lookup_mcast_ifp call
does a KASSERT that the group address is multicast, so the
check if this is indeed true and eventually return a EINVAL if not,
should be done before calling inp_lookup_mcast_ifp. This fixes a kernel
crash when calling setsockopt (sock, IPPROTO_IP, IP_ADD_MEMBERSHIP,...)
with invalid group address.
MFC r196942:
> Bring the layout of package-split.py more in line with where we're going
> with packages on the release media. It looks like we'll be putting just
> the doc packages on the new "memory stick" image as well as disc1. There
> will be no other packages on the CDROM-sized media. The DVD sized media
> will include the doc packages plus whatever other packages we decide to
> make part of the release.
>
> This commit just brings the basic structure in line with being able to
> do this. We still need to discuss with various people exactly which
> packages will be included on the DVD.
>
> If the environement variable "PKG_DVD" is set a tree suitable for the
> DVD media is generated. Otherwise a tree suitable for the "memory stick"
> and disc1 is generated.
MFC r196920:
insmntque_stddtr() clears vp->v_data and resets vp->v_op to
dead_vnodeops before calling vgone(). Revert r189706 and corresponding
part of the r186560.
MFC r196887:
In fhopen, vfs_ref() the mount point while vnode is unlocked, to prevent
vn_start_write(NULL, &mp) from operating on potentially freed or reused
struct mount *.
Adaptive spinning for locking primitives, in read-mode, have some tuning
SYSCTLs which are inappropriate for a daily use of the machine (mostly
useful only by a developer which wants to run benchmarks on it).
Remove them before the release as long as we do not want to ship with
them in.
Now that the SYSCTLs are gone, instead than use static storage for some
constants, use real numeric constants in order to avoid eventual compiler
dumbiness and the risk to share a storage (and then a cache-line) among
CPUs when doing adaptive spinning together.
Pleasse note that the sys/linker_set.h inclusion in lockmgr and sx lock
support could have been gone, but re@ preferred them to be in order to
minimize the risk of problems on future merging.
Please note that this patch is not a MFC, but an 'edge case' as commit
directly to stable/8, which creates a diverging from HEAD.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
Approved by: re (kib)
MFC r196772:
fix adaptive spinning in lockmgr by using correctly GIANT_RESTORE and
continue statement and improve adaptive spinning for sx lock by just
doing once GIANT_SAVE.
Make LRO turned off uncategorically for devices
attached to the bridge, rather than just in the case
when some device cannot do TSO. Customer tests have
shown that even when all devices can do TSO that LRO
will cause problems when bridging.
MFC 196745:
Don't attempt to bind the current thread to the CPU an IRQ is bound to
when removing an interrupt handler from an IRQ during shutdown. During
shutdown we are already bound to CPU 0 and this was triggering a panic.
MFC r196835:
Allow a jail's name to be the same as its jid (which is the default if
no name is specified), and let a numeric name specify the jid for a new
jail when the jid isn't otherwise set. Still disallow other numeric
names.
Reviewed by: zec
Approved by: re (kib), bz (mentor)
MFC r196730:
Remove the altkstacks, instead instantiate threads with kernel stack
allocated with the right size from the start. For the thread that has
kernel stack cached, verify that requested stack size is equial to the
actual, and reallocate the stack if sizes differ.
Introduce separate kernel stack cache that keeps some limited amount of
preallocated kernel stacks to lower the latency of thread allocation.
Not a merge: instead of removing td_altkstack* members of struct thread,
replace them with placeholders to keep struct thread layout on the
stable branch.
Also, record r196640, r196644 and r196648 as merged.
MFC r196692:
Make the mnt_writeopcount and mnt_secondary_writes counters,
used by the suspension code, not greater then mnt_ref reference
counter value.
MFC r196733:
Fix mount reference leak when V_XSLEEP is specified to vn_start_write().
Do the first step in removing lukemftpd from the base system. Disconnect
it from the build.
If you are using the FTP daemon, please consider using the port ftp/tnftpd
which is the same FTP server, but newer and might have more/better
functionality.
This results in us providing only one ftp daemon by default.
MFC r196831:
Add to `camcontrol cmd` support for sending arbitrary ATA commands.
It could be used for broad range of tasks, such as configuring drive
power management, caching, security and any other features and tasks,
not supported by existing drivers.
This patch fixes the following issues:
- Interface link-local address is not reachable within the
node that owns the interface, this is due to the mismatch
in address scope as the result of the installed interface
address loopback route. Therefore for each interface
address loopback route, the rt_gateway field (of AF_LINK
type) will be used to track which interface a given
address belongs to. This will aid the address source to
use the proper interface for address scope/zone validation.
- The loopback address is not reachable. The root cause is
the same as the above.
- Empty nd6 entries are created for the IPv6 loopback addresses
only for validation reason. Doing so will eliminate as much
of the special case (loopback addresses) handling code
as possible, however, these empty nd6 entries should not
be returned to the userland applications such as the
"ndp" command.
Since both of the above issues contain common files, these
files are committed together.
This patch fixes an address scope violation. Considering the
scenario where an anycast address is assigned on one interface,
and a global address with the same scope is assigned on another
interface. In other words, the interface owns the anycast
address has only the link-local address as one other address.
Without this patch, "ping6" the anycast address from another
station will observe the source address of the returned ICMP6
echo reply has the link-local address, not the global address
that exists on the other interface in the same node.
MFC r196866:
In the NEXTADDR macro use SA_SIZE() rather than directly using
sizeof(), as introduced in r186119, for advancing the current
position into the buffer.
See comment in net/route.h for a description of the difference.
This makes ndp -s work again.
Fix regression introduced with NFSv4 ACL support - make acl_to_text(3)
and acl_calc_mask(3) return error instead of crashing when acl passed
to them is NULL.
Submitted by: markus
Reviewed by: rwatson
Approved by: re (kib)
MFC r196529:
Rather than having enabled/disabled, implement a max queue depth.
While usually not an issue, this firewalls bugs in the code that may
run us out of memory.
Fix a memory exhaustion in the case where devctl was disabled, but the
link was bouncing. The check to queue was in the wrong place.
Implement a new sysctl hw.bus.devctl_queue to control the depth. Make
compatibility hacks for hw.bus.devctl_disable to ease transition.
This patch seperates the control of header split from LRO (which it
was previously dependent on), LRO gets turned off when bridging but
its been found that header split is still a performance win in that case.
Secondly, there was some interface specific control in stats code that
has been missing, and a logic error that resulted in bogus reporting.
Thanks to Manish and John of LineRateSystems for the report and help in
this code.
MFC r196721:
Make sure rx descriptor ring align on 16 bytes. I guess the
alignment requirement could be multiple of 4 bytes but I think
using descriptor size would make intention clearer.
Previously the size of rx descriptor was not power of 2 so it
caused panic in bus_dmamem_alloc(9).
Reported by: Jeff Blank (jb000003 <> mr-happy dot com)
Approved by: re (kib)
Add clarifications to the kproc and kthread manpages and link
the kthread_create(9) man page to the kproc(9) page as it has migrated and
people looking for it may need a hand to find its new name.
MFC 196705 and 196707:
- Improve pmap_change_attr() on i386 so that it is able to demote a large
(2/4MB) page into 4KB pages as needed. This should be fairly rare in
practice.
- Simplify pmap_change_attr() a bit:
- Always calculate the cache bits instead of doing it on-demand.
- Always set changed to TRUE rather than only doing it if it is false.
MFC r196738:
In case an upper layer protocol tries to send a packet but the
L2 code does not have the ethernet address for the destination
within the broadcast domain in the table, we remember the
original mbuf in `la_hold' in arpresolve() and send out a
different packet with an arp request.
In case there will be more upper layer packets to send we will
free an earlier one held in `la_hold' and queue the new one.
Once we get a packet in, with which we can perfect our arp table
entry we send out the original 'on hold' packet, should there
be any.
Rather than continuing to process the packet that we received,
we returned without freeing the packet that came in, which
basically means that we leaked an mbuf for every arp request
we sent.
Rather than freeing the received packet and returning, continue
to process the incoming arp packet as well.
This should (a) improve some setups, also proxy-arp, in case it was an
incoming arp request and (b) resembles the behaviour FreeBSD had
from day 1, which alignes with RFC826 "Packet reception" (merge case).
Rename 'm0' to 'hold' to make the code more understandable as
well as diffable to earlier versions more easily.
Handle the link-layer entry 'la' lock comepletely in the block
where needed and release it as early as possible, rather than
holding it longer, down to the end of the function.
Found by: pointyhat, ns1
Bug hunting session with: erwin, simon, rwatson
Tested by: simon on cluster machines
Reviewed by: ratson, kmacy, julian
MFC r196653:
Make sure FreeBSD binaries without .note.ABI-tag section work
correctly and do not match a colliding Debian GNU/kFreeBSD
brandinfo statements.
For this mark the Debian GNU/kFreeBSD brandinfo that it must have
an .note.ABI-tag section and ignore the old EI_OSABI brandinfo
when comparing a possibly colliding set of options.
Due to SYSINIT we add the brandinfo in a non-deterministic order,
so native FreeBSD is not always first. We may want to consider
to force native FreeBSD to come first as well.
The only way a problem could currently be noticed is when running an
i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD
brandinfo was matched first, as the fallback to ld-elf32.so.1 does
not exist in that case.
Reported and tested by: ticso
In collaboration with: kib
MFC after: 3 days
Approved by: re (rwatson)
Fix the conformance of poll(2) for sockets after r195423 by
returning POLLHUP instead of POLLIN for several cases. Now, the
tools/regression/poll results for FreeBSD are closer to that of the
Solaris and Linux.
Also, improve the POSIX conformance by explicitely clearing POLLOUT
when POLLHUP is reported in pollscan(), making the fix global.
Submitted by: bde
Reviewed by: rwatson
MFC r196556
Fix poll() on half-closed sockets, while retaining POLLHUP for fifos.
This reverts part of r196460, so that sockets only return POLLHUP if both
directions are closed/error. Fifos get POLLHUP by closing the unused
direction immediately after creating the sockets.
The tools/regression/poll/*poll.c tests now pass except for two other
things:
- if POLLHUP is returned, POLLIN is always returned as well instead of
only when there is data left in the buffer to be read
- fifo old/new reader distinction does not work the way POSIX specs it
Reviewed by: kib, bde
MFC r196554
Add some tests for poll(2)/shutdown(2) interaction.