CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

Remove setpgid() call before executing child process.

Using a separate process group here is bad, since (for example) job
control in the TTY layer prevents interaction with the TTY, causing the
child process to hang.

Mentioned on: current@
MFC after: 2 weeks

Workaround strange situation when EDMA_RESQIP register returns zero instead
of proper value. It caused bunch of "EMPTY CRPB" messages and potentially
may cause premature requests completion, which could cause data corruption.
For most cases it seems enough to just reread register to get proper value.
To protect against worse cases - erase processed queue entries with
impossible values and ignore them if problem still happen.

Some style cleanup:
- remove commented debugging code;
- wrap long lines.

catch up manual pages with rename of vm_page_sleep_busy to vm_page_sleep_if_busy

Suggested by: alc
MFC after: 4 days

VOP_GETPAGES.9: clarify and correct description of parameters and requirements

In cooperation with alc and kib, who provided valuable insights and
suggestions.

Reviewed by: alc, kib (earlier version)
MFC after: 4 days

PG_BUSY -> VPO_BUSY, PG_WANTED -> VPO_WANTED in manual pages and comments

Reviewed by: alc
MFC after: 4 days

o Put missed w/space back.

Submitted by: Garrett Cooper
MFC after: 3 days

Revert revision 214007, I realized that MySQL wants to resolve
a silly rwlock deadlock problem, the deadlock is caused by writer
waiters, if a thread has already locked a reader lock, and wants to
acquire another reader lock, it will be blocked by writer waiters,
but we had already fixed it years ago.

- Don't include sx.h, it is not needed.
- Check NULL pointer, move timeout calculation code outside of
process lock.

Correct handling of shared interrupt in sis_intr(). r212116 incorrectly
released a drver lock for shared interrupt case such that it caused
panic. While I'm here check whether driver is still running before
serving TX/RX handler.

Reported by: Jerahmy Pocott < QUAKENET1 <> optusnet dot com dot au >
Tested by: Jerahmy Pocott < QUAKENET1 <> optusnet dot com dot au >
MFC after: 3 days

Add workaround for BCM5906 controller silicon bug. If device
receive two back-to-back send BDs with less than or equal to 8
total bytes then the device may hang. The two back-to-back send
BDs must be in the same frame for this failure to occur.
Thanks to davidch for detailed errata information.

Reviewed by: davidch

Improve the Xen para-virtualized device infrastructure of FreeBSD:

o Add support for backend devices (e.g. blkback)
o Implement extensions to the Xen para-virtualized block API to allow
   for larger and more outstanding I/Os.
o Import a completely rewritten block back driver with support for fronting
   I/O to both raw devices and files.
o General cleanup and documentation of the XenBus and XenStore support code.
o Robustness and performance updates for the block front driver.
o Fixes to the netfront driver.

Sponsored by: Spectra Logic Corporation

sys/xen/xenbus/init.txt:
Deleted: This file explains the Linux method for XenBus device
enumeration and thus does not apply to FreeBSD's NewBus approach.

sys/xen/xenbus/xenbus_probe_backend.c:
Deleted: Linux version of backend XenBus service routines.  It
was never ported to FreeBSD.  See xenbusb.c, xenbusb_if.m,
xenbusb_front.c xenbusb_back.c for details of FreeBSD's XenBus
support.

sys/xen/xenbus/xenbusvar.h:
sys/xen/xenbus/xenbus_xs.c:
sys/xen/xenbus/xenbus_comms.c:
sys/xen/xenbus/xenbus_comms.h:
sys/xen/xenstore/xenstorevar.h:
sys/xen/xenstore/xenstore.c:
Split XenStore into its own tree.  XenBus is a software layer built
on top of XenStore.  The old arrangement and the naming of some
structures and functions blurred these lines making it difficult to
discern what services are provided by which layer and at what times
these services are available (e.g. during system startup and shutdown).

sys/xen/xenbus/xenbus_client.c:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_probe.c:
sys/xen/xenbus/xenbusb.c:
sys/xen/xenbus/xenbusb.h:
Split up XenBus code into methods available for use by client
drivers (xenbus.c) and code used by the XenBus "bus code" to
enumerate, attach, detach, and service bus drivers.

sys/xen/reboot.c:
sys/dev/xen/control/control.c:
Add a XenBus front driver for handling shutdown, reboot, suspend, and
resume events published in the XenStore.  Move all PV suspend/reboot
support from reboot.c into this driver.

sys/xen/blkif.h:
New file from Xen vendor with macros and structures used by
a block back driver to service requests from a VM running a
different ABI (e.g. amd64 back with i386 front).

sys/conf/files:
Adjust kernel build spec for new XenBus/XenStore layout and added
Xen functionality.

sys/dev/xen/balloon/balloon.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/blkfront/blkfront.c:
sys/xen/xenbus/...
sys/xen/xenstore/...
o Rename XenStore APIs and structures from xenbus_* to xs_*.
o Adjust to use of M_XENBUS and M_XENSTORE malloc types for allocation
  of objects returned by these APIs.
o Adjust for changes in the bus interface for Xen drivers.

sys/xen/xenbus/...
sys/xen/xenstore/...
Add Doxygen comments for these interfaces and the code that
implements them.

sys/dev/xen/blkback/blkback.c:
o Rewrite the Block Back driver to attach properly via newbus,
  operate correctly in both PV and HVM mode regardless of domain
  (e.g. can be in a DOM other than 0), and to deal with the latest
  metadata available in XenStore for block devices.

o Allow users to specify a file as a backend to blkback, in addition
  to character devices.  Use the namei lookup of the backend path
  to automatically configure, based on file type, the appropriate
  backend method.

The current implementation is limited to a single outstanding I/O
at a time to file backed storage.

sys/dev/xen/blkback/blkback.c:
sys/xen/interface/io/blkif.h:
sys/xen/blkif.h:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
Extend the Xen blkif API: Negotiable request size and number of
requests.

This change extends the information recorded in the XenStore
allowing block front/back devices to negotiate for optimal I/O
parameters.  This has been achieved without sacrificing backward
compatibility with drivers that are unaware of these protocol
enhancements.  The extensions center around the connection protocol
which now includes these additions:

o The back-end device publishes its maximum supported values for,
  request I/O size, the number of page segments that can be
  associated with a request, the maximum number of requests that
  can be concurrently active, and the maximum number of pages that
  can be in the shared request ring.  These values are published
  before the back-end enters the XenbusStateInitWait state.

o The front-end waits for the back-end to enter either the InitWait
  or Initialize state.  At this point, the front end limits it's
  own capabilities to the lesser of the values it finds published
  by the backend, it's own maximums, or, should any back-end data
  be missing in the store, the values supported by the original
  protocol.  It then initializes it's internal data structures
  including allocation of the shared ring, publishes its maximum
  capabilities to the XenStore and transitions to the Initialized
  state.

o The back-end waits for the front-end to enter the Initalized
  state.  At this point, the back end limits it's own capabilities
  to the lesser of the values it finds published by the frontend,
  it's own maximums, or, should any front-end data be missing in
  the store, the values supported by the original protocol.  It
  then initializes it's internal data structures, attaches to the
  shared ring and transitions to the Connected state.

o The front-end waits for the back-end to enter the Connnected
  state, transitions itself to the connected state, and can
  commence I/O.

Although an updated front-end driver must be aware of the back-end's
InitWait state, the back-end has been coded such that it can
tolerate a front-end that skips this step and transitions directly
to the Initialized state without waiting for the back-end.

sys/xen/interface/io/blkif.h:
o Increase BLKIF_MAX_SEGMENTS_PER_REQUEST to 255.  This is
  the maximum number possible without changing the blkif
  request header structure (nr_segs is a uint8_t).

o Add two new constants:
  BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK, and
  BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK.  These respectively
  indicate the number of segments that can fit in the first
  ring-buffer entry of a request, and for each subsequent
  (sg element only) ring-buffer entry associated with the
          "header" ring-buffer entry of the request.

o Add the blkif_request_segment_t typedef for segment
  elements.

o Add the BLKRING_GET_SG_REQUEST() macro which wraps the
  RING_GET_REQUEST() macro and returns a properly cast
  pointer to an array of blkif_request_segment_ts.

o Add the BLKIF_SEGS_TO_BLOCKS() macro which calculates the
  number of ring entries that will be consumed by a blkif
  request with the given number of segments.

sys/xen/blkif.h:
o Update for changes in interface/io/blkif.h macros.

o Update the BLKIF_MAX_RING_REQUESTS() macro to take the
  ring size as an argument to allow this calculation on
  multi-page rings.

o Add a companion macro to BLKIF_MAX_RING_REQUESTS(),
  BLKIF_RING_PAGES().  This macro determines the number of
  ring pages required in order to support a ring with the
  supplied number of request blocks.

sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
o Negotiate with the other-end with the following limits:
      Reqeust Size:   MAXPHYS
      Max Segments:   (MAXPHYS/PAGE_SIZE) + 1
      Max Requests:   256
      Max Ring Pages: Sufficient to support Max Requests with
                      Max Segments.

o Dynamically allocate request pools and segemnts-per-request.

o Update ring allocation/attachment code to support a
  multi-page shared ring.

o Update routines that access the shared ring to handle
  multi-block requests.

sys/dev/xen/blkfront/blkfront.c:
o Track blkfront allocations in a blkfront driver specific
  malloc pool.

o Strip out XenStore transaction retry logic in the
  connection code.  Transactions only need to be used when
  the update to multiple XenStore nodes must be atomic.
  That is not the case here.

o Fully disable blkif_resume() until it can be fixed
  properly (it didn't work before this change).

o Destroy bus-dma objects during device instance tear-down.

o Properly handle backend devices with powef-of-2 sector
  sizes larger than 512b.

sys/dev/xen/blkback/blkback.c:
Advertise support for and implement the BLKIF_OP_WRITE_BARRIER
and BLKIF_OP_FLUSH_DISKCACHE blkif opcodes using BIO_FLUSH and
the BIO_ORDERED attribute of bios.

sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
Fix various bugs in blkfront.

       o gnttab_alloc_grant_references() returns 0 for success and
non-zero for failure.  The check for < 0 is a leftover
Linuxism.

       o When we negotiate with blkback and have to reduce some of our
capabilities, print out the original and reduced capability before
changing the local capability.  So the user now gets the correct
information.

o Fix blkif_restart_queue_callback() formatting.  Make sure we hold
  the mutex in that function before calling xb_startio().

o Fix a couple of KASSERT()s.

        o Fix a check in the xb_remove_* macro to be a little more specific.

sys/xen/gnttab.h:
sys/xen/gnttab.c:
Define GNTTAB_LIST_END publicly as GRANT_REF_INVALID.

sys/dev/xen/netfront/netfront.c:
Use GRANT_REF_INVALID instead of driver private definitions of the
same constant.

sys/xen/gnttab.h:
sys/xen/gnttab.c:
Add the gnttab_end_foreign_access_references() API.

This API allows a client to batch the release of an array of grant
references, instead of coding a private for loop.  The implementation
takes advantage of this batching to reduce lock overhead to one
acquisition and release per-batch instead of per-freed grant reference.

While here, reduce the duration the gnttab_list_lock is held during
gnttab_free_grant_references() operations.  The search to find the
tail of the incoming free list does not rely on global state and so
can be performed without holding the lock.

sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/evtchn/evtchn.c:
sys/xen/xen_intr.h:
o Implement the bind_interdomain_evtchn_to_irqhandler API for HVM mode.
  This allows an HVM domain to serve back end devices to other domains.
  This API is already implemented for PV mode.

o Synchronize the API between HVM and PV.

sys/dev/xen/xenpci/xenpci.c:
o Scan the full region of CPUID space in which the Xen VMM interface
  may be implemented.  On systems using SuSE as a Dom0 where the
  Viridian API is also exported, the VMM interface is above the region
  we used to search.

o Pass through bus_alloc_resource() calls so that XenBus drivers
  attaching on an HVM system can allocate unused physical address
  space from the nexus.  The block back driver makes use of this
  facility.

sys/i386/xen/xen_machdep.c:
Use the correct type for accessing the statically mapped xenstore
metadata.

sys/xen/interface/hvm/params.h:
sys/xen/xenstore/xenstore.c:
Move hvm_get_parameter() to the correct global header file instead
of as a private method to the XenStore.

sys/xen/interface/io/protocols.h:
Sync with vendor.

sys/xeninterface/io/ring.h:
Add macro for calculating the number of ring pages needed for an N
deep ring.

To avoid duplication within the macros, create and use the new
__RING_HEADER_SIZE() macro.  This macro calculates the size of the
ring book keeping struct (producer/consumer indexes, etc.) that
resides at the head of the ring.

Add the __RING_PAGES() macro which calculates the number of shared
ring pages required to support a ring with the given number of
requests.

These APIs are used to support the multi-page ring version of the
Xen block API.

sys/xeninterface/io/xenbus.h:
Add Comments.

sys/xen/xenbus/...
o Refactor the FreeBSD XenBus support code to allow for both front and
  backend device attachments.

o Make use of new config_intr_hook capabilities to allow front and back
  devices to be probed/attached in parallel.

o Fix bugs in probe/attach state machine that could cause the system to
  hang when confronted with a failure either in the local domain or in
  a remote domain to which one of our driver instances is attaching.

o Publish all required state to the XenStore on device detach and
  failure.  The majority of the missing functionality was for serving
  as a back end since the typical "hot-plug" scripts in Dom0 don't
  handle the case of cleaning up for a "service domain" that is not
  itself.

o Add dynamic sysctl nodes exposing the generic ivars of
  XenBus devices.

o Add doxygen style comments to the majority of the code.

o Cleanup types, formatting, etc.

sys/xen/xenbus/xenbusb.c:
Common code used by both front and back XenBus busses.

sys/xen/xenbus/xenbusb_if.m:
Method definitions for a XenBus bus.

sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusb_back.c:
XenBus bus specialization for front and back devices.

MFC after: 1 month

Remove undocumented and stale debug.acpi.do_powerstate tunable. It was
added with hw.pci.do_powerstate but the PCI version was splitted into two
separate tunables later and now this is completely stale. To make it worse,
PCI devices enumerated in ACPI tree ignore this tunable as it is handled by
a function in acpi_pci.c instead.

Stop disallowing device nodes to be passed to camcontrol(8) since libcam
already allows both device names and nodes to be specified.

Reviewed by: avg

Remove PCI_SET_POWERSTATE method from acpi.c and eradicate all PCI-specific
knowledges from the file.  All PCI devices enumerated in ACPI tree must use
correct one from acpi_pci.c any way.  Reduce duplicate codes as we did for
pci.c in r213905.  Do not return ESRCH from PCIB_POWER_FOR_SLEEP method.
When the method is not found, just return zero without modifying the given
default value as it is completely optional.  As a side effect, the return
state must not be NULL.  Note there is actually no functional change by
removing ESRCH because acpi_pcib_power_for_sleep() always returns zero.
Adjust debugging messages and add new ones under bootverbose to help
debugging device power state related issues.

Reviewed by: jhb, imp (earlier versions)

- Wrap exchanging td_intr_frame and calling the event timer callback in
  a critical section as apparently required by both. I don't think either
  belongs in the event timer front-ends but the callback should handle
  this as necessary instead just like for example intr_event_handle()
  does but this is how the other architectures currently handle it, either
  explicitly or implicitly.
- Further rename and reword references to hardclock as this front-end no
  longer has a notion of actually calling it.

There is no reason to call rt_ifmsg(), remove it.

Submitted by: Paul B Mahol <onemda at gmail.com>
MFC after: 1 week

Fix an undefined behaviour if the desired ratectl algo is not available.
This can happen if the algos are built as modules but are not loaded. If
the selected ratectl algo is not available, try to load it (The load
module functions does nothing currently). Add a dummy ratectl algo which
always selects the first available rate. Use that one if the desired algo
is not available.

MFC after: 1 week

Make any PCI devices enumerated in ACPI tree honor do_power_resume as well.

ZFS pool name is not a real device in devfs. Do not wait for
device appear when mounting root from ZFS.

Reviewed by: marcel
Approved by: mav (mentor)

Clarify that lagg(4) sends/receives on active port, not the master port.

Note that this still seems to be a little bit confusing as the concept of
"master" is different from what people would expect on a networking
equipment.

Remove PCI header type 0 restriction from power state changes. PCI config.
registers for bridges are saved and restored since r200341.

OK'ed by: imp, jhb

Do not apply do_power_resume for suspending case. When do_powerstate was
splitted into do_power_resume and do_power_nodriver, it became stale.

Use make_dev_p(9) with the MAKEDEV_CHECKNAME flag instead of make_dev(9)
and print a diagnostic if the call fails.

This avoids a panic when a device with an invalid name is attempted to
be registered. For example the label class gets device names from
untrusted input.

Reviewed by: freebsd-geom

uma_zfree(zone, NULL) should do nothing, to match free(9).

Noticed by: Ron Steinke <rsteinke at isilon dot com>
MFC after: 3 days

mdoc: fix markup typo

MFC after: 1 week (together with r213983)

Simplify and significantly speed up the timezone listing backend script.

Reviewed by: imp

Minor cleanup, including sysctl -n instead of sed to remove the sysctl
name.

Reviewed by: imp

Revert r206418

mdoc: drop even more redundant .Pp calls

No change in rendered output, less mandoc lint warnings.

Tool provided by: Nobuyuki Koganemaru n-kogane at syd.odn.ne.jp

Fix the type of the 3rd argument for nm_getinfo so that it works
for architectures like sparc64.

Suggested by: kib
MFC after: 2 weeks

When readdirplus() is handled on the exported filesystem that does
not support VFS_VGET, like msdosfs, do not call VOP_LOOKUP() for
dotdot on the root directory. Our filesystems expect that VFS handles
dotdot lookups on root on its own.

Reported and tested by: kevlo
MFC after: 2 weeks

Modify the NFS clients and the NLM so that the NLM can be used
by both clients. Since the NLM uses various fields of the
nfsmount structure, those fields were extracted and put in a
separate nfs_mountcommon structure stored in sys/nfs/nfs_mountcommon.h.
This structure also has a function pointer for a function that
extracts the required information from the mount point and nfs vnode
for that particular client, for information stored differently by the
clients.

Reviewed by: jhb
MFC after: 2 weeks

MFV: nc(1) from OpenBSD 4.8.

While I'm there, bump WARNS level to 2 as the vendor
have the right printf format string now.

MFC after: 1 month
Obtained from: OpenBSD

Set default type to PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP, this
is the type we are using.

Replace spaces by a tab after the date.

Reported by: gavin@, brucec@
Approved by: sahil@ (mentor)

Do not synchronously start the nfsiod threads at all. The r212506
fixed the issues with file descriptor locks, but the same problems are
present for vnode lock/user map lock.

If the nfs_asyncio() cannot find the free nfsiod, schedule task to
create new nfsiod and return error. This causes fall back to the
synchronous i/o for nfs_strategy(), or does not start read at all in
the case of readahead. The caller that holds vnode and potentially
user map lock does not wait for kproc_create() to finish, preventing
the LORs.

The change effectively reverts r203072, because we never hand off the
request to newly created nfsiod thread anymore.

Reviewed by: jhb
Tested by: jhb, pluknet
MFC after: 3 weeks

We've already set p = td->td_proc, so use it.

Remove extra word, which looks like a left-over from a deleted sentence.

Fix grammar.

Fix typo: Offlaod -> Offload.

PR: docs/150756
Approved by: avg (mentor)
MFC after: 3 days

Update links for taskqueue(9) functions.

Add links for libradius(3) functions.

Set of legacy mode SATA enchancements:
- Implement proper combined mode decoding for Intel controllers to properly
identify SATA and PATA channels and associate ATA channels with SATA ports.
This fixes wrong reporting and in some cases hard resets to wrong SATA ports.
- Improve SATA registers support to handle hot-plug events and potentially
interface errors. For ICH5/6300ESB chipsets these registers accessible via
PCI config space. For later ones they may be accessible via PCI BAR(5).
- For controllers not generating interrupts on hot-plug events, implement
periodic status polling. Use it to detect hot-plug on Intel and VIA
controllers. Same probably could also be used for Serverworks and SIS.

Unbreak buildworld by including pthread_rwlockattr_setkind_np and
pthread_rwlockattr_getkind_np.

Revert r213867; while this driver really doesn't use any of the generic
subroutines, at least mii_capabilities is used within itself.

Log if fopen() fails.

Reviewed by: brian

"b64decode -r" did not handle arbitary breaks in base64 encoded
data. White space should be accepted anywhere in a base64 encoded
stream, not just after every chunk (4 characters).

Test-scenario:

VmVsb2NpdHkgUmV3YXJkcw==

and

VmVsb2NpdHkgUmV3YXJkcw
==

should both produce "Velocity Rewards"

PR: bin/124739
Submitted by: Mark Andrews <marka@isc.org>
MFC after: 2 weeks

sort function name.

s/||/&&

Add pthread_rwlockattr_setkind_np and pthread_rwlockattr_getkind_np, the
functions set or get pthread_rwlock type, current supported types are:
   PTHREAD_RWLOCK_PREFER_READER_NP,
   PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP,
   PTHREAD_RWLOCK_PREFER_WRITER_NP,
default is PTHREAD_RWLOCK_PREFER_WRITER_NONCECURSIVE_NP, this maintains
binary compatible with old code.

Re-implement the root mount logic using a recursive approach, whereby each
root file system (starting with devfs and a synthesized configuration) can
contain directives for mounting another file system as root. The old root
file system is re-mounted under the new root file system (with /.mount or
/mnt as the mount point) to allow access to the underlying file system.

The configuration allows for creating vnode-backed memory disks that can
subsequently be mounted as root. This allows for an efficient and low-
cost way to distribute and boot FreeBSD software images that reside on
some storage media.

When trying a mount, the kernel will wait for the device in question to
arrive. The timeout is configurable and is part of the configuration.
This allows arbitrarily complex GEOM configurations to be constructed
on the fly.

A side-effect of this change is that all root specifications, whether
compiled into the kernel or typed at the prompt can contain root mount
options.

In vfs_filteropt(), only print the errmsg when there's no errmsg
mount option. Otherwise errors tend to get printed multiple times.

Rename boot() to kern_reboot() and make it visible outside of
kern_shutdown.c. This makes it easier for emulators and other
parts of the kernel to initiate a reboot.

Allow the MDIOCATTACH ioctl operation to originate from within the kernel.
To protect against malicious software, we demand that the file name is at
a particular location (i.e. appended to the mdio structure) for it to be
treated as in-kernel.

Stylify of uudecode(1)
Part of PR bin/124739.

PR: bin/124739
Submitted by: Mark Andrews <marka@isc.org>

Fix a possible race where the directory dirent is moved to the location
that was used by ".." entry.
This change seems fixed panic during attempt to access msdosfs data
over nfs.

Reviewed by: kib
MFC after: 1 week

Re-add opt_mps.h and opt_cam.h, lost in the previous rev.

Add myself to calendar.freebsd.

Approved by: sahil@ (mentor)

Add an entry for myself to committers-ports.dot.

Approved by: sahil@ (mentor)

Fix an XXX comment by answering 'no'. OS X does not set the day-of-week
counter on SMU-based systems, which causes FreeBSD to reject the RTC time
when used in a dual-boot environment. Since we don't use the day-of-week
counter anyway, solve this by just not checking that it matches.

MFC after: 3 weeks

- In oneshot-mode it doesn't make sense to try to compensate the clock
  drift in order to achieve a more stable clock as the tick intervals may
  vary in the first place. In fact I haven't seen this code kick in when
  in oneshot-mode so just skip it in that case.
- There's no need to explicitly stop the (S)TICK counter in oneshot-mode
  with every tick as it just won't trigger again with the (S)TICK compare
  register set to a value in the past (with a wrap-around once every ~195
  years of uptime at 1.5 GHz this isn't something we have to worry about
  in practice).
- Given that we'll disable interrupts completely anyway there's no
  need to enter critical sections.

Document vunref(9), add some important notes for vrele(9) and vput(9).
Merge all three manpages to one, removing separate file for vput(9).

MFC after: 1 week

Log correct connection when canceling half-open connection.

Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days

- Insert thread0 into correct thread hash link list.
- In thr_exit() and kthread_exit(), only remove thread from
hash if it can directly exit, otherwise let exit1() do it.
- In thread_suspend_check(), fix cleanup code when thread needs
to exit.
This change seems fixed the "Bad link elm " panic found by
Peter Holm.

Stress testing: pho

Use one fprintf() instead of two.

MFC after: 3 days

Clear signal mask before executing a hook.

Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
MFC after: 3 days

zfs: add vop_getpages method implementation

This should make vnode_pager_getpages path a bit shorter and clearer.
Also this should eliminate problems with partially valid pages.
Having this method opens room for future optimizations.

To do: try to satisfy other pages besides the required one taking into
account tradeofs between number of page faults, read throughput and read
latency. Also, eventually vop_putpages should be added too.

Reviewed by: kib, mm, pjd
MFC after: 3 weeks

MfP4 CH182763 (original version):

Make it harder to exploit certain in_control() related races between the
intiial lookup at the beginning and the time we will remove the entry
from the lists by re-checking that entry is still in the list before
trying to remove it.

(*) It is believed that with the current code and locking strategy we
cannot completely fix all race.

Reported by: Nima Misaghian (nima_misa hotmail.com) on net@ 20100817
Tested by: Nima Misaghian (nima_misa hotmail.com) (original version)
PR: kern/146250
Submitted by: Mikolaj Golub (to.my.trociny gmail.com) (different version)
MFC after: 1 week

Allow umass to use bigger transactions for USB 3.0 devices. It is less
important for USB 2.0 devices and some of them reported to have problems
with large transactions. But USB 3.0 benchmarks show that limited number
of transactions per second on USB makes impossible to reach high transfer
speeds without using bigger transactions.

On my tests this change allows to read up to 220MB/s from USB-attached SSD
(at block size of 256-512KB), comparing to only 113MB/s without it.

Reviewed by: hselasky

Close a race acquiring the IF_ADDR_LOCK() for each entry while iterating
over all interfaces to make sure the address will neither change nor be
freed while we are working on it.

PR: kern/146250
Submitted by: Mikolaj Golub (to.my.trociny gmail.com)
MFC after: 1 week

lltable_drain() has never been used so far, thus #if 0 it for now.
While touching it add the missing locking to the now disabled code
for the time when we'll resurrect it.

MFC after: 3 days

Fix a grammatical error connected to the previous commit.

Spotted by: gjb@

Correct some typos in comments, no functional changes.

sh(1): Clarify subshells/processes for pipelines.

For multi-command pipelines,
1. all commands are direct children of the shell (unlike the original
Bourne shell)
2. all commands are executed in a subshell (unlike the real Korn shell)

MFC after: 1 week

sh: Use <stddef.h> rather than <sys/stddef.h>.

<sys/stddef.h> is only for the kernel and conflicts with <stddef.h>.

- Add support for libusbhid in 32-bit compatibility mode.
- Add missing check for ugd_actlen being too small.
- Add missing inclusion guard to usbvar.h header file.
- This also fixes buildworld breakage since r213852.

atrtc: remove (pre-)historic check of RTC NVRAM at address 0x0e

Old scrolls tell that once upon a time IBM AT BIOS was known to put some
useful system diagnostic information into RTC NVRAM. It is not really
known if and for how long PC BIOSes followed that convention, but I
believe that many, if not all, modern BIOSes do not do that any more
(not mentioning other types of x86 firmware).
Some diagnostic bits don't even make any sense any longer.
The check results in confusing messages upon boot on some systems.
So I am removing it.

Discussed with: bde, jhb, mav
MFC after: 3 weeks

Document vfs.ncsizefactor and vfs.ncnegfactor.

MFC after: 2 weeks

Provide vfs.ncsizefactor instead of hard-coding namecache ratio.
Move debug.ncnegfactor to vfs.ncnegfactor [1].
Provide some descriptions for the namecache related sysctls [1].

Based on the submission by: Rogier R. Mulhuijzen <drwilco drwilco net> [1]
MFC after: 2 weeks
X-MFC-note: remove debug.ncnegfactor in HEAD after MFC

Retire the system-wide, per-reassembly queue segment limit. The mechanism is far
too coarse grained to be useful and the default value significantly degrades TCP
performance on moderate to high bandwidth-delay product paths with non-zero loss
(e.g. 5+Mbps connections across the public Internet often suffer).

Replace the outgoing mechanism with an individual per-queue limit based on the
number of MSS segments that fit into the socket's receive buffer. This should
strike a good balance between performance and the potential for resource
exhaustion when FreeBSD is acting as a TCP receiver. With socket buffer
autotuning (which is enabled by default), the reassembly queue tracks the
socket buffer and benefits too.

As the XXX comment suggests, my testing uncovered some unexpected behaviour
which requires further investigation. By using so->so_rcv.sb_hiwat
instead of sbspace(&so->so_rcv), we allow more segments to be held across both
the socket receive buffer and reassembly queue than we probably should. The
tradeoff is better performance in at least one common scenario, versus a devious
sender's ability to consume more resources on a FreeBSD receiver.

Sponsored by: FreeBSD Foundation
Reviewed by: andre, gnn, rpaulo
MFC after: 2 weeks

- Switch the "net.inet.tcp.reass.cursegments" and
  "net.inet.tcp.reass.maxsegments" sysctl variables to be based on UMA zone
  stats. The value returned by the cursegments sysctl is approximate owing to
  the way in which uma_zone_get_cur is implemented.

- Discontinue use of V_tcp_reass_qsize as a global reassembly segment count
  variable in the reassembly implementation. The variable was used without
  proper synchronisation and was duplicating accounting done by UMA already. The
  lack of synchronisation was particularly problematic on SMP systems
  terminating many TCP sessions, resulting in poor TCP performance for
  connections with non-zero packet loss.

Sponsored by: FreeBSD Foundation
Reviewed by: andre, gnn, rpaulo (as part of a larger patch)
MFC after: 2 weeks

Change uma_zone_set_max to return the effective value of "nitems" after
rounding. The same value can also be obtained with uma_zone_get_max, but this
change avoids a caller having to make two back-to-back calls.

Sponsored by: FreeBSD Foundation
Reviewed by: gnn, jhb

- Simplify implementation of uma_zone_get_max.
- Add uma_zone_get_cur which returns the current approximate occupancy of
a zone. This is useful for providing stats via sysctl amongst other things.

Sponsored by: FreeBSD Foundation
Reviewed by: gnn, jhb
MFC after: 2 weeks

Convert the PHY drivers to honor the mii_flags passed down and convert
the NIC drivers as well as the PHY drivers to take advantage of the
mii_attach() introduced in r213878 to get rid of certain hacks. For
the most part these were:
- Artificially limiting miibus_{read,write}reg methods to certain PHY
  addresses; we now let mii_attach() only probe the PHY at the desired
  address(es) instead.
- PHY drivers setting MIIF_* flags based on the NIC driver they hang
  off from, partly even based on grabbing and using the softc of the
  parent; we now pass these flags down from the NIC to the PHY drivers
  via mii_attach(). This got us rid of all such hacks except those of
  brgphy() in combination with bce(4) and bge(4), which is way beyond
  what can be expressed with simple flags.

While at it, I took the opportunity to change the NIC drivers to pass
up the error returned by mii_attach() (previously by mii_phy_probe())
and unify the error message used in this case where and as appropriate
as mii_attach() actually can fail for a number of reasons, not just
because of no PHY(s) being present at the expected address(es).

This file was missed in r213893.

Remove unnecessary castings and fix couple of style(9) nits.

Remove two .endp's without matching .proc in lib/csu/ia64/crtn.S.
This allows it to assemble with newer binutils.

Reviewed by: marcel

Move setting power state for children into a separate function as they were
essentially the same. This also restores hw.pci.do_power_resume tunable,
which was broken since r211430.

Reviewed by: jhb

Add three new drivers for fan control and temperature reading on the
PowerMac7,2.

- The fcu driver lets us read and write the fan RPMs for all fans in the
PowerMac7,2. This driver is PowerMac specific.
- The ds1775 is a driver to read the temperature for the drive bay sensor.
- The max6690 is another driver to read temperatures. Here it is used to
read the inlet, the backside and the U3 heatsink temperature.

An additional driver, the ad7417, will follow later.

Thanks to nwhitehorn for guiding me through this driver development.

Approved by: nwhitehorn (mentor)

sh: Allow running 'prove' from tools/regression/bin/sh again
without needing to set special environment variables, testing the 'sh' from
PATH.

Now that all previous users of mii_phy_probe() have been converted
in r213893 and r213894 to use mii_attach() instead remove the former
and along with it the "EVIL HACK".

MFC after: never

Currently only opt_compat.h is included by the mps(4) driver. Also
enable /dev/mps0, which was missing from my previous patches enabling
f/w upload and download.

opt_compat.h issue noticed by scottl.

Update pmap_extract() to handle 1GB page mappings. Some device drivers
use pmap_extract() rather than pmap_kextract() on direct map addresses.
Thus, pmap_extract() needs to be able to deal with 1GB page mappings if
we are to use 1GB page mappings for the direct map. (See r197580.)

Remove a device_printf() accidentally left in r213894.

Submitted by: jhb

Converted the remainder of the NIC drivers to use the mii_attach()
introduced in r213878 instead of mii_phy_probe(). Unlike r213893 these
are only straight forward conversions though.

Reviewed by: yongari

Convert the PHY drivers to honor the mii_flags passed down and convert
the NIC drivers as well as the PHY drivers to take advantage of the
mii_attach() introduced in r213878 to get rid of certain hacks. For
the most part these were:
- Artificially limiting miibus_{read,write}reg methods to certain PHY
  addresses; we now let mii_attach() only probe the PHY at the desired
  address(es) instead.
- PHY drivers setting MIIF_* flags based on the NIC driver they hang
  off from, partly even based on grabbing and using the softc of the
  parent; we now pass these flags down from the NIC to the PHY drivers
  via mii_attach(). This got us rid of all such hacks except those of
  brgphy() in combination with bce(4) and bge(4), which is way beyond
  what can be expressed with simple flags.

While at it, I took the opportunity to change the NIC drivers to pass
up the error returned by mii_attach() (previously by mii_phy_probe())
and unify the error message used in this case where and as appropriate
as mii_attach() actually can fail for a number of reasons, not just
because of no PHY(s) being present at the expected address(es).

Reviewed by: jhb, yongari

Prevent the ofwdump manpage from being deleted by make delete-old on
PowerPC.

Stop hard coding nm(1) and make it overridable.

Embellish this testcase a little bit to be more clear what the output is
and why. The first case is correct usage which has but one correct output.
The 2nd and 3rd cases are incorrect usage in which the exact output is
not standardized and various shells give various allowable output.

Fixes to mps_user_command():
- fix the leak of command struct on error
- simplify the cleanup logic
- EINPROGRESS is not a fatal error
- buggy comment and error message

Reviewed by: ken