Stanislav Sedov [Wed, 1 Jul 2009 13:07:02 +0000 (13:07 +0000)]
- Fix the bug in write(2) called with incorrect parameters resulting in writes
always started from the start of the packet.
- Fix usage string (multiple addresses can be specified).
- Make the source more style(9) compliant.
- Improve error reporting (do not silently fail if something goes
wrong).
- Make functions static.
- Use warns level 6.
Approved by: re (kib)
Discussed with: Marc Balmer <marc@msys.ch>, brian, mbr
Ed Maste [Tue, 30 Jun 2009 13:38:49 +0000 (13:38 +0000)]
Add FIONSPACE from NetBSD. FIONSPACE is provided so that programs may
easily determine how much space is left in the send queue; they do not
need to know the send queue size.
Stanislav Sedov [Tue, 30 Jun 2009 12:35:47 +0000 (12:35 +0000)]
- Add support to atomically set/clear individual bits of a MSR register
via cpuctl(4) driver. Two new CPUCTL_MSRSBIT and CPUCTL_MSRCBIT ioctl(2)
calls treat the data field of the argument struct passed as a mask
and set/clear bits of the MSR register according to the mask value.
- Allow user to perform atomic bitwise AND and OR operaions on MSR registers
via cpucontrol(8) utility. Two new operations ("&=" and "|=") have been
added. The first one applies bitwise AND operaion between the current
contents of the MSR register and the mask, and the second performs bitwise
OR. The argument can be optionally prefixed with "~" inversion operator.
This allows one to mimic the "clear bit" behavior by using the command
like this:
cpucontrol -m 0x10&=~0x02 # clear the second bit of TSC MSR
Inversion operator support in all modes (assignment, OR, AND).
For SU mounts, softdep_fsync() might drop vnode lock, allowing other
threads to put dirty buffers on the vnode bufobj list. For regular files
and synchronous fsync requests, check for the condition and restart the
fsync vop if a new dirty buffer arrived.
Softdep_fsync() may need to lock parent directory of the synced vnode.
Use inlined (due to FFSV_FORCEINSMQ) version of vn_vget_ino() to prevent
mountpoint from being unmounted and freed while no vnodes are locked.
Rui Paulo [Tue, 30 Jun 2009 09:51:41 +0000 (09:51 +0000)]
acpi_wmi_if:
- Document different semantics for ACPI_WMI_PROVIDES_GUID_STRING_METHOD
acpi_wmi.c:
- Modify acpi_wmi_provides_guid_string_method to return absolut number of
instances known for the given GUID.
acpi_hp.c:
- sysctl dev.acpi_hp.0.verbose to toggle debug output
- A modification so this can deal with different array lengths
when reading the CMI BIOS - now it works ok on HP Compaq nx7300
as well.
- Change behaviour to query only max_instance-1 CMI BIOS instances,
because all HPs seen so far are broken in that respect
(or there is a fundamental misunderstanding on my side, possible
as well). This way a disturbing ACPI Error Field exceeds Buffer
message is avoided.
- New bit to set on dev.acpi_hp.0.cmi_detail (0x8) to
also query the highest guid instance of CMI bios
acpi_hp.4:
- Document dev.acpi_hp.0.verbose sysctl in man page
- Document new bit for dev.acpi_hp.0.cmi_detail
- Add a section to manpage about hardware that has been reported
to work ok
Submitted by: Michael Gmelin, freebsdusb at bindone.de
Approved by: re (kib)
MFC after: 2 weeks
Bjoern A. Zeeb [Tue, 30 Jun 2009 05:21:00 +0000 (05:21 +0000)]
In case we cannot queue a packet reaching the queue limit, retain the
semantics netisr_queue() always had and free the mbuf along with
returning the error.
Stacey Son [Mon, 29 Jun 2009 20:19:19 +0000 (20:19 +0000)]
Dynamically allocate the gidset field in audit record.
This fixes a problem created by the recent change that allows a large
number of groups per user. The gidset field in struct kaudit_record
is now dynamically allocated to the size needed rather than statically
(using NGROUPS).
Sam Leffler [Mon, 29 Jun 2009 18:42:54 +0000 (18:42 +0000)]
Update to 3.6.2.2 firmware (latest w/o host-based power save support):
o new tx ack queue (not used right now)
o proxy-sta related changes (no proxy sta in driver)
o explicit dwds ena/dis (needed only with proxy sta)
o cleanup BA policy handling
o new ampdu aggressive mode support
o CFEnd use now controllable
Jack F Vogel [Mon, 29 Jun 2009 18:17:10 +0000 (18:17 +0000)]
Type problem when FreeBSD is in a virtualized environment, the
result was when the RX index wrapped it was converted into some
sort of gibberish and written into the RDT register, effectively
killing the RX side of the thing :)
Attilio Rao [Mon, 29 Jun 2009 16:03:18 +0000 (16:03 +0000)]
Don't assume a default (currently 15) value for preloaded klds when
loading hwpmc, but calculate at runtime and allocate the necessary space.
Also the current logic is wrong as it can lead to an endless loop.
Sponsored by: Sandvine Incorporated
Reported by: Ryan Stone <rstone at sandvine dot com>
Tested by: Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
Approved by: re (kib)
Pyun YongHyeon [Mon, 29 Jun 2009 05:12:21 +0000 (05:12 +0000)]
Disable Rx checksum offload until I find more clue why it breaks
under certain environments. However give users chance to override
it when he/she surely knows his/her hardware works with Rx checksum
offload.
Reported by: Ulrich Spoerlein ( uqs <> spoerlein dot net )
MFC after: 1 week
Approved by: re (kensmith)
Alexander Kabaev [Sun, 28 Jun 2009 23:51:39 +0000 (23:51 +0000)]
Eliminate .text relocations in shared libraries compiled with stack protector.
Use libssp_nonshared library to pull __stack_chk_fail_local symbol into
each library that needs it instead of pulling it from libc. GCC generates
local calls to this function which result in absolute relocations put into
position-independent code segment, making dynamic loader do extra work everys
time given shared library is being relocated and making affected text pages
non-shareable.
Marius Strobl [Sun, 28 Jun 2009 22:42:51 +0000 (22:42 +0000)]
- Work around the broken loader behavior of not demapping no longer
used kernel TLB slots when unloading the kernel or modules, which
results in havoc when loading a kernel and modules which take up
less TLB slots afterwards as the unused but locked ones aren't
accounted for in virtual_avail. Eventually this should be fixed
in the loader which isn't straight forward though and the kernel
should be robust against this anyway. [1]
- Ensure that the addresses allocated directly from phys_avail[] by
pmap_bootstrap_alloc() are always colored properly. This implicit
assumption was broken in r194784 as unlike the other consumers the
DPCPU area allocated for the BSP isn't a multiple of PAGE_SIZE *
DCACHE_COLORS. [2]
- Remove the no longer used global msgbuf_phys.
- Remove the redundant ekva parameter of pmap_bootstrap_alloc().
- Correct some outdated function names in ktr(9) invocations.
Stanislav Sedov [Sun, 28 Jun 2009 21:49:43 +0000 (21:49 +0000)]
- Turn the third (islocked) argument of the knote call into flags parameter.
Introduce the new flag KNF_NOKQLOCK to allow event callers to be called
without KQ_LOCK mtx held.
- Modify VFS knote calls to always use KNF_NOKQLOCK flag. This is required
for ZFS as its getattr implementation may sleep.
There are a number of ways an application can check if there are
inbound data waiting on a filedescriptor, such as a pipe or a socket,
for instance by using select(2), poll(2), kqueue(2), ioctl(FIONREAD)
etc.
But we have no way of finding out if written data have yet to be
disposed of, for instance, transmitted (and ack'ed!) to some remote
host, or read by the applicantion at the far end of the pipe.
The closest we get, is calling shutdown(2) on a TCP socket in
non-blocking mode, but this has the undesirable sideeffect of
preventing future communication.
Add a complement to FIONREAD, called FIONWRITE, which returns the
number of bytes not yet properly disposed of. Implement it for
all sockets.
Background:
A HTTP server will want to time out connections, if no new request
arrives within a certain period after the last transmitted response
has actually been sent (and ack'ed).
For a busy HTTP server, this timeout can be subsecond duration.
In order to signal to a load-balancer that the connection is truly
dead, TCP_RST will be the preferred method, as this avoids the need
for a RTT delay for FIN handshaking, with a client which, surprisingly
often, no longer at the remote IP number.
If a slow, distant client is being served a response which is big
enough to fill the window, but small enough to fit in the socket
buffer, the write(2) call will return immediately.
If the session timeout is armed at that time, all bytes in the
response may not have been transmitted by the time it fires.
FIONWRITE allows the timeout to check that no data is outstanding
on the connection, before it TCP_RST's it.
Marc Fonvieille [Sun, 28 Jun 2009 08:59:46 +0000 (08:59 +0000)]
- release/* update to use freebsd-doc-* packages instead of building
FreeBSD docset during 'make release' this will speed up release
builds;
- sysinstall(8) has also been updated to use these packages with a new
menu allowing people to choose what localized doc to install;
- mention in UPDATING that docs from the FreeBSD Documentation project
are now installed in /usr/local/share/doc/freebsd instead of
/usr/share/doc.
Alan Cox [Sat, 27 Jun 2009 21:37:36 +0000 (21:37 +0000)]
Correct a long-standing performance bug in cluster_rbuild(). Specifically,
in the case of a file system with a block size that is less than the page
size, cluster_rbuild() looks at too many of the page's valid bits.
Consequently, it may terminate prematurely, resulting in poor performance.
Reported by: bde
Reviewed by: tegge
Approved by: re (kib)
Sam Leffler [Sat, 27 Jun 2009 20:06:56 +0000 (20:06 +0000)]
Add HAL_RX_FILTER_BSSID support (to disable bssid match):
o add HAL_CAP_BSSIDMATCH to identify parts that have the support for
disabling bssid match
o honor capability for set/get rx filter
o use HAL_CAP_BSSIDMATCH in driver to decide whether to use the bssid
match disable or fall back to promisc mode
Robert Watson [Sat, 27 Jun 2009 13:58:44 +0000 (13:58 +0000)]
Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type. This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).
In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.
Robert Watson [Sat, 27 Jun 2009 11:05:53 +0000 (11:05 +0000)]
In in6_update_ifa(), jump to 'cleanup' rather than returning directly
in one additional case, avoiding an ifaddr reference leak.
Defer releasing the in6_ifaddr's in6_ifaddrhead reference until the
end of in6_unlink_ifa(), as callers are inconsistent regarding whether
or not they hold a reference across the call. This avoids using the
ifaddr after it may have been freed.
Antoine Brodin [Sat, 27 Jun 2009 10:11:15 +0000 (10:11 +0000)]
Update ObsoleteFiles.inc:
- correct a few paths
- some USB headers were removed
- devclass_add_driver(9) is no longer public
- bind 9.6.1rc1 was imported
Stanislav Sedov [Fri, 26 Jun 2009 22:13:15 +0000 (22:13 +0000)]
- Don't zero data field in case of MSR write operation. Before this change
the value written to MSR register was always 0 regardless of value passed
by user.
- Use proper data pointer when performing AMD microcode update. Previously,
the pointer to user-space data has been provided instead, which is totally
incorrect.
Robert Watson [Fri, 26 Jun 2009 20:39:36 +0000 (20:39 +0000)]
In light of DPCPU use by netisr, revise various for loops from using
MAXCPU to mp_maxid, and handling and reporting of requests to use more
threads than we have CPUs to run them on.
John Baldwin [Fri, 26 Jun 2009 17:50:52 +0000 (17:50 +0000)]
Note that as a result of the SYSV IPC changes, COMPAT_FREEBSD[456] now
require COMPAT_FREEBSD7. Also, explicitly note in NOTES that any version
of COMPAT_FREEBSD<n> effectively requires for newer binaries (i.e.
COMPAT_FREEBSD<n+1>, etc.). While this has been true in practice
previously, it used to compile ok before the commit earlier this week.
Robert Watson [Fri, 26 Jun 2009 11:45:06 +0000 (11:45 +0000)]
Use if_maddr_rlock()/if_maddr_runlock() rather than IF_ADDR_LOCK()/
IF_ADDR_UNLOCK() across network device drivers when accessing the
per-interface multicast address list, if_multiaddrs. This will
allow us to change the locking strategy without affecting our driver
programming interface or binary interface.
For two wireless drivers, remove unnecessary locking, since they
don't actually access the multicast address list.
Rui Paulo [Fri, 26 Jun 2009 09:32:31 +0000 (09:32 +0000)]
On special systems where the MBR and the GPT are in sync (up to the 4th
slicei, Apple EFI hardware), the bootloader will fail to recognize the GPT
if it finds anything else but the EFI partition. Change the check to continue
detecting the GPT by looking at the EFI partition on the MBR but
stopping successfuly after finding it.
PR: kern/134590
Submitted by: Christoph Langguth <christoph at rosenkeller.org>
Reviewed by: jhb
MFC after: 2 weeks
Approved by: re (kib)
Warner Losh [Fri, 26 Jun 2009 07:11:14 +0000 (07:11 +0000)]
Correct some minor nits with the 2BSD and 3BSD series of releases
based on the information at The Unix Historical Society web page
(http://www.tuhs.org/Unix_History). Where multiple sources differ,
retain all data. Prefer 2.79BSD to 2.7.9BSD, since the former is from
/LABEL form the actual release. Use the /LABEL date as in the TUHS
tables (the curious can read http://minnie.tuhs.org/Unix_History/2bsd
for all the conflicting date confusion if they want).
Alan Cox [Fri, 26 Jun 2009 04:47:43 +0000 (04:47 +0000)]
This change is the next step in implementing the cache control functionality
required by video card drivers. Specifically, this change introduces
vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all
architectures. In addition, this changes adds a vm_cache_mode_t parameter
to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the
interfaces for allocating mapped kernel memory and physical memory,
respectively, with non-default cache modes.
Doug Barton [Fri, 26 Jun 2009 01:04:50 +0000 (01:04 +0000)]
Reverse the effect of r193198 for pf and ipfw which will once again
allow them to start after netif. There were too many problems reported
with this change in the short period of time that it lived in HEAD, and
we are too late in the release cycle to properly shake it out.
IMO the issue of having the firewalls up before the network is still a
valid concern, particularly for pf whose default state is wide open.
However properly solving this issue is going to take some investment
on the part of the people who actually use those tools.
This is not a strict reversion of all the changes for r193198 since it
also included some simplification of the BEFORE/REQUIRE logic which is
still valid for ipfilter and ip6fw.
Robert Watson [Fri, 26 Jun 2009 00:36:47 +0000 (00:36 +0000)]
Define four wrapper functions for interface address locking,
if_addr_rlock() and if_addr_runlock() for regular address lists, and
if_maddr_rlock() and if_maddr_runlock() for multicast address lists.
We will use these in various kernel modules to avoid encoding specific
type and locking strategy information into modules that currently use
IF_ADDR_LOCK() and IF_ADDR_UNLOCK() directly.
Robert Watson [Fri, 26 Jun 2009 00:19:25 +0000 (00:19 +0000)]
Convert netisr to use dynamic per-CPU storage (DPCPU) instead of sizing
arrays to [MAXCPU], offering moderate memory savings. In some places,
this requires using CPU_ABSENT() to handle less common platforms with
sparse CPU IDs. In several places, assert that the selected CPUID for
work placement or statistics is not CPU_ABSENT() to be on the safe side.
Fix a bug reported by pho@ where one can induce a panic by decreasing
vfs.ufs.dirhash_maxmem below the current amount of memory used by dirhash. When
ufsdirhash_build() is called with the memory in use greater than dirhash_maxmem,
it attempts to free up memory by calling ufsdirhash_recycle(). If successful in
freeing enough memory, ufsdirhash_recycle() leaves the dirhash list locked. But
at this point in ufsdirhash_build(), the list is not explicitly unlocked after
the call(s) to ufsdirhash_recycle(). When we next attempt to lock the dirhash
list, we will get a "panic: _mtx_lock_sleep: recursed on non-recursive mutex
dirhash list".
John Baldwin [Thu, 25 Jun 2009 20:35:46 +0000 (20:35 +0000)]
Fix kernels compiled without SMP support. Make intr_next_cpu() available
for UP kernels but as a stub that always returns the single CPU's local
APIC ID.
In lf_iteratelocks_vnode, increment state->ls_threads around iterating
of the vnode advisory lock list. This prevents deallocation of state
while inside the loop.
Change the type of uio_resid member of struct uio from int to ssize_t.
Note that this does not actually enable full-range i/o requests for
64 architectures, and is done now to update KBI only.
Tested by: pho
Reviewed by: jhb, bde (as part of the review of the bigger patch)
John Baldwin [Thu, 25 Jun 2009 18:44:05 +0000 (18:44 +0000)]
Remove the d_spare2_t typedef. The d_spare2 field was replaced by
d_mmap_single(). I considered adding a new round of padding for 8.0.
However, since cdevsw already maintains a version field, new versions
can be handled without requiring the need for explicit padding fields.
John Baldwin [Thu, 25 Jun 2009 18:35:19 +0000 (18:35 +0000)]
Return errors from intr_event_bind() to the caller of intr_set_affinity().
Specifically, if a non-root user attempts to bind an interrupt the request
will now report failure with EPERM rather than silently failing with a
successful return code.
Robert Noland [Thu, 25 Jun 2009 18:27:08 +0000 (18:27 +0000)]
Some more cleanups for vblank code on Intel.
The Intel 2d driver calls modeset before reinstalling the handler on
a vt switch. This means that vblank status ends up getting cleared
after it has been setup. Restore saved values for the pipestat registers
rather than just wiping them out.
John Baldwin [Thu, 25 Jun 2009 18:13:46 +0000 (18:13 +0000)]
- Restore the behavior of pre-allocating IDT vectors for MSI interrupts.
This is mostly important for the multiple MSI message case where the
IDT vectors for the entire group need to be allocated together. This
also restores the assumptions made by the PCI bus code that it could
invoke PCIB_MAP_MSI() once MSI vectors were allocated.
- To avoid whiplash with CPU assignments, change the way that CPUs are
assigned to interrupt sources on activation. Instead of assigning the
CPU via pic_assign_cpu() before calling enable_intr(), allow the
different interrupt source drivers to ask the MD interrupt code which
CPU to use when they allocate an IDT vector. I/O APIC interrupt pins
do this in their pic_enable_intr() routines giving the same behavior as
before. MSI sources do it when the IDT vectors are allocated during
msi_alloc() and msix_alloc().
- Change the intr_table_lock from an sx lock to a mutex.