Luigi Rizzo [Fri, 13 Apr 2012 16:32:33 +0000 (16:32 +0000)]
add the new memory allocator for netmap, which allocates memory
in small clusters instead of one big contiguous chunk.
This was already enabled in the previous commit.
Adrian Chadd [Fri, 13 Apr 2012 08:48:38 +0000 (08:48 +0000)]
Introduce the ability to grab local EEPROM data from the firmware(9)
interface.
* Introduce a device hint, 'eeprom_firmware', which is the name of firmware
to lookup.
* If the lookup succeeds, take a copy of it and use it as the eeprom data.
This isn't enabled by default - you have to define ATH_EEPROM_FIRMWARE.
I'll add it to the configuration variables in a later commit.
TODO:
* just keep a firmware reference in ath_softc, and remove the need to
waste the extra memory in having sc_eepromdata be a malloc()ed block.
Adrian Chadd [Fri, 13 Apr 2012 08:45:50 +0000 (08:45 +0000)]
(ab)Use the firmware API to store away EEPROM calibration data for
future use by the ath(4) driver.
These embedded devices put the calibration/PCI bootstrap data on the
on board SPI flash rather than on an EEPROM connected to the NIC.
For some boards, there's two NICs and two sets of EEPROM data in the
main SPI flash.
The particulars:
* Introduce ath_fixup_size, which is the size of the EEPROM area in
bytes.
* Create a firmware image with a name based on the PCI device identifier
(bus/slot/device/function).
* Hide some verbose debugging behind 'bootverbose'.
ath(4) can then use this to load in the EEPROM data.
Adrian Chadd [Fri, 13 Apr 2012 06:11:24 +0000 (06:11 +0000)]
Sync this code against what's in OpenWRT trunk.
* the openwrt code doesn't treat 0/0/0 any differently
from other bus/slot/func combinations.
* A "local write" function writes to the LCONF area, and
so I've added it.
* The PCI workaround at attach time uses this LCONF code,
which it already did ..
* .. but it is a 4 byte write, not a 2 byte write.
Even though it's PCIR_COMMAND which is a two byte PCI register.
Tested on: AR7161
TODO: The other two AR71xx derivatives
TODO: More thoroughly stare at the datasheets I do have
and if it indeed is incorrect, push fixes to both
FreeBSD and Linux/OpenWRT.
John Baldwin [Thu, 12 Apr 2012 21:34:58 +0000 (21:34 +0000)]
Update the ddb and gdb backends for the new 'trace_thread' hook.
It is implemented via db_trace_thread() for DDB and not implemented
for GDB. This should have been part of r234190.
Pointy hat to: jhb
Reported by: jkim
MFC after: 1 week
Peter Grehan [Thu, 12 Apr 2012 18:46:48 +0000 (18:46 +0000)]
Complete polled-mode operation by using a callout if the device will be
used in polled-mode. The callout invokes uart_intr, which rearms the timeout.
Implemented for bhyve, but generically useful for e.g. embedded bringup
when the interrupt controller hasn't been setup, or if it's not deemed
worthy to wire an interrupt line from a serial port.
John Baldwin [Thu, 12 Apr 2012 17:43:59 +0000 (17:43 +0000)]
- Extend the KDB interface to add a per-debugger callback to print a
backtrace for an arbitrary thread (rather than the calling thread).
A kdb_backtrace_thread() wrapper function uses the configured debugger
if possible, otherwise it falls back to using stack(9) if that is
available.
- Replace a direct call to db_trace_thread() in propagate_priority()
with a call to kdb_backtrace_thread() instead.
John Baldwin [Thu, 12 Apr 2012 14:49:25 +0000 (14:49 +0000)]
If a linker file contains at least one module, but all of the modules
fail to load (the MOD_LOAD event fails) during a kldload(2), unload the
linker file and fail the kldload(2) with ENOEXEC.
Luigi Rizzo [Thu, 12 Apr 2012 14:06:05 +0000 (14:06 +0000)]
Apparently the length field in advanced descriptors
does not include the CRC irrespective of the setting
of CRCSTRIP. The 82599 data sheets (sec. 7.1.6) say differently.
Very strange. Need to check what happens on legacy descriptors,
but for the time being this restores functionality.
John Baldwin [Thu, 12 Apr 2012 14:01:06 +0000 (14:01 +0000)]
Add OFED and the associated options and drivers to x86 LINT builds:
- Mark 'sdp' as requiring 'inet'.
- Always include "opt_inet.h" and "opt_inet6.h" and modify the IB
driver Makefiles to honor WITH/WITHOUT_INET/INET6/_SUPPORT options
to determine what should be enabled during a module build.
- Fix the mlxen(4) driver and the core IB code to compile without
if INET is disabled (including when both INET and INET6 are disabled).
John Baldwin [Thu, 12 Apr 2012 13:53:49 +0000 (13:53 +0000)]
Don't update if_obytes when transmitting packets. That is already done
in IFQ_HANDOFF() when the packet is passed to the start routine, so doing
it here resulted in double counting.
Remove block reallocation used to make room for the cylinder group
summary structure. From now on, when there is no room for it,
we simply allocate new one in a newly added cylinder group.
This patch removes a conditional in updcsloc(), reindents some code
there, and removes unused routines. I decided to do it this way instead
of disabling reallocation when the filesystem is live and leaving it
as it is otherwise, because this allows for removal of lots of complicated
and hard to test code. Also, conditionally disabling it would result
in a different layout in filesystems resized online and offline, which
would look somewhat weird.
Reviewed by: mckusick
No objections from: kib
Sponsored by: The FreeBSD Foundation
Read backup GPT header from the last LBA only when primary GPT header and
table aren't valid. If they are ok, use hdr_lba_alt value to read backup
header. This will make gptboot happy when GPT used atop of some GEOM
provider, e.g. GEOM_MIRROR.
Luigi Rizzo [Thu, 12 Apr 2012 11:27:09 +0000 (11:27 +0000)]
Some code restructuring to bring the memory allocator out of netmap.c
and make it easier to replace it with a different implementation.
On passing, also fix indentation.
NOTE: I know that #include "foo.c" is ugly, but the alternative
(add another entry to sys/conf/files, add a separate header with
structs and prototypes, and expose functions that are meant to
be private) looks even worse to me.
We need a more modular way to specify dependencies and build options.
Keep a copy of the original pointer returned by openpam_readline() so
we can free it later, instead of trying to free a pointer that points
to the end of the buffer.
Committed to head because this code no longer exists upstream.
Add thread-private flag to indicate that error value is already placed
in td_errno. Flag is supposed to be used by syscalls returning
EJUSTRETURN because errno was already placed into the usermode frame
by a call to set_syscall_retval(9). Both ktrace and dtrace get errno
value from td_errno if the flag is set.
Use the flag to fix sigsuspend(2) error return ktrace records.
Propagate the current state of rtld_bind_lock to dlopen_object() calls
through the filter loading call chain. This fixes attempts to
write-lock the already locked rtld_bind_lock when filter loading is
initiated by relocation of dlopening dso.
Andrew Thompson [Thu, 12 Apr 2012 01:07:17 +0000 (01:07 +0000)]
Set the proto to LAGG_PROTO_NONE before calling the detach routine so packets
are discarded, this is an issue because lacp drops the lock which may allow
network threads to access freed memory. Expand the lock coverage so the
detach/attach happen atomically.
Export vinactive() from kern/vfs_subr.c (e.g., make it no longer
static and declare its prototype in sys/vnode.h) so that it can be
called from process_deferred_inactive() (in ufs/ffs/ffs_snapshot.c)
instead of the body of vinactive() being cut and pasted into
process_deferred_inactive().
Only manipulate the PGA_EXECUTABLE flag on managed pages. This is a proxy
for whether the page is physical. On dense phys mem systems (32-bit),
VM_PHYS_TO_PAGE will not return NULL for device memory pages if device
memory is above physical memory even if there is no allocated vm_page.
Attempting to use the returned page could then cause either memory
corruption or a page fault.
John Baldwin [Wed, 11 Apr 2012 21:33:45 +0000 (21:33 +0000)]
Reapply r223198 which was reverted in the previous vendor import. Some
portions were already reapplied in r233708:
- Use a dedicated task to handle deferred transmits from the if_transmit
method instead of reusing the existing per-queue interrupt task.
Reusing the per-queue interrupt task could result in both an interrupt
thread and the taskqueue thread trying to handle received packets on a
single queue resulting in out-of-order packet processing.
- Call ether_ifdetach() earlier in igb_detach().
- Drain tasks and free taskqueues during igb_detach().
John Baldwin [Wed, 11 Apr 2012 20:57:41 +0000 (20:57 +0000)]
Allow device_busy() and device_unbusy() to be invoked while a device is
being attached. This is implemented by adding a new DS_ATTACHING state
while a device's DEVICE_ATTACH() method is being invoked. A driver is
required to not fail an attach of a busy device. The device's state will
be promoted to DS_BUSY rather than DS_ACTIVE() if the device was marked
busy during DEVICE_ATTACH().
Fix error in r233949. Synchronizing icaches on uncacheable pages turns out
not to be a good idea, and of course the PV entry list for a page is never
empty after the page has been mapped.
Luigi Rizzo [Wed, 11 Apr 2012 16:11:08 +0000 (16:11 +0000)]
A couple of changes related to ixgbe operation in netmap mode:
- add a sysctl, dev.netmap.ix_crcstrip, to control whether ixgbe should
strip the CRC on received frames. Defaults to 0, which keeps the CRC.
and improves performance when receiving min-sized (64-byte) frames.
This matters because min-sized frames is one of the standard
benchmarks for switches and routers, some chipsets seem to issue
read-modify-write cycles for PCIe transactions that are not a
full cache line, and a min-sized frame triggers the bug, resulting
in reduced throughput -- 9.7 instead of 14.88 Mpps -- and heavy
bus load.
- for the time being, always look for incoming packets on a select/poll
even if there has not been an interrupt in the meantime. This is
only a temporary workaround for a probable race condition in keeping
track of rx interrupts.
Add a couple of diagnostic vars to help studying the problem.
Ed Maste [Wed, 11 Apr 2012 15:42:02 +0000 (15:42 +0000)]
Support percent-encoded user and password
RFC 1738 specifies that any ":", "@", or "/" within a user name or
password in a URL is percent-encoded, to avoid ambiguity with the use
of those characters as URL component separators.
Luigi Rizzo [Wed, 11 Apr 2012 15:02:14 +0000 (15:02 +0000)]
Enable prefetching of descriptors on the TX ring, using the same
values as in the Intel driver 3.8.21 for linux. The fact that it
is standard in the above driver suggests that it has no bad side
effects.
But of course there must be a reason for enabling features, not
just "it does not harm", so here it is a good one:
Prefetching enables full line rate even using a single queue (14.88
Mpps, compared to ~12 Mpps without prefetch). This in turn is
terribly useful when one wants to schedule traffic.
For obvious reasons the difference is only visible with netmap
or other high speed solutions, but presumably the advantage
should be in the order of a fraction of a microsecond when
starting transmission on an empty queue.
- remove the length shortening on the path
- make the default prompt a bit more like scp
- make the user show as root even when using 'su' instead of 'su -'
- the key bindings didn't hurt anything but likely hide a bug
- merge history instead of overwriting it
It is a logical error that in carp_multicast_cleanup()
we look at count of addresses on a particular vhid, we
should account number of addresses on cif.
To achieve this we need to run carp_attach() and
carp_detach() under appropriate cif lock.
Back out r228476.
r228476 fixed superfluous link UP/DOWN messages but broke IPMI
access during boot. It's not clear why r228476 breaks IPMI and
should be revisited.
uart_cpu_amd64.c and uart_cpu_i386.c (under sys/dev/uart) are
identical now that the bus spaces are unified under sys/x86.
Replace them with a single uart_cpu_x86.c.
o delete uart_cpu_i386.c
o move uart_cpu_amd64.c to uart_cpu_x86.c
o update files.amd64 and files.i386 accordingly.
Do not restore the register holding the TLS pointer when doing various
usermode context switches (long jumps and ucontext operations). If these
are used across threads, multiple threads can end up with the same TLS base.
Madness will then result.
This makes behavior on PPC match that on x86 systems and on Linux.
Juli Mallett [Tue, 10 Apr 2012 17:37:24 +0000 (17:37 +0000)]
Back out r233646. Although it fixed most libgeom consumers under 32-bit
compatibility, it broke programs using devstat, under 32-bit compatibility and
not.
It's very difficult to fix the identifiers used by devstat, so this change is
simply being backed out. Since changes to 3rd-party code seem likely, and may be
necessary to properly fix 32-bit binaries on 64-bit kernel, it would seem better
to make more invasive changes to fix GEOM's problems with 32-bit compatibility in
general.
The right thing to do is to replace all of the use of pointers as opaque
identifiers with a fixed-size (64-bit or even 32-bit should be enough for tracking
unique GEOM elments) field. That probably maintains source compatibility with
most GEOM consumers, and allows xml2tree to make better assumptions about how to
decode the identifiers.
Jim Harris [Tue, 10 Apr 2012 16:33:19 +0000 (16:33 +0000)]
Queue CCBs internally instead of using CAM_REQUEUE_REQ status. This fixes
problem where userspace apps such as smartctl fail due to CAM_REQUEUE_REQ
status getting returned when tagged commands are outstanding when smartctl
sends its I/O using the pass(4) interface.
Sponsored by: Intel
Found and tested by: Ravi Pokala <rpokala at panasas dot com>
Reviewed by: scottl
Approved by: scottl
MFC after: 1 week
Jaakko Heinonen [Tue, 10 Apr 2012 15:59:37 +0000 (15:59 +0000)]
- Return EPERM from ufs_setattr() when an user without PRIV_VFS_SYSFLAGS
privilege attempts to toggle SF_SETTABLE flags.
- Use the '^' operator in the SF_SNAPSHOT anti-toggling check.
Flags are now stored to ip->i_flags in one place after all checks.
Adrian Chadd [Tue, 10 Apr 2012 06:25:11 +0000 (06:25 +0000)]
* Since the API changed along the -CURRENT path (december 2011),
add a FreeBSD_version check. It should work fine for compiling
on -HEAD, 9.x and 8.x.
* Conditionally compile the 11n options only when 11n is enabled.
The above changes allow the ath(4) driver to compile and run on
8.1-RELEASE (Hi old PC-BSD!) but with the 11n stuff disabled.
I've done a test against the net80211 and tools in 8.1-RELEASE.
The NIC used in testing is the AR2427 in an EEEPC.
Just to be clear - this change is to allow the -HEAD ath/hal/rate
code to run on 9.x _and_ 8.x with no source changes. However,
when running on earlier kernels, it should only be used for legacy
mode. (Don't define ATH_ENABLE_11N.)
CARP should be capable to run on if_bridge(4). Unfortunately,
this commit is not enough to enable CARP operation on
if_bridge(4), because the latter doesn't handle or even
initialize its ifp->if_link_state.
BSP is not added to the mask of valid target CPUs for interrupts
in set_apic_interrupt_ids(). Besides, set_apic_interrupts_ids() is not
called in the !SMP case too.
Fix this by:
- Adding the BSP as an interrupt target directly in cpu_startup().
- Remove an obsolete optimization where the BSP are skipped in
set_apic_interrupt_ids().
Reported by: jh
Reviewed by: jhb
MFC after: 3 days
X-MFC: r233961
Pointy hat to: me
Remove unused and wrong SA_PROC internal signal property.
The SA_PROC signal property indicated whether each signal number is directed
at a specific thread or at the process in general. However, that depends on
how the signal was generated and not on the signal number. SA_PROC was not
used.
- Introduce a cache-miss optimization for consistency with other
accesses of the cache member of vm_object objects.
- Use novel vm_page_is_cached() for checks outside of the vm subsystem.
sem_open: Make sure to fail an O_CREAT|O_EXCL open, even if that semaphore
is already open in this process.
If the named semaphore is already open, sem_open() only increments a
reference count and did not take the flags into account (which otherwise
happens by passing them to open()). Add an extra check for O_CREAT|O_EXCL.
PR: kern/166706
Reviewed by: davidxu
MFC after: 10 days
intpm: reflect the fact that SB800 and later AMD chipsets are not supported
They do not have compatible configuration registers in PCI configuration
space. Instead their configuration resides in AMD "PM I/O" space
(accessed via a pair of I/O space registers).
Alan Cox [Sun, 8 Apr 2012 17:00:46 +0000 (17:00 +0000)]
If a page belonging a reservation is cached, then mark the reservation so
that it will be freed to the cache pool rather than the default pool.
Otherwise, the cached pages within the reservation may be recycled sooner
than necessary.