kmacy [Sat, 22 Nov 2008 08:05:05 +0000 (08:05 +0000)]
- enable multiple transmit queues
- invert sense of hw.cxgb.singleq tunable to hw.cxgb.multiq
- don't wake up transmitting thread by default
- add per tx queue ifaltq to handle ALTQ
- remove several unused functions in cxgb_multiq.c
- add several sysctls: multiq_tx_enable, coalesce_tx_enable,
and wakeup_tx_thread
- this obsoletes the hw.cxgb.snd_queue_len as ifq is replaced
by a buf_ring
kmacy [Sat, 22 Nov 2008 05:55:56 +0000 (05:55 +0000)]
- bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions
- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own
- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers
- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
allow drivers to efficiently manage multiple hardware queues
(i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues
This work was supported by Bitgravity Inc. and Chelsio Inc.
gnn [Fri, 21 Nov 2008 19:22:25 +0000 (19:22 +0000)]
Several small additions to the Chelsio 10G driver.
1) Fix a bug in dealing with the Alerus 1006 PHY which prevented the
device from ever coming back up once it had been set to down.
2) Add a kernel tunable (hw.cxgb.snd_queue_len) which makes it possible
to give the device more than IFQ_MAXLEN entries in its send queue. The
default remains 50.
3) Add code to place the card'd identification and serial number into
its description (%desc) so that users can tell which card they have
installed.
imp [Fri, 21 Nov 2008 03:03:57 +0000 (03:03 +0000)]
Create a /dev/cardbus%d.cis, to be compatible with older versions of
the software. This is a trivial amount of code to keep wireless
monitoring software working... I plan on removing it in 9.0.
marius [Thu, 20 Nov 2008 18:44:09 +0000 (18:44 +0000)]
- According to OpenSolaris, CDMA flushing/syncing for Tomatillos
and XMITS has to be basically done in the same manner as for
the Sabres, i.e. only for devices behind PCI-PCI-bridges and
after a PIO read on the far side of the farest PCI-PCI-bridge.
Given that the Tomatillo documentation mentions no difference
to the Schizo bridges in this regard and this is also still
part of the procedure described Schizo documentation this
seems about right so adjust accordingly (the unconditional
CDMA flushing/syncing previously done was based on how Linux
behaves).
- Implement CDMA flushing/syncing for Schizo version >= 5,
which requires the workaround described in Schizo Errata I-23.
According to Schizo Errata I-13 it's just unusable with
version < 5 though. [1]
- Don't register the Schizo streaming buffer for now until it's
usage is sorted out according to the erratas.
- Register our interrupt filters with the revived INTR_FAST so
they these interrupts can even interrupt filters of device
drivers as necessary.
- Remove the comment regarding lack of newbus'ified bus_dma(9)
as being able to associate a DMA tag with a device would
allow to implement CDMA flushing/syncing in bus_dmamap_sync(9)
but that would totally kill performance. Given that for devices
not behind a PCI-PCI bridge the host-to-PCI bridges also only
do CDMA flushing/syncing based on interrupts there's no
additional disadvantage for polling(4) callbacks in the case
schizo(4) has to do the CDMA flushing/syncing but rather a
general problem.
luigi [Thu, 20 Nov 2008 14:57:09 +0000 (14:57 +0000)]
As reported in kern/118222, pxeboot in RELENG7 (and presumably
above) exhibits some misbehaviours on machines with AMD64 CPUs,
which at least in some cases I have tracked down to a heap overflow.
It is unclear whether it depends on the CPU or on the pxe bios
itself which may use more memory on AMD machines.
Noticeably a pxeboot compiled from 6.x sources works fine on all
machines I have tried so far, while a pxeboot compiled from 7.x
sources does not.
This patch is a first step in reducing the amount of memory used
while processing the configuration files read by the loader at boot
(some of them are quite large, 1700+ lines), and it does so by:
+ moving a buffer to static memory instead of allocating in the heap;
+ skipping empty lines;
+ reducing the amount of memory used for line descriptors;
Unfortunately there are several changes between 6.x and above,
affecting the compiler, the loader code itself, and libstand,
and it is not so straightforward to
These changes fix the behaviour on one motherboard with a
single-core AMD cpu, but are still not enough e.g on an Asus
M2N-VM (with a dual-core CPU).
I need to investigate the problem a bit more before figuring
out what should be committed to RELENG_7
imp [Thu, 20 Nov 2008 08:32:19 +0000 (08:32 +0000)]
damn. Always do make depend. Forgot to recompile main because of it,
so the changes for the struct cis -> struct tuple_list didn't get
made. They have been now.
imp [Thu, 20 Nov 2008 08:30:15 +0000 (08:30 +0000)]
Fix check for link target so we don't print cardbus CIS information twice.
Also, eliminate some magic constants and replace them with values from cis.h.
imp [Thu, 20 Nov 2008 08:20:53 +0000 (08:20 +0000)]
Restore now-useless ioctl as a roadmap. The original dumpcis code
assumed it had to toggle between attribute and common memory in the
cards. The kernel is supposed to cope with that automatically and
give us a tuple list. However, there's a number of details of how
that happens that's currently, ummm, magical and/or not implemented
for 16-bit PC Cards that have CIS_LONGLINK_C tuples in them (eg, mix
both attribute memory and common memory). Also, CIS_LOGNLINK_A
entries might not be handled completely correctly either, since there
can be gaps in the attribute vs common stuff.
All this will need to be corrected in the kernel. Once it is
corrected, dumpcis can be made even simpler in some ways, a little
more complicated in others once an API for presentation of CIS to
userland in these weird cases is settled upon.
imp [Thu, 20 Nov 2008 08:12:26 +0000 (08:12 +0000)]
The original programs that this code was lifted from (pccardd and
pccardc) parsed data to make decisions about stuff related to card
configuration.
The purely CIS dumping aspect of this program obviates the need for
such parsing. Save some space and don't parse the data anymore for
configuration purposes. Just parse it to print an interpreatation of
it.
marius [Wed, 19 Nov 2008 22:12:32 +0000 (22:12 +0000)]
Use the interrupt level right below PIL_FAST for executing interrupt
filters instead of PIL_FAST and allow special filters and handlers
for interrupts which need to be able to interrupt even filters, f.e.
bus error interrupts, to be registered with the revived INTR_FAST
at PIL_FAST.
marius [Wed, 19 Nov 2008 22:09:03 +0000 (22:09 +0000)]
Given that the buffer dcons_crom(4) exposes is used for both input
and output, set BUS_DMA_COHERENT when creating the DMA map used for
loading the buffer. As a side-effect this solves locking issues on
sparc64 when dcons(4) calls bus_dmamap_sync(9) while in an interrupt
filter, which are executed in a critical section, and iommu(4) has
to use a sleep lock when taking advantage of the streaming buffer.
Reported and tested by: kensmith
Approved by: simokawa
ed [Wed, 19 Nov 2008 21:07:33 +0000 (21:07 +0000)]
Make nmdm(4) use MPSAFE callouts.
For some reason the nmdm(4) driver doesn't use CALLOUT_MPSAFE, even
though we live in the MPSAFE TTY era. Add the CALLOUT_MPSAFE flags.
System survives.
raj [Wed, 19 Nov 2008 17:34:28 +0000 (17:34 +0000)]
Initial storage functionality for U-Boot support library.
- Only non-sliced bsdlabel style partitioning is currently supported (but provisions
are made towards GPT support, which should follow soon)
- Enable storage support in loader on ARM
dfr [Wed, 19 Nov 2008 16:39:01 +0000 (16:39 +0000)]
Add a GPT-aware variant of zfsboot which should be used in a similar manner
to gptboot, i.e. installed in a freebsd-boot partition using /sbin/gpart or
/sbin/gpt.
Tweak the /boot/loader ZFS support so that it can find ZFS pools that are
contained in GPT partitions.
dfr [Wed, 19 Nov 2008 16:04:07 +0000 (16:04 +0000)]
If we free the GPT partition list in bd_open_gpt() because of an error, don't
try to free it again in bd_closedisk(). While I'm here, fix a DEBUG print.
zec [Wed, 19 Nov 2008 09:39:34 +0000 (09:39 +0000)]
Change the initialization methodology for global variables scheduled
for virtualization.
Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions. As a rule,
initialization at instatiation for such variables should never be
introduced again from now on. Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.
Essentialy, this change should have zero functional impact. In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.
Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
Bugfix for Linux USB compat layer. Do not free non-generic FIFOs when
doing an alternate setting.
Cleanup USB IOCTL and USB reference handling.
Fix a corner case where USB-FS was left initialised after
setting a new configuration or alternate setting.
src/sys/dev/usb2/core/usb2_hub.c
Improvement: Check all USB HUB ports by default at least one time.
src/sys/dev/usb2/core/usb2_request.c
Bugfix: Make sure destination ASCII string is properly zero terminated
in all cases.
Improvement: Skip invalid characters instead of replacing with a dot.
jhb [Tue, 18 Nov 2008 23:19:43 +0000 (23:19 +0000)]
- Fix a typo in a comment.
- Whitespace fix.
- Remove #if 0'd BSD 4.x code for flushing busy buffers from a mountpoint
during an unmount. FreeBSD uses vflush() for this.
jhb [Tue, 18 Nov 2008 23:18:37 +0000 (23:18 +0000)]
When looking up the vnode for the device to mount the filesystem on,
ask NDINIT to return a locked vnode instead of letting it drop the
lock and return a referenced vnode and then relock the vnode a few
lines down. This matches the behavior of other filesystem mount routines.
jhb [Tue, 18 Nov 2008 21:01:54 +0000 (21:01 +0000)]
Allow device hints to wire the unit numbers of devices.
- An "at" hint now reserves a device name.
- A new BUS_HINT_DEVICE_UNIT method is added to the bus interface. When
determining the unit number of a device, this method is invoked to
let the bus driver specify the unit of a device given a specific
devclass. This is the only way a device can be given a name reserved
via an "at" hint.
- Implement BUS_HINT_DEVICE_UNIT() for the acpi(4) and isa(4) bus drivers.
Both of these busses implement this by comparing the resources for a
given hint device with the resources enumerated by ACPI/PnPBIOS and
wire a unit if the hint resources are a subset of the "real" resources.
- Use bus_hinted_children() for adding hinted devices on isa(4) busses
now instead of doing it by hand.
- Remove the unit kludging from sio(4) as it is no longer necessary.
mav [Tue, 18 Nov 2008 13:24:38 +0000 (13:24 +0000)]
Set of powerd enchancements:
1. Make it more SMP polite. Previous version uses average CPU load that
often leads to load underestimation. It make powerd with default
configuration unusable on systems with more then 2 CPUs. I propose to use
summary load instead of average one. IMO this is the best we can do without
specially tuned scheduler. Also as soon as measuring total load on SMP
systems is more useful then total idle, I have switched to it.
2. Make powerd's operation independent from number and size of frequency
levels. I have added internal frequency counter which translated into real
frequencies only on a last stage and only as good as gone. Some systems may
have only several power levels, while others - many of them, so adaptation
time with previous approach was completely different.
3. As part of previous I have changed adaptive mode to rise frequency on
demand up to 2 times and fall on 1/8 per time internal.
4. For desktop (AC-powered) systems I have added one more mode - "hiadaptive".
It rises frequency twice faster, drops it 4 times slower, prefers twice
lower CPU load and has additional delay before leaving the highest frequency
after the period of maximum load. This mode was specially made to improve
interactivity of the systems where operation capabilities are more
significant then power consumption, but keeping maximum frequency all the
time is not needed.
5. I have reduced default polling interval from 1/2 to 1/4 of second.
It is not so important for algorithm math now, but gives better system
interactivity.
marcel [Tue, 18 Nov 2008 05:55:58 +0000 (05:55 +0000)]
Partition type FS_UNUSED does not mean the partition entry
is unused. Unused partition entries have a partition size
of zero. Therefore, partitions can have type FS_UNUSED.
jhb [Tue, 18 Nov 2008 05:41:34 +0000 (05:41 +0000)]
When checking to see if another CPU is running its idle thread, examine
the thread running on the other CPU instead of the thread being placed on
the run queue.
marcel [Tue, 18 Nov 2008 04:04:01 +0000 (04:04 +0000)]
Use humanize_number(), rather than a home-grown algorithm for
formatting a number in a human-friendly way.
Note that with this commit a megabyte changed from 1000000 to 1048576 and a 80G disk is now printed as being 75G in size.
This is deliberate. It's consistent with the core of geom(8).
However, the original choice for a megabyte being 1000000 was
on purpose and matches what disk vendors put on the box. The
consistency is considered more important.
pjd [Mon, 17 Nov 2008 20:49:29 +0000 (20:49 +0000)]
Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.
This bring huge amount of changes, I'll enumerate only user-visible changes:
- Delegated Administration
Allows regular users to perform ZFS operations, like file system
creation, snapshot creation, etc.
- L2ARC
Level 2 cache for ZFS - allows to use additional disks for cache.
Huge performance improvements mostly for random read of mostly
static content.
- slog
Allow to use additional disks for ZFS Intent Log to speed up
operations like fsync(2).
- vfs.zfs.super_owner
Allows regular users to perform privileged operations on files stored
on ZFS file systems owned by him. Very careful with this one.
- chflags(2)
Not all the flags are supported. This still needs work.
- ZFSBoot
Support to boot off of ZFS pool. Not finished, AFAIK.
Submitted by: dfr
- Snapshot properties
- New failure modes
Before if write requested failed, system paniced. Now one
can select from one of three failure modes:
- panic - panic on write error
- wait - wait for disk to reappear
- continue - serve read requests if possible, block write requests
- Refquota, refreservation properties
Just quota and reservation properties, but don't count space consumed
by children file systems, clones and snapshots.
- Sparse volumes
ZVOLs that don't reserve space in the pool.
- External attributes
Compatible with extattr(2).
- NFSv4-ACLs
Not sure about the status, might not be complete yet.
raj [Mon, 17 Nov 2008 16:37:04 +0000 (16:37 +0000)]
gdb: Remove arm_pc_is_thumb_dummy() and related code.
This is basically an import of the following gdb change:
http://sourceware.org/ml/gdb-cvs/2005-03/msg00143.html (which in effect fixes
problems with gracefully closing down the non-Thumb program being debugged).
imp [Mon, 17 Nov 2008 01:32:29 +0000 (01:32 +0000)]
Overhaul of CIS parsing, next step: keep a cached copy of the CIS,
read before we configure the card, so we can implement
/dev/cardbus*.cis. Also, do this on a per-child basis, so we now have
a different name than before. I think i'll have to fix that for some
legacy tools to keep working.
I can now do a dumpcis on my running atheros card and have it still work!
kib [Sun, 16 Nov 2008 21:56:29 +0000 (21:56 +0000)]
Revert r184118. There is actually a code in the kernel, for instance in
kern_unlinkat(), that expects that vn_start_write() actually fills the mp
even when the call failed.
As Tor noted, that pattern relies on the the type stability of the mount
points, as well as that suspended mount points are never freed and
V_XSLEEP is always passed to vn_start_write() when called on a freed
mount point.
marius [Sun, 16 Nov 2008 19:53:49 +0000 (19:53 +0000)]
- Allow the front-end to specify that iommu(4) should disable
rerun of the streaming cache for silicon bug workarounds.
- Announce the presence of a streaming cache on attach for
informational purposes.
- For performance reasons don't do unnecessary flushes of the
streaming cache when coherent mappings are synced.
- Fix some minor style issues.
marius [Sun, 16 Nov 2008 19:30:17 +0000 (19:30 +0000)]
Use the spitfire VIS block copy/zero functions also with cheetah-
class CPUs. In theory one could also use versions additionally
taking advantage of the prefetch cache with cheetah-class CPUs,
in my worldstone runs these either didn't provide extra speedup
(USIII+) in comparison to the existing spitfire versions or were
even slightly slower (USIIIi) though, so they aren't committed
for now.
The basic problem leading to the VIS-based copy/zero functions
being initially disabled for cheetah-class CPUs was solved by
letting cheetah_init() clear DCR_IFPOE.
marius [Sun, 16 Nov 2008 19:28:55 +0000 (19:28 +0000)]
Micro-optimize spitfire_block_{copy,zero}():
- Predict the loop as taken as it's more likely that there's still
data to copy and memory to zero respectively.
- Don't waste the delay slot.
marius [Sun, 16 Nov 2008 18:30:16 +0000 (18:30 +0000)]
- For maximum flexibility, sparc64 supports BUS_DMA_COHERENT also
with bus_dmamap_create() and not only bus_dmamem_alloc() so move
the description of this flag up accordingly in order to document
this fact. While at, it refine this description with an application
example.
- Reword the description of BUS_DMA_NOCACHE as this flag is also
implemented on sparc64.
ed [Sun, 16 Nov 2008 14:43:33 +0000 (14:43 +0000)]
Add a comment to utmp.h about the sizes of UT_HOSTSIZE and UT_LINESIZE.
UT_HOSTSIZE and UT_LINESIZE are too small right now. If we ever bump
UT_HOSTSIZE, we must not forget to increase UT_LINESIZE as well. If we
add a comment, we're pretty sure we increase both values at the same
time.
peter [Sat, 15 Nov 2008 22:23:07 +0000 (22:23 +0000)]
On i386, the primary function that SYSCALL() generates is with the
__sys_ prefix. Make END() match. This didn't cause a compile error, but
the function size is attached to the .weak symbol, not the real one.