Jeff Roberson [Thu, 7 Apr 2011 03:19:10 +0000 (03:19 +0000)]
- Don't invalidate jnewblks immediately upon discovering that the block
will be removed. Permit the journal to proceed so that we don't leave
a rollback in a cg for a very long time as this can cause terrible perf
problems in low memory situations.
Jung-uk Kim [Wed, 6 Apr 2011 23:59:59 +0000 (23:59 +0000)]
Implement atomic_load_acq_64(9) and atomic_store_rel_64(9) for i386. These
functions are implemented with CMPXCHG8B instruction where it is available,
i. e., all Pentium-class and later processors. Note this instruction is
also used for atomic_store_rel_64() because a simple XCHG-like instruction
for 64-bit memory access does not exist, unfortunately. If the processor
lacks the instruction, i. e., 80486-class CPUs, two 32-bit load/store are
performed with interrupt temporarily disabled, assuming it does not support
SMP. Although this assumption may be little naive, it is true in reality.
This implementation is inspired by Linux.
Complete WITHOUT_CXX support. It implies WITHOUT_GROFF and
WITHOUT_CLANG.
Don't build clang bootstrap/build-tools depending on this flag. We also
keep gperf, devd and libstdc++ around to prevent foot-shooting and to
make this a two-way street.
Andrew Gallatin [Wed, 6 Apr 2011 15:45:32 +0000 (15:45 +0000)]
Implement mxge_init()
This fixes a long standing bug in mxge(4) where "ifconfig mxge0 $IP"
did not bring the interface into a RUNNING state, like it does on
most (all?) other FreeBSD NIC drivers.
Thanks to gnn for mentioning the bug, and yongari for pointing out that
ether_ioctl() invokes ifp->if_init() in SIOCSIFADDR.
Correct 'list scan' description in the examples. The previous description
was incorrect - 'list scan' does not actually do a scan, but instead lists
the results of the background 'scan' cache.
Submitted by: Fabian Keil (freebsd-listen of fabiankeil de) (via email)
Discussed with: bschmidt
MFC after: 3 days
- Removed multiple console error messages and replaced with statistic
counters to reduce spew.
- Fixed a TSO problem when an mbuf contains both header and payload in
the same cluster.
Of course, strerror_r() may still fail with ERANGE.
Although the POSIX specification said this could fail with EINVAL and
doing this likely indicates invalid use of errno, most other
implementations permitted it, various POSIX testsuites require it to
work (matching the older sys_errlist array) and apparently some
applications depend on it.
Jack F Vogel [Tue, 5 Apr 2011 21:55:43 +0000 (21:55 +0000)]
Important update for the igb driver:
- Add the change made in em to the actual unrefreshed number
of descriptors is used as a basis in rxeof on the way out
to determine if more refresh is needed. NOTE: there is a
difference in the ring setup in igb, this is not accidental,
it is necessitated by hardware behavior, when you reset the
newer adapters it will not let you write RDH, it ALWAYS sets
it to 0. Thus the way em does it is not possible.
- Change the sysctl handling of flow control, it will now make
the change dynamically when the variable setting changes rather
than requiring a reset.
- Change the eee sysctl naming, validation found the old unintuitive :)
- Last but not least, some important performance tweaks in the TX
path, I found that UDP behavior could be drastically hindered or
improved with just small changes in the start loop. What I have
here is what testing has shown to be the best overall. Its interesting
to note that changing the clean threshold to start at a full half of
the ring, made a BIG difference in performance. I hope that this
will prove to be advantageous for most workloads.
Be far more persistent in reclaiming blocks and inodes before giving
up and declaring a filesystem out of space. Especially necessary when
running on a small filesystem. With this improvement, it should be
possible to use soft updates on a small root filesystem.
Kudos to: Peter Holm
Testing by: Peter Holm
MFC: 2 weeks
* Add the readline(3) API to libedit. The libedit versions of
{readline,history}.h are in /usr/include/edit so as to not conflict with
the GNU libreadline versions. To use the libedit readline(3) one should
add "-I/usr/include/edit" to their Makefile
(spelled "-I${DESTDIR}/${INCLUDEDIR}/edit" within the FreeBSD source tree).
* Enable its use in the BSD licensed utilities that support readline(3).
* To make it easier to sync libedit development with NetBSD, histedit.h
is moved into libedit's directory as history shows shown we keep merging
it into that location.
Jung-uk Kim [Tue, 5 Apr 2011 18:40:19 +0000 (18:40 +0000)]
Lower the bar for ACPI-fast on real machines slightly. Empirical evidences
show that there are perfectly working PM timers with occasional "hiccups",
probably because of an SMI. Now we ignore the maximum if it happens once in
the test loop and the width is small enough. Also, relax normal width a bit
to count in a boundary case.
Add initial jumbo frame support for BCM5714/BCM5715 and BCM5780.
Unlike other controllers which have more advanced jumbo support,
these controllers have one send ring, one standard receive producer
ring and one receive return ring. In order to receive jumbo frames
on the controllers, driver now will increase Rx buffer size to 9k.
Two Rx modes are supported on these controllers and I chose
standard Rx BDs over extended Rx BDs. The extended Rx BD mode
allows up to 4 segmentations for each Rx BDs such that kernel does
not have to allocate large buffer of contiguous memory for
receiving. The extended Rx BD mode is already used on controllers
that have separate jumbo receive ring. However, using extended Rx
BDs on BCM5714/BCM5715/BCM5780 reduces the number of Rx BDs to 256
entries which in turn may reduce the performance. Also UMA backed
page allocator for jumbo frame returns contiguous memory so using
extended Rx BD has no advantage on FreeBSD unless highly customized
local allocator implemented in driver is used.
To use jumbo buffers in standard receive ring, Rx buffer allocation
handler was changed to allocate MJUM9BYTES sized mbuf.
John Baldwin [Tue, 5 Apr 2011 14:19:05 +0000 (14:19 +0000)]
Add the ability to manage the state of write caching when the battery
back-up is missing or dead. The current state of this field is reported
in 'mfiutil cache <volume>' and can be adjusted via
'mfiutil cache <volume> bad-bbu-write-cache <enable|disable>'. This
setting should generally be disabled to avoid data loss.
Extend the DDB command "watchdog" with the ability to specify a timeout
value.
The timeout is expressed in the form T(N) = (2^N * nanoseconds) and can
be easilly extracted from the watchdog interface as a WD_TO_* macro.
That new functionality is supposed to fix re-entering the kernel from DDB
re-enabling the watchdog again (previously disabled) and also offer the
possibility to break for deadlocked DDB commands.
Please note that retro-compatibility is retained.
Sponsored by: Sandvine Incorporated
Approved by: des
MFC after: 10 days
Warner Losh [Tue, 5 Apr 2011 08:49:47 +0000 (08:49 +0000)]
Make clang default on x86 and powerpc, but not on other architectures.
Make fdt default on arm and powerpc.
This now includes cross compiled targets, where before we tried to
make it host-based.
Also, move the lists of default yes and no options to a variable.
In general, only build tools should get this treatment in bsd.own.mk.
Also, the use of TARGET* in the bsd.*mk files is discouraged, but
necessary here due to the ordering of things in buildworld. We make
the native case work by testing MACHINE_ARCH after TARGET_ARCH.
Adrian Chadd [Tue, 5 Apr 2011 06:46:07 +0000 (06:46 +0000)]
if_arge has had a strange bug that only appears during high traffic
levels. TX would hang, RX wouldn't. A bit of digging showed the interface
send queue was full, but IFF_DRV_OACTIVE was clear and the hardware TX
queue was empty.
It turns out that there wasn't a check to drain the interface send
queue once hardware TX had completed, so if the interface send queue
had filled up in the meantime, subsequent packets would be dropped
by the higher layers and if_start (and thus arge_start()) would never
be called.
The fix is simple - call arge_start_locked() in the software interrupt
handler after the hardware TX queue has been handled or a TX underrun
occured. This way the interface send queue gets drained.
Jung-uk Kim [Mon, 4 Apr 2011 22:56:33 +0000 (22:56 +0000)]
Use cpu_ticks() for get_cyclecount(9) rather than checking existence of TSC
at run-time on i386. cpu_ticks() is set to use RDTSC early enough on i386
where it is available. Otherwise, cpu_ticks() is driven by the current
timecounter hardware as binuptime(9) does. This also avoids unnecessary
namespace pollution from <machine/cputypes.h>.
Roman Divacky [Mon, 4 Apr 2011 18:23:55 +0000 (18:23 +0000)]
Build boot2 with -mregparm=3, ie. pass upto 3 arguments via registers.
This modifies CFLAGS and tweaks sio.S to use the new calling convention.
The sio_init() and sio_putc() prototypes are modified so that other
users of this code know the correct calling convention.
This makes the code smaller when compiled with clang.
Reviewed by: jhb
Tested by: me and Freddie Cash <fjwcash gmail com>
Jung-uk Kim [Mon, 4 Apr 2011 17:00:50 +0000 (17:00 +0000)]
Lower the bar for ACPI-fast on virtual machines. The current logic depends
on the fact that real hardware has almost fixed cost to read the ACPI timer.
It is virtually always false for hardware emulation and it makes no sense to
read it multiple times, which is already quite expensive for full emulation.
Fix a long standing bug where file_load() passes down the global loadaddr
to the l_load() method in the file_formats structure, while being passed
an address as an argument (dest). With file_load() calling arch_loadaddr()
now, this bug is a little bit more significant.
Adrian Chadd [Mon, 4 Apr 2011 14:52:31 +0000 (14:52 +0000)]
Add a HAL capability bit for supporting self-linked RX descriptors and disable it for the 11n chipsets.
From the ath9k source:
==
11N: we can no longer afford to self link the last descriptor.
MAC acknowledges BA status as long as it copies frames to host
buffer (or rx fifo). This can incorrectly acknowledge packets
to a sender if last desc is self-linked.
==
Since this is useful for pre-AR5416 chips that communicate PHY errors
via error frames rather than by on-chip counters, leave the support
in there, but disable it for AR5416 and later.
When removing ifnets, we should first remove the reference to ifnet
from the interface index, then decrease refcount, not vice versa.
Otherwise there is a race (reproducible) when if_free_internal()
contests on IFNET_WLOCK(), and we got a zero-refed ifnet in the
index for a long time. It may be picked by some other thread,
that runs ifnet_byindex_ref(), who takes the ifnet from index,
and bumps refcount. When reader drops the lock, if_free_internal()
proceeds with free. Then reader tries to free it a second time.
Use the new arch_loadaddr I/F to align ELF objects to PBVM page
boundaries. For good measure, align all other objects to cache
lines boundaries.
Use the new arch_loadseg I/F to keep track of kernel text and
data so that we can wire as much of it as is possible. It is
the responsibility of the kernel to link critical (read IVT
related) code and data at the front of the respective segment
so that it's covered by TRs before the kernel has a chance to
add more translations.
Use a better way of determining whether we're loading a legacy
kernel or not. We can't check for the presence of the PBVM page
table, because we may have unloaded that kernel and loaded an
older (legacy) kernel after that. Simply use the latest load
address for it.
Add 2 new archsw interfaces:
1. arch_loadaddr - used by platform code to adjust the address at which
the object gets loaded. Implement PC98 using this new interface instead
of using conditional compilation. For ELF objects the ELF header is
passed as the data pointer. For raw files it's the filename. Note that
ELF objects are first considered as raw files.
2. arch_loadseg - used by platform code to keep track of actual segments,
so that (instruction) caches can be flushed or translations can be
created. Both the ELF header as well as the program header are passed
to allow platform code to treat the kernel proper differently from any
additional modules and to have all the relevant details of the loaded
segment (e.g. protection).
- Improvements to USB PF solution
- Add more fields for USB device and host mode
- Add more information to USB PF header so that decoding
can easily be done by software analyzer tools like
Wireshark.
- Optimise usbdump to display USB streams in text format
more efficiently.
- Software using USB PF must be recompiled after
this commit, due to structure changes.
In g_eli_read_done() and g_eli_write_done(), for a bio with
bio_children > 1, g_destroy_bio() is never called and the bio
leaks. Fix this by calling g_destroy_bio() earlier, before the check.
Adrian Chadd [Sun, 3 Apr 2011 14:39:55 +0000 (14:39 +0000)]
Import the initial CPU support for the MIPS RALink RT305x SoC.
This is a MIPS4KC CPU with various embedded peripherals, including
wireless and ethernet support.
This commit includes the platform, UART, ethernet MAC and GPIO support.
The interrupt-driven GPIO code is disabled for now pending GPIO changes
from the submitter.
Adrian Chadd [Sun, 3 Apr 2011 11:55:48 +0000 (11:55 +0000)]
Import nvram2env, a device driver which imports various NVRAM-style
environments into the kernel environment.
The eventual aim is to replace these with specific drivers for
the various bootloaders (redboot, uboot, etc.) This however will
work for the time being until it can be properly addressed.
Change for Africa/Casablanca:
- The 3rd april 2011 at 00:00:00, [it] will be 3rd april 1:00:00
- The 31th july 2011 at 00:59:59, [it] will be 31th July 00:00:00
Update for SouthAmerica/Chili:
- Chile's clocks will go back an hour this year on the 7th of May instead
of this Saturday. They will go forward again the 3rd Saturday in
August, not in October as they have since 1968. This is a pilot plan
which will be reevaluated in 2012.
Change for Africa/Casablanca:
- The 3rd april 2011 at 00:00:00, [it] will be 3rd april 1:00:00
- The 31th july 2011 at 00:59:59, [it] will be 31th July 00:00:00
Update for SouthAmerica/Chili:
- Chile's clocks will go back an hour this year on the 7th of May instead
of this Saturday. They will go forward again the 3rd Saturday in
August, not in October as they have since 1968. This is a pilot plan
which will be reevaluated in 2012.
Jeff Roberson [Sat, 2 Apr 2011 21:52:58 +0000 (21:52 +0000)]
Fix problems that manifested from filesystem full conditions:
- In softdep_revert_mkdir() find the dotaddref before we attempt to cancel
the jaddref so we can make assumptions about where the dotaddref is on
the list. cancel_jaddref() does not always remove items from the list
anymore.
- Always set GOINGAWAY on an inode in softdep_freefile() if DEPCOMPLETE
was never set. This ensures that dependencies will continue to be
processed on the inowait/bufwait list and is more an artifact of
the structure of the code than a pure ordering problem.
- Always set DEPCOMPLETE on canceled jaddrefs so that they can be freed
appropriately. This normally occurs when the refs are added to the
journal but if they are canceled before this point the state would
never be set and the dependency could never be freed.
There is a generic problem with the shims for ioctls that receive
pointers to the usermode data areas in the data argument. We either have
to modify the handler to accept UIO_USERSPACE/UIO_SYSSPACE indicator, or
allocate and fill a usermode memory for data buffer in the host format.
The change goes the second route, in particular because we do not need
to modify the handler.
Increase default timeout from 5 seconds to 20 seconds. 5 seconds is definitely
to short under heavy load and I was experiencing those timeouts in my recent
tests.
When we are operating on blocking socket and get EAGAIN on send(2) or recv(2)
this means that request timed out. Translate the meaningless EAGAIN to
ETIMEDOUT to give administrator a hint that he might need to increase timeout
in configuration file.
Declare directions for sockets between primary and secondary.
In HAST we use two sockets - one for only sending the data and one for only
receiving the data.
Allow to disable sends or receives on a socket using shutdown(2) by
interpreting NULL 'data' argument passed to proto_common_send() or
proto_common_recv() as a will to do so.
Because ggatel(8) operates on local GEOM providers, use unlimited queue size in
GEOM GATE to fix the issue described in r220264. This also means that we no
longer need -q option, remove it. Don't bother to leaving it as a no-op, as
ggatel(8) is just an example utility.
GEOM has an internal mechanism to deal with ENOMEM errors returned via
g_io_deliver(). In such case it increases 'pace' counter on each ENOMEM and
reschedules the request. The 'pace' counter is decreased for each request going
down, but until 'pace' is greater than zero, GEOM will handle at most 10
requests per second. For GEOM GATE users that are proxy to local GEOM providers
(like ggatel(8) and HAST) we can end up with almost permanent slow down of GEOM
down queue. This is because once we reach GEOM GATE queue limit, we return
ENOMEM to the GEOM. This means that we have, eg. 1024 I/O requests in the GEOM
GATE queue. To make room in the queue and stop returning ENOMEM we need to
proceed the requests of course, but those requests are handled by userland
daemons that handle them by reading/writing also from/to local GEOM providers.
For example with HAST, a new requests comes to /dev/hast/data, which is GEOM
GATE provider. GEOM GATE passes the request to hastd(8) and hastd(8)
reads/writes from/to /dev/da0. Once we reach GEOM GATE queue limit, to free up
a slot in GEOM GATE queue, hastd(8) has to read/write from/to /dev/da0, but
this request will also be very slow, because GEOM now slows down all the
requests. We end up with full queue that we can unload at the speed of 10
requests per second. This simply looks like a deadlock.
Fix it by allowing userland daemons that work with both GEOM GATE and local
GEOM providers to specify unlimited queue size, so GEOM GATE will never return
ENOMEM to the GEOM.
Gordon Tetlow [Sat, 2 Apr 2011 05:01:09 +0000 (05:01 +0000)]
Overhaul locale handling.
Use locale(1) to determine the locale instead of trying to hand roll it.
Correctly construct groff call based on charset and locale independently,
not the mix between the two.