yongari [Fri, 8 Oct 2010 18:19:05 +0000 (18:19 +0000)]
MFC r213306:
Rename rl_setmulti() to rl_rxfilter() as rl_rxfilter() will handle
IFF_ALLMULTI/IFF_PROMISC as well as multicast filter configuration.
Rewrite RX filter logic to reduce number of register accesses and
make it handle promiscuous/allmulti toggling without controller
reinitialization.
Previously rl(4) counted on controller reinitialization to reprogram
promiscuous configuration but r211767 resulted in avoiding
controller reinitialization whenever promiscuous mode is toggled.
To address this, keep track of driver's view of interface state and
handle IFF_ALLMULTI/IFF_PROMISC changes without reinitializing
controller. This should fix a regression introduced in r211267.
While I'm here remove unnecessary variable reassignment in ioctl
handler.
emaste [Fri, 8 Oct 2010 14:59:14 +0000 (14:59 +0000)]
MFC r213013:
Move test for zero bufp or size before rseq and wseq calculation. This
avoids spinning in an infinite loop for some (possibly corrupt?) core
files at work.
emaste [Fri, 8 Oct 2010 14:56:39 +0000 (14:56 +0000)]
MFC r212570:
Allow a kernel config to specify a set but empty value via
'makeoptions OPTION=' for consistency with the make commandline.
Previously 'makeoptions WERROR=' would result in a syntax error; now
it produces the same effect as 'makeoptions WERROR'. Both forms now
result in 'WERROR=' in the generated Makefile.
delphij [Thu, 7 Oct 2010 00:36:58 +0000 (00:36 +0000)]
MFC r211059:
Address an edge condition that we found at work, where the carp(4)
interface goes to issue LINK_UP, then LINK_DOWN, then LINK_UP at
cold boot. This behavior is not observed when carp(4) interface
is created slightly later, when the underlying interface is fully
up.
Before this change what happen at boot is roughly:
- ifconfig creates em0 interface;
- ifconfig clones a carp device using em0;
(em0's link state is DOWN at this point)
- carp state: INIT -> BACKUP [*]
- carp state: BACKUP -> MASTER
- [Some negotiate between em0 and switch]
- em0 kicks up link state change event
(em0's link state is now up DOWN at this point)
- do_link_state_change() -> carp_carpdev_state()
- carp state: MASTER -> INIT (via carp_set_state(sc, INIT)) [+]
- carp state: INIT -> BACKUP
- carp state: BACKUP -> MASTER
At the [*] stage, em0 did not received any broadcast message from other
node, and assume our node is the master, thus carp(4) sets the link
state to "UP" after becoming a master. At [+], the master status
is forcely set to "INIT", then an election is casted, after which our
node would actually become a master.
We believe that at the [*] stage, the master status should remain as
"INIT" since the underlying parent interface's link state is not up.
Obtained from: iXsystems, Inc.
Reported by: jpaetzel
delphij [Thu, 7 Oct 2010 00:29:07 +0000 (00:29 +0000)]
MFC r213044:
In the past gunzip(1) write()'s after each inflate return. This is
not optimal from a performance standpoint since the write buffer is
not necessarily be filled up when the inflate rountine reached the
end of input buffer and it's not the end of file.
This problem gets uncovered by trying to pipe gunzip -c output to
a GEOM device directly, which enforces the writes be multiple of
sector size.
Sponsored by: iXsystems, Inc.
Reported by: jpaetzel
kib [Wed, 6 Oct 2010 10:00:37 +0000 (10:00 +0000)]
MFC r212998:
For sparc64 relocations that directly put bits of the symbol value into
the location, apply elf_relocaddr to the symbol value to have right
values for the symbols from dpcpu segment.
marius [Mon, 4 Oct 2010 20:13:19 +0000 (20:13 +0000)]
MFC: r213105, r213147
Improve r56796; the reply handler actually may remove the request from
the chain in which case it shouldn't be removed twice.
Reported by: Staale Kristoffersen
marius [Mon, 4 Oct 2010 20:02:48 +0000 (20:02 +0000)]
MFC: r213102
Remove the duplicate logging of failed read requests, whose error message
also was inappropriate as it triggered for every EACCESS and ENOTFOUND, not
just the case the -n option is intended to deal with and thus really spammed
us with ~20 messages in the default configuration when booting a diskless
FreeBSD client, introduced with r207608 (commited to stable/8 in 213038)
again.
MFC r213174:
Some schemes can allocate memory for internal purposes but when
GEOM does withering this memory doesn't freed. Add G_PART_DESTROY
call to g_part_wither. Also add missed g_free() call to G_PART_READ
method for MBR and PC98 schemes.
MFC r203261 (by marcel):
Export the UUID of the partition in the XML. The partition UUID is used
by EFI's device path to identify a partition. In order for FreeBSD to
add EFI boot options, proper device paths need to be constructed.
kib [Sat, 2 Oct 2010 17:41:47 +0000 (17:41 +0000)]
MFC r212824:
Adopt the deferring of object deallocation for the deleted map entries
on map unlock to the lock downgrade and later read unlock operation.
MFC r212868 (by alc) [1]:
Make refinements to r212824. Redo the implementation of
vm_map_unlock_and_wait().
MFC r212732:
Fix panic, when due to some kind of congestion on FIS-based switching
port multiplier some command triggers false positive timeout, but then
completes normally.
In the case of non-sequential mappings, ofw_mapmem() could ask Open
Firmware to map a memory region with negative length, causing crashes
and Undefined Behavior. Add the appropriate check to make the behavior
defined.
Fix a problem where device detection would work unreliably on Serverworks
K2 SATA controllers. The chip's status register must be read first, and
as a long, for other registers to be correctly updated after a command, and
this includes the command sequence in device detection as well as the
previously handled case after interrupts. While here, clean up some
previous hacks related to this controller.
MFC r211913:
Do not allocate multicast array memory in multicast filter
configuration function. For failed memory allocations, em(4)/lem(4)
called panic(9) which is not acceptable on production box.
igb(4)/ixgb(4)/ix(4) allocated the required memory in stack which
consumed 768 bytes of stack memory which looks too big.
To address these issues, allocate multicast array memory in device
attach time and make multicast configuration success under any
conditions. This change also removes the excessive use of memory in
stack.
MFC r211909:
If em(4) failed to allocate RX buffers, do not call panic(9).
Just showing some buffer allocation error is more appropriate
action for drivers. This should fix occasional panic reported on
em(4) when driver encountered resource shortage.
MFC r212755:
Fix incorrect RX BD producer updates. The producer index was
already updated after allocating mbuf so driver had to use the last
index instead of using next producer index. This should fix driver
hang which may happen under high network load.
Reported by: Igor Sysoev <is <> rambler-co dot ru>, Vlad Galu <dudu <> dudu dot ro>
Tested by: Igor Sysoev <is <> rambler-co dot ru>, Vlad Galu <dudu <> dudu dot ro>
MFC 212902:
Tweak the stats exported by the e1000 drivers:
- Add a single sysctl procedure to all three drivers to read an arbitrary
register (the register is passed as arg2). Use it to replace existing
routines in igb(4) that used a separate routine for each register, and
to add support for missing stats in em(4) and lem(4).
- Move the 'rx_overruns' and 'watchdog_timeouts' stats out of the MAC stats
section as they are driver stats, not MAC counters.
- Simplify the code that creates per-queue stats in igb(4) to use a single
loop and remove duplicated code.
- Properly read all 64 bits of the 'good octets received/transmitted' in
em(4) and lem(4).
- Actually read the interrupt count registers in em(4), and drop the
'host to card' sysctl stats from em(4) as they are not implemented in
any of the hardware this driver supports.
- Restore several stats to em(4) that were lost in the earlier stats
conversion including per-queue stats.
- Export several MAC stats in em(4) that were exported in igb(4) but not
in em(4).
- Export stats in lem(4) using individual sysctls as in em(4) and igb(4).
MFC: r212834
Fix nfsrv_freeallnfslocks() in the experimental NFSv4 server so that
it frees local locks correctly upon close. In order for
nfsrv_localunlock() to work correctly, the lock can no longer be in
the lockowner's stateid list. As such, nfsrv_freenfslock() has to
be called before nfsrv_localunlock(), to get rid of the lock structure
on the lockowner's stateid list. This only affected operation when
local locks (vfs.newnfs.enable_locallocks=1) are enabled, which is
not the default at this time.
MFC: r212833
Fix the experimental NFSv4 server so that it performs local VOP_ADVLOCK()
unlock operations correctly. It was passing in F_SETLK instead of
F_UNLCK as the operation for the unlock case. This only affected
operation when local locking (vfs.newnfs.enable_locallocks=1) was enabled.
MFC: r212443
This patch applies one of the two fixes suggested by
zack.kirsch at isilon.com for a race between nfsrv_freeopen()
and nfsrv_getlockfile() in the experimental NFS server that
he found during testing. Although nfsrv_freeopen() holds a
sleep lock on the lock file structure when called with
cansleep != 0, nfsrv_getlockfile() could still search the
list, once it acquired the NFSLOCKSTATE() mutex. I believe
that acquiring the mutex in nfsrv_freeopen() fixes the race.
MFC: r212439
Fix the NFSVNO_CMPFH() macro in the experimental NFS server so
that it works correctly for ZFS file handles. It is possible to
have two ZFS file handles that differ only in the bytes in the
fid_reserved field of the generic "struct fid" and comparing the
bytes in fid_data didn't catch this case. This patch changes the
macro to compare all bytes of "struct fid".
MFC: r212362
Fix the experimental NFS client so that it doesn't panic when
NFSv2,3 byte range locking is attempted. A fix that allows the
nlm_advlock() to work with both clients is in progress, but
may take a while. As such, I am doing this commit so that
the kernel doesn't panic in the meantime.
Don't attempt to write label with GEOM_BSD based method if the class is
not available. This improves error reporting when bsdlabel(8) is unable
to open a device for writing. If GEOM_BSD was unavailable, only a rather
obscure error message "Class not found" was printed.
In setusercontext(), do not apply user settings unless running as the
user in question (usually but not necessarily because we were called
with LOGIN_SETUSER). This plugs a hole where users could raise their
resource limits and expand their CPU mask.
MFC r211765:
Remove unnecessary controller reinitialization.
CAM filter handling was rewritten long time ago so it should not
require controller reinitialization.
MFC r211764:
Add PNP id for Compex RL2000.
I'm not sure whether adding this logical id is correct or not
because Compex RL2000 is in the list of supported hardware list.
I guess the Compex RL2000 could be PCI variant while the controller
in question is ISA controller. It seems PNP compat id didn't match
or it had multiple compat ids so isa_pnp_probe() seemed to return
ENOENT.
MFC r211717:
Implement basic WOL support. Note, not all xl(4) controllers
support WOL. Some controllers require additional 3-wire auxiliary
remote wakeup connector to draw power. More recent xl(4)
controllers may not need the wakeup connector though.
MFC r211716:
Move xl_reset() to xl_init_locked(). This will make driver
initialize controller from a known good state. Previously driver
used to issue controller reset while TX/RX DMA are in progress.
I guess resetting controller in active TX/RX DMA cycle is to ensure
stopping I/Os in xl_shutdown(). I remember some buggy controllers
didn't respond with stop command if controller is under high
network load at the time of shutdown so resetting controller was
the only safe way to stop the I/Os. However, from my experiments,
controller always responded with stop command under high network
load so I think it's okay to remove the xl_reset() in
device_shutdown handler.
Resetting controller also will clear configured RX filter which
in turn will make WOL support hard because driver have to reprogram
RX filter in WOL handler as well as setting station address.
MFC r211596:
It seems all Broadcom controllers have a bug that can generate UDP
datagrams with checksum value 0 when TX UDP checksum offloading is
enabled. Generating UDP checksum value 0 is RFC 768 violation.
Even though the probability of generating such UDP datagrams is
low, I don't want to see FreeBSD boxes to inject such datagrams
into network so disable UDP checksum offloading by default. Users
still override this behavior by setting a sysctl variable or loader
tunable, dev.bge.%d.forced_udpcsum.
I have no idea why this issue was not reported so far given that
bge(4) is one of the most commonly used controller on high-end
server class systems. Thanks to andre@ who passed the PR to me.
Attempt to autodetect the cype of chipset, rather than storing this
within the device table. This code uses the same algorithm as used in
the Linux, NetBSD and DragonflyBSD driver.
While investigating this, it became apparent that the Linux driver
always initialises the device, and not just in the PL2303HX case.
Change uplcom(4) to do the same.
This change allows us to synchronize our device ID list with Linux and
NetBSD, without requiring knowledge of the chipset in use.
mdoc: move remaining sections into consistent order
This pertains mostly to FILES, HISTORY, EXIT STATUS and AUTHORS sections.
Found by: mdocml lint run
Reviewed by: ru
r210368:
Actually, only the fullsync mode is implemented, not memsync mode.
Correct manual page.
r210702:
Spelling fixes.
r210869:
Add an argument to the proto_register() function which allows protocol to
declare it is the default and be placed at the end of the queue so it is
checked last.
r210870:
Now that TCP will be checked last we don't need any knowledge about other
protocols.
r210872:
Mark two more places that we won't reach.
r210873:
Keep $FreeBSD$ in __FBSDID() only for C files.
r210875:
Problem with assertion is that it logs on stderr. Add two macros:
PJDLOG_ASSERT() and PJDLOG_VERIFY() that will check the given condition
and log the problem where appropriate. The difference between those
two is that PJDLOG_VERIFY() always work and PJDLOG_ASSERT() can be
turned off by defining NDEBUG.
r210876:
Assert that various buffers we are large enough.
r210879:
- Use pjdlog_exitx() to log errors and exit instead of errx().
- Use 'unable to' (instead of 'cannot') consistently.
r210880:
Reset signal handlers after fork().
r210881:
Allow to use 'none' keywork as remote address in case second cluster node
is not setup yet.
r210882:
Make control_set_role() more public. We will need it soon.
r210883:
Prepare configuration parsing code to be called multiple times:
- Don't exit on errors if not requested.
- Don't keep configuration in global variable, but allocate memory for
configuration.
- Call yyrestart() before yyparse() so that on error in configuration file
we will start from the begining next time and not from the place we left of.
r210886:
Implement configuration reload on SIGHUP. This includes:
- Load added resources.
- Stop and forget removed resources.
- Update modified resources in least intrusive way, ie. don't touch
/dev/hast/<name> unless path to local component or provider name were
modified.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r210892:
Document 'none' value for remote.
Reviewed by: dougb
r211397:
Fix typos, spelling, formatting and mdoc mistakes found by Nobuyuki while
translating these manual pages. Minor corrections by me.
The 'size' variable is there to limit how many bytes we want to copy from
'addr'. It is very likely that size of 'addr' is larger than 'size', so checking
strlcpy() return value is bogus.
r211452:
For some setups sending data in 128kB chunks makes communication very slow. No
idea why. 32kB on the other hand seems to work properly everywhere.
Reported by: Thomas Steen Rasmussen <thomas@gibfest.dk>
r211875:
Make comment more readable.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211876:
Add mtx_owned() implementation.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211877:
Add QUEUE_INSERT() and QUEUE_TAKE() macros that simplify the code a bit.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211878:
We have sync_start() function to start synchronization, introduce sync_stop()
function to stop it.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211879:
Log that synchronization was interrupted in a proper place.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211880:
Don't increase number synchronized bytes in case of an error.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211881:
- Remove redundant and incorrect 'old' word from debug message.
- Log disconnects as warnings.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211882:
Implement keepalive mechanism inside HAST protocol so we can detect secondary
node failures quickly for HAST resources that are rarely modified.
Remove XXX from a comment now that the guard thread never sleeps infinitely.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211883:
Reduce indent where possible.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211884:
When logging to stdout/stderr don't close those descriptors after fork().
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211885:
- Run hooks in background - don't block waiting for them to finish.
- Keep all hooks we're running in a global list, so we can report when
they finish and also report when they are running for too long.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211886:
Allow to execute specified program on various HAST events.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211887:
Document new 'exec' parameter.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211895:
Add hooks execution.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211896:
Check if no signals were delivered just before going to sleep.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211897:
Correct when we log interrupted synchronization.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211898:
When logging to stdout/stderr, flush after each log.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211899:
When SIGTERM or SIGINT is received, terminate worker processes.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211975:
Implement mtx_destroy() and rw_destroy().
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211976:
- Add hook_fini() which should be called after fork() from the main hastd
process, once it start to use hooks.
- Add hook_check_one() in case the caller expects different child processes
and once it can recognize it, it will pass pid and status to hook_check_one().
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211977:
Allow to run hooks from the main hastd process.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211978:
- Call hook on role change.
- Document new event.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211979:
Disconnect after logging errors.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211981:
- Move functionality responsible for checking one connection to separate
function to make code more readable.
- Be sure not to reconnect too often in case of signal delivery, etc.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211982:
Use sigtimedwait(2) for signals handling in primary process.
This fixes various races and eliminates use of pthread* API in signal handler.
Pointed out by: kib
With help from: jilles
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211983:
Execute hook when split-brain is detected.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r211984:
Execute hook when connection between the nodes is established or lost.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r212033:
Constify arguments we can constify.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r212034:
Use pjdlog_exit() before fork().
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r212036:
When someone gives NULL as data, assume this is because he want to declare
connection side only.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r212037:
We only want to know if descriptors are ready for reading.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r212038:
Because it is very hard to make fork(2) from threaded process safe (we are
limited to async-signal safe functions in the child process), move all hooks
execution to the main (non-threaded) process.
Do it by maintaining connection (socketpair) between child and parent
and sending events from the child to parent, so it can execute the hook.
This is step in right direction for others reasons too. For example there is
one less problem to drop privs in worker processes.
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r212046:
Mask only those signals that we want to handle.
Suggested by: jilles
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
r212049:
Forgot to add event.c and event.h in r212038.
Pointed out by: pluknet <pluknet@gmail.com>
Obtained from: Wheel Systems Sp. z o.o. http://www.wheelsystems.com
Add __dead2 to functions that we know they are going to exit.
r213003:
Sort includes.
r213004:
If we are unable to receive control message is most likely because the main
process died. Instead of entering infinite loop, terminate.
r213006:
Fix descriptor leaks: when child exits, we have to close control and event
socket pairs. We did that only in one case out of three.
r213007:
Fix possible deadlock where worker process sends an event to the main process
while the main process sends control message to the worker process, but worker
process hasn't started control thread yet, because it waits for reply from the
main process.
The fix is to start the control thread before sending any events.
Reported and fix suggested by: Mikolaj Golub <to.my.trociny@gmail.com>
r213008:
Assert that descriptor numbers are sane.
r213009:
Switch to sigprocmask(2) API also in the main process and secondary process.
This way the primary process inherits signal mask from the main process,
which fixes a race where signal is delivered to the primary process before
configuring signal mask.