nyan [Sun, 31 Oct 2010 08:14:52 +0000 (08:14 +0000)]
MFC: revision 208638
- Add an integer argument to idle to indicate how likely we are to wake
from idle over the next tick.
- Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
suspended in cpu specific states. This function can fail and cause the
scheduler to fall back to another mechanism (ipi).
- Implement support for mwait in cpu_idle() on i386/amd64 machines that
support it. mwait is a higher performance way to synchronize cpus
as compared to hlt & ipis.
- Allow selecting the idle routine by name via sysctl machdep.idle. This
replaces machdep.cpu_idle_hlt. Only idle routines supported by the
current machine are permitted.
bz [Sat, 30 Oct 2010 12:05:20 +0000 (12:05 +0000)]
MFC r213932:
MfP4 CH182763 (original version):
Make it harder to exploit certain in_control() related races between the
intiial lookup at the beginning and the time we will remove the entry
from the lists by re-checking that entry is still in the list before
trying to remove it.
Reported by: Nima Misaghian (nima_misa hotmail.com) on net@ 20100817
Tested by: Nima Misaghian (nima_misa hotmail.com) (original version)
PR: kern/146250
Submitted by: Mikolaj Golub (to.my.trociny gmail.com) (different version)
bz [Sat, 30 Oct 2010 11:54:55 +0000 (11:54 +0000)]
MFC r213930:
Close a race acquiring the IF_ADDR_LOCK() for each entry while iterating
over all interfaces to make sure the address will neither change nor be
freed while we are working on it.
alc [Sat, 30 Oct 2010 04:53:50 +0000 (04:53 +0000)]
MFC r213408
If vm_map_find() is asked to allocate a superpage-aligned region of
virtual addresses that is greater than a superpage in size but not a
multiple of the superpage size, then vm_map_find() is not always
expanding the kernel pmap to support the last few small pages being
allocated. Previously, we grew the kernel page table in
vm_map_findspace() when we found the first available virtual address.
Now, instead, we defer the call to pmap_growkernel() until we are
committed to a range of virtual addresses in vm_map_insert().
kib [Sat, 30 Oct 2010 01:19:15 +0000 (01:19 +0000)]
MFC r213916:
Provide vfs.ncsizefactor instead of hard-coding namecache ratio.
Move debug.ncnegfactor to vfs.ncnegfactor.
Provide some descriptions for the namecache related sysctls.
tuexen [Thu, 28 Oct 2010 19:10:31 +0000 (19:10 +0000)]
MFC r212799:
* Implement initial version of send buffer splitting.
* Make send/recv buffer splitting switchable via sysctl.
* While there: Fix some comments.
rrs [Thu, 28 Oct 2010 17:17:45 +0000 (17:17 +0000)]
MFC of 212225
Fix some CLANG warnings. One clang warning is left
due to the fact that its bogus.. nam->sa_family will
not change from AF_INET6 to AF_INET (but clang
thinks it does ;-D)
tuexen [Thu, 28 Oct 2010 17:04:32 +0000 (17:04 +0000)]
MFC 212099:
Fix the the SCTP_WITH_NO_CSUM option when used in combination with
interface supporting CRC offload. While at it, make use of the
feature that the loopback interface provides CRC offloading.
tuexen [Thu, 28 Oct 2010 17:02:36 +0000 (17:02 +0000)]
MFC 211969:
Fix the the SCTP_WITH_NO_CSUM option when used in combination with
interface supporting CRC offload. While at it, make use of the
feature that the loopback interface provides CRC offloading.
tuexen [Thu, 28 Oct 2010 16:58:12 +0000 (16:58 +0000)]
MFC 211944:
Fix the switching on/off of CMT using sysctl and socket option.
Fix the switching on/off of PF and NR-SACKs using sysctl.
Add minor improvement in handling malloc failures.
Improve the address checks when sending.
tuexen [Thu, 28 Oct 2010 16:53:54 +0000 (16:53 +0000)]
MFC r211030:
Fix a bug where MSG_TRUNC was not returned in all necessary cases for
SOCK_DGRAM socket. MSG_TRUNC was only returned when some mbufs could
not be copied to the application. If some data was left in the last
mbuf, it was correctly discarded, but MSG_TRUNC was not set.
edwin [Thu, 28 Oct 2010 00:54:18 +0000 (00:54 +0000)]
MFC of 214124
Fix printing of files located on ZFS filesystem with an st_dev or
st_ino larger than 2**31.
From the PR:
Printing from a ZFS filesystem using 'lp' fails and returns an
email reporting "Your printer job was not printed because it was
not linked to the original file".
In order to protect against files being switched when files
are printed using 'lp' or 'lpr -s', the st_dev and st_ino
values for the original file are saved by lpr and verified
by lpd before the file is printed. Unfortunately, lpr prints
both values using '%d' (although both fields are unsigned)
and lpd(8) assumes a string of decimal digits.
ZFS (at least) generates st_dev values greater than 2^31-1,
resulting in negative values being printed - which lpd cannot
parse, leading it to report that the file has been switched.
A similar problem would occur with large inode numbers.
How-To-Repeat:
Find a file with either st_dev or st_ino greater than 2^31-1
(stat(1) will report both numbers) and print it with 'lpq -s'.
This should generate an email reporting that the file could
not be printed because it was not linked to the original file
PR: bin/151567
Submitted by: Peter Jeremy <Peter.Jeremy@alcatel-lucent.com>
kib [Wed, 27 Oct 2010 16:01:57 +0000 (16:01 +0000)]
MFC r213983:
Document vunref(9), add some important notes for vrele(9) and vput(9).
Merge all three manpages to one, removing separate file for vput(9).
kib [Wed, 27 Oct 2010 15:57:17 +0000 (15:57 +0000)]
MFC r213716:
Add macro DECLARE_MODULE_TIED to denote a module as requiring the
kernel of exactly the same __FreeBSD_version as the headers module was
compiled against.
Mark our in-tree ABI emulators with DECLARE_MODULE_TIED. The modules
use kernel interfaces that the Release Engineering Team feel are not
stable enough to guarantee they will not change during the life cycle
of a STABLE branch. In particular, the layout of struct sysentvec is
declared to be not part of the STABLE KBI.
kib [Wed, 27 Oct 2010 15:44:49 +0000 (15:44 +0000)]
MFC r213664:
The r184588 changed the layout of struct export_args, causing an ABI
breakage for old mount(2) syscall, since most struct <filesystem>_args
embed export_args. The mount(2) is supposed to provide ABI
compatibility for pre-nmount mount(8) binaries, so restore ABI to
pre-r184588.
rmacklem [Wed, 27 Oct 2010 13:10:08 +0000 (13:10 +0000)]
MFC: r213756
Fix the krpc so that it can handle NFSv3,UDP mounts with a read/write
data size greater than 8192. Since soreserve(so, 256*1024, 256*1024)
would always fail for the default value of sb_max, modify clnt_dg.c
so that it uses the calculated values and checks for an error return
from soreserve(). Also, add a check for error return from soreserve()
to clnt_vc.c and change __rpc_get_t_size() to use sb_max_adj instead of
the bogus maxsize == 256*1024.
yongari [Wed, 27 Oct 2010 02:04:24 +0000 (02:04 +0000)]
MFC r213796:
Rewrite interrupt handler to give fairness for both RX and TX.
Previously rl(4) continuously checked whether there are RX events
or TX completions in forever loop. This caused TX starvation under
high RX load as well as consuming too much CPU cycles in the
interrupt handler. If interrupt was shared with other devices which
may be always true due to USB devices in these days, rl(4) also
tried to process the interrupt. This means polling(4) was the only
way to mitigate the these issues.
To address these issues, rl(4) now disables interrupts when it
knows the interrupt is ours and limit the number of iteration of
the loop to 16. The interrupt would be enabled again before exiting
interrupt handler if the driver is still running. Because RX buffer
is 64KB in size, the number of iterations in the loop has nothing
to do with number of RX packets being processed. This change
ensures sending TX frames under high RX load.
RX handler drops a driver lock to pass received frames to upper
stack such that there is a window that user can down the interface.
So rl(4) now checks whether driver is still running before serving
RX or TX completion in the loop.
While I'm here, exit interrupt handler when driver initialized
controller.
With this change, now rl(4) can send frames under high RX load even
though the TX performance is still not good(rl(4) controllers can't
queue more than 4 frames at a time so low TX performance was one of
design issue of rl(4) controllers). It's much better than previous
TX starvation and you should not notice RX performance drop with
this change. Controller still shows poor performance under high
network load but for many cases it's now usable without resorting
to polling(4).
Correct offset conversion to little endian. It was implemented in version 2,
but because of a bug it was a no-op, so we were still using offsets in native
byte order for the host. Do it properly this time, bump version to 4 and set
the G_ELI_FLAG_NATIVE_BYTE_ORDER flag when version is under 4.
Reported by: ivoras
r212845 (by brian):
Support attaching version 4 metadata
Reviewed by: pjd
r212846:
Fix indent.
r212934 (by brian):
Add a geli resize subcommand to resize encrypted filesystems prior
to growing the filesystem.
Refuse to attach providers where the metadata provider size is
wrong. This makes post-boot attaches behave consistently with
pre-boot attaches. Also refuse to restore metadata to a provider
of the wrong size without the new -f switch. The new -f switch
forces the metadata restoration despite the provider size, and
updates the provider size in the restored metadata to the correct
value.
Helped by: pjd
Reviewed by: pjd
r213055:
When trashing metadata, flush after each write.
r213056:
Simplify code a bit by using g_*() API from libgeom.
r213057:
- Make use of g_*() API.
- Flush cache after writing metadata.
r213058:
Because we first write metadata into new place and then trash old place we
don't want situation where old size is equal to new size, as we will trash
newly written metadata.
r213059:
- Use g_*() API when doing backups.
- fsync() created files.
r213060:
- When trashing metadata, repeat overwrite kern.geom.eli.overwrites times.
- Flush write cache after each write.
r213062:
Define default overwrite count, so that userland can use it.
r213063:
Make the code similar to the code in g_eli_integrity.c.
r213067:
Implement switching of data encryption key every 2^20 blocks.
This ensures the same encryption key won't be used for more than
2^20 blocks (sectors). This will be the default now.
r213070:
Add support for AES-XTS. This will be the default now.
r213071:
Document AES-XTS.
r213072:
Update copyright years.
r213073:
Update copyright years.
r213164:
Ignore errors from BIO_FLUSH. It might confuse users that provider wasn't
really killed. What we really care about are write errors only.
r213165:
Change g_eli_debug to int, so one can turn off any GELI output by setting
kern.geom.eli.debug sysctl to -1.
r213172:
- Add support for loading passphrase from a file (-J and -j options).
This is especially useful for things like installers, where regular
geli prompt can't be used.
- Add support for specifing multiple -K or -k options, so there is no
need to cat all keyfiles and read them from standard input.
Requested by: Kris Moore <kris@pcbsd.org>, thompsa
r214116:
- Add missing comments.
- Make a comment consistent with others.
r214118:
Bring in geli suspend/resume functionality (finally).
Before this change if you wanted to suspend your laptop and be sure that your
encryption keys are safe, you had to stop all processes that use file system
stored on encrypted device, unmount the file system and detach geli provider.
This isn't very handy. If you are a lucky user of a laptop where suspend/resume
actually works with FreeBSD (I'm not!) you most likely want to suspend your
laptop, because you don't want to start everything over again when you turn
your laptop back on.
And this is where geli suspend/resume steps in. When you execute:
# geli suspend -a
geli will wait for all in-flight I/O requests, suspend new I/O requests, remove
all geli sensitive data from the kernel memory (like encryption keys) and will
wait for either 'geli resume' or 'geli detach'.
Now with no keys in memory you can suspend your laptop without stopping any
processes or unmounting any file systems.
When you resume your laptop you have to resume geli devices using 'geli resume'
command. You need to provide your passphrase, etc. again so the keys can be
restored and suspended I/O requests released.
Of course you need to remember that 'geli suspend' won't clear file system
cache and other places where data from your geli-encrypted file system might be
present. But to get rid of those stopping processes and unmounting file system
won't help either - you have to turn your laptop off. Be warned.
Also note, that suspending geli device which contains file system with geli
utility (or anything used by 'geli resume') is not very good idea, as you won't
be able to resume it - when you execute geli(8), the kernel will try to read it
and this read I/O request will be suspended.
r214133:
Fix a bug introduced in r213067 where we use authentication key before
initializing it.
r214163:
Free opencrypto sessions on suspend, as they also might keep encryption keys.
r214225:
Move sc_akeyctx and sc_ivctx initialization to the g_eli_mkey_propagate()
function which eliminates code duplication and will ensure proper order
of operation.
r214226:
Encryption keys array might be NULL if device is suspended. Check for this, so
we don't panic when we detach suspended device.
r214227:
Add State tag, so 'geli status' will report active/suspended status, eg:
# geli status
Name Status Components
da0.eli SUSPENDED da0
da1.eli ACTIVE da1
r214228:
Close a race between checking if device is already suspended and suspending it.
r214229:
- Improve error messages, so instead of 'Not fully done', the user will get
information that device is already suspended or that device is using
one-time key and suspend is not supported.
- 'geli suspend -a' silently skips devices that use one-time key, this is fine,
but because we log which device were suspended on the console, log also which
devices were skipped.
r214404:
Use fprintf(stderr) instead of gctl_error() to print a warning about too
big sector size. When gctl error is set gctl_has_param() always returns
'false', which prevents geli(8) from finding some arguments and also masks
an error, which is generates in such case.
bschmidt [Tue, 26 Oct 2010 20:23:29 +0000 (20:23 +0000)]
MFC r214069:
Fix an undefined behaviour if the desired ratectl algo is not available.
This can happen if the algos are built as modules but are not loaded. If
the selected ratectl algo is not available, try to load it (The load
module functions does nothing currently). Add a dummy ratectl algo which
always selects the first available rate. Use that one if the desired algo
is not available.
rrs [Tue, 26 Oct 2010 19:08:26 +0000 (19:08 +0000)]
MFC:210599
PR SCTP Bugs. Basically a full sized frame of
PR SCTP FWD-TSN's would not be sent and thus
cause a stalled connection. Also the rwnd
Calculation was also off on the receiver side for
PR-SCTP.
rrs [Tue, 26 Oct 2010 19:06:31 +0000 (19:06 +0000)]
MFC:210494
Make sure that we report chunks if a socket
still exists that were not sent. In either
case carefully remove the data if it does not
get taken by the reporting routines.
rrs [Tue, 26 Oct 2010 19:04:05 +0000 (19:04 +0000)]
MFC:210493
When counting the number of chunks in the
retransmission queue to validate the retran count, we
need to include the chunks in the control send queue
too. Otherwise the count will not match and you will get
the invarient warning if invarients are on.
rrs [Tue, 26 Oct 2010 18:59:36 +0000 (18:59 +0000)]
MFC of 209663
This fixes a crash in SCTP. It was possible to have a
large number of packets queued to a crashing process.
In a specific case you may get 2 ABORT's back (from
say two packets in flight). If the aborts happened to
be processed at the same time its possible to have
one free the association while the other is trying
to report all the outbound packets. When this occured
it could lead to a crash.
rrs [Tue, 26 Oct 2010 18:56:55 +0000 (18:56 +0000)]
MFC of 209644
Log is:
Fix a bug that will cause a panic. Basically
a read-lock is being called to check the vtag-timewait cache.
Then in two cases (where a vtag is bad i.e. in the time-wait
state) the write-unlokc is called NOT the read-unlock. Under
conditions where lots of associations are coming and going
this willc ause the system to panic with invariants on.
bschmidt [Tue, 26 Oct 2010 17:30:34 +0000 (17:30 +0000)]
MFC r213729:
Fix monitor mode which is implemented by doing a firmware scan. This
is a port from stable/6, seems like the code got lost during the
background scan changes in r170530.
nwhitehorn [Tue, 26 Oct 2010 14:59:35 +0000 (14:59 +0000)]
MFC r212360:
On architectures with non-tree-based page tables like PowerPC, every page
in a range must be checked when calling pmap_remove(). Calling
pmap_remove() from vm_pageout_map_deactivate_pages() with the entire range
of the map could result in attempting to demap an extraordinary number
of pages (> 10^15), so iterate through each map entry and unmap each of
them individually.
nwhitehorn [Tue, 26 Oct 2010 14:56:46 +0000 (14:56 +0000)]
MFC r213456:
Handle vector assist traps without a kernel panic, by setting denormalized
values to zero. A correct solution would involve emulating vector
operations on denormalized values, but this has little effect on accuracy
and is much less complicated for now.
davidxu [Tue, 26 Oct 2010 09:25:29 +0000 (09:25 +0000)]
MFC r213241, r213257:
In current code, statically initialized and destroyed object have
same null value, the code can not distinguish between them, to
fix the problem, now a destroyed object is assigned to a non-null
value, and it will be rejected by some pthread functions.
PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP is changed to number 1, so that
adaptive mutex can be statically initialized correctly.
attilio [Tue, 26 Oct 2010 01:20:30 +0000 (01:20 +0000)]
MFC r213272 by emaste:
Previously, the aac driver did not handle enclosure management AIFs,
which were raised during hot-swap events. Now such events trigger cam
rescans.
rmacklem [Mon, 25 Oct 2010 17:05:14 +0000 (17:05 +0000)]
MFC: r213712
Try and make the nfsrv_localunlock() function in the experimental
NFSv4 server more readable. Mostly changes to comments, but a
case of >= is changed to >, since == can never happen. Also, I've
added a couple of KASSERT()s and a slight optimization, since
once the "else if" case happens, subsequent locks in the list can't
have any effect. None of these changes fixes a known bug.
avg [Mon, 25 Oct 2010 07:58:37 +0000 (07:58 +0000)]
stable/8: add options KDB and KDB_TRACE to GENERIC kernels
Now that we have code for printing a stack trace on panic using stack(9)
facility without any debugger backend configured, use this ability
in GENERIC kernels to slightly increase amount of debugging information
available in default installations.
This change should not break anything for those who include GENERIC into
a custom kernel config file and have the above options already enabled.
They should only get a warning about duplicate options.
This commit should not change behavior of GENERIC kernels for panics and
traps with respect to core dumping and automatic reset.
As no debugger backend is configured, enter-to-debugger key combination
should still be ignored.
With this commit the sizes of GENERIC kernels increase by one to two KB.
yongari [Sun, 24 Oct 2010 21:28:58 +0000 (21:28 +0000)]
MFC r213696:
Do not setup interrupt endpoint for axe(4).
It seems axe(4) controllers support interrupt endpoint such that
enabling interrupt endpoint generates about 1000 interrupts/sec.
Controllers transfer 8 bytes data through interrupt endpoint and
the data include link UP/DOWN state as well as some PHY related
information. Previously axe(4) didn't use the transferred data and
didn't even try to read the data. Because axe(4) counts on mii(4)
to detect link state changes there is no need to use interrupt
endpoint here.
This change fixes generation of unnecessary interrupts which was
seen when interface is brought to UP.
yongari [Sun, 24 Oct 2010 21:26:41 +0000 (21:26 +0000)]
MFC r213438:
RX buffer allocation failure is not an input error. Controller
successfully received a frame but we failed to pass it to upper
stack due to lack of resources. So update if_iqdrops counter
instead of updating if_ierrors counter.