avg [Sun, 3 Feb 2013 18:42:20 +0000 (18:42 +0000)]
zfs: fix, improve and re-organize page_lookup and page_unlock
Now they are split into two pairs: page_hold/page_unhold for mappedread
and page_busy/page_unbusy for update_pages.
For mappedread we simply hold a page that is to be used as a source if it
is resident and valid (and not busy). This is sufficient since we are
only doing page -> user buffer copying. There is no page <-> backing
storage I/O involved.
update_pages is now better split to properly handle the putpages case
(page -> arc) and the regular write case (arc -> page).
For the latter we use complete protocol of marking an object with
paging-in-progress and marking a page with io_start (busy count).
Also, in this case we remove the write bit from all page mappings and
clear dirty bits of the pages, the former is needed to ensure that the
latter does the right thing.
Additionally we update a page if it is cached instead of just freeing it
as was done before. This needs to be verified.
A minor detail: ZFS-backed pages should always be either fully valid
or fully invalid. Assert this and use simpler API that does not deal
with sub-page blocks.
mckusick [Sun, 3 Feb 2013 17:16:32 +0000 (17:16 +0000)]
For UFS2 i_blocks is unsigned. The current "sanity" check that it
has gone below zero after the blocks in its inode are freed is a
no-op which the compiler fails to warn about because of the use of
the DIP macro. Change the sanity check to compare the number of
blocks being freed against the value i_blocks. If the number of
blocks being freed exceeds i_blocks, just set i_blocks to zero.
Reported by: Pedro Giffuni (pfg@)
MFC after: 2 weeks
jilles [Sun, 3 Feb 2013 15:54:57 +0000 (15:54 +0000)]
sh: Expand here documents in the current process.
Expand here documents at the same point other redirections are expanded but
use a non-fork subshell environment (like simple command substitutions) for
compatibility. Substitition errors result in an empty here document like
before.
As a result, a fork is avoided for short (<4K) expanded here documents.
Unexpanded here documents (with quoted end marker after <<) are not affected
by this change. They already only forked when >4K.
Side effects:
* Order of expansion is slightly different.
* Slow expansions are not executed in parallel with the redirected command.
* A non-fork subshell environment is subtly different from a forked process.
hrs [Sun, 3 Feb 2013 10:26:24 +0000 (10:26 +0000)]
- Add CHECKSUM.* support in Makefile[1].
- Use ln -fs to create a symlink.
- Remove pkgadd for docports.
- Use WITHOUT_JADETEX=yes instead of WITH_JADETEX=no.
- Add {WORLD,KERNEL}_FLAGS to [BTWK]MAKE.
- Use makefs(8) and gpart(8) for sparc64 ISO image[2].
- Add publisher option to makefs(8)[2].
Based on work by: gjb[1]
Discussed with: marius, nwhitehorn[2]
avg [Sun, 3 Feb 2013 09:57:39 +0000 (09:57 +0000)]
allow for large KTR_ENTRIES values by allocating ktr_buf using malloc(9)
Only during very early boot, before malloc(9) is functional (SI_SUB_KMEM),
the static ktr_buf_init is used. Size of the static buffer is determined
by a new kernel option KTR_BOOT_ENTRIES. Its default value is 1024.
kientzle [Sun, 3 Feb 2013 01:08:01 +0000 (01:08 +0000)]
Another overhaul of the CPSW driver for BeagleBone
Major changes:
* Finally tracked down the flow control setting that
seems to have been causing TX stalls and watchdog timeouts
* RX and TX paths now share a lot more code
* TX interrupt is no longer used; we instead GC finished
tx queue entries at the bottom of the start routine.
* TX start now queues fragmented packets directly; it only
invokes defrag() for occasional very fragmented packets.
* "sysctl dev.cpsw" dumps controller statistics and queue counts
* Host Error Interrupt will give extensive debugging information
if the controller chokes on the queued data.
jhibbits [Sun, 3 Feb 2013 00:19:34 +0000 (00:19 +0000)]
Fix the PowerPC DTrace copy functions. The kernel doesn't hold the same view to
the user map, so use the md copy in/out functions provided by the kernel.
dim [Sat, 2 Feb 2013 22:28:29 +0000 (22:28 +0000)]
Pull in r170135 from upstream clang trunk:
Dont use/link ARCMT, StaticAnalyzer and Rewriter to clang when the user
specifies not to. Dont build ASTMatchers with Rewriter disabled and
StaticAnalyzer when it's disabled.
Without all those three, the clang binary shrinks (x86_64) from ~36MB
to ~32MB (unstripped).
To disable these clang components, and get a smaller clang binary built
and installed, set WITHOUT_CLANG_FULL in src.conf(5). During the
initial stages of buildworld, those extra components are already
disabled automatically, to save some build time.
pfg [Sat, 2 Feb 2013 22:23:45 +0000 (22:23 +0000)]
ext2fs: general cleanup.
- Remove unused extern declarations in fs.h
- Correct comments in ext2_dir.h
- Several panic() messages showed wrong function names.
- Remove commented out stray line in ext2_alloc.c.
- Remove the unused macro EXT2_BLOCK_SIZE_BITS() and the then
write-only member e2fs_blocksize_bits from struct m_ext2fs.
- Remove the unused macro EXT2_FIRST_INO() and the then write-only
member e2fs_first_inode from struct m_ext2fs.
- Remove EXT2_DESC_PER_BLOCK() and the member e2fs_descpb from
struct m_ext2fs.
- Remove the unused members e2fs_bmask, e2fs_dbpg and
e2fs_mount_opt from struct m_ext2fs
- Correct harmless off-by-one error for fspath in ext2_vfsops.c.
- Remove the unused and broken macros EXT2_ADDR_PER_BLOCK_BITS()
and EXT2_DESC_PER_BLOCK_BITS().
- Remove the !_KERNEL versions of the EXT2_* macros.
marius [Sat, 2 Feb 2013 21:57:06 +0000 (21:57 +0000)]
Improve r238673 to additionally allow for odd-aligned buffers as
passed in by smartd of smartmontools.
While at it, hint the compiler that 32-bit PIO is the most likely
case (idea from Linux) and use bus_{read,write}_stream_2(9) instead
of bus_{read,write}_multi_stream_2(9) for single count reads/writes.
avg [Sat, 2 Feb 2013 11:58:35 +0000 (11:58 +0000)]
print compiler version in the kernel banner
And provide kernel compiler version as a sysctl as well.
This is useful while we have gcc and clang cohabitation.
This could be even more useful when we have support
for external toolchains.
avg [Sat, 2 Feb 2013 11:38:26 +0000 (11:38 +0000)]
uart: add resume method and enable it for attachments on the most common
x86 buses
Otherwise the uart hardware could be in such a state after the resume
where IER is cleared and thus no interrupts are generated.
This behavior is observed and tested with QEMU, so I am comitting this
change to help with my debugging.
There has been no feedback from users of serial ports on real hardware.
kib [Fri, 1 Feb 2013 18:30:41 +0000 (18:30 +0000)]
The MSDOSFSMNT_WAITONFAT flag is bogus and broken. It does less than
track the MNT_SYNCHRONOUS flag. It is set to the latter at mount time
but not updated by MNT_UPDATE.
Use MNT_SYNCHRONOUS to decide to write the FAT updates syncrhonously.
kib [Fri, 1 Feb 2013 18:25:53 +0000 (18:25 +0000)]
Backup FATs were sometimes marked dirty by copying their first block
from the primary FAT, and then they were not marked clean on unmount.
Force marking them clean when appropriate.
kib [Fri, 1 Feb 2013 18:06:06 +0000 (18:06 +0000)]
The directory entry for dotdot was corrupted in the FAT32 case when moving
a directory to a subdir of the root directory from somewhere else.
For all directory moves that change the parent directory, the dotdot
entry must be fixed up. For msdosfs, the root directory is magic for
non-FAT32. It is less magic for FAT32, but needs the same magic for
the dotdot fixup. It didn't have it.
Both chkdsk and fsck_msdosfs fix the corrupt directory entries with no
problems.
The fix is to use the same magic for dotdot in msdosfs_rename() as in
msdosfs_mkdir().
For msdosfs_mkdir(), document the magic. When writing the dotdot entry
in mkdir, use explicitly set pcl variable instead on relying on the
start cluster of the root directory typically has a value < 65536.
kib [Fri, 1 Feb 2013 18:01:03 +0000 (18:01 +0000)]
The mountmsdosfs() function had an insane sanity test, remove it.
Trying FAT32 on a small partition failed to mount because
pmp->pm_Sectors was nonzero. Normally, FAT32 file systems are so
large that the 16-bit pm_Sectors can't hold the size. This is
indicated by setting it to 0 and using only pm_HugeSectors. But at
least old versions of newfs_msdos use the 16-bit field if possible,
and msdosfs supports this except for breaking its own support in the
sanity check. This is quite different from the handling of pm_FATsecs
-- now the 16-bit value is always ignored for FAT32 except for
checking that it is 0, and newfs_msdos doesn't use the 16-bit value
for FAT32.
neel [Fri, 1 Feb 2013 16:58:59 +0000 (16:58 +0000)]
Add support for MSI-X interrupts in the virtio block device and make that
the default.
The current behavior of advertising a single MSI vector can be requested by
setting the environment variable "BHYVE_USE_MSI" to "yes". The use of MSI
is not compliant with the virtio specification and will be eventually phased
out.
kib [Fri, 1 Feb 2013 16:57:02 +0000 (16:57 +0000)]
Assert that the mbuf in the chain has sane length. Proper place for
this check is somewhere in the network code, but this assertion
already proven to be useful in catching what seems to be driver bugs
causing NFS scrambling random memory.
kib [Fri, 1 Feb 2013 16:48:55 +0000 (16:48 +0000)]
The change to reduce default smp_tsc_shift caused tsc shift to become
zero on slower machines, which make the fenced get_timecount methods
not used despite needed. Remove the (shift > 0) condition when
selecting the get_timecount() implementation.
Rename smp_tsc_shift to tsc_shift, and apply it for the UP case too.
Allow shift to reach value of 31 instead of 30, as it was previously
(should be nop).
Reorganize the tc quality calculation to remove the conditionally
compiled block. Rename test_smp_tsc() to test_tsc() and provide
separate versions for SMP and UP builds. The check for virtialized
hardware is more natural to perform in the smp version of the
test_tsc(), since it is only done for smp case.
Noted and reviewed by: bde (previous version)
MFC after: 12 days
gahr [Fri, 1 Feb 2013 13:04:06 +0000 (13:04 +0000)]
- Fix more style(9)-related issues (copyright header, spaces after function
names, unnecessary casts)
- Change type of boolean variable from char to bool
andre [Fri, 1 Feb 2013 10:26:31 +0000 (10:26 +0000)]
Add VM_KMEM_SIZE_SCALE parameter set to 2 (50%) for all ARM platforms.
VM_KMEM_SIZE_SCALE specifies which fraction of the available physical
memory, after deduction of the kernel itself and other early statically
allocated memory, can be used for the kmem_map. The kmem_map provides
for all UMA/malloc allocations in KVM space.
Previously ARM was using a fixed kmem_map size of (12*1024*1024) = 12MB
without regard to effectively available memory. This is too small for
recent ARM SoC with more than 128MB of RAM.
For reference a description of others related kmem_map parameters:
VM_KMEM_SIZE default start size of kmem_map if SCALE is
not defined
VM_KMEM_SIZE_MIN hard floor on the kmem_map size
VM_KMEM_SIZE_MAX hard ceiling on the kmem_map size
VM_KMEM_SIZE_SCALE fraction of the available real memory to
be used for the kmem_map, limited by the
MIN and MAX parameters.
neel [Fri, 1 Feb 2013 06:40:53 +0000 (06:40 +0000)]
Delete the "blackhole" driver - it is not needed anymore.
The "blackhole" driver was used in conjunction with bhyve to sequester
pci devices intended for passthru until vmm.ko was loaded. This was
useful at one point because vmm.ko could not be loaded at boot time.
The same functionality can now be achieved by loading vmm.ko via the
loader along with the kernel.
neel [Fri, 1 Feb 2013 03:49:09 +0000 (03:49 +0000)]
Fix a broken assumption in the passthru implementation that the MSI-X table
can only be located at the beginning or the end of the BAR.
If the MSI-table is located in the middle of a BAR then we will split the
BAR into two and create two mappings - one before the table and one after
the table - leaving a hole in place of the table so accesses to it can be
trapped and emulated.
neel [Fri, 1 Feb 2013 02:41:47 +0000 (02:41 +0000)]
Fix a bug in the passthru implementation where it would assume that all
devices are MSI-X capable. This in turn would lead it to treat bar 0 as
the MSI-X table bar even if the underlying device did not support MSI-X.
Fix this by providing an API to query the MSI-X table index of the emulated
device. If the underlying device does not support MSI-X then this API will
return -1.
neel [Fri, 1 Feb 2013 01:16:26 +0000 (01:16 +0000)]
Increase the number of passthru devices supported by bhyve.
The maximum length of an environment variable puts a limitation on the
number of passthru devices that can be specified via a single variable.
The workaround is to allow user to specify passthru devices via multiple
environment variables instead of a single one.
gahr [Thu, 31 Jan 2013 16:39:50 +0000 (16:39 +0000)]
- Remove underscores from the internal structure name, as it doesn't collide
with the user's namespace.
- Correct size and position variables type from long to size_t.
- Do not set errno to ENOMEM on malloc failure, as malloc already does so.
- Implement the concept of "buffer data length", which mandates what SEEK_END
refers to and the allowed extent for a read.
- Use NULL as read-callback if the buffer is opened in write-only mode.
Conversely, use NULL as write-callback when opened in read-only mode.
- Implement the handling of the ``b'' character in the mode argument. A binary
buffer differs from a text buffer (default mode if ``b'' is omitted) in that
NULL bytes are never appended to writes and that the "buffer data length"
equals to the size of the buffer.
- Remove shall from the man page. Use indicative instead. Also, specify that
the ``b'' flag does not conform with POSIX but is supported by glibc.
- Update the regression test so that the ``b'' functionality and the "buffer
data length" concepts are tested.
hselasky [Thu, 31 Jan 2013 11:00:57 +0000 (11:00 +0000)]
Initial version of libusbboot, a fully stand-alone, single threaded and
functional compilation of the FreeBSD USB stack for use with boot loaders
and such.
glebius [Thu, 31 Jan 2013 08:55:21 +0000 (08:55 +0000)]
Retire struct sockaddr_inarp.
Since ARP and routing are separated, "proxy only" entries
don't have any meaning, thus we don't need additional field
in sockaddr to pass SIN_PROXY flag.
New kernel is binary compatible with old tools, since sizes
of sockaddr_inarp and sockaddr_in match, and sa_family are
filled with same value.
The structure declaration is left for compatibility with
third party software, but in tree code no longer use it.
adrian [Thu, 31 Jan 2013 00:14:25 +0000 (00:14 +0000)]
Work around some rather unfortunate race conditions inside net80211.
Right now, ic_curchan seems to be updated rather quickly (ie, during
the ioctl) and before the driver gets notified of what's going on.
So what I was seeing was:
* NIC was in channel X;
* It generates PHY errors for channel X;
* an ioctl comes along from userland and changes things to channel Y;
* .. this updates ic_curchan, but hasn't yet reset the hardware;
* in parallel, RX is occuring and it looks at ic_curchan;
* .. which is channel Y, so events get stamped with that now.
ian [Wed, 30 Jan 2013 23:37:35 +0000 (23:37 +0000)]
Improve devd startup time, by tweaking some string handling routines that are
heavily used when parsing config files. Mostly these changes avoid making
temporary copies of the strings, and avoid doing byte at a time append
operations, on the most-used code path.
On a 1.2 GHz ARM processor this reduces the time to parse the config files
from 13 to 6 seconds.
dim [Wed, 30 Jan 2013 19:51:16 +0000 (19:51 +0000)]
Fix a problem introduced in r231057: in bsd.own.mk, move the test for
whether clang is enabled to just after the last place where it could
have been forced to "no".
jhb [Wed, 30 Jan 2013 18:24:29 +0000 (18:24 +0000)]
Allow the address and ports to be separated by a colon or period rather
than a space to permit directly pasting the output of commands such as
netstat and sockstat on the command line.
hselasky [Wed, 30 Jan 2013 15:26:04 +0000 (15:26 +0000)]
Modify the FreeBSD USB kernel code so that it can be compiled directly
into the FreeBSD boot loader, typically for non-USB aware BIOSes, EFI systems
or embedded platforms. This is also useful for out of the system compilation
of the FreeBSD USB stack for various purposes. The USB kernel files can
now optionally include a global header file which should include all needed
definitions required to compile the FreeBSD USB stack. When the global USB
header file is included, no other USB header files will be included by
default.
Add new file containing the USB stack configuration for the
FreeBSD loader build.
Replace some __FBSDID()'s by /* $FreeBSD$ */ comments. Now all
USB files follow the same style.
Use cases:
- console in loader via USB
- loading kernel via USB
ian [Wed, 30 Jan 2013 15:21:18 +0000 (15:21 +0000)]
Fix a descriptor leak in devd. Clients reading /var/run/devd.pipe can close
their socket connection any time, and devd only notices that when it gets an
error trying to write an event to the client. On a system with no device
change activity, clients could connect and disappear repeatedly without devd
noticing, leading to an ever-growing list of open socket descriptors in devd.
Now devd uses poll(2) looking for POLLHUP on all existing clients every time
a new client connection is established, and also periodically (once a minute)
to proactively find zombie clients and reap the socket descriptors. It also
now has a connection limit, configurable with a new -l <num> command line arg.
When the maximum number of connections is reached it stops accepting new
connections until some current clients drop off.
gahr [Wed, 30 Jan 2013 14:59:26 +0000 (14:59 +0000)]
Add fmemopen(3), an interface to get a FILE * from a buffer in memory, along
with the respective regression test.
See http://pubs.opengroup.org/onlinepubs/9699919799/functions/fmemopen.html
kib [Wed, 30 Jan 2013 13:14:34 +0000 (13:14 +0000)]
Rework the handling of the children for the pthread_vfork_test. The
trivial handler for SIGCHLD is installed, and SIGCHLD is blocked, to
not abandon our zombies to init(8). This way, the zombies are around
slightly longer, allowing to actually exercise the logic for p_pwait
use by the test.
kib [Wed, 30 Jan 2013 13:14:15 +0000 (13:14 +0000)]
The case of pid == WAIT_MYPGRP for the kern_wait() is already handled
in kern_wait6(), which is called by kern_wait(). Remove the redundand
check, introduced in r243136, and add a comment noting this, to make
the code less confusing.
The blank lines are added to properly delineate the scope of the
preceeding comments.
Noted by: "Jukka A. Ukkonen" <jau@iki.fi>
MFC after: 1 week
kib [Wed, 30 Jan 2013 12:48:16 +0000 (12:48 +0000)]
Rework the __vdso_* symbols attributes to only make the symbols weak,
but use normal references instead of weak. This makes the statically
linked binaries to use fast gettimeofday(2) by forcing the linker to
resolve references and providing the neccessary functions.
kib [Wed, 30 Jan 2013 12:43:10 +0000 (12:43 +0000)]
Reduce default shift used to calculate the max frequency for the TSC
timecounter to 1, and correspondingly increase the precision of the
gettimeofday(2) and related functions in the default configuration.
The motivation for the TSC-low timecounter, as described in the
r222866, seems to provide a workaround for the non-serializing
behaviour of the RDTSC on some Intel hardware. Tests demonstrate that
even with the pre-shift of 8, the cross-core non-monotonicity of the
RDTSC is still observed reliably, e.g. on the Nehalems. The r238755
and r238973 implemented the proper fix for the issue.
The pre-shift of 1 is applied to keep TSC not overflowing for the
frequency of hardclock down to 2 sec/intr. The pre-shift is made a
tunable to allow the easy debugging of the issues users could see with
the shift being too low.
neel [Wed, 30 Jan 2013 04:30:36 +0000 (04:30 +0000)]
Add support for MSI-X interrupts in the virtio network device and make that
the default.
The current behavior of advertising a single MSI vector can be requested by
setting the environment variable "BHYVE_USE_MSI" to "true". The use of MSI
is not compliant with the virtio specification and will be eventually phased
out.