dchagin [Sat, 9 Jan 2016 16:44:17 +0000 (16:44 +0000)]
MFC r283441:
Implement epoll family system calls. This is a tiny wrapper
around kqueue() to implement epoll subset of functionality.
The kqueue user data are 32bit on i386 which is not enough for
epoll user data, so we keep user data in the proc emuldata.
Initial patch developed by rdivacky@ in 2007, then extended
by Yuri Victorovich @ r255672 and finished by me
in collaboration with mjg@ and jillies@.
dchagin [Sat, 9 Jan 2016 16:39:15 +0000 (16:39 +0000)]
MFC r283440:
For future use in the Linuxulator:
1. Add a kern_kqueue() counterpart for kqueue() with flags parameter.
2. Be a bit secure. To avoid a double fp lookup add a kern_kevent_fp()
counterpart for kern_kevent() with file pointer parameter instead
of file descriptor an pass the buck to it.
dchagin [Sat, 9 Jan 2016 16:28:40 +0000 (16:28 +0000)]
MFC r283435:
Convert Linux wait options to the FreeBSD.
Check wait options as a Linux do.
Linux always set WEXITED option not a WUNTRACED|WNOHANG
which is a strange bug.
dchagin [Sat, 9 Jan 2016 16:26:39 +0000 (16:26 +0000)]
MFC r283433:
Rewrite linux_recvfrom. To avoid double conversion of sockaddr use
kern_recvit() directly.
And check fromlen parameter before sockaddr copyin and conversion.
dchagin [Sat, 9 Jan 2016 16:25:30 +0000 (16:25 +0000)]
MFC r283432:
Being exported through vdso the note.Linux section used by glibc
to determine the kernel version (this saves one uname call).
Temporarily disable the export of a note.Linux section until I figured
out how to change the kernel version in the note.Linux on the fly.
dchagin [Sat, 9 Jan 2016 16:21:39 +0000 (16:21 +0000)]
MFC r283428:
Change linux faccessat syscall definition to match actual linux one.
The AT_EACCESS and AT_SYMLINK_NOFOLLOW flags are actually implemented
within the glibc wrapper function for faccessat(). If either of these
flags are specified, then the wrapper function employs fstatat() to
determine access permissions.
dchagin [Sat, 9 Jan 2016 16:11:09 +0000 (16:11 +0000)]
MFC r283422:
Refund the proc emuldata struct for future use. For now move flags from
thread emuldata to proc emuldata as it was originally intended.
As we can have both 64 & 32 bit Linuxulator running any eventhandler
can be called twice for us. To prevent this move eventhandlers code
from linux_emul.c to the linux_common.ko module.
dchagin [Sat, 9 Jan 2016 16:08:22 +0000 (16:08 +0000)]
MFC r283421:
Introduce a new module linux_common.ko which is intended for the
following primary purposes:
1. Remove the dependency of linsysfs and linprocfs modules from linux.ko,
which will be architecture specific on amd64.
2. Incorporate into linux_common.ko general code for platforms on which
we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit).
3. Move malloc(9) declaration to linux_common.ko, to enable getting memory
usage statistics properly.
Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c
and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko.
Temporarily remove dtrace garbage from linux_mib.c and linux_util.c
dchagin [Sat, 9 Jan 2016 15:44:38 +0000 (15:44 +0000)]
MFC r283407:
Implement vdso - virtual dynamic shared object. Through vdso Linux
exposes functions from kernel with proper DWARF CFI information so that
it becomes easier to unwind through them.
Using vdso is a mandatory for a thread cancelation && cleanup
on a modern glibc.
dchagin [Sat, 9 Jan 2016 15:23:54 +0000 (15:23 +0000)]
MFC r283391:
To reduce code duplication introduce linux_copyout_rusage() method.
Use it in linux_wait4() system call and move linux_wait4() to the MI path.
While here add a prototype for the static bsd_to_linux_rusage().
dchagin [Sat, 9 Jan 2016 15:17:34 +0000 (15:17 +0000)]
MFC r283384:
pthread_join() caller do futex_wait on child_clear_tid. As a results
of multiple simultaneous calls to pthread_join() specifying the same
target thread are undefined wake up the one thread.
dchagin [Sat, 9 Jan 2016 15:16:13 +0000 (15:16 +0000)]
MFC r283383:
Switch linuxulator to use the native 1:1 threads.
The reasons:
1. Get rid of the stubs/quirks with process dethreading,
process reparent when the process group leader exits and close
to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
managment routines in Linuxulator.
Implementation details:
1. The thread is created via kern_thr_new() in the clone() call with
the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
struct thread and freed in the thread_dtor() hook.
Mandatory holdig of the p_mtx required when referencing emuldata
from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
and Linux tgid is the native pid, with the exception of the first
thread in the process where tid and pid are one and the same.
Ugliness:
In case when the Linux thread is the initial thread in the thread
group thread id is equal to the process id. Glibc depends on this
magic (assert in pthread_getattr_np.c). So for system calls that
take thread id as a parameter we should use the special method
to reference struct thread.
dchagin [Sat, 9 Jan 2016 14:44:41 +0000 (14:44 +0000)]
MFC r283377:
In preparation for switching linuxulator to the use the native 1:1
threads split sys_sched_getparam(), sys_sched_setparam(),
sys_sched_getscheduler(), sys_sched_setscheduler() to their kern_*
counterparts and add targettd parameter to allow specify the target
thread directly by callee.
dchagin [Sat, 9 Jan 2016 14:40:38 +0000 (14:40 +0000)]
MFC r283374:
In preparation for switching linuxulator to the use the native 1:1
threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval().
Add a kern_sched_rr_get_interval() counterpart which takes a targettd
parameter to allow specify target thread directly by callee (new Linuxulator).
Linuxulator temporarily uses first thread in proc.
Move linux_sched_rr_get_interval() to the MI part.
dchagin [Sat, 9 Jan 2016 14:36:44 +0000 (14:36 +0000)]
MFC r283372:
In preparation for switching linuxulator to the use the native 1:1
threads split sys_thr_exit() up into sys_thr_exit() and kern_thr_exit().
Move
Where the second will be used in linux_exit() system call later.
dchagin [Sat, 9 Jan 2016 14:33:10 +0000 (14:33 +0000)]
MFC r283370:
In preparation for switching linuxulator to the use the native 1:1
threads introduce linux_exit() stub instead of sys_exit() call
(which terminates process).
In the new linuxulator exit() system call terminates the calling
thread (not a whole process).
dchagin [Sat, 9 Jan 2016 14:08:10 +0000 (14:08 +0000)]
To facillitate an upcoming Linuxulator merging partially
MFC r275121 (by kib). Only merge the syntax changes from r275121,
PROC_*LOCK() macros still lock the same proc spinlock.
The process spin lock currently has the following distinct uses:
- Threads lifetime cycle, in particular, counting of the threads in
the process, and interlocking with process mutex and thread lock.
The main reason of this is that turnstile locks are after thread
locks, so you e.g. cannot unlock blockable mutex (think process
mutex) while owning thread lock.
- Virtual and profiling itimers, since the timers activation is done
from the clock interrupt context. Replace the p_slock by p_itimmtx
and PROC_ITIMLOCK().
- Profiling code (profil(2)), for similar reason. Replace the p_slock
by p_profmtx and PROC_PROFLOCK().
- Resource usage accounting. Need for the spinlock there is subtle,
my understanding is that spinlock blocks context switching for the
current thread, which prevents td_runtime and similar fields from
changing (updates are done at the mi_switch()). Replace the p_slock
by p_statmtx and PROC_STATLOCK().
gjb [Sat, 9 Jan 2016 00:31:24 +0000 (00:31 +0000)]
MFC r293173, r293231:
r293173:
Fix path to include .OBJDIR to avoid polluting the source
tree during 'make release'.
r293231:
Add a new target to touch the ${.OBJDIR}/release file, which
indicates the 'release' target has run (in order to prevent
subsequent invocations that may clobber original build output).
jpaetzel [Fri, 8 Jan 2016 23:58:32 +0000 (23:58 +0000)]
MFC 293043
Unset the gss kernel state when gssd exits
When gssd exits it leaves the kernel state set by
gssd_syscall(). nfsd sees this and waits endlessly
in an unkillable state for gssd to come back. If you
had acidentally started gssd then stopped it, then
started nfsd you'd be in a bad way until you either
restarted gssd or rebooted the system. This change
fixes that by setting the kernel state to "" when
gssd exits.
bdrewery [Thu, 7 Jan 2016 22:06:05 +0000 (22:06 +0000)]
MFC r291611:
Add NO_INSTALLKERNEL to undo the assumption that the first KERNCONF will be
installed as "kernel". This is relevant for packaging of the kernel when not
wanting a default "kernel.txz".
wollman [Thu, 7 Jan 2016 20:43:45 +0000 (20:43 +0000)]
MFH r292836:
in6_if2idlen: treat bridge(4) interfaces like other Ethernet interfaces
bridge(4) interfaces have an if_type of IFT_BRIDGE, rather than
IFT_ETHER, even though they only support Ethernet-style links. This
caused in6_if2idlen to emit an "unknown link type (209)" warning to
the console every time it was called. Add IFT_BRIDGE to the case
statement in the appropriate place, indicating that it uses the same
IPv6 address format as other Ethernet-like interfaces.
gnn [Thu, 7 Jan 2016 19:52:17 +0000 (19:52 +0000)]
MFC: 292394
Switch the IPsec related statistics to using the built in sysctl
variable set rather than reading from kernel memory.
This also makes the -z (zero) flag work correctly
emaste [Thu, 7 Jan 2016 17:03:26 +0000 (17:03 +0000)]
MFC r291377: vidfont: with vt(4) omit size from vidcontrol -f
When using syscons, vidfont extracts the font size from the filename
passes it to vidcontrol -f. In vt(4) mode the size argument is not
required, and some of the fonts in /usr/share/vt/fonts do not have the
size in the filename, which caused vidfont to fail. Thus, just omit the
size argument in vt(4) mode.
emaste [Thu, 7 Jan 2016 17:00:35 +0000 (17:00 +0000)]
MFC r291691: newvers: Honour SOURCE_DATE_EPOCH for build reproducibility
One reason the kernel does not build reproducibly is that it includes
a timestamp in the version string. SOURCE_DATE_EPOCH provides a standard
method to address this: it should be set to the last modification time
of the source, and build processes use the specified timestamp instead
of the "current" date and time.
This change uses SOURCE_DATE_EPOCH if it is set; how it gets set needs
to be addressed elsewhere.