use libco instead of state-thread(st), still have some bug

2025-03-09 15:49:59 +00:00 · 2020-02-16 21:07:54 +08:00 · 2020-02-16 21:07:54 +08:00 · 7c8a35aea9
commit 7c8a35aea9
parent 51d6c367f5
88 changed files with 4836 additions and 19273 deletions
--- a/trunk/3rdparty/st-srs/docs/fig.gif
+++ b/trunk/3rdparty/st-srs/docs/fig.gif
--- a/trunk/3rdparty/st-srs/docs/notes.html
+++ b/trunk/3rdparty/st-srs/docs/notes.html
@ -1,434 +0,0 @@
-<HTML>
-<HEAD>
-<TITLE>State Threads Library Programming Notes</TITLE>
-</HEAD>
-<BODY BGCOLOR=#FFFFFF>
-<H2>Programming Notes</H2>
-<P>
-<B>
-<UL>
-<LI><A HREF=#porting>Porting</A></LI>
-<LI><A HREF=#signals>Signals</A></LI>
-<LI><A HREF=#intra>Intra-Process Synchronization</A></LI>
-<LI><A HREF=#inter>Inter-Process Synchronization</A></LI>
-<LI><A HREF=#nonnet>Non-Network I/O</A></LI>
-<LI><A HREF=#timeouts>Timeouts</A></LI>
-</UL>
-</B>
-<P>
-<HR>
-<P>
-<A NAME="porting">
-<H3>Porting</H3>
-The State Threads library uses OS concepts that are available in some
-form on most UNIX platforms, making the library very portable across
-many flavors of UNIX.  However, there are several parts of the library
-that rely on platform-specific features.  Here is the list of such parts:
-<P>
-<UL>
-<LI><I>Thread context initialization</I>: Two ingredients of the
-<TT>jmp_buf</TT>
-data structure (the program counter and the stack pointer) have to be
-manually set in the thread creation routine. The <TT>jmp_buf</TT> data
-structure is defined in the <TT>setjmp.h</TT> header file and differs from
-platform to platform.  Usually the program counter is a structure member
-with <TT>PC</TT> in the name and the stack pointer is a structure member
-with <TT>SP</TT> in the name.  One can also look in the
-<A HREF="http://www.mozilla.org/source.html">Netscape's NSPR library source</A>
-which already has this code for many UNIX-like platforms
-(<TT>mozilla/nsprpub/pr/include/md/*.h</TT> files).
-<P>
-Note that on some BSD-derived platforms <TT>_setjmp(3)/_longjmp(3)</TT>
-calls should be used instead of <TT>setjmp(3)/longjmp(3)</TT> (that is
-the calls that manipulate only the stack and registers and do <I>not</I>
-save and restore the process's signal mask).</LI>
-<P>
-Starting with glibc 2.4 on Linux the opacity of the <TT>jmp_buf</TT> data
-structure is enforced by <TT>setjmp(3)/longjmp(3)</TT> so the
-<TT>jmp_buf</TT> ingredients cannot be accessed directly anymore (unless
-special environmental variable LD_POINTER_GUARD is set before application
-execution). To avoid dependency on custom environment, the State Threads
-library provides <TT>setjmp/longjmp</TT> replacement functions for
-all Intel CPU architectures. Other CPU architectures can also be easily
-supported (the <TT>setjmp/longjmp</TT> source code is widely available for
-many CPU architectures).
-<P>
-<LI><I>High resolution time function</I>: Some platforms (IRIX, Solaris)
-provide a high resolution time function based on the free running hardware
-counter.  This function returns the time counted since some arbitrary
-moment in the past (usually machine power up time).  It is not correlated in
-any way to the time of day, and thus is not subject to resetting,
-drifting, etc.  This type of time is ideal for tasks where cheap, accurate
-interval timing is required.  If such a function is not available on a
-particular platform, the <TT>gettimeofday(3)</TT> function can be used
-(though on some platforms it involves a system call).
-<P>
-<LI><I>The stack growth direction</I>: The library needs to know whether the
-stack grows toward lower (down) or higher (up) memory addresses.
-One can write a simple test program that detects the stack growth direction
-on a particular platform.</LI>
-<P>
-<LI><I>Non-blocking attribute inheritance</I>: On some platforms (e.g. IRIX)
-the socket created as a result of the <TT>accept(2)</TT> call inherits the
-non-blocking attribute of the listening socket. One needs to consult the manual
-pages or write a simple test program to see if this applies to a specific
-platform.</LI>
-<P>
-<LI><I>Anonymous memory mapping</I>: The library allocates memory segments
-for thread stacks by doing anonymous memory mapping (<TT>mmap(2)</TT>). This
-mapping is somewhat different on SVR4 and BSD4.3 derived platforms.
-<P>
-The memory mapping can be avoided altogether by using <TT>malloc(3)</TT> for
-stack allocation.  In this case the <TT>MALLOC_STACK</TT> macro should be
-defined.</LI>
-</UL>
-<P>
-All machine-dependent feature test macros should be defined in the
-<TT>md.h</TT> header file. The assembly code for <TT>setjmp/longjmp</TT>
-replacement functions for all CPU architectures should be placed in
-the <TT>md.S</TT> file.
-<P>
-The current version of the library is ported to:
-<UL>
-  <LI>IRIX 6.x (both 32 and 64 bit)</LI>
-  <LI>Linux (kernel 2.x and glibc 2.x) on x86, Alpha, MIPS and MIPSEL,
-  SPARC, ARM, PowerPC, 68k, HPPA, S390, IA-64, and Opteron (AMD-64)</LI>
-  <LI>Solaris 2.x (SunOS 5.x) on x86, AMD64, SPARC, and SPARC-64</LI>
-  <LI>AIX 4.x</LI>
-  <LI>HP-UX 11 (both 32 and 64 bit)</LI>
-  <LI>Tru64/OSF1</LI>
-  <LI>FreeBSD on x86, AMD64, and Alpha</LI>
-  <LI>OpenBSD on x86, AMD64, Alpha, and SPARC</LI>
-  <LI>NetBSD on x86, Alpha, SPARC, and VAX</LI>
-  <LI>MacOS X (Darwin) on PowerPC (32 bit) and Intel (both 32 and 64 bit) [universal]</LI>
-  <LI>Cygwin</LI>
-</UL>
-<P>
-
-<A NAME="signals">
-<H3>Signals</H3>
-Signal handling in an application using State Threads should be treated the
-same way as in a classical UNIX process application. There is no such
-thing as per-thread signal mask, all threads share the same signal handlers,
-and only asynchronous-safe functions can be used in signal handlers.
-However, there is a way to process signals synchronously by converting a
-signal event to an I/O event: a signal catching function does a write to
-a pipe which will be processed synchronously by a dedicated signal handling
-thread.  The following code demonstrates this technique (error handling is
-omitted for clarity):
-<PRE>
-
-/* Per-process pipe which is used as a signal queue. */
-/* Up to PIPE_BUF/sizeof(int) signals can be queued up. */
-int sig_pipe[2];
-
-/* Signal catching function. */
-/* Converts signal event to I/O event. */
-void sig_catcher(int signo)
-{
-  int err;
-
-  /* Save errno to restore it after the write() */
-  err = errno;
-  /* write() is reentrant/async-safe */
-  write(sig_pipe[1], &signo, sizeof(int));
-  errno = err;
-}
-
-/* Signal processing function. */
-/* This is the "main" function of the signal processing thread. */
-void *sig_process(void *arg)
-{
-  st_netfd_t nfd;
-  int signo;
-
-  nfd = st_netfd_open(sig_pipe[0]);
-
-  for ( ; ; ) {
-    /* Read the next signal from the pipe */
-    st_read(nfd, &signo, sizeof(int), ST_UTIME_NO_TIMEOUT);
-
-    /* Process signal synchronously */
-    switch (signo) {
-    case SIGHUP:
-      /* do something here - reread config files, etc. */
-      break;
-    case SIGTERM:
-      /* do something here - cleanup, etc. */
-      break;
-      /*      .
-              .
-         Other signals
-              .
-              .
-      */
-    }
-  }
-
-  return NULL;
-}
-
-int main(int argc, char *argv[])
-{
-  struct sigaction sa;
-        .
-        .
-        .
-
-  /* Create signal pipe */
-  pipe(sig_pipe);
-
-  /* Create signal processing thread */
-  st_thread_create(sig_process, NULL, 0, 0);
-
-  /* Install sig_catcher() as a signal handler */
-  sa.sa_handler = sig_catcher;
-  sigemptyset(&sa.sa_mask);
-  sa.sa_flags = 0;
-  sigaction(SIGHUP, &sa, NULL);
-
-  sa.sa_handler = sig_catcher;
-  sigemptyset(&sa.sa_mask);
-  sa.sa_flags = 0;
-  sigaction(SIGTERM, &sa, NULL);
-
-        .
-        .
-        .
-      
-}
-
-</PRE>
-<P>
-Note that if multiple processes are used (see below), the signal pipe should
-be initialized after the <TT>fork(2)</TT> call so that each process has its
-own private pipe.
-<P>
-
-<A NAME="intra">
-<H3>Intra-Process Synchronization</H3>
-Due to the event-driven nature of the library scheduler, the thread context
-switch (process state change) can only happen in a well-known set of
-library functions.  This set includes functions in which a thread may
-"block":<TT>  </TT>I/O functions (<TT>st_read(), st_write(), </TT>etc.),
-sleep functions (<TT>st_sleep(), </TT>etc.), and thread synchronization
-functions (<TT>st_thread_join(), st_cond_wait(), </TT>etc.).  As a result,
-process-specific global data need not to be protected by locks since a thread
-cannot be rescheduled while in a critical section (and only one thread at a
-time can access the same memory location).  By the same token,
-non thread-safe functions (in a traditional sense) can be safely used with
-the State Threads.  The library's mutex facilities are practically useless
-for a correctly written application (no blocking functions in critical
-section) and are provided mostly for completeness.  This absence of locking
-greatly simplifies an application design and provides a foundation for
-scalability.
-<P>
-
-<A NAME="inter">
-<H3>Inter-Process Synchronization</H3>
-The State Threads library makes it possible to multiplex a large number
-of simultaneous connections onto a much smaller number of separate 
-processes, where each process uses a many-to-one user-level threading
-implementation (<B>N</B> of <B>M:1</B> mappings rather than one <B>M:N</B>
-mapping used in native threading libraries on some platforms). This design
-is key to the application's scalability.  One can think about it as if a
-set of all threads is partitioned into separate groups (processes) where
-each group has a separate pool of resources (virtual address space, file
-descriptors, etc.).  An application designer has full control of how many
-groups (processes) an application creates and what resources, if any,
-are shared among different groups via standard UNIX inter-process
-communication (IPC) facilities.<P>
-There are several reasons for creating multiple processes:
-<P>
-<UL>
-<LI>To take advantage of multiple hardware entities (CPUs, disks, etc.)
-available in the system (hardware parallelism).</LI>
-<P>
-<LI>To reduce risk of losing a large number of user connections when one of
-the processes crashes. For example, if <B>C</B> user connections (threads)
-are multiplexed onto <B>P</B> processes and one of the processes crashes,
-only a fraction (<B>C/P</B>) of all connections will be lost.</LI>
-<P>
-<LI>To overcome per-process resource limitations imposed by the OS.  For
-example, if <TT>select(2)</TT> is used for event polling, the number of
-simultaneous connections (threads) per process is
-limited by the <TT>FD_SETSIZE</TT> parameter (see <TT>select(2)</TT>).
-If <TT>FD_SETSIZE</TT> is equal to 1024 and each connection needs one file
-descriptor, then an application should create 10 processes to support 10,000
-simultaneous connections.</LI>
-</UL>
-<P>
-Ideally all user sessions are completely independent, so there is no need for
-inter-process communication.  It is always better to have several separate
-smaller process-specific resources (e.g., data caches) than to have one large
-resource shared (and modified) by all processes.  Sometimes, however, there
-is a need to share a common resource among different processes.  In that case,
-standard UNIX IPC facilities can be used.  In addition to that, there is a way
-to synchronize different processes so that only the thread accessing the
-shared resource will be suspended (but not the entire process) if that resource
-is unavailable.  In the following code fragment a pipe is used as a counting
-semaphore for inter-process synchronization:
-<PRE>
-#ifndef PIPE_BUF
-#define PIPE_BUF 512  /* POSIX */
-#endif
-
-/* Semaphore data structure */
-typedef struct ipc_sem {
-  st_netfd_t rdfd;  /* read descriptor */
-  st_netfd_t wrfd;  /* write descriptor */
-} ipc_sem_t;
-
-/* Create and initialize the semaphore. Should be called before fork(2). */
-/* 'value' must be less than PIPE_BUF. */
-/* If 'value' is 1, the semaphore works as mutex. */
-ipc_sem_t *ipc_sem_create(int value)
-{
-  ipc_sem_t *sem;
-  int p[2];
-  char b[PIPE_BUF];
-
-  /* Error checking is omitted for clarity */
-  sem = malloc(sizeof(ipc_sem_t));
-
-  /* Create the pipe */
-  pipe(p);
-  sem->rdfd = st_netfd_open(p[0]);
-  sem->wrfd = st_netfd_open(p[1]);
-
-  /* Initialize the semaphore: put 'value' bytes into the pipe */
-  write(p[1], b, value);
-
-  return sem;
-}
-
-/* Try to decrement the "value" of the semaphore. */
-/* If "value" is 0, the calling thread blocks on the semaphore. */
-int ipc_sem_wait(ipc_sem_t *sem)
-{
-  char c;
-
-  /* Read one byte from the pipe */
-  if (st_read(sem->rdfd, &c, 1, ST_UTIME_NO_TIMEOUT) != 1)
-    return -1;
-
-  return 0;
-}
-
-/* Increment the "value" of the semaphore. */
-int ipc_sem_post(ipc_sem_t *sem)
-{
-  char c;
-
-  if (st_write(sem->wrfd, &c, 1, ST_UTIME_NO_TIMEOUT) != 1)
-    return -1;
-
-  return 0;
-}
-
-</PRE>
-<P>
-
-Generally, the following steps should be followed when writing an application
-using the State Threads library:
-<P>
-<OL>
-<LI>Initialize the library (<TT>st_init()</TT>).</LI>
-<P>
-<LI>Create resources that will be shared among different processes:
-    create and bind listening sockets, create shared memory segments, IPC
-    channels, synchronization primitives, etc.</LI>
-<P>
-<LI>Create several processes (<TT>fork(2)</TT>). The parent process should
-    either exit or become a "watchdog" (e.g., it starts a new process when
-    an existing one crashes, does a cleanup upon application termination,
-    etc.).</LI>
-<P>
-<LI>In each child process create a pool of threads
-    (<TT>st_thread_create()</TT>) to handle user connections.</LI>
-</OL>
-<P>
-
-<A NAME="nonnet">
-<H3>Non-Network I/O</H3>
-
-The State Threads architecture uses non-blocking I/O on
-<TT>st_netfd_t</TT> objects for concurrent processing of multiple user
-connections.  This architecture has a drawback:  the entire process and
-all its threads may block for the duration of a <I>disk</I> or other
-non-network I/O operation, whether through State Threads I/O functions,
-direct system calls, or standard I/O functions.  (This is applicable
-mostly to disk <I>reads</I>; disk <I>writes</I> are usually performed
-asynchronously -- data goes to the buffer cache to be written to disk
-later.)  Fortunately, disk I/O (unlike network I/O) usually takes a
-finite and predictable amount of time, but this may not be true for
-special devices or user input devices (including stdin).  Nevertheless,
-such I/O reduces throughput of the system and increases response times.
-There are several ways to design an application to overcome this
-drawback:
-
-<P>
-<UL>
-<LI>Create several identical main processes as described above (symmetric
-    architecture).  This will improve CPU utilization and thus improve the
-    overall throughput of the system.</LI>
-<P>
-<LI>Create multiple "helper" processes in addition to the main process that
-    will handle blocking I/O operations (asymmetric architecture).
-    This approach was suggested for Web servers in a
-    <A HREF="http://www.cs.rice.edu/~vivek/flash99/">paper</A> by Peter
-    Druschel et al. In this architecture the main process communicates with
-    a helper process via an IPC channel (<TT>pipe(2), socketpair(2)</TT>).
-    The main process instructs a helper to perform the potentially blocking
-    operation.  Once the operation completes, the helper returns a
-    notification via IPC.
-</UL>
-<P>
-
-<A NAME="timeouts">
-<H3>Timeouts</H3>
-
-The <TT>timeout</TT> parameter to <TT>st_cond_timedwait()</TT> and the
-I/O functions, and the arguments to <TT>st_sleep()</TT> and
-<TT>st_usleep()</TT> specify a maximum time to wait <I>since the last
-context switch</I> not since the beginning of the function call.
-
-<P>The State Threads' time resolution is actually the time interval
-between context switches.  That time interval may be large in some
-situations, for example, when a single thread does a lot of work
-continuously.  Note that a steady, uninterrupted stream of network I/O
-qualifies for this description; a context switch occurs only when a
-thread blocks.
-
-<P>If a specified I/O timeout is less than the time interval between
-context switches the function may return with a timeout error before
-that amount of time has elapsed since the beginning of the function
-call.  For example, if eight milliseconds have passed since the last
-context switch and an I/O function with a timeout of 10 milliseconds
-blocks, causing a switch, the call may return with a timeout error as
-little as two milliseconds after it was called.  (On Linux,
-<TT>select()</TT>'s timeout is an <I>upper</I> bound on the amount of
-time elapsed before select returns.)  Similarly, if 12 ms have passed
-already, the function may return immediately.
-
-<P>In almost all cases I/O timeouts should be used only for detecting a
-broken network connection or for preventing a peer from holding an idle
-connection for too long.  Therefore for most applications realistic I/O
-timeouts should be on the order of seconds.  Furthermore, there's
-probably no point in retrying operations that time out.  Rather than
-retrying simply use a larger timeout in the first place.
-
-<P>The largest valid timeout value is platform-dependent and may be
-significantly less than <TT>INT_MAX</TT> seconds for <TT>select()</TT>
-or <TT>INT_MAX</TT> milliseconds for <TT>poll()</TT>.  Generally, you
-should not use timeouts exceeding several hours.  Use
-<tt>ST_UTIME_NO_TIMEOUT</tt> (<tt>-1</tt>) as a special value to
-indicate infinite timeout or indefinite sleep.  Use
-<tt>ST_UTIME_NO_WAIT</tt> (<tt>0</tt>) to indicate no waiting at all.
-
-<P>
-<HR>
-<P>
-</BODY>
-</HTML>
-
--- a/trunk/3rdparty/st-srs/docs/reference.html
+++ b/trunk/3rdparty/st-srs/docs/reference.html
--- a/trunk/3rdparty/st-srs/docs/st.html
+++ b/trunk/3rdparty/st-srs/docs/st.html
@ -1,504 +0,0 @@
-<HTML>
-<HEAD>
-<TITLE>State Threads for Internet Applications</TITLE>
-</HEAD>
-<BODY BGCOLOR=#FFFFFF>
-<H2>State Threads for Internet Applications</H2>
-<H3>Introduction</H3>
-<P>
-State Threads is an application library which provides a
-foundation for writing fast and highly scalable Internet Applications
-on UNIX-like platforms.  It combines the simplicity of the multithreaded 
-programming paradigm, in which one thread supports each simultaneous 
-connection, with the performance and scalability of an event-driven 
-state machine architecture.</P>
-
-<H3>1. Definitions</H3>
-<P>
-<A NAME="IA">
-<H4>1.1 Internet Applications</H4>
-</A>
-<P>
-An <I>Internet Application</I> (IA) is either a server or client network
-application that accepts connections from clients and may or may not 
-connect to servers.  In an IA the arrival or departure of network data
-often controls processing (that is, IA is a <I>data-driven</I> application).
-For each connection, an IA does some finite amount of work 
-involving data exchange with its peer, where its peer may be either 
-a client or a server.
-The typical transaction steps of an IA are to accept a connection,
-read a request, do some finite and predictable amount of work to 
-process the request, then write a response to the peer that sent the 
-request.  One example of an IA is a Web server; 
-the most general example of an IA is a proxy server, because it both 
-accepts connections from clients and connects to other servers.</P>
-<P>
-We assume that the performance of an IA is constrained by available CPU
-cycles rather than network bandwidth or disk I/O (that is, CPU
-is a bottleneck resource).
-<P>
-
-<A NAME="PS">
-<H4>1.2 Performance and Scalability</H4>
-</A>
-<P>
-The <I>performance</I> of an IA is usually evaluated as its
-throughput measured in transactions per second or bytes per second (one
-can be converted to the other, given the average transaction size).  There are
-several benchmarks that can be used to measure throughput of Web serving
-applications for specific workloads (such as 
-<A HREF="http://www.spec.org/osg/web96/">SPECweb96</A>,
-<A HREF="http://www.mindcraft.com/webstone/">WebStone</A>,
-<A HREF="http://www.zdnet.com/zdbop/webbench/">WebBench</A>).
-Although there is no common definition for <I>scalability</I>, in general it
-expresses the ability of an application to sustain its performance when some
-external condition changes.  For IAs this external condition is either the
-number of clients (also known as "users," "simultaneous connections," or "load
-generators") or the underlying hardware system size (number of CPUs, memory
-size, and so on).  Thus there are two types of scalability: <I>load
-scalability</I> and <I>system scalability</I>, respectively.
-<P>
-The figure below shows how the throughput of an idealized IA changes with
-the increasing number of clients (solid blue line).  Initially the throughput
-grows linearly (the slope represents the maximal throughput that one client
-can provide). Within this initial range, the IA is underutilized and CPUs are
-partially idle.  Further increase in the number of clients leads to a system
-saturation, and the throughput gradually stops growing as all CPUs become fully
-utilized.  After that point, the throughput stays flat because there are no
-more CPU cycles available.
-In the real world, however, each simultaneous connection
-consumes some computational and memory resources, even when idle, and this
-overhead grows with the number of clients.  Therefore, the throughput of the
-real world IA starts dropping after some point (dashed blue line in the figure
-below).  The rate at which the throughput drops depends, among other things, on
-application design.
-<P>
-We say that an application has a good <I>load scalability</I> if it can
-sustain its throughput over a wide range of loads.
-Interestingly, the <A HREF="http://www.spec.org/osg/web99/">SPECweb99</A>
-benchmark somewhat reflects the Web server's load scalability because it
-measures the number of clients (load generators) given a mandatory minimal
-throughput per client (that is, it measures the server's <I>capacity</I>).
-This is unlike <A HREF="http://www.spec.org/osg/web96/">SPECweb96</A> and
-other benchmarks that use the throughput as their main metric (see the figure
-below).
-<P>
-<CENTER><IMG SRC="fig.gif" ALT="Figure: Throughput vs. Number of clients">
-</CENTER>
-<P>
-<I>System scalability</I> is the ability of an application to sustain its
-performance per hardware unit (such as a CPU) with the increasing number of
-these units.  In other words, good system scalability means that doubling the
-number of processors will roughly double the application's throughput (dashed
-green line).  We assume here that the underlying operating system also scales
-well.  Good system scalability allows you to initially run an application on 
-the smallest system possible, while retaining the ability to move that
-application to a larger system if necessary, without excessive effort or
-expense.  That is, an application need not be rewritten or even undergo a
-major porting effort when changing system size.
-<P>
-Although scalability and performance are more important in the case of server
-IAs, they should also be considered for some client applications (such as 
-benchmark load generators).
-<P>
-
-<A NAME="CONC">
-<H4>1.3 Concurrency</H4>
-</A>
-<P>
-Concurrency reflects the parallelism in a system.  The two unrelated types 
-are <I>virtual</I> concurrency and <I>real</I> concurrency.
-<UL>
-<LI>Virtual (or apparent) concurrency is the number of simultaneous
-connections that a system supports.
-<BR><BR>
-<LI>Real concurrency is the number of hardware devices, including
-CPUs, network cards, and disks, that actually allow a system to perform 
-tasks in parallel.
-</UL>
-<P>
-An IA must provide virtual concurrency in order to serve many users
-simultaneously.
-To achieve maximum performance and scalability in doing so, the number of
-programming entities than an IA creates to be scheduled by the OS kernel
-should be
-kept close to (within an order of magnitude of) the real concurrency found on
-the system. These programming entities scheduled by the kernel are known as
-<I>kernel execution vehicles</I>. Examples of kernel execution vehicles
-include Solaris lightweight processes and IRIX kernel threads.
-In other words, the number of kernel execution vehicles should be dictated by
-the system size and not by the number of simultaneous connections.
-<P>
-
-<H3>2. Existing Architectures</H3>
-<P>
-There are a few different architectures that are commonly used by IAs. 
-These include the <I>Multi-Process</I>, 
-<I>Multi-Threaded</I>, and <I>Event-Driven State Machine</I> 
-architectures.
-<P>
-<A NAME="MP">
-<H4>2.1 Multi-Process Architecture</H4>
-</A>
-<P>
-In the Multi-Process (MP) architecture, an individual process is 
-dedicated to each simultaneous connection.
-A process performs all of a transaction's initialization steps 
-and services a connection completely before moving on to service 
-a new connection.
-<P>
-User sessions in IAs are relatively independent; therefore, no 
-synchronization between processes handling different connections is
-necessary.  Because each process has its own private address space,
-this architecture is very robust. If a process serving one of the connections
-crashes, the other sessions will not be affected.  However, to serve many
-concurrent connections, an equal number of processes must be employed.
-Because processes are kernel entities (and are in fact the heaviest ones), 
-the number of kernel entities will be at least as large as the number of 
-concurrent sessions. On most systems, good performance will not be achieved 
-when more than a few hundred processes are created because of the high 
-context-switching overhead. In other words, MP applications have poor load 
-scalability.
-<P>
-On the other hand, MP applications have very good system scalability, because
-no resources are shared among different processes and there is no
-synchronization overhead.
-<P>
-The Apache Web Server 1.x (<A HREF=#refs1>[Reference 1]</A>) uses the MP 
-architecture on UNIX systems.
-<P>
-<A NAME="MT">
-<H4>2.2 Multi-Threaded Architecture</H4>
-</A>
-<P>
-In the Multi-Threaded (MT) architecture, multiple independent threads 
-of control are employed within a single shared address space.  Like a 
-process in the MP architecture, each thread performs all of a
-transaction's initialization steps and services a connection completely
-before moving on to service a new connection.
-<P>
-Many modern UNIX operating systems implement a <I>many-to-few</I> model when 
-mapping user-level threads to kernel entities.  In this model, an 
-arbitrarily large number of user-level threads is multiplexed onto a 
-lesser number of kernel execution vehicles.  Kernel execution 
-vehicles are also known as <I>virtual processors</I>.  Whenever a user-level
-thread makes a blocking system call, the kernel execution vehicle it is using
-will become blocked in the kernel.  If there are no other non-blocked kernel
-execution vehicles and there are other runnable user-level threads, a new
-kernel execution vehicle will be created automatically.  This prevents the
-application from blocking when it can continue to make useful forward
-progress.
-<P>
-Because IAs are by nature network I/O driven, all concurrent sessions block on
-network I/O at various points.  As a result, the number of virtual processors
-created in the kernel grows close to the number of user-level threads
-(or simultaneous connections).  When this occurs, the many-to-few model
-effectively degenerates to a <I>one-to-one</I> model.  Again, like in
-the MP architecture, the number of kernel execution vehicles is dictated by
-the number of simultaneous connections rather than by number of CPUs.  This
-reduces an application's load scalability.  However, because kernel threads
-(lightweight processes) use fewer resources and are more light-weight than
-traditional UNIX processes, an MT application should scale better with load
-than an MP application.
-<P>
-Unexpectedly, the small number of virtual processors sharing the same address
-space in the MT architecture destroys an application's system scalability
-because of contention among the threads on various locks.  Even if an
-application itself is carefully
-optimized to avoid lock contention around its own global data (a non-trivial
-task), there are still standard library functions and system calls
-that use common resources hidden from the application.  For example,
-on many platforms thread safety of memory allocation routines
-(<TT>malloc(3)</TT>, <TT>free(3)</TT>, and so on) is achieved by using a single
-global lock.  Another example is a per-process file descriptor table.
-This common resource table is shared by all kernel execution vehicles within
-the same process and must be protected when one modifies it via
-certain system calls (such as <TT>open(2)</TT>, <TT>close(2)</TT>, and so on).
-In addition to that, maintaining the caches coherent
-among CPUs on multiprocessor systems hurts performance when different threads
-running on different CPUs modify data items on the same cache line.
-<P>
-In order to improve load scalability, some applications employ a different
-type of MT architecture:  they create one or more thread(s) <I>per task</I>
-rather than one thread <I>per connection</I>.  For example, one small group
-of threads may be responsible for accepting client connections, another 
-for request processing, and yet another for serving responses.  The main
-advantage of this architecture is that it eliminates the tight coupling
-between the number of threads and number of simultaneous connections. However,
-in this architecture, different task-specific thread groups must share common
-work queues that must be protected by mutual exclusion locks (a typical
-producer-consumer problem).  This adds synchronization overhead that causes an
-application to perform badly on multiprocessor systems.  In other words, in
-this architecture, the application's system scalability is sacrificed for the
-sake of load scalability.
-<P>
-Of course, the usual nightmares of threaded programming, including data
-corruption, deadlocks, and race conditions, also make MT architecture (in any
-form) non-simplistic to use.
-<P>
-
-<A NAME="EDSM">
-<H4>2.3 Event-Driven State Machine Architecture</H4>
-</A>
-<P>
-In the Event-Driven State Machine (EDSM) architecture, a single process
-is employed to concurrently process multiple connections. The basics of this
-architecture are described in Comer and Stevens
-<A HREF=#refs2>[Reference 2]</A>.
-The EDSM architecture performs one basic data-driven step associated with
-a particular connection at a time, thus multiplexing many concurrent
-connections.  The process operates as a state machine that receives an event
-and then reacts to it.
-<P>
-In the idle state the EDSM calls <TT>select(2)</TT> or <TT>poll(2)</TT> to
-wait for network I/O events.  When a particular file descriptor is ready for
-I/O, the EDSM completes the corresponding basic step (usually by invoking a
-handler function) and starts the next one.  This architecture uses
-non-blocking system calls to perform asynchronous network I/O operations.
-For more details on non-blocking I/O see Stevens
-<A HREF=#refs3>[Reference 3]</A>.
-<P>
-To take advantage of hardware parallelism (real concurrency), multiple
-identical processes may be created.  This is called Symmetric Multi-Process
-EDSM and is used, for example, in the Zeus Web Server
-(<A HREF=#refs4>[Reference 4]</A>).  To more efficiently multiplex disk I/O,
-special "helper" processes may be created.  This is called Asymmetric
-Multi-Process EDSM and was proposed for Web servers by Druschel
-and others <A HREF=#refs5>[Reference 5]</A>.
-<P>
-EDSM is probably the most scalable architecture for IAs.
-Because the number of simultaneous connections (virtual concurrency) is
-completely decoupled from the number of kernel execution vehicles (processes),
-this architecture has very good load scalability.  It requires only minimal 
-user-level resources to create and maintain additional connection.
-<P>
-Like MP applications, Multi-Process EDSM has very good system scalability
-because no resources are shared among different processes and there is no
-synchronization overhead.
-<P>
-Unfortunately, the EDSM architecture is monolithic rather than based on the
-concept of threads, so new applications generally need to be implemented from
-the ground up.  In effect, the EDSM architecture simulates threads and their
-stacks the hard way.
-<P>
-
-<A NAME="ST">
-<H3>3. State Threads Library</H3>
-</A>
-<P>
-The State Threads library combines the advantages of all of the above
-architectures.  The interface preserves the programming simplicity of thread
-abstraction, allowing each simultaneous connection to be treated as a separate
-thread of execution within a single process. The underlying implementation is
-close to the EDSM architecture as the state of each particular concurrent
-session is saved in a separate memory segment.
-<P>
-
-<H4>3.1 State Changes and Scheduling</H4>
-<P>
-The state of each concurrent session includes its stack environment 
-(stack pointer, program counter, CPU registers) and its stack.  Conceptually, 
-a thread context switch can be viewed as a process changing its state.  There 
-are no kernel entities involved other than processes.  
-Unlike other general-purpose threading libraries, the State Threads library
-is fully deterministic.  The thread context switch (process state change) can
-only happen in a well-known set of functions (at I/O points or at explicit
-synchronization points).  As a result, process-specific global data does not
-have to be protected by mutual exclusion locks in most cases.  The entire
-application is free to use all the static variables and non-reentrant library
-functions it wants, greatly simplifying programming and debugging while
-increasing performance.  This is somewhat similar to a <I>co-routine</I> model
-(co-operatively multitasked threads), except that no explicit yield is needed
--
-sooner or later, a thread performs a blocking I/O operation and thus surrenders
-control.  All threads of execution (simultaneous connections) have the
-same priority, so scheduling is non-preemptive, like in the EDSM architecture.
-Because IAs are data-driven (processing is limited by the size of network 
-buffers and data arrival rates), scheduling is non-time-slicing.
-<P>
-Only two types of external events are handled by the library's
-scheduler, because only these events can be detected by
-<TT>select(2)</TT> or <TT>poll(2)</TT>: I/O events (a file descriptor is ready
-for I/O) and time events
-(some timeout has expired).  However, other types of events (such as
-a signal sent to a process) can also be handled by converting them to I/O
-events.  For example, a signal handling function can perform a write to a pipe
-(<TT>write(2)</TT> is reentrant/asynchronous-safe), thus converting a signal
-event to an I/O event.
-<P>
-To take advantage of hardware parallelism, as in the EDSM architecture,
-multiple processes can be created in either a symmetric or asymmetric manner.
-Process management is not in the library's scope but instead is left up to the
-application.
-<P>
-There are several general-purpose threading libraries that implement a
-<I>many-to-one</I> model (many user-level threads to one kernel execution
-vehicle), using the same basic techniques as the State Threads library 
-(non-blocking I/O, event-driven scheduler, and so on).  For an example, see GNU
-Portable Threads (<A HREF=#refs6>[Reference 6]</A>).  Because they are
-general-purpose, these libraries have different objectives than the State 
-Threads library.  The State Threads library is <I>not</I> a general-purpose
-threading library,
-but rather an application library that targets only certain types of
-applications (IAs) in order to achieve the highest possible performance and
-scalability for those applications.
-<P>
-
-<H4>3.2 Scalability</H4>
-<P>
-State threads are very lightweight user-level entities, and therefore creating
-and maintaining user connections requires minimal resources.  An application
-using the State Threads library scales very well with the increasing number
-of connections.
-<P>
-On multiprocessor systems an application should create multiple processes
-to take advantage of hardware parallelism.  Using multiple separate processes
-is the <I>only</I> way to achieve the highest possible system scalability.
-This is because duplicating per-process resources is the only way to avoid
-significant synchronization overhead on multiprocessor systems.  Creating
-separate UNIX processes naturally offers resource duplication.  Again,
-as in the EDSM architecture, there is no connection between the number of
-simultaneous connections (which may be very large and changes within a wide
-range) and the number of kernel entities (which is usually small and constant).
-In other words, the State Threads library makes it possible to multiplex a
-large number of simultaneous connections onto a much smaller number of
-separate processes, thus allowing an application to scale well with both
-the load and system size.
-<P>
-
-<H4>3.3 Performance</H4>
-<P>
-Performance is one of the library's main objectives.  The State Threads
-library is implemented to minimize the number of system calls and 
-to make thread creation and context switching as fast as possible.
-For example, per-thread signal mask does not exist (unlike
-POSIX threads), so there is no need to save and restore a process's
-signal mask on every thread context switch. This eliminates two system
-calls per context switch.  Signal events can be handled much more
-efficiently by converting them to I/O events (see above).
-<P>
-
-<H4>3.4 Portability</H4>
-<P>
-The library uses the same general, underlying concepts as the EDSM 
-architecture, including non-blocking I/O, file descriptors, and 
-I/O multiplexing.  These concepts are available in some form on most 
-UNIX platforms, making the library very portable across many 
-flavors of UNIX.  There are only a few platform-dependent sections in the
-source.
-<P>
-
-<H4>3.5 State Threads and NSPR</H4>
-<P>
-The State Threads library is a derivative of the Netscape Portable 
-Runtime library (NSPR) <A HREF=#refs7>[Reference 7]</A>. The primary goal of 
-NSPR is to provide a platform-independent layer for system facilities, 
-where system facilities include threads, thread synchronization, and I/O.
-Performance and scalability are not the main concern of NSPR.  The 
-State Threads library addresses performance and scalability while 
-remaining much smaller than NSPR.  It is contained in 8 source files 
-as opposed to more than 400, but provides all the functionality that 
-is needed to write efficient IAs on UNIX-like platforms.
-<P>
-
-<TABLE CELLPADDING=3>
-<TR>
-<TD></TD>
-<TH>NSPR</TH>
-<TH>State Threads</TH>
-</TR>
-<TR>
-<TD><B>Lines of code</B></TD>
-<TD ALIGN=RIGHT>~150,000</TD>
-<TD ALIGN=RIGHT>~3000</TD>
-</TR>
-<TR>
-<TD><B>Dynamic library size&nbsp;&nbsp;<BR>(debug version)</B></TD>
-<TD></TD>
-<TD></TD>
-</TR>
-<TR>
-<TD>IRIX</TD>
-<TD ALIGN=RIGHT>~700 KB</TD>
-<TD ALIGN=RIGHT>~60 KB</TD>
-</TR>
-<TR>
-<TD>Linux</TD>
-<TD ALIGN=RIGHT>~900 KB</TD>
-<TD ALIGN=RIGHT>~70 KB</TD>
-</TR>
-</TABLE>
-<P>
-
-<H3>Conclusion</H3>
-<P>
-State Threads is an application library which provides a foundation for
-writing <A HREF=#IA>Internet Applications</A>.  To summarize, it has the
-following <I>advantages</I>:
-<P>
-<UL>
-<LI>It allows the design of fast and highly scalable applications.  An
-application will scale well with both load and number of CPUs.
-<P>
-<LI>It greatly simplifies application programming and debugging because, as a
-rule, no mutual exclusion locking is necessary and the entire application is
-free to use static variables and non-reentrant library functions.
-</UL>
-<P>
-The library's main <I>limitation</I>:
-<P>
-<UL>
-<LI>All I/O operations on sockets must use the State Thread library's I/O
-functions because only those functions perform thread scheduling and prevent
-the application's processes from blocking.
-</UL>
-<P>
-
-<H3>References</H3>
-<OL>
-<A NAME="refs1">
-<LI> Apache Software Foundation,
-<A HREF="http://www.apache.org">http://www.apache.org</A>.
-<A NAME="refs2">
-<LI> Douglas E. Comer, David L. Stevens, <I>Internetworking With TCP/IP,
-Vol. III: Client-Server Programming And Applications</I>, Second Edition,
-Ch. 8, 12.
-<A NAME="refs3">
-<LI> W. Richard Stevens, <I>UNIX Network Programming</I>, Second Edition,
-Vol. 1, Ch. 15.
-<A NAME="refs4">
-<LI> Zeus Technology Limited,
-<A HREF="http://www.zeus.co.uk/">http://www.zeus.co.uk</A>.
-<A NAME="refs5">
-<LI> Peter Druschel, Vivek S. Pai, Willy Zwaenepoel,
-<A HREF="http://www.cs.rice.edu/~druschel/usenix99flash.ps.gz">
-Flash: An Efficient and Portable Web Server</A>. In <I>Proceedings of the
-USENIX 1999 Annual Technical Conference</I>, Monterey, CA, June 1999.
-<A NAME="refs6">
-<LI> GNU Portable Threads,
-<A HREF="http://www.gnu.org/software/pth/">http://www.gnu.org/software/pth/</A>.
-<A NAME="refs7">
-<LI> Netscape Portable Runtime,
-<A HREF="http://www.mozilla.org/docs/refList/refNSPR/">http://www.mozilla.org/docs/refList/refNSPR/</A>.
-</OL>
-
-<H3>Other resources covering various architectural issues in IAs</H3>
-<OL START=8>
-<LI> Dan Kegel, <I>The C10K problem</I>,
-<A HREF="http://www.kegel.com/c10k.html">http://www.kegel.com/c10k.html</A>.
-</LI>
-<LI> James C. Hu, Douglas C. Schmidt, Irfan Pyarali, <I>JAWS: Understanding
-High Performance Web Systems</I>,
-<A HREF="http://www.cs.wustl.edu/~jxh/research/research.html">http://www.cs.wustl.edu/~jxh/research/research.html</A>.</LI>
-</OL>
-<P>
-<HR>
-<P>
-
-<CENTER><FONT SIZE=-1>Portions created by SGI are Copyright &copy; 2000
-Silicon Graphics, Inc.  All rights reserved.</FONT></CENTER>
-<P>
-
-</BODY>
-</HTML>
-
--- a/trunk/3rdparty/st-srs/docs/timeout_heap.txt
+++ b/trunk/3rdparty/st-srs/docs/timeout_heap.txt
@ -1,60 +0,0 @@
-How the timeout heap works
-
-As of version 1.5, the State Threads Library represents the queue of
-sleeping threads using a heap data structure rather than a sorted
-linked list.  This improves performance when there is a large number
-of sleeping threads, since insertion into a heap takes O(log N) time
-while insertion into a sorted list takes O(N) time.  For example, in
-one test 1000 threads were created, each thread called st_usleep()
-with a random time interval, and then all the threads where
-immediately interrupted and joined before the sleeps had a chance to
-finish.  The whole process was repeated 1000 times, for a total of a
-million sleep queue insertions and removals.  With the old list-based
-sleep queue, this test took 100 seconds; now it takes only 12 seconds.
-
-Heap data structures are typically based on dynamically resized
-arrays.  However, since the existing ST code base was very nicely
-structured around linking the thread objects into pointer-based lists
-without the need for any auxiliary data structures, implementing the
-heap using a similar nodes-and-pointers based approach seemed more
-appropriate for ST than introducing a separate array.
-
-Thus, the new ST timeout heap works by organizing the existing
-_st_thread_t objects in a balanced binary tree, just as they were
-previously organized into a doubly-linked, sorted list.  The global
-_ST_SLEEPQ variable, formerly a linked list head, is now simply a
-pointer to the root of this tree, and the root node of the tree is the
-thread with the earliest timeout.  Each thread object has two child
-pointers, "left" and "right", pointing to threads with later timeouts.
-
-Each node in the tree is numbered with an integer index, corresponding
-to the array index in an array-based heap, and the tree is kept fully
-balanced and left-adjusted at all times.  In other words, the tree
-consists of any number of fully populated top levels, followed by a
-single bottom level which may be partially populated, such that any
-existing nodes form a contiguous block to the left and the spaces for
-missing nodes form a contiguous block to the right.  For example, if
-there are nine threads waiting for a timeout, they are numbered and
-arranged in a tree exactly as follows:
-
-              1
-           /     \
-          2       3
-         / \     / \
-        4   5   6   7
-       / \
-      8   9
-
-Each node has either no children, only a left child, or both a left
-and a right child.  Children always time out later than their parents
-(this is called the "heap invariant"), but when a node has two
-children, their mutual order is unspecified - the left child may time
-out before or after the right child.  If a node is numbered N, its
-left child is numbered 2N, and its right child is numbered 2N+1.
-
-There is no pointer from a child to its parent; all pointers point
-downward.  Additions and deletions both work by starting at the root
-and traversing the tree towards the leaves, going left or right
-according to the binary digits forming the index of the destination
-node.  As nodes are added or deleted, existing nodes are rearranged to
-maintain the heap invariant.