mirror of
				https://github.com/ossrs/srs.git
				synced 2025-03-09 15:49:59 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			434 lines
		
	
	
	
		
			16 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			434 lines
		
	
	
	
		
			16 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
<HTML>
 | 
						|
<HEAD>
 | 
						|
<TITLE>State Threads Library Programming Notes</TITLE>
 | 
						|
</HEAD>
 | 
						|
<BODY BGCOLOR=#FFFFFF>
 | 
						|
<H2>Programming Notes</H2>
 | 
						|
<P>
 | 
						|
<B>
 | 
						|
<UL>
 | 
						|
<LI><A HREF=#porting>Porting</A></LI>
 | 
						|
<LI><A HREF=#signals>Signals</A></LI>
 | 
						|
<LI><A HREF=#intra>Intra-Process Synchronization</A></LI>
 | 
						|
<LI><A HREF=#inter>Inter-Process Synchronization</A></LI>
 | 
						|
<LI><A HREF=#nonnet>Non-Network I/O</A></LI>
 | 
						|
<LI><A HREF=#timeouts>Timeouts</A></LI>
 | 
						|
</UL>
 | 
						|
</B>
 | 
						|
<P>
 | 
						|
<HR>
 | 
						|
<P>
 | 
						|
<A NAME="porting">
 | 
						|
<H3>Porting</H3>
 | 
						|
The State Threads library uses OS concepts that are available in some
 | 
						|
form on most UNIX platforms, making the library very portable across
 | 
						|
many flavors of UNIX.  However, there are several parts of the library
 | 
						|
that rely on platform-specific features.  Here is the list of such parts:
 | 
						|
<P>
 | 
						|
<UL>
 | 
						|
<LI><I>Thread context initialization</I>: Two ingredients of the
 | 
						|
<TT>jmp_buf</TT>
 | 
						|
data structure (the program counter and the stack pointer) have to be
 | 
						|
manually set in the thread creation routine. The <TT>jmp_buf</TT> data
 | 
						|
structure is defined in the <TT>setjmp.h</TT> header file and differs from
 | 
						|
platform to platform.  Usually the program counter is a structure member
 | 
						|
with <TT>PC</TT> in the name and the stack pointer is a structure member
 | 
						|
with <TT>SP</TT> in the name.  One can also look in the
 | 
						|
<A HREF="http://www.mozilla.org/source.html">Netscape's NSPR library source</A>
 | 
						|
which already has this code for many UNIX-like platforms
 | 
						|
(<TT>mozilla/nsprpub/pr/include/md/*.h</TT> files).
 | 
						|
<P>
 | 
						|
Note that on some BSD-derived platforms <TT>_setjmp(3)/_longjmp(3)</TT>
 | 
						|
calls should be used instead of <TT>setjmp(3)/longjmp(3)</TT> (that is
 | 
						|
the calls that manipulate only the stack and registers and do <I>not</I>
 | 
						|
save and restore the process's signal mask).</LI>
 | 
						|
<P>
 | 
						|
Starting with glibc 2.4 on Linux the opacity of the <TT>jmp_buf</TT> data
 | 
						|
structure is enforced by <TT>setjmp(3)/longjmp(3)</TT> so the
 | 
						|
<TT>jmp_buf</TT> ingredients cannot be accessed directly anymore (unless
 | 
						|
special environmental variable LD_POINTER_GUARD is set before application
 | 
						|
execution). To avoid dependency on custom environment, the State Threads
 | 
						|
library provides <TT>setjmp/longjmp</TT> replacement functions for
 | 
						|
all Intel CPU architectures. Other CPU architectures can also be easily
 | 
						|
supported (the <TT>setjmp/longjmp</TT> source code is widely available for
 | 
						|
many CPU architectures).
 | 
						|
<P>
 | 
						|
<LI><I>High resolution time function</I>: Some platforms (IRIX, Solaris)
 | 
						|
provide a high resolution time function based on the free running hardware
 | 
						|
counter.  This function returns the time counted since some arbitrary
 | 
						|
moment in the past (usually machine power up time).  It is not correlated in
 | 
						|
any way to the time of day, and thus is not subject to resetting,
 | 
						|
drifting, etc.  This type of time is ideal for tasks where cheap, accurate
 | 
						|
interval timing is required.  If such a function is not available on a
 | 
						|
particular platform, the <TT>gettimeofday(3)</TT> function can be used
 | 
						|
(though on some platforms it involves a system call).
 | 
						|
<P>
 | 
						|
<LI><I>The stack growth direction</I>: The library needs to know whether the
 | 
						|
stack grows toward lower (down) or higher (up) memory addresses.
 | 
						|
One can write a simple test program that detects the stack growth direction
 | 
						|
on a particular platform.</LI>
 | 
						|
<P>
 | 
						|
<LI><I>Non-blocking attribute inheritance</I>: On some platforms (e.g. IRIX)
 | 
						|
the socket created as a result of the <TT>accept(2)</TT> call inherits the
 | 
						|
non-blocking attribute of the listening socket. One needs to consult the manual
 | 
						|
pages or write a simple test program to see if this applies to a specific
 | 
						|
platform.</LI>
 | 
						|
<P>
 | 
						|
<LI><I>Anonymous memory mapping</I>: The library allocates memory segments
 | 
						|
for thread stacks by doing anonymous memory mapping (<TT>mmap(2)</TT>). This
 | 
						|
mapping is somewhat different on SVR4 and BSD4.3 derived platforms.
 | 
						|
<P>
 | 
						|
The memory mapping can be avoided altogether by using <TT>malloc(3)</TT> for
 | 
						|
stack allocation.  In this case the <TT>MALLOC_STACK</TT> macro should be
 | 
						|
defined.</LI>
 | 
						|
</UL>
 | 
						|
<P>
 | 
						|
All machine-dependent feature test macros should be defined in the
 | 
						|
<TT>md.h</TT> header file. The assembly code for <TT>setjmp/longjmp</TT>
 | 
						|
replacement functions for all CPU architectures should be placed in
 | 
						|
the <TT>md.S</TT> file.
 | 
						|
<P>
 | 
						|
The current version of the library is ported to:
 | 
						|
<UL>
 | 
						|
  <LI>IRIX 6.x (both 32 and 64 bit)</LI>
 | 
						|
  <LI>Linux (kernel 2.x and glibc 2.x) on x86, Alpha, MIPS and MIPSEL,
 | 
						|
  SPARC, ARM, PowerPC, 68k, HPPA, S390, IA-64, and Opteron (AMD-64)</LI>
 | 
						|
  <LI>Solaris 2.x (SunOS 5.x) on x86, AMD64, SPARC, and SPARC-64</LI>
 | 
						|
  <LI>AIX 4.x</LI>
 | 
						|
  <LI>HP-UX 11 (both 32 and 64 bit)</LI>
 | 
						|
  <LI>Tru64/OSF1</LI>
 | 
						|
  <LI>FreeBSD on x86, AMD64, and Alpha</LI>
 | 
						|
  <LI>OpenBSD on x86, AMD64, Alpha, and SPARC</LI>
 | 
						|
  <LI>NetBSD on x86, Alpha, SPARC, and VAX</LI>
 | 
						|
  <LI>MacOS X (Darwin) on PowerPC (32 bit) and Intel (both 32 and 64 bit) [universal]</LI>
 | 
						|
  <LI>Cygwin</LI>
 | 
						|
</UL>
 | 
						|
<P>
 | 
						|
 | 
						|
<A NAME="signals">
 | 
						|
<H3>Signals</H3>
 | 
						|
Signal handling in an application using State Threads should be treated the
 | 
						|
same way as in a classical UNIX process application. There is no such
 | 
						|
thing as per-thread signal mask, all threads share the same signal handlers,
 | 
						|
and only asynchronous-safe functions can be used in signal handlers.
 | 
						|
However, there is a way to process signals synchronously by converting a
 | 
						|
signal event to an I/O event: a signal catching function does a write to
 | 
						|
a pipe which will be processed synchronously by a dedicated signal handling
 | 
						|
thread.  The following code demonstrates this technique (error handling is
 | 
						|
omitted for clarity):
 | 
						|
<PRE>
 | 
						|
 | 
						|
/* Per-process pipe which is used as a signal queue. */
 | 
						|
/* Up to PIPE_BUF/sizeof(int) signals can be queued up. */
 | 
						|
int sig_pipe[2];
 | 
						|
 | 
						|
/* Signal catching function. */
 | 
						|
/* Converts signal event to I/O event. */
 | 
						|
void sig_catcher(int signo)
 | 
						|
{
 | 
						|
  int err;
 | 
						|
 | 
						|
  /* Save errno to restore it after the write() */
 | 
						|
  err = errno;
 | 
						|
  /* write() is reentrant/async-safe */
 | 
						|
  write(sig_pipe[1], &signo, sizeof(int));
 | 
						|
  errno = err;
 | 
						|
}
 | 
						|
 | 
						|
/* Signal processing function. */
 | 
						|
/* This is the "main" function of the signal processing thread. */
 | 
						|
void *sig_process(void *arg)
 | 
						|
{
 | 
						|
  st_netfd_t nfd;
 | 
						|
  int signo;
 | 
						|
 | 
						|
  nfd = st_netfd_open(sig_pipe[0]);
 | 
						|
 | 
						|
  for ( ; ; ) {
 | 
						|
    /* Read the next signal from the pipe */
 | 
						|
    st_read(nfd, &signo, sizeof(int), ST_UTIME_NO_TIMEOUT);
 | 
						|
 | 
						|
    /* Process signal synchronously */
 | 
						|
    switch (signo) {
 | 
						|
    case SIGHUP:
 | 
						|
      /* do something here - reread config files, etc. */
 | 
						|
      break;
 | 
						|
    case SIGTERM:
 | 
						|
      /* do something here - cleanup, etc. */
 | 
						|
      break;
 | 
						|
      /*      .
 | 
						|
              .
 | 
						|
         Other signals
 | 
						|
              .
 | 
						|
              .
 | 
						|
      */
 | 
						|
    }
 | 
						|
  }
 | 
						|
 | 
						|
  return NULL;
 | 
						|
}
 | 
						|
 | 
						|
int main(int argc, char *argv[])
 | 
						|
{
 | 
						|
  struct sigaction sa;
 | 
						|
        .
 | 
						|
        .
 | 
						|
        .
 | 
						|
 | 
						|
  /* Create signal pipe */
 | 
						|
  pipe(sig_pipe);
 | 
						|
 | 
						|
  /* Create signal processing thread */
 | 
						|
  st_thread_create(sig_process, NULL, 0, 0);
 | 
						|
 | 
						|
  /* Install sig_catcher() as a signal handler */
 | 
						|
  sa.sa_handler = sig_catcher;
 | 
						|
  sigemptyset(&sa.sa_mask);
 | 
						|
  sa.sa_flags = 0;
 | 
						|
  sigaction(SIGHUP, &sa, NULL);
 | 
						|
 | 
						|
  sa.sa_handler = sig_catcher;
 | 
						|
  sigemptyset(&sa.sa_mask);
 | 
						|
  sa.sa_flags = 0;
 | 
						|
  sigaction(SIGTERM, &sa, NULL);
 | 
						|
 | 
						|
        .
 | 
						|
        .
 | 
						|
        .
 | 
						|
      
 | 
						|
}
 | 
						|
 | 
						|
</PRE>
 | 
						|
<P>
 | 
						|
Note that if multiple processes are used (see below), the signal pipe should
 | 
						|
be initialized after the <TT>fork(2)</TT> call so that each process has its
 | 
						|
own private pipe.
 | 
						|
<P>
 | 
						|
 | 
						|
<A NAME="intra">
 | 
						|
<H3>Intra-Process Synchronization</H3>
 | 
						|
Due to the event-driven nature of the library scheduler, the thread context
 | 
						|
switch (process state change) can only happen in a well-known set of
 | 
						|
library functions.  This set includes functions in which a thread may
 | 
						|
"block":<TT>  </TT>I/O functions (<TT>st_read(), st_write(), </TT>etc.),
 | 
						|
sleep functions (<TT>st_sleep(), </TT>etc.), and thread synchronization
 | 
						|
functions (<TT>st_thread_join(), st_cond_wait(), </TT>etc.).  As a result,
 | 
						|
process-specific global data need not to be protected by locks since a thread
 | 
						|
cannot be rescheduled while in a critical section (and only one thread at a
 | 
						|
time can access the same memory location).  By the same token,
 | 
						|
non thread-safe functions (in a traditional sense) can be safely used with
 | 
						|
the State Threads.  The library's mutex facilities are practically useless
 | 
						|
for a correctly written application (no blocking functions in critical
 | 
						|
section) and are provided mostly for completeness.  This absence of locking
 | 
						|
greatly simplifies an application design and provides a foundation for
 | 
						|
scalability.
 | 
						|
<P>
 | 
						|
 | 
						|
<A NAME="inter">
 | 
						|
<H3>Inter-Process Synchronization</H3>
 | 
						|
The State Threads library makes it possible to multiplex a large number
 | 
						|
of simultaneous connections onto a much smaller number of separate 
 | 
						|
processes, where each process uses a many-to-one user-level threading
 | 
						|
implementation (<B>N</B> of <B>M:1</B> mappings rather than one <B>M:N</B>
 | 
						|
mapping used in native threading libraries on some platforms). This design
 | 
						|
is key to the application's scalability.  One can think about it as if a
 | 
						|
set of all threads is partitioned into separate groups (processes) where
 | 
						|
each group has a separate pool of resources (virtual address space, file
 | 
						|
descriptors, etc.).  An application designer has full control of how many
 | 
						|
groups (processes) an application creates and what resources, if any,
 | 
						|
are shared among different groups via standard UNIX inter-process
 | 
						|
communication (IPC) facilities.<P>
 | 
						|
There are several reasons for creating multiple processes:
 | 
						|
<P>
 | 
						|
<UL>
 | 
						|
<LI>To take advantage of multiple hardware entities (CPUs, disks, etc.)
 | 
						|
available in the system (hardware parallelism).</LI>
 | 
						|
<P>
 | 
						|
<LI>To reduce risk of losing a large number of user connections when one of
 | 
						|
the processes crashes. For example, if <B>C</B> user connections (threads)
 | 
						|
are multiplexed onto <B>P</B> processes and one of the processes crashes,
 | 
						|
only a fraction (<B>C/P</B>) of all connections will be lost.</LI>
 | 
						|
<P>
 | 
						|
<LI>To overcome per-process resource limitations imposed by the OS.  For
 | 
						|
example, if <TT>select(2)</TT> is used for event polling, the number of
 | 
						|
simultaneous connections (threads) per process is
 | 
						|
limited by the <TT>FD_SETSIZE</TT> parameter (see <TT>select(2)</TT>).
 | 
						|
If <TT>FD_SETSIZE</TT> is equal to 1024 and each connection needs one file
 | 
						|
descriptor, then an application should create 10 processes to support 10,000
 | 
						|
simultaneous connections.</LI>
 | 
						|
</UL>
 | 
						|
<P>
 | 
						|
Ideally all user sessions are completely independent, so there is no need for
 | 
						|
inter-process communication.  It is always better to have several separate
 | 
						|
smaller process-specific resources (e.g., data caches) than to have one large
 | 
						|
resource shared (and modified) by all processes.  Sometimes, however, there
 | 
						|
is a need to share a common resource among different processes.  In that case,
 | 
						|
standard UNIX IPC facilities can be used.  In addition to that, there is a way
 | 
						|
to synchronize different processes so that only the thread accessing the
 | 
						|
shared resource will be suspended (but not the entire process) if that resource
 | 
						|
is unavailable.  In the following code fragment a pipe is used as a counting
 | 
						|
semaphore for inter-process synchronization:
 | 
						|
<PRE>
 | 
						|
#ifndef PIPE_BUF
 | 
						|
#define PIPE_BUF 512  /* POSIX */
 | 
						|
#endif
 | 
						|
 | 
						|
/* Semaphore data structure */
 | 
						|
typedef struct ipc_sem {
 | 
						|
  st_netfd_t rdfd;  /* read descriptor */
 | 
						|
  st_netfd_t wrfd;  /* write descriptor */
 | 
						|
} ipc_sem_t;
 | 
						|
 | 
						|
/* Create and initialize the semaphore. Should be called before fork(2). */
 | 
						|
/* 'value' must be less than PIPE_BUF. */
 | 
						|
/* If 'value' is 1, the semaphore works as mutex. */
 | 
						|
ipc_sem_t *ipc_sem_create(int value)
 | 
						|
{
 | 
						|
  ipc_sem_t *sem;
 | 
						|
  int p[2];
 | 
						|
  char b[PIPE_BUF];
 | 
						|
 | 
						|
  /* Error checking is omitted for clarity */
 | 
						|
  sem = malloc(sizeof(ipc_sem_t));
 | 
						|
 | 
						|
  /* Create the pipe */
 | 
						|
  pipe(p);
 | 
						|
  sem->rdfd = st_netfd_open(p[0]);
 | 
						|
  sem->wrfd = st_netfd_open(p[1]);
 | 
						|
 | 
						|
  /* Initialize the semaphore: put 'value' bytes into the pipe */
 | 
						|
  write(p[1], b, value);
 | 
						|
 | 
						|
  return sem;
 | 
						|
}
 | 
						|
 | 
						|
/* Try to decrement the "value" of the semaphore. */
 | 
						|
/* If "value" is 0, the calling thread blocks on the semaphore. */
 | 
						|
int ipc_sem_wait(ipc_sem_t *sem)
 | 
						|
{
 | 
						|
  char c;
 | 
						|
 | 
						|
  /* Read one byte from the pipe */
 | 
						|
  if (st_read(sem->rdfd, &c, 1, ST_UTIME_NO_TIMEOUT) != 1)
 | 
						|
    return -1;
 | 
						|
 | 
						|
  return 0;
 | 
						|
}
 | 
						|
 | 
						|
/* Increment the "value" of the semaphore. */
 | 
						|
int ipc_sem_post(ipc_sem_t *sem)
 | 
						|
{
 | 
						|
  char c;
 | 
						|
 | 
						|
  if (st_write(sem->wrfd, &c, 1, ST_UTIME_NO_TIMEOUT) != 1)
 | 
						|
    return -1;
 | 
						|
 | 
						|
  return 0;
 | 
						|
}
 | 
						|
 | 
						|
</PRE>
 | 
						|
<P>
 | 
						|
 | 
						|
Generally, the following steps should be followed when writing an application
 | 
						|
using the State Threads library:
 | 
						|
<P>
 | 
						|
<OL>
 | 
						|
<LI>Initialize the library (<TT>st_init()</TT>).</LI>
 | 
						|
<P>
 | 
						|
<LI>Create resources that will be shared among different processes:
 | 
						|
    create and bind listening sockets, create shared memory segments, IPC
 | 
						|
    channels, synchronization primitives, etc.</LI>
 | 
						|
<P>
 | 
						|
<LI>Create several processes (<TT>fork(2)</TT>). The parent process should
 | 
						|
    either exit or become a "watchdog" (e.g., it starts a new process when
 | 
						|
    an existing one crashes, does a cleanup upon application termination,
 | 
						|
    etc.).</LI>
 | 
						|
<P>
 | 
						|
<LI>In each child process create a pool of threads
 | 
						|
    (<TT>st_thread_create()</TT>) to handle user connections.</LI>
 | 
						|
</OL>
 | 
						|
<P>
 | 
						|
 | 
						|
<A NAME="nonnet">
 | 
						|
<H3>Non-Network I/O</H3>
 | 
						|
 | 
						|
The State Threads architecture uses non-blocking I/O on
 | 
						|
<TT>st_netfd_t</TT> objects for concurrent processing of multiple user
 | 
						|
connections.  This architecture has a drawback:  the entire process and
 | 
						|
all its threads may block for the duration of a <I>disk</I> or other
 | 
						|
non-network I/O operation, whether through State Threads I/O functions,
 | 
						|
direct system calls, or standard I/O functions.  (This is applicable
 | 
						|
mostly to disk <I>reads</I>; disk <I>writes</I> are usually performed
 | 
						|
asynchronously -- data goes to the buffer cache to be written to disk
 | 
						|
later.)  Fortunately, disk I/O (unlike network I/O) usually takes a
 | 
						|
finite and predictable amount of time, but this may not be true for
 | 
						|
special devices or user input devices (including stdin).  Nevertheless,
 | 
						|
such I/O reduces throughput of the system and increases response times.
 | 
						|
There are several ways to design an application to overcome this
 | 
						|
drawback:
 | 
						|
 | 
						|
<P>
 | 
						|
<UL>
 | 
						|
<LI>Create several identical main processes as described above (symmetric
 | 
						|
    architecture).  This will improve CPU utilization and thus improve the
 | 
						|
    overall throughput of the system.</LI>
 | 
						|
<P>
 | 
						|
<LI>Create multiple "helper" processes in addition to the main process that
 | 
						|
    will handle blocking I/O operations (asymmetric architecture).
 | 
						|
    This approach was suggested for Web servers in a
 | 
						|
    <A HREF="http://www.cs.rice.edu/~vivek/flash99/">paper</A> by Peter
 | 
						|
    Druschel et al. In this architecture the main process communicates with
 | 
						|
    a helper process via an IPC channel (<TT>pipe(2), socketpair(2)</TT>).
 | 
						|
    The main process instructs a helper to perform the potentially blocking
 | 
						|
    operation.  Once the operation completes, the helper returns a
 | 
						|
    notification via IPC.
 | 
						|
</UL>
 | 
						|
<P>
 | 
						|
 | 
						|
<A NAME="timeouts">
 | 
						|
<H3>Timeouts</H3>
 | 
						|
 | 
						|
The <TT>timeout</TT> parameter to <TT>st_cond_timedwait()</TT> and the
 | 
						|
I/O functions, and the arguments to <TT>st_sleep()</TT> and
 | 
						|
<TT>st_usleep()</TT> specify a maximum time to wait <I>since the last
 | 
						|
context switch</I> not since the beginning of the function call.
 | 
						|
 | 
						|
<P>The State Threads' time resolution is actually the time interval
 | 
						|
between context switches.  That time interval may be large in some
 | 
						|
situations, for example, when a single thread does a lot of work
 | 
						|
continuously.  Note that a steady, uninterrupted stream of network I/O
 | 
						|
qualifies for this description; a context switch occurs only when a
 | 
						|
thread blocks.
 | 
						|
 | 
						|
<P>If a specified I/O timeout is less than the time interval between
 | 
						|
context switches the function may return with a timeout error before
 | 
						|
that amount of time has elapsed since the beginning of the function
 | 
						|
call.  For example, if eight milliseconds have passed since the last
 | 
						|
context switch and an I/O function with a timeout of 10 milliseconds
 | 
						|
blocks, causing a switch, the call may return with a timeout error as
 | 
						|
little as two milliseconds after it was called.  (On Linux,
 | 
						|
<TT>select()</TT>'s timeout is an <I>upper</I> bound on the amount of
 | 
						|
time elapsed before select returns.)  Similarly, if 12 ms have passed
 | 
						|
already, the function may return immediately.
 | 
						|
 | 
						|
<P>In almost all cases I/O timeouts should be used only for detecting a
 | 
						|
broken network connection or for preventing a peer from holding an idle
 | 
						|
connection for too long.  Therefore for most applications realistic I/O
 | 
						|
timeouts should be on the order of seconds.  Furthermore, there's
 | 
						|
probably no point in retrying operations that time out.  Rather than
 | 
						|
retrying simply use a larger timeout in the first place.
 | 
						|
 | 
						|
<P>The largest valid timeout value is platform-dependent and may be
 | 
						|
significantly less than <TT>INT_MAX</TT> seconds for <TT>select()</TT>
 | 
						|
or <TT>INT_MAX</TT> milliseconds for <TT>poll()</TT>.  Generally, you
 | 
						|
should not use timeouts exceeding several hours.  Use
 | 
						|
<tt>ST_UTIME_NO_TIMEOUT</tt> (<tt>-1</tt>) as a special value to
 | 
						|
indicate infinite timeout or indefinite sleep.  Use
 | 
						|
<tt>ST_UTIME_NO_WAIT</tt> (<tt>0</tt>) to indicate no waiting at all.
 | 
						|
 | 
						|
<P>
 | 
						|
<HR>
 | 
						|
<P>
 | 
						|
</BODY>
 | 
						|
</HTML>
 | 
						|
 |