mirror of
				https://github.com/ossrs/srs.git
				synced 2025-03-09 15:49:59 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			434 lines
		
	
	
	
		
			16 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			434 lines
		
	
	
	
		
			16 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <HTML>
 | |
| <HEAD>
 | |
| <TITLE>State Threads Library Programming Notes</TITLE>
 | |
| </HEAD>
 | |
| <BODY BGCOLOR=#FFFFFF>
 | |
| <H2>Programming Notes</H2>
 | |
| <P>
 | |
| <B>
 | |
| <UL>
 | |
| <LI><A HREF=#porting>Porting</A></LI>
 | |
| <LI><A HREF=#signals>Signals</A></LI>
 | |
| <LI><A HREF=#intra>Intra-Process Synchronization</A></LI>
 | |
| <LI><A HREF=#inter>Inter-Process Synchronization</A></LI>
 | |
| <LI><A HREF=#nonnet>Non-Network I/O</A></LI>
 | |
| <LI><A HREF=#timeouts>Timeouts</A></LI>
 | |
| </UL>
 | |
| </B>
 | |
| <P>
 | |
| <HR>
 | |
| <P>
 | |
| <A NAME="porting">
 | |
| <H3>Porting</H3>
 | |
| The State Threads library uses OS concepts that are available in some
 | |
| form on most UNIX platforms, making the library very portable across
 | |
| many flavors of UNIX.  However, there are several parts of the library
 | |
| that rely on platform-specific features.  Here is the list of such parts:
 | |
| <P>
 | |
| <UL>
 | |
| <LI><I>Thread context initialization</I>: Two ingredients of the
 | |
| <TT>jmp_buf</TT>
 | |
| data structure (the program counter and the stack pointer) have to be
 | |
| manually set in the thread creation routine. The <TT>jmp_buf</TT> data
 | |
| structure is defined in the <TT>setjmp.h</TT> header file and differs from
 | |
| platform to platform.  Usually the program counter is a structure member
 | |
| with <TT>PC</TT> in the name and the stack pointer is a structure member
 | |
| with <TT>SP</TT> in the name.  One can also look in the
 | |
| <A HREF="http://www.mozilla.org/source.html">Netscape's NSPR library source</A>
 | |
| which already has this code for many UNIX-like platforms
 | |
| (<TT>mozilla/nsprpub/pr/include/md/*.h</TT> files).
 | |
| <P>
 | |
| Note that on some BSD-derived platforms <TT>_setjmp(3)/_longjmp(3)</TT>
 | |
| calls should be used instead of <TT>setjmp(3)/longjmp(3)</TT> (that is
 | |
| the calls that manipulate only the stack and registers and do <I>not</I>
 | |
| save and restore the process's signal mask).</LI>
 | |
| <P>
 | |
| Starting with glibc 2.4 on Linux the opacity of the <TT>jmp_buf</TT> data
 | |
| structure is enforced by <TT>setjmp(3)/longjmp(3)</TT> so the
 | |
| <TT>jmp_buf</TT> ingredients cannot be accessed directly anymore (unless
 | |
| special environmental variable LD_POINTER_GUARD is set before application
 | |
| execution). To avoid dependency on custom environment, the State Threads
 | |
| library provides <TT>setjmp/longjmp</TT> replacement functions for
 | |
| all Intel CPU architectures. Other CPU architectures can also be easily
 | |
| supported (the <TT>setjmp/longjmp</TT> source code is widely available for
 | |
| many CPU architectures).
 | |
| <P>
 | |
| <LI><I>High resolution time function</I>: Some platforms (IRIX, Solaris)
 | |
| provide a high resolution time function based on the free running hardware
 | |
| counter.  This function returns the time counted since some arbitrary
 | |
| moment in the past (usually machine power up time).  It is not correlated in
 | |
| any way to the time of day, and thus is not subject to resetting,
 | |
| drifting, etc.  This type of time is ideal for tasks where cheap, accurate
 | |
| interval timing is required.  If such a function is not available on a
 | |
| particular platform, the <TT>gettimeofday(3)</TT> function can be used
 | |
| (though on some platforms it involves a system call).
 | |
| <P>
 | |
| <LI><I>The stack growth direction</I>: The library needs to know whether the
 | |
| stack grows toward lower (down) or higher (up) memory addresses.
 | |
| One can write a simple test program that detects the stack growth direction
 | |
| on a particular platform.</LI>
 | |
| <P>
 | |
| <LI><I>Non-blocking attribute inheritance</I>: On some platforms (e.g. IRIX)
 | |
| the socket created as a result of the <TT>accept(2)</TT> call inherits the
 | |
| non-blocking attribute of the listening socket. One needs to consult the manual
 | |
| pages or write a simple test program to see if this applies to a specific
 | |
| platform.</LI>
 | |
| <P>
 | |
| <LI><I>Anonymous memory mapping</I>: The library allocates memory segments
 | |
| for thread stacks by doing anonymous memory mapping (<TT>mmap(2)</TT>). This
 | |
| mapping is somewhat different on SVR4 and BSD4.3 derived platforms.
 | |
| <P>
 | |
| The memory mapping can be avoided altogether by using <TT>malloc(3)</TT> for
 | |
| stack allocation.  In this case the <TT>MALLOC_STACK</TT> macro should be
 | |
| defined.</LI>
 | |
| </UL>
 | |
| <P>
 | |
| All machine-dependent feature test macros should be defined in the
 | |
| <TT>md.h</TT> header file. The assembly code for <TT>setjmp/longjmp</TT>
 | |
| replacement functions for all CPU architectures should be placed in
 | |
| the <TT>md.S</TT> file.
 | |
| <P>
 | |
| The current version of the library is ported to:
 | |
| <UL>
 | |
|   <LI>IRIX 6.x (both 32 and 64 bit)</LI>
 | |
|   <LI>Linux (kernel 2.x and glibc 2.x) on x86, Alpha, MIPS and MIPSEL,
 | |
|   SPARC, ARM, PowerPC, 68k, HPPA, S390, IA-64, and Opteron (AMD-64)</LI>
 | |
|   <LI>Solaris 2.x (SunOS 5.x) on x86, AMD64, SPARC, and SPARC-64</LI>
 | |
|   <LI>AIX 4.x</LI>
 | |
|   <LI>HP-UX 11 (both 32 and 64 bit)</LI>
 | |
|   <LI>Tru64/OSF1</LI>
 | |
|   <LI>FreeBSD on x86, AMD64, and Alpha</LI>
 | |
|   <LI>OpenBSD on x86, AMD64, Alpha, and SPARC</LI>
 | |
|   <LI>NetBSD on x86, Alpha, SPARC, and VAX</LI>
 | |
|   <LI>MacOS X (Darwin) on PowerPC (32 bit) and Intel (both 32 and 64 bit) [universal]</LI>
 | |
|   <LI>Cygwin</LI>
 | |
| </UL>
 | |
| <P>
 | |
| 
 | |
| <A NAME="signals">
 | |
| <H3>Signals</H3>
 | |
| Signal handling in an application using State Threads should be treated the
 | |
| same way as in a classical UNIX process application. There is no such
 | |
| thing as per-thread signal mask, all threads share the same signal handlers,
 | |
| and only asynchronous-safe functions can be used in signal handlers.
 | |
| However, there is a way to process signals synchronously by converting a
 | |
| signal event to an I/O event: a signal catching function does a write to
 | |
| a pipe which will be processed synchronously by a dedicated signal handling
 | |
| thread.  The following code demonstrates this technique (error handling is
 | |
| omitted for clarity):
 | |
| <PRE>
 | |
| 
 | |
| /* Per-process pipe which is used as a signal queue. */
 | |
| /* Up to PIPE_BUF/sizeof(int) signals can be queued up. */
 | |
| int sig_pipe[2];
 | |
| 
 | |
| /* Signal catching function. */
 | |
| /* Converts signal event to I/O event. */
 | |
| void sig_catcher(int signo)
 | |
| {
 | |
|   int err;
 | |
| 
 | |
|   /* Save errno to restore it after the write() */
 | |
|   err = errno;
 | |
|   /* write() is reentrant/async-safe */
 | |
|   write(sig_pipe[1], &signo, sizeof(int));
 | |
|   errno = err;
 | |
| }
 | |
| 
 | |
| /* Signal processing function. */
 | |
| /* This is the "main" function of the signal processing thread. */
 | |
| void *sig_process(void *arg)
 | |
| {
 | |
|   st_netfd_t nfd;
 | |
|   int signo;
 | |
| 
 | |
|   nfd = st_netfd_open(sig_pipe[0]);
 | |
| 
 | |
|   for ( ; ; ) {
 | |
|     /* Read the next signal from the pipe */
 | |
|     st_read(nfd, &signo, sizeof(int), ST_UTIME_NO_TIMEOUT);
 | |
| 
 | |
|     /* Process signal synchronously */
 | |
|     switch (signo) {
 | |
|     case SIGHUP:
 | |
|       /* do something here - reread config files, etc. */
 | |
|       break;
 | |
|     case SIGTERM:
 | |
|       /* do something here - cleanup, etc. */
 | |
|       break;
 | |
|       /*      .
 | |
|               .
 | |
|          Other signals
 | |
|               .
 | |
|               .
 | |
|       */
 | |
|     }
 | |
|   }
 | |
| 
 | |
|   return NULL;
 | |
| }
 | |
| 
 | |
| int main(int argc, char *argv[])
 | |
| {
 | |
|   struct sigaction sa;
 | |
|         .
 | |
|         .
 | |
|         .
 | |
| 
 | |
|   /* Create signal pipe */
 | |
|   pipe(sig_pipe);
 | |
| 
 | |
|   /* Create signal processing thread */
 | |
|   st_thread_create(sig_process, NULL, 0, 0);
 | |
| 
 | |
|   /* Install sig_catcher() as a signal handler */
 | |
|   sa.sa_handler = sig_catcher;
 | |
|   sigemptyset(&sa.sa_mask);
 | |
|   sa.sa_flags = 0;
 | |
|   sigaction(SIGHUP, &sa, NULL);
 | |
| 
 | |
|   sa.sa_handler = sig_catcher;
 | |
|   sigemptyset(&sa.sa_mask);
 | |
|   sa.sa_flags = 0;
 | |
|   sigaction(SIGTERM, &sa, NULL);
 | |
| 
 | |
|         .
 | |
|         .
 | |
|         .
 | |
|       
 | |
| }
 | |
| 
 | |
| </PRE>
 | |
| <P>
 | |
| Note that if multiple processes are used (see below), the signal pipe should
 | |
| be initialized after the <TT>fork(2)</TT> call so that each process has its
 | |
| own private pipe.
 | |
| <P>
 | |
| 
 | |
| <A NAME="intra">
 | |
| <H3>Intra-Process Synchronization</H3>
 | |
| Due to the event-driven nature of the library scheduler, the thread context
 | |
| switch (process state change) can only happen in a well-known set of
 | |
| library functions.  This set includes functions in which a thread may
 | |
| "block":<TT>  </TT>I/O functions (<TT>st_read(), st_write(), </TT>etc.),
 | |
| sleep functions (<TT>st_sleep(), </TT>etc.), and thread synchronization
 | |
| functions (<TT>st_thread_join(), st_cond_wait(), </TT>etc.).  As a result,
 | |
| process-specific global data need not to be protected by locks since a thread
 | |
| cannot be rescheduled while in a critical section (and only one thread at a
 | |
| time can access the same memory location).  By the same token,
 | |
| non thread-safe functions (in a traditional sense) can be safely used with
 | |
| the State Threads.  The library's mutex facilities are practically useless
 | |
| for a correctly written application (no blocking functions in critical
 | |
| section) and are provided mostly for completeness.  This absence of locking
 | |
| greatly simplifies an application design and provides a foundation for
 | |
| scalability.
 | |
| <P>
 | |
| 
 | |
| <A NAME="inter">
 | |
| <H3>Inter-Process Synchronization</H3>
 | |
| The State Threads library makes it possible to multiplex a large number
 | |
| of simultaneous connections onto a much smaller number of separate 
 | |
| processes, where each process uses a many-to-one user-level threading
 | |
| implementation (<B>N</B> of <B>M:1</B> mappings rather than one <B>M:N</B>
 | |
| mapping used in native threading libraries on some platforms). This design
 | |
| is key to the application's scalability.  One can think about it as if a
 | |
| set of all threads is partitioned into separate groups (processes) where
 | |
| each group has a separate pool of resources (virtual address space, file
 | |
| descriptors, etc.).  An application designer has full control of how many
 | |
| groups (processes) an application creates and what resources, if any,
 | |
| are shared among different groups via standard UNIX inter-process
 | |
| communication (IPC) facilities.<P>
 | |
| There are several reasons for creating multiple processes:
 | |
| <P>
 | |
| <UL>
 | |
| <LI>To take advantage of multiple hardware entities (CPUs, disks, etc.)
 | |
| available in the system (hardware parallelism).</LI>
 | |
| <P>
 | |
| <LI>To reduce risk of losing a large number of user connections when one of
 | |
| the processes crashes. For example, if <B>C</B> user connections (threads)
 | |
| are multiplexed onto <B>P</B> processes and one of the processes crashes,
 | |
| only a fraction (<B>C/P</B>) of all connections will be lost.</LI>
 | |
| <P>
 | |
| <LI>To overcome per-process resource limitations imposed by the OS.  For
 | |
| example, if <TT>select(2)</TT> is used for event polling, the number of
 | |
| simultaneous connections (threads) per process is
 | |
| limited by the <TT>FD_SETSIZE</TT> parameter (see <TT>select(2)</TT>).
 | |
| If <TT>FD_SETSIZE</TT> is equal to 1024 and each connection needs one file
 | |
| descriptor, then an application should create 10 processes to support 10,000
 | |
| simultaneous connections.</LI>
 | |
| </UL>
 | |
| <P>
 | |
| Ideally all user sessions are completely independent, so there is no need for
 | |
| inter-process communication.  It is always better to have several separate
 | |
| smaller process-specific resources (e.g., data caches) than to have one large
 | |
| resource shared (and modified) by all processes.  Sometimes, however, there
 | |
| is a need to share a common resource among different processes.  In that case,
 | |
| standard UNIX IPC facilities can be used.  In addition to that, there is a way
 | |
| to synchronize different processes so that only the thread accessing the
 | |
| shared resource will be suspended (but not the entire process) if that resource
 | |
| is unavailable.  In the following code fragment a pipe is used as a counting
 | |
| semaphore for inter-process synchronization:
 | |
| <PRE>
 | |
| #ifndef PIPE_BUF
 | |
| #define PIPE_BUF 512  /* POSIX */
 | |
| #endif
 | |
| 
 | |
| /* Semaphore data structure */
 | |
| typedef struct ipc_sem {
 | |
|   st_netfd_t rdfd;  /* read descriptor */
 | |
|   st_netfd_t wrfd;  /* write descriptor */
 | |
| } ipc_sem_t;
 | |
| 
 | |
| /* Create and initialize the semaphore. Should be called before fork(2). */
 | |
| /* 'value' must be less than PIPE_BUF. */
 | |
| /* If 'value' is 1, the semaphore works as mutex. */
 | |
| ipc_sem_t *ipc_sem_create(int value)
 | |
| {
 | |
|   ipc_sem_t *sem;
 | |
|   int p[2];
 | |
|   char b[PIPE_BUF];
 | |
| 
 | |
|   /* Error checking is omitted for clarity */
 | |
|   sem = malloc(sizeof(ipc_sem_t));
 | |
| 
 | |
|   /* Create the pipe */
 | |
|   pipe(p);
 | |
|   sem->rdfd = st_netfd_open(p[0]);
 | |
|   sem->wrfd = st_netfd_open(p[1]);
 | |
| 
 | |
|   /* Initialize the semaphore: put 'value' bytes into the pipe */
 | |
|   write(p[1], b, value);
 | |
| 
 | |
|   return sem;
 | |
| }
 | |
| 
 | |
| /* Try to decrement the "value" of the semaphore. */
 | |
| /* If "value" is 0, the calling thread blocks on the semaphore. */
 | |
| int ipc_sem_wait(ipc_sem_t *sem)
 | |
| {
 | |
|   char c;
 | |
| 
 | |
|   /* Read one byte from the pipe */
 | |
|   if (st_read(sem->rdfd, &c, 1, ST_UTIME_NO_TIMEOUT) != 1)
 | |
|     return -1;
 | |
| 
 | |
|   return 0;
 | |
| }
 | |
| 
 | |
| /* Increment the "value" of the semaphore. */
 | |
| int ipc_sem_post(ipc_sem_t *sem)
 | |
| {
 | |
|   char c;
 | |
| 
 | |
|   if (st_write(sem->wrfd, &c, 1, ST_UTIME_NO_TIMEOUT) != 1)
 | |
|     return -1;
 | |
| 
 | |
|   return 0;
 | |
| }
 | |
| 
 | |
| </PRE>
 | |
| <P>
 | |
| 
 | |
| Generally, the following steps should be followed when writing an application
 | |
| using the State Threads library:
 | |
| <P>
 | |
| <OL>
 | |
| <LI>Initialize the library (<TT>st_init()</TT>).</LI>
 | |
| <P>
 | |
| <LI>Create resources that will be shared among different processes:
 | |
|     create and bind listening sockets, create shared memory segments, IPC
 | |
|     channels, synchronization primitives, etc.</LI>
 | |
| <P>
 | |
| <LI>Create several processes (<TT>fork(2)</TT>). The parent process should
 | |
|     either exit or become a "watchdog" (e.g., it starts a new process when
 | |
|     an existing one crashes, does a cleanup upon application termination,
 | |
|     etc.).</LI>
 | |
| <P>
 | |
| <LI>In each child process create a pool of threads
 | |
|     (<TT>st_thread_create()</TT>) to handle user connections.</LI>
 | |
| </OL>
 | |
| <P>
 | |
| 
 | |
| <A NAME="nonnet">
 | |
| <H3>Non-Network I/O</H3>
 | |
| 
 | |
| The State Threads architecture uses non-blocking I/O on
 | |
| <TT>st_netfd_t</TT> objects for concurrent processing of multiple user
 | |
| connections.  This architecture has a drawback:  the entire process and
 | |
| all its threads may block for the duration of a <I>disk</I> or other
 | |
| non-network I/O operation, whether through State Threads I/O functions,
 | |
| direct system calls, or standard I/O functions.  (This is applicable
 | |
| mostly to disk <I>reads</I>; disk <I>writes</I> are usually performed
 | |
| asynchronously -- data goes to the buffer cache to be written to disk
 | |
| later.)  Fortunately, disk I/O (unlike network I/O) usually takes a
 | |
| finite and predictable amount of time, but this may not be true for
 | |
| special devices or user input devices (including stdin).  Nevertheless,
 | |
| such I/O reduces throughput of the system and increases response times.
 | |
| There are several ways to design an application to overcome this
 | |
| drawback:
 | |
| 
 | |
| <P>
 | |
| <UL>
 | |
| <LI>Create several identical main processes as described above (symmetric
 | |
|     architecture).  This will improve CPU utilization and thus improve the
 | |
|     overall throughput of the system.</LI>
 | |
| <P>
 | |
| <LI>Create multiple "helper" processes in addition to the main process that
 | |
|     will handle blocking I/O operations (asymmetric architecture).
 | |
|     This approach was suggested for Web servers in a
 | |
|     <A HREF="http://www.cs.rice.edu/~vivek/flash99/">paper</A> by Peter
 | |
|     Druschel et al. In this architecture the main process communicates with
 | |
|     a helper process via an IPC channel (<TT>pipe(2), socketpair(2)</TT>).
 | |
|     The main process instructs a helper to perform the potentially blocking
 | |
|     operation.  Once the operation completes, the helper returns a
 | |
|     notification via IPC.
 | |
| </UL>
 | |
| <P>
 | |
| 
 | |
| <A NAME="timeouts">
 | |
| <H3>Timeouts</H3>
 | |
| 
 | |
| The <TT>timeout</TT> parameter to <TT>st_cond_timedwait()</TT> and the
 | |
| I/O functions, and the arguments to <TT>st_sleep()</TT> and
 | |
| <TT>st_usleep()</TT> specify a maximum time to wait <I>since the last
 | |
| context switch</I> not since the beginning of the function call.
 | |
| 
 | |
| <P>The State Threads' time resolution is actually the time interval
 | |
| between context switches.  That time interval may be large in some
 | |
| situations, for example, when a single thread does a lot of work
 | |
| continuously.  Note that a steady, uninterrupted stream of network I/O
 | |
| qualifies for this description; a context switch occurs only when a
 | |
| thread blocks.
 | |
| 
 | |
| <P>If a specified I/O timeout is less than the time interval between
 | |
| context switches the function may return with a timeout error before
 | |
| that amount of time has elapsed since the beginning of the function
 | |
| call.  For example, if eight milliseconds have passed since the last
 | |
| context switch and an I/O function with a timeout of 10 milliseconds
 | |
| blocks, causing a switch, the call may return with a timeout error as
 | |
| little as two milliseconds after it was called.  (On Linux,
 | |
| <TT>select()</TT>'s timeout is an <I>upper</I> bound on the amount of
 | |
| time elapsed before select returns.)  Similarly, if 12 ms have passed
 | |
| already, the function may return immediately.
 | |
| 
 | |
| <P>In almost all cases I/O timeouts should be used only for detecting a
 | |
| broken network connection or for preventing a peer from holding an idle
 | |
| connection for too long.  Therefore for most applications realistic I/O
 | |
| timeouts should be on the order of seconds.  Furthermore, there's
 | |
| probably no point in retrying operations that time out.  Rather than
 | |
| retrying simply use a larger timeout in the first place.
 | |
| 
 | |
| <P>The largest valid timeout value is platform-dependent and may be
 | |
| significantly less than <TT>INT_MAX</TT> seconds for <TT>select()</TT>
 | |
| or <TT>INT_MAX</TT> milliseconds for <TT>poll()</TT>.  Generally, you
 | |
| should not use timeouts exceeding several hours.  Use
 | |
| <tt>ST_UTIME_NO_TIMEOUT</tt> (<tt>-1</tt>) as a special value to
 | |
| indicate infinite timeout or indefinite sleep.  Use
 | |
| <tt>ST_UTIME_NO_WAIT</tt> (<tt>0</tt>) to indicate no waiting at all.
 | |
| 
 | |
| <P>
 | |
| <HR>
 | |
| <P>
 | |
| </BODY>
 | |
| </HTML>
 | |
| 
 |