mirror of
				https://github.com/ossrs/srs.git
				synced 2025-03-09 15:49:59 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			504 lines
		
	
	
	
		
			23 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			504 lines
		
	
	
	
		
			23 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <HTML>
 | |
| <HEAD>
 | |
| <TITLE>State Threads for Internet Applications</TITLE>
 | |
| </HEAD>
 | |
| <BODY BGCOLOR=#FFFFFF>
 | |
| <H2>State Threads for Internet Applications</H2>
 | |
| <H3>Introduction</H3>
 | |
| <P>
 | |
| State Threads is an application library which provides a
 | |
| foundation for writing fast and highly scalable Internet Applications
 | |
| on UNIX-like platforms.  It combines the simplicity of the multithreaded 
 | |
| programming paradigm, in which one thread supports each simultaneous 
 | |
| connection, with the performance and scalability of an event-driven 
 | |
| state machine architecture.</P>
 | |
| 
 | |
| <H3>1. Definitions</H3>
 | |
| <P>
 | |
| <A NAME="IA">
 | |
| <H4>1.1 Internet Applications</H4>
 | |
| </A>
 | |
| <P>
 | |
| An <I>Internet Application</I> (IA) is either a server or client network
 | |
| application that accepts connections from clients and may or may not 
 | |
| connect to servers.  In an IA the arrival or departure of network data
 | |
| often controls processing (that is, IA is a <I>data-driven</I> application).
 | |
| For each connection, an IA does some finite amount of work 
 | |
| involving data exchange with its peer, where its peer may be either 
 | |
| a client or a server.
 | |
| The typical transaction steps of an IA are to accept a connection,
 | |
| read a request, do some finite and predictable amount of work to 
 | |
| process the request, then write a response to the peer that sent the 
 | |
| request.  One example of an IA is a Web server; 
 | |
| the most general example of an IA is a proxy server, because it both 
 | |
| accepts connections from clients and connects to other servers.</P>
 | |
| <P>
 | |
| We assume that the performance of an IA is constrained by available CPU
 | |
| cycles rather than network bandwidth or disk I/O (that is, CPU
 | |
| is a bottleneck resource).
 | |
| <P>
 | |
| 
 | |
| <A NAME="PS">
 | |
| <H4>1.2 Performance and Scalability</H4>
 | |
| </A>
 | |
| <P>
 | |
| The <I>performance</I> of an IA is usually evaluated as its
 | |
| throughput measured in transactions per second or bytes per second (one
 | |
| can be converted to the other, given the average transaction size).  There are
 | |
| several benchmarks that can be used to measure throughput of Web serving
 | |
| applications for specific workloads (such as 
 | |
| <A HREF="http://www.spec.org/osg/web96/">SPECweb96</A>,
 | |
| <A HREF="http://www.mindcraft.com/webstone/">WebStone</A>,
 | |
| <A HREF="http://www.zdnet.com/zdbop/webbench/">WebBench</A>).
 | |
| Although there is no common definition for <I>scalability</I>, in general it
 | |
| expresses the ability of an application to sustain its performance when some
 | |
| external condition changes.  For IAs this external condition is either the
 | |
| number of clients (also known as "users," "simultaneous connections," or "load
 | |
| generators") or the underlying hardware system size (number of CPUs, memory
 | |
| size, and so on).  Thus there are two types of scalability: <I>load
 | |
| scalability</I> and <I>system scalability</I>, respectively.
 | |
| <P>
 | |
| The figure below shows how the throughput of an idealized IA changes with
 | |
| the increasing number of clients (solid blue line).  Initially the throughput
 | |
| grows linearly (the slope represents the maximal throughput that one client
 | |
| can provide). Within this initial range, the IA is underutilized and CPUs are
 | |
| partially idle.  Further increase in the number of clients leads to a system
 | |
| saturation, and the throughput gradually stops growing as all CPUs become fully
 | |
| utilized.  After that point, the throughput stays flat because there are no
 | |
| more CPU cycles available.
 | |
| In the real world, however, each simultaneous connection
 | |
| consumes some computational and memory resources, even when idle, and this
 | |
| overhead grows with the number of clients.  Therefore, the throughput of the
 | |
| real world IA starts dropping after some point (dashed blue line in the figure
 | |
| below).  The rate at which the throughput drops depends, among other things, on
 | |
| application design.
 | |
| <P>
 | |
| We say that an application has a good <I>load scalability</I> if it can
 | |
| sustain its throughput over a wide range of loads.
 | |
| Interestingly, the <A HREF="http://www.spec.org/osg/web99/">SPECweb99</A>
 | |
| benchmark somewhat reflects the Web server's load scalability because it
 | |
| measures the number of clients (load generators) given a mandatory minimal
 | |
| throughput per client (that is, it measures the server's <I>capacity</I>).
 | |
| This is unlike <A HREF="http://www.spec.org/osg/web96/">SPECweb96</A> and
 | |
| other benchmarks that use the throughput as their main metric (see the figure
 | |
| below).
 | |
| <P>
 | |
| <CENTER><IMG SRC="fig.gif" ALT="Figure: Throughput vs. Number of clients">
 | |
| </CENTER>
 | |
| <P>
 | |
| <I>System scalability</I> is the ability of an application to sustain its
 | |
| performance per hardware unit (such as a CPU) with the increasing number of
 | |
| these units.  In other words, good system scalability means that doubling the
 | |
| number of processors will roughly double the application's throughput (dashed
 | |
| green line).  We assume here that the underlying operating system also scales
 | |
| well.  Good system scalability allows you to initially run an application on 
 | |
| the smallest system possible, while retaining the ability to move that
 | |
| application to a larger system if necessary, without excessive effort or
 | |
| expense.  That is, an application need not be rewritten or even undergo a
 | |
| major porting effort when changing system size.
 | |
| <P>
 | |
| Although scalability and performance are more important in the case of server
 | |
| IAs, they should also be considered for some client applications (such as 
 | |
| benchmark load generators).
 | |
| <P>
 | |
| 
 | |
| <A NAME="CONC">
 | |
| <H4>1.3 Concurrency</H4>
 | |
| </A>
 | |
| <P>
 | |
| Concurrency reflects the parallelism in a system.  The two unrelated types 
 | |
| are <I>virtual</I> concurrency and <I>real</I> concurrency.
 | |
| <UL>
 | |
| <LI>Virtual (or apparent) concurrency is the number of simultaneous
 | |
| connections that a system supports.
 | |
| <BR><BR>
 | |
| <LI>Real concurrency is the number of hardware devices, including
 | |
| CPUs, network cards, and disks, that actually allow a system to perform 
 | |
| tasks in parallel.
 | |
| </UL>
 | |
| <P>
 | |
| An IA must provide virtual concurrency in order to serve many users
 | |
| simultaneously.
 | |
| To achieve maximum performance and scalability in doing so, the number of
 | |
| programming entities than an IA creates to be scheduled by the OS kernel
 | |
| should be
 | |
| kept close to (within an order of magnitude of) the real concurrency found on
 | |
| the system. These programming entities scheduled by the kernel are known as
 | |
| <I>kernel execution vehicles</I>. Examples of kernel execution vehicles
 | |
| include Solaris lightweight processes and IRIX kernel threads.
 | |
| In other words, the number of kernel execution vehicles should be dictated by
 | |
| the system size and not by the number of simultaneous connections.
 | |
| <P>
 | |
| 
 | |
| <H3>2. Existing Architectures</H3>
 | |
| <P>
 | |
| There are a few different architectures that are commonly used by IAs. 
 | |
| These include the <I>Multi-Process</I>, 
 | |
| <I>Multi-Threaded</I>, and <I>Event-Driven State Machine</I> 
 | |
| architectures.
 | |
| <P>
 | |
| <A NAME="MP">
 | |
| <H4>2.1 Multi-Process Architecture</H4>
 | |
| </A>
 | |
| <P>
 | |
| In the Multi-Process (MP) architecture, an individual process is 
 | |
| dedicated to each simultaneous connection.
 | |
| A process performs all of a transaction's initialization steps 
 | |
| and services a connection completely before moving on to service 
 | |
| a new connection.
 | |
| <P>
 | |
| User sessions in IAs are relatively independent; therefore, no 
 | |
| synchronization between processes handling different connections is
 | |
| necessary.  Because each process has its own private address space,
 | |
| this architecture is very robust. If a process serving one of the connections
 | |
| crashes, the other sessions will not be affected.  However, to serve many
 | |
| concurrent connections, an equal number of processes must be employed.
 | |
| Because processes are kernel entities (and are in fact the heaviest ones), 
 | |
| the number of kernel entities will be at least as large as the number of 
 | |
| concurrent sessions. On most systems, good performance will not be achieved 
 | |
| when more than a few hundred processes are created because of the high 
 | |
| context-switching overhead. In other words, MP applications have poor load 
 | |
| scalability.
 | |
| <P>
 | |
| On the other hand, MP applications have very good system scalability, because
 | |
| no resources are shared among different processes and there is no
 | |
| synchronization overhead.
 | |
| <P>
 | |
| The Apache Web Server 1.x (<A HREF=#refs1>[Reference 1]</A>) uses the MP 
 | |
| architecture on UNIX systems.
 | |
| <P>
 | |
| <A NAME="MT">
 | |
| <H4>2.2 Multi-Threaded Architecture</H4>
 | |
| </A>
 | |
| <P>
 | |
| In the Multi-Threaded (MT) architecture, multiple independent threads 
 | |
| of control are employed within a single shared address space.  Like a 
 | |
| process in the MP architecture, each thread performs all of a
 | |
| transaction's initialization steps and services a connection completely
 | |
| before moving on to service a new connection.
 | |
| <P>
 | |
| Many modern UNIX operating systems implement a <I>many-to-few</I> model when 
 | |
| mapping user-level threads to kernel entities.  In this model, an 
 | |
| arbitrarily large number of user-level threads is multiplexed onto a 
 | |
| lesser number of kernel execution vehicles.  Kernel execution 
 | |
| vehicles are also known as <I>virtual processors</I>.  Whenever a user-level
 | |
| thread makes a blocking system call, the kernel execution vehicle it is using
 | |
| will become blocked in the kernel.  If there are no other non-blocked kernel
 | |
| execution vehicles and there are other runnable user-level threads, a new
 | |
| kernel execution vehicle will be created automatically.  This prevents the
 | |
| application from blocking when it can continue to make useful forward
 | |
| progress.
 | |
| <P>
 | |
| Because IAs are by nature network I/O driven, all concurrent sessions block on
 | |
| network I/O at various points.  As a result, the number of virtual processors
 | |
| created in the kernel grows close to the number of user-level threads
 | |
| (or simultaneous connections).  When this occurs, the many-to-few model
 | |
| effectively degenerates to a <I>one-to-one</I> model.  Again, like in
 | |
| the MP architecture, the number of kernel execution vehicles is dictated by
 | |
| the number of simultaneous connections rather than by number of CPUs.  This
 | |
| reduces an application's load scalability.  However, because kernel threads
 | |
| (lightweight processes) use fewer resources and are more light-weight than
 | |
| traditional UNIX processes, an MT application should scale better with load
 | |
| than an MP application.
 | |
| <P>
 | |
| Unexpectedly, the small number of virtual processors sharing the same address
 | |
| space in the MT architecture destroys an application's system scalability
 | |
| because of contention among the threads on various locks.  Even if an
 | |
| application itself is carefully
 | |
| optimized to avoid lock contention around its own global data (a non-trivial
 | |
| task), there are still standard library functions and system calls
 | |
| that use common resources hidden from the application.  For example,
 | |
| on many platforms thread safety of memory allocation routines
 | |
| (<TT>malloc(3)</TT>, <TT>free(3)</TT>, and so on) is achieved by using a single
 | |
| global lock.  Another example is a per-process file descriptor table.
 | |
| This common resource table is shared by all kernel execution vehicles within
 | |
| the same process and must be protected when one modifies it via
 | |
| certain system calls (such as <TT>open(2)</TT>, <TT>close(2)</TT>, and so on).
 | |
| In addition to that, maintaining the caches coherent
 | |
| among CPUs on multiprocessor systems hurts performance when different threads
 | |
| running on different CPUs modify data items on the same cache line.
 | |
| <P>
 | |
| In order to improve load scalability, some applications employ a different
 | |
| type of MT architecture:  they create one or more thread(s) <I>per task</I>
 | |
| rather than one thread <I>per connection</I>.  For example, one small group
 | |
| of threads may be responsible for accepting client connections, another 
 | |
| for request processing, and yet another for serving responses.  The main
 | |
| advantage of this architecture is that it eliminates the tight coupling
 | |
| between the number of threads and number of simultaneous connections. However,
 | |
| in this architecture, different task-specific thread groups must share common
 | |
| work queues that must be protected by mutual exclusion locks (a typical
 | |
| producer-consumer problem).  This adds synchronization overhead that causes an
 | |
| application to perform badly on multiprocessor systems.  In other words, in
 | |
| this architecture, the application's system scalability is sacrificed for the
 | |
| sake of load scalability.
 | |
| <P>
 | |
| Of course, the usual nightmares of threaded programming, including data
 | |
| corruption, deadlocks, and race conditions, also make MT architecture (in any
 | |
| form) non-simplistic to use.
 | |
| <P>
 | |
| 
 | |
| <A NAME="EDSM">
 | |
| <H4>2.3 Event-Driven State Machine Architecture</H4>
 | |
| </A>
 | |
| <P>
 | |
| In the Event-Driven State Machine (EDSM) architecture, a single process
 | |
| is employed to concurrently process multiple connections. The basics of this
 | |
| architecture are described in Comer and Stevens
 | |
| <A HREF=#refs2>[Reference 2]</A>.
 | |
| The EDSM architecture performs one basic data-driven step associated with
 | |
| a particular connection at a time, thus multiplexing many concurrent
 | |
| connections.  The process operates as a state machine that receives an event
 | |
| and then reacts to it.
 | |
| <P>
 | |
| In the idle state the EDSM calls <TT>select(2)</TT> or <TT>poll(2)</TT> to
 | |
| wait for network I/O events.  When a particular file descriptor is ready for
 | |
| I/O, the EDSM completes the corresponding basic step (usually by invoking a
 | |
| handler function) and starts the next one.  This architecture uses
 | |
| non-blocking system calls to perform asynchronous network I/O operations.
 | |
| For more details on non-blocking I/O see Stevens
 | |
| <A HREF=#refs3>[Reference 3]</A>.
 | |
| <P>
 | |
| To take advantage of hardware parallelism (real concurrency), multiple
 | |
| identical processes may be created.  This is called Symmetric Multi-Process
 | |
| EDSM and is used, for example, in the Zeus Web Server
 | |
| (<A HREF=#refs4>[Reference 4]</A>).  To more efficiently multiplex disk I/O,
 | |
| special "helper" processes may be created.  This is called Asymmetric
 | |
| Multi-Process EDSM and was proposed for Web servers by Druschel
 | |
| and others <A HREF=#refs5>[Reference 5]</A>.
 | |
| <P>
 | |
| EDSM is probably the most scalable architecture for IAs.
 | |
| Because the number of simultaneous connections (virtual concurrency) is
 | |
| completely decoupled from the number of kernel execution vehicles (processes),
 | |
| this architecture has very good load scalability.  It requires only minimal 
 | |
| user-level resources to create and maintain additional connection.
 | |
| <P>
 | |
| Like MP applications, Multi-Process EDSM has very good system scalability
 | |
| because no resources are shared among different processes and there is no
 | |
| synchronization overhead.
 | |
| <P>
 | |
| Unfortunately, the EDSM architecture is monolithic rather than based on the
 | |
| concept of threads, so new applications generally need to be implemented from
 | |
| the ground up.  In effect, the EDSM architecture simulates threads and their
 | |
| stacks the hard way.
 | |
| <P>
 | |
| 
 | |
| <A NAME="ST">
 | |
| <H3>3. State Threads Library</H3>
 | |
| </A>
 | |
| <P>
 | |
| The State Threads library combines the advantages of all of the above
 | |
| architectures.  The interface preserves the programming simplicity of thread
 | |
| abstraction, allowing each simultaneous connection to be treated as a separate
 | |
| thread of execution within a single process. The underlying implementation is
 | |
| close to the EDSM architecture as the state of each particular concurrent
 | |
| session is saved in a separate memory segment.
 | |
| <P>
 | |
| 
 | |
| <H4>3.1 State Changes and Scheduling</H4>
 | |
| <P>
 | |
| The state of each concurrent session includes its stack environment 
 | |
| (stack pointer, program counter, CPU registers) and its stack.  Conceptually, 
 | |
| a thread context switch can be viewed as a process changing its state.  There 
 | |
| are no kernel entities involved other than processes.  
 | |
| Unlike other general-purpose threading libraries, the State Threads library
 | |
| is fully deterministic.  The thread context switch (process state change) can
 | |
| only happen in a well-known set of functions (at I/O points or at explicit
 | |
| synchronization points).  As a result, process-specific global data does not
 | |
| have to be protected by mutual exclusion locks in most cases.  The entire
 | |
| application is free to use all the static variables and non-reentrant library
 | |
| functions it wants, greatly simplifying programming and debugging while
 | |
| increasing performance.  This is somewhat similar to a <I>co-routine</I> model
 | |
| (co-operatively multitasked threads), except that no explicit yield is needed
 | |
| --
 | |
| sooner or later, a thread performs a blocking I/O operation and thus surrenders
 | |
| control.  All threads of execution (simultaneous connections) have the
 | |
| same priority, so scheduling is non-preemptive, like in the EDSM architecture.
 | |
| Because IAs are data-driven (processing is limited by the size of network 
 | |
| buffers and data arrival rates), scheduling is non-time-slicing.
 | |
| <P>
 | |
| Only two types of external events are handled by the library's
 | |
| scheduler, because only these events can be detected by
 | |
| <TT>select(2)</TT> or <TT>poll(2)</TT>: I/O events (a file descriptor is ready
 | |
| for I/O) and time events
 | |
| (some timeout has expired).  However, other types of events (such as
 | |
| a signal sent to a process) can also be handled by converting them to I/O
 | |
| events.  For example, a signal handling function can perform a write to a pipe
 | |
| (<TT>write(2)</TT> is reentrant/asynchronous-safe), thus converting a signal
 | |
| event to an I/O event.
 | |
| <P>
 | |
| To take advantage of hardware parallelism, as in the EDSM architecture,
 | |
| multiple processes can be created in either a symmetric or asymmetric manner.
 | |
| Process management is not in the library's scope but instead is left up to the
 | |
| application.
 | |
| <P>
 | |
| There are several general-purpose threading libraries that implement a
 | |
| <I>many-to-one</I> model (many user-level threads to one kernel execution
 | |
| vehicle), using the same basic techniques as the State Threads library 
 | |
| (non-blocking I/O, event-driven scheduler, and so on).  For an example, see GNU
 | |
| Portable Threads (<A HREF=#refs6>[Reference 6]</A>).  Because they are
 | |
| general-purpose, these libraries have different objectives than the State 
 | |
| Threads library.  The State Threads library is <I>not</I> a general-purpose
 | |
| threading library,
 | |
| but rather an application library that targets only certain types of
 | |
| applications (IAs) in order to achieve the highest possible performance and
 | |
| scalability for those applications.
 | |
| <P>
 | |
| 
 | |
| <H4>3.2 Scalability</H4>
 | |
| <P>
 | |
| State threads are very lightweight user-level entities, and therefore creating
 | |
| and maintaining user connections requires minimal resources.  An application
 | |
| using the State Threads library scales very well with the increasing number
 | |
| of connections.
 | |
| <P>
 | |
| On multiprocessor systems an application should create multiple processes
 | |
| to take advantage of hardware parallelism.  Using multiple separate processes
 | |
| is the <I>only</I> way to achieve the highest possible system scalability.
 | |
| This is because duplicating per-process resources is the only way to avoid
 | |
| significant synchronization overhead on multiprocessor systems.  Creating
 | |
| separate UNIX processes naturally offers resource duplication.  Again,
 | |
| as in the EDSM architecture, there is no connection between the number of
 | |
| simultaneous connections (which may be very large and changes within a wide
 | |
| range) and the number of kernel entities (which is usually small and constant).
 | |
| In other words, the State Threads library makes it possible to multiplex a
 | |
| large number of simultaneous connections onto a much smaller number of
 | |
| separate processes, thus allowing an application to scale well with both
 | |
| the load and system size.
 | |
| <P>
 | |
| 
 | |
| <H4>3.3 Performance</H4>
 | |
| <P>
 | |
| Performance is one of the library's main objectives.  The State Threads
 | |
| library is implemented to minimize the number of system calls and 
 | |
| to make thread creation and context switching as fast as possible.
 | |
| For example, per-thread signal mask does not exist (unlike
 | |
| POSIX threads), so there is no need to save and restore a process's
 | |
| signal mask on every thread context switch. This eliminates two system
 | |
| calls per context switch.  Signal events can be handled much more
 | |
| efficiently by converting them to I/O events (see above).
 | |
| <P>
 | |
| 
 | |
| <H4>3.4 Portability</H4>
 | |
| <P>
 | |
| The library uses the same general, underlying concepts as the EDSM 
 | |
| architecture, including non-blocking I/O, file descriptors, and 
 | |
| I/O multiplexing.  These concepts are available in some form on most 
 | |
| UNIX platforms, making the library very portable across many 
 | |
| flavors of UNIX.  There are only a few platform-dependent sections in the
 | |
| source.
 | |
| <P>
 | |
| 
 | |
| <H4>3.5 State Threads and NSPR</H4>
 | |
| <P>
 | |
| The State Threads library is a derivative of the Netscape Portable 
 | |
| Runtime library (NSPR) <A HREF=#refs7>[Reference 7]</A>. The primary goal of 
 | |
| NSPR is to provide a platform-independent layer for system facilities, 
 | |
| where system facilities include threads, thread synchronization, and I/O.
 | |
| Performance and scalability are not the main concern of NSPR.  The 
 | |
| State Threads library addresses performance and scalability while 
 | |
| remaining much smaller than NSPR.  It is contained in 8 source files 
 | |
| as opposed to more than 400, but provides all the functionality that 
 | |
| is needed to write efficient IAs on UNIX-like platforms.
 | |
| <P>
 | |
| 
 | |
| <TABLE CELLPADDING=3>
 | |
| <TR>
 | |
| <TD></TD>
 | |
| <TH>NSPR</TH>
 | |
| <TH>State Threads</TH>
 | |
| </TR>
 | |
| <TR>
 | |
| <TD><B>Lines of code</B></TD>
 | |
| <TD ALIGN=RIGHT>~150,000</TD>
 | |
| <TD ALIGN=RIGHT>~3000</TD>
 | |
| </TR>
 | |
| <TR>
 | |
| <TD><B>Dynamic library size  <BR>(debug version)</B></TD>
 | |
| <TD></TD>
 | |
| <TD></TD>
 | |
| </TR>
 | |
| <TR>
 | |
| <TD>IRIX</TD>
 | |
| <TD ALIGN=RIGHT>~700 KB</TD>
 | |
| <TD ALIGN=RIGHT>~60 KB</TD>
 | |
| </TR>
 | |
| <TR>
 | |
| <TD>Linux</TD>
 | |
| <TD ALIGN=RIGHT>~900 KB</TD>
 | |
| <TD ALIGN=RIGHT>~70 KB</TD>
 | |
| </TR>
 | |
| </TABLE>
 | |
| <P>
 | |
| 
 | |
| <H3>Conclusion</H3>
 | |
| <P>
 | |
| State Threads is an application library which provides a foundation for
 | |
| writing <A HREF=#IA>Internet Applications</A>.  To summarize, it has the
 | |
| following <I>advantages</I>:
 | |
| <P>
 | |
| <UL>
 | |
| <LI>It allows the design of fast and highly scalable applications.  An
 | |
| application will scale well with both load and number of CPUs.
 | |
| <P>
 | |
| <LI>It greatly simplifies application programming and debugging because, as a
 | |
| rule, no mutual exclusion locking is necessary and the entire application is
 | |
| free to use static variables and non-reentrant library functions.
 | |
| </UL>
 | |
| <P>
 | |
| The library's main <I>limitation</I>:
 | |
| <P>
 | |
| <UL>
 | |
| <LI>All I/O operations on sockets must use the State Thread library's I/O
 | |
| functions because only those functions perform thread scheduling and prevent
 | |
| the application's processes from blocking.
 | |
| </UL>
 | |
| <P>
 | |
| 
 | |
| <H3>References</H3>
 | |
| <OL>
 | |
| <A NAME="refs1">
 | |
| <LI> Apache Software Foundation,
 | |
| <A HREF="http://www.apache.org">http://www.apache.org</A>.
 | |
| <A NAME="refs2">
 | |
| <LI> Douglas E. Comer, David L. Stevens, <I>Internetworking With TCP/IP,
 | |
| Vol. III: Client-Server Programming And Applications</I>, Second Edition,
 | |
| Ch. 8, 12.
 | |
| <A NAME="refs3">
 | |
| <LI> W. Richard Stevens, <I>UNIX Network Programming</I>, Second Edition,
 | |
| Vol. 1, Ch. 15.
 | |
| <A NAME="refs4">
 | |
| <LI> Zeus Technology Limited,
 | |
| <A HREF="http://www.zeus.co.uk/">http://www.zeus.co.uk</A>.
 | |
| <A NAME="refs5">
 | |
| <LI> Peter Druschel, Vivek S. Pai, Willy Zwaenepoel,
 | |
| <A HREF="http://www.cs.rice.edu/~druschel/usenix99flash.ps.gz">
 | |
| Flash: An Efficient and Portable Web Server</A>. In <I>Proceedings of the
 | |
| USENIX 1999 Annual Technical Conference</I>, Monterey, CA, June 1999.
 | |
| <A NAME="refs6">
 | |
| <LI> GNU Portable Threads,
 | |
| <A HREF="http://www.gnu.org/software/pth/">http://www.gnu.org/software/pth/</A>.
 | |
| <A NAME="refs7">
 | |
| <LI> Netscape Portable Runtime,
 | |
| <A HREF="http://www.mozilla.org/docs/refList/refNSPR/">http://www.mozilla.org/docs/refList/refNSPR/</A>.
 | |
| </OL>
 | |
| 
 | |
| <H3>Other resources covering various architectural issues in IAs</H3>
 | |
| <OL START=8>
 | |
| <LI> Dan Kegel, <I>The C10K problem</I>,
 | |
| <A HREF="http://www.kegel.com/c10k.html">http://www.kegel.com/c10k.html</A>.
 | |
| </LI>
 | |
| <LI> James C. Hu, Douglas C. Schmidt, Irfan Pyarali, <I>JAWS: Understanding
 | |
| High Performance Web Systems</I>,
 | |
| <A HREF="http://www.cs.wustl.edu/~jxh/research/research.html">http://www.cs.wustl.edu/~jxh/research/research.html</A>.</LI>
 | |
| </OL>
 | |
| <P>
 | |
| <HR>
 | |
| <P>
 | |
| 
 | |
| <CENTER><FONT SIZE=-1>Portions created by SGI are Copyright © 2000
 | |
| Silicon Graphics, Inc.  All rights reserved.</FONT></CENTER>
 | |
| <P>
 | |
| 
 | |
| </BODY>
 | |
| </HTML>
 | |
| 
 |