mirror of
				https://github.com/ossrs/srs.git
				synced 2025-03-09 15:49:59 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			279 lines
		
	
	
	
		
			11 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			279 lines
		
	
	
	
		
			11 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| gperftools
 | ||
| ----------
 | ||
| (originally Google Performance Tools)
 | ||
| 
 | ||
| The fastest malloc we’ve seen; works particularly well with threads
 | ||
| and STL. Also: thread-friendly heap-checker, heap-profiler, and
 | ||
| cpu-profiler.
 | ||
| 
 | ||
| 
 | ||
| OVERVIEW
 | ||
| ---------
 | ||
| 
 | ||
| gperftools is a collection of a high-performance multi-threaded
 | ||
| malloc() implementation, plus some pretty nifty performance analysis
 | ||
| tools.
 | ||
| 
 | ||
| gperftools is distributed under the terms of the BSD License. Join our
 | ||
| mailing list at gperftools@googlegroups.com for updates:
 | ||
| https://groups.google.com/forum/#!forum/gperftools
 | ||
| 
 | ||
| gperftools was original home for pprof program. But do note that
 | ||
| original pprof (which is still included with gperftools) is now
 | ||
| deprecated in favor of golang version at https://github.com/google/pprof
 | ||
| 
 | ||
| 
 | ||
| TCMALLOC
 | ||
| --------
 | ||
| Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of
 | ||
| tcmalloc -- a replacement for malloc and new.  See below for some
 | ||
| environment variables you can use with tcmalloc, as well.
 | ||
| 
 | ||
| tcmalloc functionality is available on all systems we've tested; see
 | ||
| INSTALL for more details.  See README_windows.txt for instructions on
 | ||
| using tcmalloc on Windows.
 | ||
| 
 | ||
| when compiling.  gcc makes some optimizations assuming it is using its
 | ||
| own, built-in malloc; that assumption obviously isn't true with
 | ||
| tcmalloc.  In practice, we haven't seen any problems with this, but
 | ||
| the expected risk is highest for users who register their own malloc
 | ||
| hooks with tcmalloc (using gperftools/malloc_hook.h).  The risk is
 | ||
| lowest for folks who use tcmalloc_minimal (or, of course, who pass in
 | ||
| the above flags :-) ).
 | ||
| 
 | ||
| 
 | ||
| HEAP PROFILER
 | ||
| -------------
 | ||
| See docs/heapprofile.html for information about how to use tcmalloc's
 | ||
| heap profiler and analyze its output.
 | ||
| 
 | ||
| As a quick-start, do the following after installing this package:
 | ||
| 
 | ||
| 1) Link your executable with -ltcmalloc
 | ||
| 2) Run your executable with the HEAPPROFILE environment var set:
 | ||
|      $ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args]
 | ||
| 3) Run pprof to analyze the heap usage
 | ||
|      $ pprof <path/to/binary> /tmp/heapprof.0045.heap  # run 'ls' to see options
 | ||
|      $ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap
 | ||
| 
 | ||
| You can also use LD_PRELOAD to heap-profile an executable that you
 | ||
| didn't compile.
 | ||
| 
 | ||
| There are other environment variables, besides HEAPPROFILE, you can
 | ||
| set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES"
 | ||
| below.
 | ||
| 
 | ||
| The heap profiler is available on all unix-based systems we've tested;
 | ||
| see INSTALL for more details.  It is not currently available on Windows.
 | ||
| 
 | ||
| 
 | ||
| HEAP CHECKER
 | ||
| ------------
 | ||
| See docs/heap_checker.html for information about how to use tcmalloc's
 | ||
| heap checker.
 | ||
| 
 | ||
| In order to catch all heap leaks, tcmalloc must be linked *last* into
 | ||
| your executable.  The heap checker may mischaracterize some memory
 | ||
| accesses in libraries listed after it on the link line.  For instance,
 | ||
| it may report these libraries as leaking memory when they're not.
 | ||
| (See the source code for more details.)
 | ||
| 
 | ||
| Here's a quick-start for how to use:
 | ||
| 
 | ||
| As a quick-start, do the following after installing this package:
 | ||
| 
 | ||
| 1) Link your executable with -ltcmalloc
 | ||
| 2) Run your executable with the HEAPCHECK environment var set:
 | ||
|      $ HEAPCHECK=1 <path/to/binary> [binary args]
 | ||
| 
 | ||
| Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian
 | ||
| 
 | ||
| You can also use LD_PRELOAD to heap-check an executable that you
 | ||
| didn't compile.
 | ||
| 
 | ||
| The heap checker is only available on Linux at this time; see INSTALL
 | ||
| for more details.
 | ||
| 
 | ||
| 
 | ||
| CPU PROFILER
 | ||
| ------------
 | ||
| See docs/cpuprofile.html for information about how to use the CPU
 | ||
| profiler and analyze its output.
 | ||
| 
 | ||
| As a quick-start, do the following after installing this package:
 | ||
| 
 | ||
| 1) Link your executable with -lprofiler
 | ||
| 2) Run your executable with the CPUPROFILE environment var set:
 | ||
|      $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args]
 | ||
| 3) Run pprof to analyze the CPU usage
 | ||
|      $ pprof <path/to/binary> /tmp/prof.out      # -pg-like text output
 | ||
|      $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output
 | ||
| 
 | ||
| There are other environment variables, besides CPUPROFILE, you can set
 | ||
| to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below.
 | ||
| 
 | ||
| The CPU profiler is available on all unix-based systems we've tested;
 | ||
| see INSTALL for more details.  It is not currently available on Windows.
 | ||
| 
 | ||
| NOTE: CPU profiling doesn't work after fork (unless you immediately
 | ||
|       do an exec()-like call afterwards).  Furthermore, if you do
 | ||
|       fork, and the child calls exit(), it may corrupt the profile
 | ||
|       data.  You can use _exit() to work around this.  We hope to have
 | ||
|       a fix for both problems in the next release of perftools
 | ||
|       (hopefully perftools 1.2).
 | ||
| 
 | ||
| 
 | ||
| EVERYTHING IN ONE
 | ||
| -----------------
 | ||
| If you want the CPU profiler, heap profiler, and heap leak-checker to
 | ||
| all be available for your application, you can do:
 | ||
|    gcc -o myapp ... -lprofiler -ltcmalloc
 | ||
| 
 | ||
| However, if you have a reason to use the static versions of the
 | ||
| library, this two-library linking won't work:
 | ||
|    gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a  # errors!
 | ||
| 
 | ||
| Instead, use the special libtcmalloc_and_profiler library, which we
 | ||
| make for just this purpose:
 | ||
|    gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a
 | ||
| 
 | ||
| 
 | ||
| CONFIGURATION OPTIONS
 | ||
| ---------------------
 | ||
| For advanced users, there are several flags you can pass to
 | ||
| './configure' that tweak tcmalloc performance.  (These are in addition
 | ||
| to the environment variables you can set at runtime to affect
 | ||
| tcmalloc, described below.)  See the INSTALL file for details.
 | ||
| 
 | ||
| 
 | ||
| ENVIRONMENT VARIABLES
 | ||
| ---------------------
 | ||
| The cpu profiler, heap checker, and heap profiler will lie dormant,
 | ||
| using no memory or CPU, until you turn them on.  (Thus, there's no
 | ||
| harm in linking -lprofiler into every application, and also -ltcmalloc
 | ||
| assuming you're ok using the non-libc malloc library.)
 | ||
| 
 | ||
| The easiest way to turn them on is by setting the appropriate
 | ||
| environment variables.  We have several variables that let you
 | ||
| enable/disable features as well as tweak parameters.
 | ||
| 
 | ||
| Here are some of the most important variables:
 | ||
| 
 | ||
| HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix
 | ||
| HEAPCHECK=<type>  -- turns on heap checking with strictness 'type'
 | ||
| CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file.
 | ||
| PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code
 | ||
|                      surrounded with ProfilerEnable()/ProfilerDisable().
 | ||
| CPUPROFILE_FREQUENCY=x-- how many interrupts/second the cpu-profiler samples.
 | ||
| 
 | ||
| PERFTOOLS_VERBOSE=<level> -- the higher level, the more messages malloc emits
 | ||
| MALLOCSTATS=<level>    -- prints memory-use stats at program-exit
 | ||
| 
 | ||
| For a full list of variables, see the documentation pages:
 | ||
|    docs/cpuprofile.html
 | ||
|    docs/heapprofile.html
 | ||
|    docs/heap_checker.html
 | ||
| 
 | ||
| 
 | ||
| COMPILING ON NON-LINUX SYSTEMS
 | ||
| ------------------------------
 | ||
| 
 | ||
| Perftools was developed and tested on x86 Linux systems, and it works
 | ||
| in its full generality only on those systems.  However, we've
 | ||
| successfully ported much of the tcmalloc library to FreeBSD, Solaris
 | ||
| x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic
 | ||
| functionality in tcmalloc_minimal to Windows.  See INSTALL for details.
 | ||
| See README_windows.txt for details on the Windows port.
 | ||
| 
 | ||
| 
 | ||
| PERFORMANCE
 | ||
| -----------
 | ||
| 
 | ||
| If you're interested in some third-party comparisons of tcmalloc to
 | ||
| other malloc libraries, here are a few web pages that have been
 | ||
| brought to our attention.  The first discusses the effect of using
 | ||
| various malloc libraries on OpenLDAP.  The second compares tcmalloc to
 | ||
| win32's malloc.
 | ||
|   http://www.highlandsun.com/hyc/malloc/
 | ||
|   http://gaiacrtn.free.fr/articles/win32perftools.html
 | ||
| 
 | ||
| It's possible to build tcmalloc in a way that trades off faster
 | ||
| performance (particularly for deletes) at the cost of more memory
 | ||
| fragmentation (that is, more unusable memory on your system).  See the
 | ||
| INSTALL file for details.
 | ||
| 
 | ||
| 
 | ||
| OLD SYSTEM ISSUES
 | ||
| -----------------
 | ||
| 
 | ||
| When compiling perftools on some old systems, like RedHat 8, you may
 | ||
| get an error like this:
 | ||
|     ___tls_get_addr: symbol not found
 | ||
| 
 | ||
| This means that you have a system where some parts are updated enough
 | ||
| to support Thread Local Storage, but others are not.  The perftools
 | ||
| configure script can't always detect this kind of case, leading to
 | ||
| that error.  To fix it, just comment out (or delete) the line
 | ||
|    #define HAVE_TLS 1
 | ||
| in your config.h file before building.
 | ||
| 
 | ||
| 
 | ||
| 64-BIT ISSUES
 | ||
| -------------
 | ||
| 
 | ||
| There are two issues that can cause program hangs or crashes on x86_64
 | ||
| 64-bit systems, which use the libunwind library to get stack-traces.
 | ||
| Neither issue should affect the core tcmalloc library; they both
 | ||
| affect the perftools tools such as cpu-profiler, heap-checker, and
 | ||
| heap-profiler.
 | ||
| 
 | ||
| 1) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the
 | ||
| libc function dl_iterate_phdr() acquires its locks in the wrong
 | ||
| order.  This bug should not affect tcmalloc, but may cause occasional
 | ||
| deadlock with the cpu-profiler, heap-profiler, and heap-checker.
 | ||
| Its likeliness increases the more dlopen() commands an executable has.
 | ||
| Most executables don't have any, though several library routines like
 | ||
| getgrgid() call dlopen() behind the scenes.
 | ||
| 
 | ||
| 2) On x86-64 64-bit systems, while tcmalloc itself works fine, the
 | ||
| cpu-profiler tool is unreliable: it will sometimes work, but sometimes
 | ||
| cause a segfault.  I'll explain the problem first, and then some
 | ||
| workarounds.
 | ||
| 
 | ||
| Note that this only affects the cpu-profiler, which is a
 | ||
| gperftools feature you must turn on manually by setting the
 | ||
| CPUPROFILE environment variable.  If you do not turn on cpu-profiling,
 | ||
| you shouldn't see any crashes due to perftools.
 | ||
| 
 | ||
| The gory details: The underlying problem is in the backtrace()
 | ||
| function, which is a built-in function in libc.
 | ||
| Backtracing is fairly straightforward in the normal case, but can run
 | ||
| into problems when having to backtrace across a signal frame.
 | ||
| Unfortunately, the cpu-profiler uses signals in order to register a
 | ||
| profiling event, so every backtrace that the profiler does crosses a
 | ||
| signal frame.
 | ||
| 
 | ||
| In our experience, the only time there is trouble is when the signal
 | ||
| fires in the middle of pthread_mutex_lock.  pthread_mutex_lock is
 | ||
| called quite a bit from system libraries, particularly at program
 | ||
| startup and when creating a new thread.
 | ||
| 
 | ||
| The solution: The dwarf debugging format has support for 'cfi
 | ||
| annotations', which make it easy to recognize a signal frame.  Some OS
 | ||
| distributions, such as Fedora and gentoo 2007.0, already have added
 | ||
| cfi annotations to their libc.  A future version of libunwind should
 | ||
| recognize these annotations; these systems should not see any
 | ||
| crashes.
 | ||
| 
 | ||
| Workarounds: If you see problems with crashes when running the
 | ||
| cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into
 | ||
| your code, rather than setting CPUPROFILE.  This will profile only
 | ||
| those sections of the codebase.  Though we haven't done much testing,
 | ||
| in theory this should reduce the chance of crashes by limiting the
 | ||
| signal generation to only a small part of the codebase.  Ideally, you
 | ||
| would not use ProfilerStart()/ProfilerStop() around code that spawns
 | ||
| new threads, or is otherwise likely to cause a call to
 | ||
| pthread_mutex_lock!
 | ||
| 
 | ||
| ---
 | ||
| 17 May 2011
 |