Demonstrations of execsnoop, the Linux ftrace version. Here's execsnoop showing what's really executed by "man ls": # ./execsnoop Tracing exec()s. Ctrl-C to end. PID PPID ARGS 22898 22004 man ls 22905 22898 preconv -e UTF-8 22908 22898 pager -s 22907 22898 nroff -mandoc -rLL=164n -rLT=164n -Tutf8 22906 22898 tbl 22911 22910 locale charmap 22912 22907 groff -mtty-char -Tutf8 -mandoc -rLL=164n -rLT=164n 22913 22912 troff -mtty-char -mandoc -rLL=164n -rLT=164n -Tutf8 22914 22912 grotty Many commands. This is particularly useful for understanding application startup. Another use for execsnoop is identifying short-lived processes. Eg, with the -t option to see timestamps: # ./execsnoop -t Tracing exec()s. Ctrl-C to end. TIMEs PID PPID ARGS 7419756.154031 8185 8181 mawk -W interactive -v o=1 -v opt_name=0 -v name= [...] 7419756.154131 8186 8184 cat -v trace_pipe 7419756.245264 8188 1698 ./run 7419756.245691 8189 1696 ./run 7419756.246212 8187 1689 ./run 7419756.278993 8190 1693 ./run 7419756.278996 8191 1692 ./run 7419756.288430 8192 1695 ./run 7419756.290115 8193 1691 ./run 7419756.292406 8194 1699 ./run 7419756.293986 8195 1690 ./run 7419756.294149 8196 1686 ./run 7419756.296527 8197 1687 ./run 7419756.296973 8198 1697 ./run 7419756.298356 8200 1685 ./run 7419756.298683 8199 1688 ./run 7419757.269883 8201 1696 ./run [...] So we're running many "run" commands every second. The PPID is included, so I can debug this further (they are "supervise" processes). Short-lived processes can consume CPU and not be visible from top(1), and can be the source of hidden performance issues. Here's another example: I noticed CPU usage was high in top(1), but couldn't see the responsible process: $ top top - 00:04:32 up 78 days, 15:41, 3 users, load average: 0.85, 0.29, 0.14 Tasks: 123 total, 1 running, 121 sleeping, 0 stopped, 1 zombie Cpu(s): 15.7%us, 34.9%sy, 0.0%ni, 49.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.2%st Mem: 7629464k total, 7537216k used, 92248k free, 1376492k buffers Swap: 0k total, 0k used, 0k free, 5432356k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7225 bgregg-t 20 0 29480 6196 2128 S 3 0.1 0:02.64 ec2rotatelogs 1 root 20 0 24320 2256 1340 S 0 0.0 0:01.23 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0 0.0 1:19.61 ksoftirqd/0 4 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/0:0 5 root 20 0 0 0 0 S 0 0.0 0:00.01 kworker/u:0 6 root RT 0 0 0 0 S 0 0.0 0:16.00 migration/0 7 root RT 0 0 0 0 S 0 0.0 0:17.29 watchdog/0 8 root RT 0 0 0 0 S 0 0.0 0:15.85 migration/1 9 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/1:0 [...] See the line starting with "Cpu(s):". So there's about 50% CPU utilized (this is a two CPU server, so that's equivalent to one full CPU), but this CPU usage isn't visible from the process listing. vmstat agreed, showing the same average CPU usage statistics: # vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 0 0 92816 1376476 5432188 0 0 0 3 2 1 0 1 99 0 1 0 0 92676 1376484 5432264 0 0 0 24 6573 6130 12 38 49 0 1 0 0 91964 1376484 5432272 0 0 0 0 6529 6097 16 35 49 0 1 0 0 92692 1376484 5432272 0 0 0 0 6192 5775 17 35 49 0 1 0 0 92692 1376484 5432272 0 0 0 0 6554 6121 14 36 50 0 1 0 0 91940 1376484 5432272 0 0 0 12 6546 6101 13 38 49 0 1 0 0 92560 1376484 5432272 0 0 0 0 6201 5769 15 35 49 0 1 0 0 92676 1376484 5432272 0 0 0 0 6524 6123 17 34 49 0 1 0 0 91932 1376484 5432272 0 0 0 0 6546 6107 10 40 49 0 1 0 0 92832 1376484 5432272 0 0 0 0 6057 5710 13 38 49 0 1 0 0 92248 1376484 5432272 0 0 84 28 6592 6183 16 36 48 1 1 0 0 91504 1376492 5432348 0 0 0 12 6540 6098 18 33 49 1 [...] So this could be caused by short-lived processes, who vanish before they are seen by top(1). Do I have my execsnoop handy? Yes: # ~/perf-tools/bin/execsnoop Tracing exec()s. Ctrl-C to end. PID PPID ARGS 10239 10229 gawk -v o=0 -v opt_name=0 -v name= -v opt_duration=0 [...] 10240 10238 cat -v trace_pipe 10242 7225 sh [?] 10243 10242 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.201201.3122.txt 10245 7225 sh [?] 10246 10245 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.202201.3122.txt 10248 7225 sh [?] 10249 10248 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.203201.3122.txt 10251 7225 sh [?] 10252 10251 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.204201.3122.txt 10254 7225 sh [?] 10255 10254 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.205201.3122.txt 10257 7225 sh [?] 10258 10257 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.210201.3122.txt 10260 7225 sh [?] 10261 10260 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.211201.3122.txt 10263 7225 sh [?] 10264 10263 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.212201.3122.txt 10266 7225 sh [?] 10267 10266 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.213201.3122.txt [...] The output scrolled quickly, showing that many shell and lsof processes were being launched. If you check the PID and PPID columns carefully, you can see that these are ultimately all from PID 7225. We saw that earlier in the top output: ec2rotatelogs, at 3% CPU. I now know the culprit. I should have used "-t" to show the timestamps with this example. Run -h to print the USAGE message: # ./execsnoop -h USAGE: execsnoop [-hrt] [-a argc] [-d secs] [name] -d seconds # trace duration, and use buffers -a argc # max args to show (default 8) -r # include re-execs -t # include time (seconds) -h # this usage message name # process name to match (REs allowed) eg, execsnoop # watch exec()s live (unbuffered) execsnoop -d 1 # trace 1 sec (buffered) execsnoop grep # trace process names containing grep execsnoop 'log$' # filenames ending in "log" See the man page and example file for more info.