]> arthur.barton.de Git - netdata.git/commitdiff
Merge pull request #632 from paulfantom/master
authorCosta Tsaousis <costa@tsaousis.gr>
Sat, 2 Jul 2016 16:35:36 +0000 (19:35 +0300)
committerGitHub <noreply@github.com>
Sat, 2 Jul 2016 16:35:36 +0000 (19:35 +0300)
Wrappers around charts creation + LogService

14 files changed:
conf.d/apps_groups.conf
netdata-installer.sh
plugins.d/charts.d.plugin
plugins.d/tc-qos-helper.sh
python.d/python-modules-installer.sh.in
src/apps_plugin.c
src/sys_fs_cgroup.c
web/dashboard.html
web/demo.html
web/demo2.html
web/demosites.html
web/index.html
web/registry.html
web/tv.html

index 576e15433195d71d331dccf94e8a3dd3406e0ca2..c4dd3d04af3b019c5b21dc1823cfbf61d4b442a0 100644 (file)
@@ -28,7 +28,7 @@
 #  *name*   substring mode: will search for 'name' in the whole command line (/proc/PID/cmdline)
 #
 # If you enter even just one *name* (substring), apps.plugin will process
-# /proc/PID/cmdline for all processes, on every iteration.
+# /proc/PID/cmdline for all processes, just once (when they are first seen).
 #
 # To add process names with single quotes, enclose them in double quotes
 # example: "process with this ' single quote"
 # You can add any number of groups you like. Only the ones found running will
 # affect the charts generated. However, producing charts with hundreds of
 # dimensions may slow down your web browser.
+#
+# The order of the entries in this list is important: the first that matches
+# a process is used, so put important ones at the top. Processes not matched
+# by any row, will inherit it from their parents or children.
+#
+# The order also controls the order of the dimensions on the generated charts
+# (although applications started after apps.plugin is started, will be appended
+# to the existing list of dimensions the netdata daemon maintains).
+
+# -----------------------------------------------------------------------------
+# NETDATA processes accounting
+
+# netdata main process
+netdata: netdata
 
-compile: cc1 cc1plus as gcc* ld make automake autoconf git
-rsync: rsync
-media: mplayer vlc xine mediatomb omxplayer* kodi* xbmc* mediacenter eventlircd mpd minidlnad
-squid: squid* c-icap
-apache: apache* httpd
-mysql: mysql*
-asterisk: asterisk
-opensips: opensips* stund
-radius: radius*
+# netdata known plugins
+# plugins not defined here will be accumulated in netdata, above
+apps.plugin: apps.plugin
+charts.d.plugin: *charts.d.plugin*
+node.d.plugin: *node.d.plugin*
+python.d.plugin: *python.d.plugin*
+tc-qos-helper: *tc-qos-helper.sh*
+
+# -----------------------------------------------------------------------------
+# authentication/authorization related servers
+
+auth: radius* openldap* ldap*
 fail2ban: fail2ban*
-mail: dovecot imapd pop3d
-postfix: master
-nginx: nginx
-splunk: splunkd
-mongo: mongod
-redis: redis*
-lighttpd: lighttpd
+
+# -----------------------------------------------------------------------------
+# web/ftp servers
+
+httpd: apache* httpd nginx* lighttpd
+proxy: squid* c-icap squidGuard varnish*
+php: php*
 ftpd: proftpd in.tftpd
+
+# -----------------------------------------------------------------------------
+# database servers
+
+sql: mysqld* mariad* postgres*
+nosql: mongod redis*
+
+# -----------------------------------------------------------------------------
+# email servers
+
+email: dovecot imapd pop3d amavis* master zmstat* zmmailboxdmgr
+
+# -----------------------------------------------------------------------------
+# networking and VPN servers
+
+ppp: ppp*
+vpn: openvpn pptp* cjdroute
+wifi: hostapd wpa_supplicant
+
+# -----------------------------------------------------------------------------
+# high availability and balancers
+
+balancer: ipvs_* haproxy
+ha: corosync hs_logd ha_logd stonithd
+
+# -----------------------------------------------------------------------------
+# telephony
+
+pbx: asterisk safe_asterisk *vicidial*
+sip: opensips* stund
+
+# -----------------------------------------------------------------------------
+# monitoring
+
+logs: ulogd* syslog* rsyslog* logrotate
+nms: snmpd vnstatd smokeping zabbix* monit munin* mon openhpid watchdog
+splunk: splunkd
+
+# -----------------------------------------------------------------------------
+# file systems and file servers
+
 samba: smbd nmbd winbindd
 nfs: rpcbind rpc.* nfs*
+zfs: spl_* z_* txg_* zil_* arc_* l2arc*
+btrfs: btrfs*
+
+# -----------------------------------------------------------------------------
+# containers & virtual machines
+
+containers: lxc* docker*
+VMs: vbox* VBox* qemu*
+
+# -----------------------------------------------------------------------------
+# ssh servers and clients
+
 ssh: ssh* scp
-X: X lightdm xdm pulseaudio gkrellm
-xfce: xfwm4 xfdesktop xfce* Thunar xfsettingsd
-gnome: gnome-* gdm gconfd-2
-named: named rncd
-clam: clam* *clam
-cups: cups*
-ntp: ntp*
-torrent: deluge* transmission*
-vbox: vbox* VBox*
-log: ulogd syslog* rsyslog* logrotate
-nms: snmpd vnstatd smokeping zabbix* monit munin* mon openhpid
-ppp: ppp* pptp*
-inetd: inetd xinetd
-openvpn: openvpn
-cjdns: cjdroute
-cron: cron atd
-ha: corosync hs_logd ha_logd stonithd
-ipvs: ipvs_*
+
+# -----------------------------------------------------------------------------
+# print servers and clients
+
+print: cups* lpd lpq
+
+# -----------------------------------------------------------------------------
+# time servers and clients
+
+time: ntp*
+
+# -----------------------------------------------------------------------------
+# dhcp servers and clients
+
+dhcp: *dhcp*
+
+# -----------------------------------------------------------------------------
+# name servers and clients
+
+named: named rncd dig
+
+# -----------------------------------------------------------------------------
+# installation / compilation / debugging
+
+build: cc1 cc1plus as gcc* ld make automake autoconf autoreconf git valgrind*
+
+# -----------------------------------------------------------------------------
+# antivirus
+
+antivirus: clam* *clam
+
+# -----------------------------------------------------------------------------
+# torrent clients
+
+torrents: *deluge* transmission* *SickBeard*
+
+# -----------------------------------------------------------------------------
+# backup servers and clients
+
+backup: rsync bacula*
+
+# -----------------------------------------------------------------------------
+# cron
+
+cron: cron atd anacron
+
+# -----------------------------------------------------------------------------
+# UPS
+
+ups: upsmon upsd */nut/*
+
+# -----------------------------------------------------------------------------
+# Kernel / System
+
+system: systemd* udisks* udevd* *udevd connmand ipv6_addrconf dbus-* inetd xinetd mdadm
 kernel: kthreadd kauditd lockd khelper kdevtmpfs khungtaskd rpciod fsnotify_mark kthrotld iscsi_eh deferwq
-netdata: netdata
-crsproxy: crsproxy
-wifi: hostapd wpa_supplicant
-system: systemd* udisks* udevd connmand ipv6_addrconf dbus-*
 ksmd: ksmd
-lxc: lxc*
-zfs-spl: spl_* 
-zfs-posix: z_*
-zfs-txg: txg_* zil_*
-zfs-arc: arc_* l2arc* 
-php: php*
-zimbra: zmstat* zmmailboxdmgr
+
+# -----------------------------------------------------------------------------
+# media players, servers, clients
+
+media: mplayer vlc xine mediatomb omxplayer* kodi* xbmc* mediacenter eventlircd mpd minidlnad mt-daapd avahi*
+
+# -----------------------------------------------------------------------------
+# X
+
+X: X lightdm xdm pulseaudio gkrellm xfwm4 xfdesktop xfce* Thunar xfsettingsd xfconfd gnome-* gdm gconfd-2 *gvfsd gvfsd* kdm slim
+
+# -----------------------------------------------------------------------------
+# other application servers
+
+crsproxy: crsproxy
 java: java
-bacula: bacula*
-amavis: amavis*
-varnish: varnish*
-haproxy: haproxy
index da0ec82159338c46f0a027a5163b9eede22addfe..10f546746fda55b3b3f1d801316cb3ac1d3539ee 100755 (executable)
@@ -806,7 +806,7 @@ cat >netdata-uninstaller.sh <<-UNINSTALL
        fi
 
        echo >&2 "Stopping a possibly running netdata..."
-       for p in \$(pidof netdata); do kill \$x; done
+       for p in \$(pidof netdata); do kill \$p; done
        sleep 2
 
        deletedir() {
index 6b361b4ac8707120c4d8a4eb7528594b6fd92c0a..109127d62325f7da27ba64bac6a9bedd4fc4c9b9 100755 (executable)
@@ -567,7 +567,7 @@ global_update() {
                                exec_start_ms=$now_ms
                                $chart$charts_update $dt
                                ret=$?
-                               
+
                                # return the current time in ms in $now_ms
                                current_time_ms; exec_end_ms=$now_ms
 
@@ -582,7 +582,7 @@ global_update() {
                                else
                                        charts_serial_failures[$chart]=$(( charts_serial_failures[$chart] + 1 ))
 
-                                       if [ charts_serial_failures[$chart] -gt 10 ]
+                                       if [ ${charts_serial_failures[$chart]} -gt 10 ]
                                                then
                                                echo >&2 "$PROGRAM_NAME: chart '$chart' update() function reported failure ${charts_serial_failures[$chart]} times. Disabling it."
                                        else
index 7b4739815481d2097bb1573500b8f92e8757a338..94eec44a569dac63bb521c661e34a2480032f2ca 100755 (executable)
@@ -2,6 +2,31 @@
 
 export PATH="${PATH}:/sbin:/usr/sbin:/usr/local/sbin"
 
+PROGRAM_FILE="$0"
+PROGRAM_NAME="$(basename $0)"
+PROGRAM_NAME="${PROGRAM_NAME/.plugin}"
+
+plugins_dir="${NETDATA_PLUGINS_DIR}"
+[ -z "$plugins_dir" ] && plugins_dir="$( dirname $PROGRAM_FILE )"
+
+config_dir=${NETDATA_CONFIG_DIR-/etc/netdata}
+tc="$(which tc 2>/dev/null)"
+fireqos_run_dir="/var/run/fireqos"
+qos_get_class_names_every=120
+qos_exit_every=3600
+
+# check if we have a valid number for interval
+t=${1}
+update_every=$((t))
+[ $((update_every)) -lt 1 ] && update_every=${NETDATA_UPDATE_EVERY}
+[ $((update_every)) -lt 1 ] && update_every=1
+
+# allow the user to override our defaults
+if [ -f "${config_dir}/tc-qos-helper.conf" ]
+       then
+       source "${config_dir}/tc-qos-helper.conf"
+fi
+
 # default time function
 now_ms=
 current_time_ms() {
@@ -17,18 +42,11 @@ loopsleepms() {
 
 # if found and included, this file overwrites loopsleepms()
 # with a high resolution timer function for precise looping.
-. "$NETDATA_PLUGINS_DIR/loopsleepms.sh.inc"
-
-# check if we have a valid number for interval
-t=$1
-sleep_time=$((t))
-[ $((sleep_time)) -lt 1 ] && $NETDATA_UPDATE_EVERY
-[ $((sleep_time)) -lt 1 ] && sleep_time=1
+. "${plugins_dir}/loopsleepms.sh.inc"
 
-tc_cmd="$(which tc)"
-if [ -z "$tc_cmd" ]
+if [ -z "${tc}" -o ! -x "${tc}" ]
        then
-       echo >&2 "tc: Cannot find a 'tc' command in this system."
+       echo >&2 "${PROGRAM_NAME}: Cannot find command 'tc' in this system."
        exit 1
 fi
 
@@ -40,44 +58,45 @@ setclassname() {
 }
 
 show_tc() {
-       local x="$1"
+       local x="${1}" interface_dev interface_classes interface_classes_monitor
 
-       echo "BEGIN $x"
-       $tc_cmd -s class show dev $x
+       echo "BEGIN ${x}"
+       ${tc} -s class show dev ${x}
 
        # check FireQOS names for classes
-       if [ ! -z "$fix_names" -a -f /var/run/fireqos/ifaces/$x ]
+       if [ ! -z "${fix_names}" -a -f "${fireqos_run_dir}/ifaces/${x}" ]
        then
-               name="$(cat /var/run/fireqos/ifaces/$x)"
-               echo "SETDEVICENAME $name"
+               name="$(<"${fireqos_run_dir}/ifaces/${x}")"
+               echo "SETDEVICENAME ${name}"
 
+               interface_dev=
                interface_classes=
                interface_classes_monitor=
-               . /var/run/fireqos/$name.conf
-               for n in $interface_classes_monitor
+               source "${fireqos_run_dir}/${name}.conf"
+               for n in ${interface_classes_monitor}
                do
-                       setclassname $(echo $n | tr '|' ' ')
+                       setclassname ${n//|/ }
                done
-               echo "SETDEVICEGROUP $interface_dev"
+               [ ! -z "${interface_dev}" ] && echo "SETDEVICEGROUP ${interface_dev}"
        fi
-       echo "END $x"
+       echo "END ${x}"
 }
 
 all_devices() {
        cat /proc/net/dev | grep ":" | cut -d ':' -f 1 | while read dev
        do
-               l=$($tc_cmd class show dev $dev | wc -l)
-               [ $l -ne 0 ] && echo $dev
+               l=$(${tc} class show dev ${dev} | wc -l)
+               [ $l -ne 0 ] && echo ${dev}
        done
 }
 
 # update devices and class names
 # once every 2 minutes
-names_every=$((120 / sleep_time))
+names_every=$((qos_get_class_names_every / update_every))
 
 # exit this script every hour
 # it will be restarted automatically
-exit_after=$((3600 / sleep_time))
+exit_after=$((qos_exit_every / update_every))
 
 c=0
 gc=0
@@ -87,21 +106,21 @@ do
        c=$((c + 1))
        gc=$((gc + 1))
 
-       if [ $c -le 1 -o $c -ge $names_every ]
+       if [ ${c} -le 1 -o ${c} -ge ${names_every} ]
        then
                c=1
                fix_names="YES"
                devices="$( all_devices )"
        fi
 
-       for d in $devices
+       for d in ${devices}
        do
-               show_tc $d
+               show_tc ${d}
        done
 
-       echo "WORKTIME $LOOPSLEEPMS_LASTWORK"
+       echo "WORKTIME ${LOOPSLEEPMS_LASTWORK}"
 
-       loopsleepms $sleep_time
+       loopsleepms ${update_every}
 
-       [ $gc -gt $exit_after ] && exit 0
+       [ ${gc} -gt ${exit_after} ] && exit 0
 done
index 955762dc073f892caa3cafb627621009ebbb4320..0fb8ba8bd2b7cbdf27fd5bb234c651469519cc77 100755 (executable)
@@ -63,7 +63,7 @@ fi
 [ -z "${pip}" ] && pip="$(which pip 2>/dev/null)"
 if [ -z "${pip}" ]
 then
-    echo >& "pip command is required to install python v${pv} modules"
+    echo >&2 "pip command is required to install python v${pv} modules"
     exit 1
 fi
 
index 39d0efc8af2ad2917cf055758178a128f3a0d8d7..3d94d40ad89770488dbddff2d00eb277944e3458 100644 (file)
@@ -1,7 +1,3 @@
-// TODO
-//
-// 1. disable RESET_OR_OVERFLOW check in charts
-
 #ifdef HAVE_CONFIG_H
 #include <config.h>
 #endif
 #define MAX_NAME 100
 #define MAX_CMDLINE 1024
 
-long processors = 1;
-long pid_max = 32768;
+int processors = 1;
+pid_t pid_max = 32768;
 int debug = 0;
 
 int update_every = 1;
 unsigned long long file_counter = 0;
 int proc_pid_cmdline_is_needed = 0;
-
+int include_exited_childs = 1;
 char *host_prefix = "";
 char *config_dir = CONFIG_DIR;
 
-#ifdef NETDATA_INTERNAL_CHECKS
-// ----------------------------------------------------------------------------
-// memory debugger
-// do not use in production systems - it mis-aligns allocated memory
-
-struct allocations {
-       size_t allocations;
-       size_t allocated;
-       size_t allocated_max;
-} allocations = { 0, 0, 0 };
-
-#define MALLOC_MARK (uint32_t)(0x0BADCAFE)
-#define MALLOC_PREFIX (sizeof(uint32_t) * 2)
-#define MALLOC_SUFFIX (sizeof(uint32_t))
-#define MALLOC_OVERHEAD (MALLOC_PREFIX + MALLOC_SUFFIX)
-
-void *mark_allocation(void *allocated_ptr, size_t size_without_overheads) {
-       uint32_t *real_ptr = (uint32_t *)allocated_ptr;
-       real_ptr[0] = MALLOC_MARK;
-       real_ptr[1] = (uint32_t) size_without_overheads;
-
-       uint32_t *end_ptr = (uint32_t *)(allocated_ptr + MALLOC_PREFIX + size_without_overheads);
-       end_ptr[0] = MALLOC_MARK;
-
-       // fprintf(stderr, "MEMORY_POINTER: Allocated at %p, returning %p.\n", allocated_ptr, (void *)(allocated_ptr + MALLOC_PREFIX));
-
-       return allocated_ptr + MALLOC_PREFIX;
-}
-
-void *check_allocation(const char *file, int line, const char *function, void *marked_ptr, size_t *size_without_overheads_ptr) {
-       uint32_t *real_ptr = (uint32_t *)(marked_ptr - MALLOC_PREFIX);
-
-       // fprintf(stderr, "MEMORY_POINTER: Checking pointer at %p, real %p for %s/%u@%s.\n", marked_ptr, (void *)(marked_ptr - MALLOC_PREFIX), function, line, file);
-
-       if(real_ptr[0] != MALLOC_MARK) fatal("MEMORY: prefix MARK is not valid for %s/%d@%s.", function, line, file);
-
-       size_t size = real_ptr[1];
-
-       uint32_t *end_ptr = (uint32_t *)(marked_ptr + size);
-       if(end_ptr[0] != MALLOC_MARK) fatal("MEMORY: suffix MARK of allocation with size %zu is not valid for %s/%d@%s.", size, function, line, file);
-
-       if(size_without_overheads_ptr) *size_without_overheads_ptr = size;
-
-       return real_ptr;
-}
-
-void *malloc_debug(const char *file, int line, const char *function, size_t size) {
-       void *ptr = malloc(size + MALLOC_OVERHEAD);
-       if(!ptr) fatal("MEMORY: Cannot allocate %zu bytes for %s/%d@%s.", size, function, line, file);
-
-       allocations.allocated += size;
-       allocations.allocations++;
-
-       debug(D_MEMORY, "MEMORY: Allocated %zu bytes for %s/%d@%s."
-               " Status: allocated %zu in %zu allocs."
-               , size
-               , function, line, file
-               , allocations.allocated
-               , allocations.allocations
-       );
-
-       if(allocations.allocated > allocations.allocated_max) {
-               debug(D_MEMORY, "MEMORY: total allocation peak increased from %zu to %zu", allocations.allocated_max, allocations.allocated);
-               allocations.allocated_max = allocations.allocated;
-       }
-
-       size_t csize;
-       check_allocation(file, line, function, mark_allocation(ptr, size), &csize);
-       if(size != csize) {
-               fatal("Invalid size.");
-       }
-
-       return mark_allocation(ptr, size);
-}
-
-void *calloc_debug(const char *file, int line, const char *function, size_t nmemb, size_t size) {
-       void *ptr = malloc_debug(file, line, function, (nmemb * size));
-       bzero(ptr, nmemb * size);
-       return ptr;
-}
-
-void free_debug(const char *file, int line, const char *function, void *ptr) {
-       size_t size;
-       void *real_ptr = check_allocation(file, line, function, ptr, &size);
-
-       bzero(real_ptr, size + MALLOC_OVERHEAD);
-
-       free(real_ptr);
-       allocations.allocated -= size;
-       allocations.allocations--;
-
-       debug(D_MEMORY, "MEMORY: freed %zu bytes for %s/%d@%s."
-               " Status: allocated %zu in %zu allocs."
-               , size
-               , function, line, file
-               , allocations.allocated
-               , allocations.allocations
-       );
-}
-
-void *realloc_debug(const char *file, int line, const char *function, void *ptr, size_t size) {
-       if(!ptr) return malloc_debug(file, line, function, size);
-       if(!size) { free_debug(file, line, function, ptr); return NULL; }
-
-       size_t old_size;
-       void *real_ptr = check_allocation(file, line, function, ptr, &old_size);
-
-       void *new_ptr = realloc(real_ptr, size + MALLOC_OVERHEAD);
-       if(!new_ptr) fatal("MEMORY: Cannot allocate %zu bytes for %s/%d@%s.", size, function, line, file);
-
-       allocations.allocated += size;
-       allocations.allocated -= old_size;
-
-       debug(D_MEMORY, "MEMORY: Re-allocated from %zu to %zu bytes for %s/%d@%s."
-               " Status: allocated %zu in %zu allocs."
-               , old_size, size
-               , function, line, file
-               , allocations.allocated
-               , allocations.allocations
-       );
-
-       if(allocations.allocated > allocations.allocated_max) {
-               debug(D_MEMORY, "MEMORY: total allocation peak increased from %zu to %zu", allocations.allocated_max, allocations.allocated);
-               allocations.allocated_max = allocations.allocated;
-       }
-
-       return mark_allocation(new_ptr, size);
-}
-
-char *strdup_debug(const char *file, int line, const char *function, const char *ptr) {
-       size_t size = 0;
-       const char *s = ptr;
-
-       while(*s++) size++;
-       size++;
-
-       char *p = malloc_debug(file, line, function, size);
-       if(!p) fatal("Cannot allocate %zu bytes.", size);
-
-       memcpy(p, ptr, size);
-       return p;
-}
-
-#define malloc(size) malloc_debug(__FILE__, __LINE__, __FUNCTION__, (size))
-#define calloc(nmemb, size) calloc_debug(__FILE__, __LINE__, __FUNCTION__, (nmemb), (size))
-#define realloc(ptr, size) realloc_debug(__FILE__, __LINE__, __FUNCTION__, (ptr), (size))
-#define free(ptr) free_debug(__FILE__, __LINE__, __FUNCTION__, (ptr))
-
-#ifdef strdup
-#undef strdup
-#endif
-#define strdup(ptr) strdup_debug(__FILE__, __LINE__, __FUNCTION__, (ptr))
-
-#endif /* NETDATA_INTERNAL_CHECKS */
 
 // ----------------------------------------------------------------------------
 
@@ -254,9 +96,9 @@ long get_system_cpus(void) {
        return processors;
 }
 
-long get_system_pid_max(void) {
+pid_t get_system_pid_max(void) {
        procfile *ff = NULL;
-       long mpid = 32768;
+       pid_t mpid = 32768;
 
        char filename[FILENAME_MAX + 1];
        snprintfz(filename, FILENAME_MAX, "%s/proc/sys/kernel/pid_max", host_prefix);
@@ -269,7 +111,7 @@ long get_system_pid_max(void) {
                return mpid;
        }
 
-       mpid = atol(procfile_lineword(ff, 0, 0));
+       mpid = (pid_t)atoi(procfile_lineword(ff, 0, 0));
        if(!mpid) mpid = 32768;
 
        procfile_close(ff);
@@ -304,14 +146,14 @@ struct target {
        unsigned long long num_threads;
        unsigned long long rss;
 
-       unsigned long long fix_minflt;
-       unsigned long long fix_cminflt;
-       unsigned long long fix_majflt;
-       unsigned long long fix_cmajflt;
-       unsigned long long fix_utime;
-       unsigned long long fix_stime;
-       unsigned long long fix_cutime;
-       unsigned long long fix_cstime;
+       long long fix_minflt;
+       long long fix_cminflt;
+       long long fix_majflt;
+       long long fix_cmajflt;
+       long long fix_utime;
+       long long fix_stime;
+       long long fix_cutime;
+       long long fix_cstime;
 
        unsigned long long statm_size;
        unsigned long long statm_resident;
@@ -463,10 +305,12 @@ struct target *get_apps_groups_target(const char *id, struct target *target)
        }
        uint32_t hash = simple_hash(id);
 
-       struct target *w;
+       struct target *w, *last = apps_groups_root_target;
        for(w = apps_groups_root_target ; w ; w = w->next) {
                if(w->idhash == hash && strncmp(nid, w->id, MAX_NAME) == 0)
                        return w;
+
+               last = w;
        }
 
        w = calloc(sizeof(struct target), 1);
@@ -498,8 +342,9 @@ struct target *get_apps_groups_target(const char *id, struct target *target)
        w->debug = tdebug;
        w->target = target;
 
-       w->next = apps_groups_root_target;
-       apps_groups_root_target = w;
+       // append it, to maintain the order in apps_groups.conf
+       if(last) last->next = w;
+       else apps_groups_root_target = w;
 
        if(unlikely(debug))
                fprintf(stderr, "apps.plugin: ADDING TARGET ID '%s', process name '%s' (%s), aggregated on target '%s', options: %s %s\n"
@@ -675,14 +520,20 @@ struct pid_stat {
        // we will subtract these values from the old
        // target
        unsigned long long last_minflt;
-       unsigned long long last_cminflt;
        unsigned long long last_majflt;
-       unsigned long long last_cmajflt;
        unsigned long long last_utime;
        unsigned long long last_stime;
+
+       unsigned long long last_cminflt;
+       unsigned long long last_cmajflt;
        unsigned long long last_cutime;
        unsigned long long last_cstime;
 
+       unsigned long long last_fix_cminflt;
+       unsigned long long last_fix_cmajflt;
+       unsigned long long last_fix_cutime;
+       unsigned long long last_fix_cstime;
+
        unsigned long long last_io_logical_bytes_read;
        unsigned long long last_io_logical_bytes_written;
        unsigned long long last_io_read_calls;
@@ -691,27 +542,10 @@ struct pid_stat {
        unsigned long long last_io_storage_bytes_written;
        unsigned long long last_io_cancelled_write_bytes;
 
-#ifdef AGGREGATE_CHILDREN_TO_PARENTS
-       unsigned long long old_utime;
-       unsigned long long old_stime;
-       unsigned long long old_minflt;
-       unsigned long long old_majflt;
-
-       unsigned long long old_cutime;
-       unsigned long long old_cstime;
-       unsigned long long old_cminflt;
-       unsigned long long old_cmajflt;
-
-       unsigned long long fix_cutime;
-       unsigned long long fix_cstime;
        unsigned long long fix_cminflt;
        unsigned long long fix_cmajflt;
-
-       unsigned long long diff_cutime;
-       unsigned long long diff_cstime;
-       unsigned long long diff_cminflt;
-       unsigned long long diff_cmajflt;
-#endif /* AGGREGATE_CHILDREN_TO_PARENTS */
+       unsigned long long fix_cutime;
+       unsigned long long fix_cstime;
 
        int *fds;                                               // array of fds it uses
        int fds_size;                                   // the size of the fds array
@@ -765,7 +599,8 @@ void del_pid_entry(pid_t pid)
 {
        if(!all_pids[pid]) return;
 
-       if(debug) fprintf(stderr, "apps.plugin: process %d %s exited, deleting it.\n", pid, all_pids[pid]->comm);
+       if(unlikely(debug))
+               fprintf(stderr, "apps.plugin: process %d %s exited, deleting it.\n", pid, all_pids[pid]->comm);
 
        if(root_of_pids == all_pids[pid]) root_of_pids = all_pids[pid]->next;
        if(all_pids[pid]->next) all_pids[pid]->next->prev = all_pids[pid]->prev;
@@ -898,7 +733,7 @@ int read_proc_pid_stat(struct pid_stat *p) {
        // p->guest_time        = strtoull(procfile_lineword(ff, 0, 42+i), NULL, 10);
        // p->cguest_time       = strtoull(procfile_lineword(ff, 0, 43), NULL, 10);
 
-       if(debug || (p->target && p->target->debug))
+       if(unlikely(debug || (p->target && p->target->debug)))
                fprintf(stderr, "apps.plugin: READ PROC/PID/STAT: %s/proc/%d/stat, process: '%s' VALUES: utime=%llu, stime=%llu, cutime=%llu, cstime=%llu, minflt=%llu, majflt=%llu, cminflt=%llu, cmajflt=%llu, threads=%d\n", host_prefix, p->pid, p->comm, p->utime, p->stime, p->cutime, p->cstime, p->minflt, p->majflt, p->cminflt, p->cmajflt, p->num_threads);
 
        // procfile_close(ff);
@@ -1048,13 +883,16 @@ void file_descriptor_not_used(int id)
                }
 #endif /* NETDATA_INTERNAL_CHECKS */
 
-               if(debug) fprintf(stderr, "apps.plugin: decreasing slot %d (count = %d).\n", id, all_files[id].count);
+               if(unlikely(debug))
+                       fprintf(stderr, "apps.plugin: decreasing slot %d (count = %d).\n", id, all_files[id].count);
 
                if(all_files[id].count > 0) {
                        all_files[id].count--;
 
                        if(!all_files[id].count) {
-                               if(debug) fprintf(stderr, "apps.plugin:   >> slot %d is empty.\n", id);
+                               if(unlikely(debug))
+                                       fprintf(stderr, "apps.plugin:   >> slot %d is empty.\n", id);
+
                                file_descriptor_remove(&all_files[id]);
 #ifdef NETDATA_INTERNAL_CHECKS
                                all_files[id].magic = 0x00000000;
@@ -1073,12 +911,15 @@ int file_descriptor_find_or_add(const char *name)
        static int last_pos = 0;
        uint32_t hash = simple_hash(name);
 
-       if(debug) fprintf(stderr, "apps.plugin: adding or finding name '%s' with hash %u\n", name, hash);
+       if(unlikely(debug))
+               fprintf(stderr, "apps.plugin: adding or finding name '%s' with hash %u\n", name, hash);
 
        struct file_descriptor *fd = file_descriptor_find(name, hash);
        if(fd) {
                // found
-               if(debug) fprintf(stderr, "apps.plugin:   >> found on slot %d\n", fd->pos);
+               if(unlikely(debug))
+                       fprintf(stderr, "apps.plugin:   >> found on slot %d\n", fd->pos);
+
                fd->count++;
                return fd->pos;
        }
@@ -1090,19 +931,25 @@ int file_descriptor_find_or_add(const char *name)
                int i;
 
                // there is no empty slot
-               if(debug) fprintf(stderr, "apps.plugin: extending fd array to %d entries\n", all_files_size + FILE_DESCRIPTORS_INCREASE_STEP);
+               if(unlikely(debug))
+                       fprintf(stderr, "apps.plugin: extending fd array to %d entries\n", all_files_size + FILE_DESCRIPTORS_INCREASE_STEP);
+
                all_files = realloc(all_files, (all_files_size + FILE_DESCRIPTORS_INCREASE_STEP) * sizeof(struct file_descriptor));
 
                // if the address changed, we have to rebuild the index
                // since all pointers are now invalid
                if(old && old != (void *)all_files) {
-                       if(debug) fprintf(stderr, "apps.plugin:   >> re-indexing.\n");
+                       if(unlikely(debug))
+                               fprintf(stderr, "apps.plugin:   >> re-indexing.\n");
+
                        all_files_index.root = NULL;
                        for(i = 0; i < all_files_size; i++) {
                                if(!all_files[i].count) continue;
                                file_descriptor_add(&all_files[i]);
                        }
-                       if(debug) fprintf(stderr, "apps.plugin:   >> re-indexing done.\n");
+
+                       if(unlikely(debug))
+                               fprintf(stderr, "apps.plugin:   >> re-indexing done.\n");
                }
 
                for(i = all_files_size; i < (all_files_size + FILE_DESCRIPTORS_INCREASE_STEP); i++) {
@@ -1118,7 +965,8 @@ int file_descriptor_find_or_add(const char *name)
                all_files_size += FILE_DESCRIPTORS_INCREASE_STEP;
        }
 
-       if(debug) fprintf(stderr, "apps.plugin:   >> searching for empty slot.\n");
+       if(unlikely(debug))
+               fprintf(stderr, "apps.plugin:   >> searching for empty slot.\n");
 
        // search for an empty slot
        int i, c;
@@ -1127,14 +975,17 @@ int file_descriptor_find_or_add(const char *name)
                if(c == 0) continue;
 
                if(!all_files[c].count) {
-                       if(debug) fprintf(stderr, "apps.plugin:   >> Examining slot %d.\n", c);
+                       if(unlikely(debug))
+                               fprintf(stderr, "apps.plugin:   >> Examining slot %d.\n", c);
 
 #ifdef NETDATA_INTERNAL_CHECKS
                        if(all_files[c].magic == 0x0BADCAFE && all_files[c].name && file_descriptor_find(all_files[c].name, all_files[c].hash))
                                error("fd on position %d is not cleared properly. It still has %s in it.\n", c, all_files[c].name);
 #endif /* NETDATA_INTERNAL_CHECKS */
 
-                       if(debug) fprintf(stderr, "apps.plugin:   >> %s fd position %d for %s (last name: %s)\n", all_files[c].name?"re-using":"using", c, name, all_files[c].name);
+                       if(unlikely(debug))
+                               fprintf(stderr, "apps.plugin:   >> %s fd position %d for %s (last name: %s)\n", all_files[c].name?"re-using":"using", c, name, all_files[c].name);
+
                        if(all_files[c].name) free((void *)all_files[c].name);
                        all_files[c].name = NULL;
                        last_pos = c;
@@ -1145,7 +996,9 @@ int file_descriptor_find_or_add(const char *name)
                fatal("We should find an empty slot, but there isn't any");
                exit(1);
        }
-       if(debug) fprintf(stderr, "apps.plugin:   >> updating slot %d.\n", c);
+
+       if(unlikely(debug))
+               fprintf(stderr, "apps.plugin:   >> updating slot %d.\n", c);
 
        all_files_len++;
 
@@ -1161,11 +1014,15 @@ int file_descriptor_find_or_add(const char *name)
        else if(strcmp(name, "anon_inode:[timerfd]") == 0) type = FILETYPE_TIMERFD;
        else if(strcmp(name, "anon_inode:[signalfd]") == 0) type = FILETYPE_SIGNALFD;
        else if(strncmp(name, "anon_inode:", 11) == 0) {
-               if(debug) fprintf(stderr, "apps.plugin: FIXME: unknown anonymous inode: %s\n", name);
+               if(unlikely(debug))
+                       fprintf(stderr, "apps.plugin: FIXME: unknown anonymous inode: %s\n", name);
+
                type = FILETYPE_OTHER;
        }
        else {
-               if(debug) fprintf(stderr, "apps.plugin: FIXME: cannot understand linkname: %s\n", name);
+               if(unlikely(debug))
+                       fprintf(stderr, "apps.plugin: FIXME: cannot understand linkname: %s\n", name);
+
                type = FILETYPE_OTHER;
        }
 
@@ -1179,7 +1036,8 @@ int file_descriptor_find_or_add(const char *name)
 #endif /* NETDATA_INTERNAL_CHECKS */
        file_descriptor_add(&all_files[c]);
 
-       if(debug) fprintf(stderr, "apps.plugin: using fd position %d (name: %s)\n", c, all_files[c].name);
+       if(unlikely(debug))
+               fprintf(stderr, "apps.plugin: using fd position %d (name: %s)\n", c, all_files[c].name);
 
        return c;
 }
@@ -1208,7 +1066,9 @@ int read_pid_file_descriptors(struct pid_stat *p) {
                        if(fdid < 0) continue;
                        if(fdid >= p->fds_size) {
                                // it is small, extend it
-                               if(debug) fprintf(stderr, "apps.plugin: extending fd memory slots for %s from %d to %d\n", p->comm, p->fds_size, fdid + 100);
+                               if(unlikely(debug))
+                                       fprintf(stderr, "apps.plugin: extending fd memory slots for %s from %d to %d\n", p->comm, p->fds_size, fdid + 100);
+
                                p->fds = realloc(p->fds, (fdid + 100) * sizeof(int));
                                if(!p->fds) {
                                        fatal("Cannot re-allocate fds for %s", p->comm);
@@ -1291,26 +1151,34 @@ int collect_data_for_all_processes_from_proc(void)
        all_pids_count = 0;
        for(p = root_of_pids; p ; p = p->next) {
                all_pids_count++;
-               p->parent = NULL;
-               p->updated = 0;
-               p->children_count = 0;
-               p->merged = 0;
-               p->new_entry = 0;
-
-        p->last_minflt  = p->minflt;
-        p->last_cminflt  = p->cminflt;
-        p->last_majflt  = p->majflt;
-        p->last_cmajflt  = p->cmajflt;
-        p->last_utime  = p->utime;
-        p->last_stime  = p->stime;
-        p->last_cutime  = p->cutime;
-        p->last_cstime  = p->cstime;
-
-        p->last_io_logical_bytes_read  = p->io_logical_bytes_read;
+
+               p->parent           = NULL;
+
+               p->updated          = 0;
+               p->children_count   = 0;
+               p->merged           = 0;
+               p->new_entry        = 0;
+
+        p->last_minflt      = p->minflt;
+        p->last_majflt      = p->majflt;
+        p->last_utime       = p->utime;
+        p->last_stime       = p->stime;
+
+        p->last_cminflt     = p->cminflt;
+        p->last_cmajflt     = p->cmajflt;
+        p->last_cutime      = p->cutime;
+        p->last_cstime      = p->cstime;
+
+        p->last_fix_cminflt = p->fix_cminflt;
+        p->last_fix_cmajflt = p->fix_cmajflt;
+        p->last_fix_cutime  = p->fix_cutime;
+        p->last_fix_cstime  = p->fix_cstime;
+
+        p->last_io_logical_bytes_read     = p->io_logical_bytes_read;
         p->last_io_logical_bytes_written  = p->io_logical_bytes_written;
-        p->last_io_read_calls  = p->io_read_calls;
-        p->last_io_write_calls  = p->io_write_calls;
-        p->last_io_storage_bytes_read  = p->io_storage_bytes_read;
+        p->last_io_read_calls             = p->io_read_calls;
+        p->last_io_write_calls            = p->io_write_calls;
+        p->last_io_storage_bytes_read     = p->io_storage_bytes_read;
         p->last_io_storage_bytes_written  = p->io_storage_bytes_written;
         p->last_io_cancelled_write_bytes  = p->io_cancelled_write_bytes;
        }
@@ -1320,9 +1188,14 @@ int collect_data_for_all_processes_from_proc(void)
                pid_t pid = (pid_t) strtoul(file->d_name, &endptr, 10);
 
                // make sure we read a valid number
-               if(unlikely(pid <= 0 || pid > pid_max || endptr == file->d_name || *endptr != '\0'))
+               if(unlikely(endptr == file->d_name || *endptr != '\0'))
                        continue;
 
+               if(unlikely(pid <= 0 || pid > pid_max)) {
+                       error("Invalid pid %d read (expected 1 to %d). Ignoring process.", pid, pid_max);
+                       continue;
+               }
+
                p = get_pid_entry(pid);
                if(unlikely(!p)) continue;
 
@@ -1331,34 +1204,22 @@ int collect_data_for_all_processes_from_proc(void)
                // /proc/<pid>/stat
 
                if(unlikely(read_proc_pid_stat(p))) {
-                               error("Cannot process %s/proc/%d/stat", host_prefix, pid);
-
+                       error("Cannot process %s/proc/%d/stat", host_prefix, pid);
                        // there is no reason to proceed if we cannot get its status
                        continue;
                }
 
                // check its parent pid
                if(unlikely(p->ppid < 0 || p->ppid > pid_max)) {
-                               error("Pid %d states invalid parent pid %d. Using 0.", pid, p->ppid);
-
+                       error("Pid %d states invalid parent pid %d. Using 0.", pid, p->ppid);
                        p->ppid = 0;
                }
 
-               // --------------------------------------------------------------------
-               // /proc/<pid>/cmdline
-
-               if(proc_pid_cmdline_is_needed) {
-                       if(unlikely(read_proc_pid_cmdline(p))) {
-                                       error("Cannot process %s/proc/%d/cmdline", host_prefix, pid);
-                       }
-               }
-
                // --------------------------------------------------------------------
                // /proc/<pid>/statm
 
                if(unlikely(read_proc_pid_statm(p))) {
-                               error("Cannot process %s/proc/%d/statm", host_prefix, pid);
-
+                       error("Cannot process %s/proc/%d/statm", host_prefix, pid);
                        // there is no reason to proceed if we cannot get its memory status
                        continue;
                }
@@ -1388,9 +1249,18 @@ int collect_data_for_all_processes_from_proc(void)
                // check if it is target
                // we do this only once, the first time this pid is loaded
                if(unlikely(p->new_entry)) {
-                       if(debug) fprintf(stderr, "apps.plugin: \tJust added %s\n", p->comm);
+                       // /proc/<pid>/cmdline
+                       if(proc_pid_cmdline_is_needed) {
+                               if(unlikely(read_proc_pid_cmdline(p))) {
+                                               error("Cannot process %s/proc/%d/cmdline", host_prefix, pid);
+                               }
+                       }
+
+                       if(unlikely(debug))
+                               fprintf(stderr, "apps.plugin: \tJust added %d (%s)\n", pid, p->comm);
+
                        uint32_t hash = simple_hash(p->comm);
-                       size_t pclen = strlen(p->comm);
+                       size_t pclen  = strlen(p->comm);
 
                        struct target *w;
                        for(w = apps_groups_root_target; w ; w = w->next) {
@@ -1411,6 +1281,8 @@ int collect_data_for_all_processes_from_proc(void)
 
                                        if(debug || (p->target && p->target->debug))
                                                fprintf(stderr, "apps.plugin: \t\t%s linked to target %s\n", p->comm, p->target->name);
+
+                                       break;
                                }
                        }
                }
@@ -1434,49 +1306,6 @@ int collect_data_for_all_processes_from_proc(void)
        return 1;
 }
 
-
-// ----------------------------------------------------------------------------
-
-#ifdef AGGREGATE_CHILDREN_TO_PARENTS
-// print a tree view of all processes
-int debug_childrens_aggregations(pid_t pid, int level) {
-       struct pid_stat *p = NULL;
-       char b[level+3];
-       int i, ret = 0;
-
-       for(i = 0; i < level; i++) b[i] = '\t';
-       b[level] = '|';
-       b[level+1] = '-';
-       b[level+2] = '\0';
-
-       for(p = root_of_pids; p ; p = p->next) {
-               if(p->ppid == pid) {
-                       ret += debug_childrens_aggregations(p->pid, level+1);
-               }
-       }
-
-       p = all_pids[pid];
-       if(p) {
-               if(!p->updated) ret += 1;
-               if(ret) fprintf(stderr, "%s %s %d [%s, %s] c=%d u=%llu+%llu, s=%llu+%llu, cu=%llu+%llu, cs=%llu+%llu, n=%llu+%llu, j=%llu+%llu, cn=%llu+%llu, cj=%llu+%llu\n"
-                       , b, p->comm, p->pid, p->updated?"OK":"KILLED", p->target->name, p->children_count
-                       , p->utime, p->utime - p->old_utime
-                       , p->stime, p->stime - p->old_stime
-                       , p->cutime, p->cutime - p->old_cutime
-                       , p->cstime, p->cstime - p->old_cstime
-                       , p->minflt, p->minflt - p->old_minflt
-                       , p->majflt, p->majflt - p->old_majflt
-                       , p->cminflt, p->cminflt - p->old_cminflt
-                       , p->cmajflt, p->cmajflt - p->old_cmajflt
-                       );
-       }
-
-       return ret;
-}
-#endif /* AGGREGATE_CHILDREN_TO_PARENTS */
-
-
-
 // ----------------------------------------------------------------------------
 // update statistics on the targets
 
@@ -1494,6 +1323,7 @@ int debug_childrens_aggregations(pid_t pid, int level) {
 // check: update_apps_groups_statistics()
 
 void link_all_processes_to_their_parents(void) {
+       struct pid_stat *init = all_pids[1];
        struct pid_stat *p = NULL;
 
        // link all children to their parents
@@ -1501,81 +1331,110 @@ void link_all_processes_to_their_parents(void) {
        for(p = root_of_pids; p ; p = p->next) {
                // for each process found running
 
-               if(p->ppid > 0
-                               && p->ppid <= pid_max
-                               && all_pids[p->ppid]
-                       ) {
-                       // for valid processes
+               if(likely(p->new_entry && p->updated)) {
+                       // the first time we see an entry
+                       // we remove the exited children figures
+                       // to avoid spikes
+                       p->fix_cminflt = p->cminflt;
+                       p->fix_cmajflt = p->cmajflt;
+                       p->fix_cutime  = p->cutime;
+                       p->fix_cstime  = p->cstime;
+               }
+
+               if(likely(p->ppid > 0 && all_pids[p->ppid])) {
+                       // valid parent processes
 
-                       if(debug || (p->target && p->target->debug))
-                               fprintf(stderr, "apps.plugin: \tparent of %d (%s) is %d (%s)\n", p->pid, p->comm, p->ppid, all_pids[p->ppid]->comm);
+                       struct pid_stat *pp;
 
-                       p->parent = all_pids[p->ppid];
+                       p->parent = pp = all_pids[p->ppid];
                        p->parent->children_count++;
-               }
-               else if(p->ppid != 0)
-                       error("pid %d %s states parent %d, but the later does not exist.", p->pid, p->comm, p->ppid);
-       }
-}
 
-#ifdef AGGREGATE_CHILDREN_TO_PARENTS
-void aggregate_children_to_parents(void) {
-       struct pid_stat *p = NULL;
+                       if(unlikely(debug || (p->target && p->target->debug)))
+                               fprintf(stderr, "apps.plugin: \tchild %d (%s, %s) has parent %d (%s, %s). Parent: utime=%llu, stime=%llu, minflt=%llu, majflt=%llu, cutime=%llu, cstime=%llu, cminflt=%llu, cmajflt=%llu, fix_cutime=%llu, fix_cstime=%llu, fix_cminflt=%llu, fix_cmajflt=%llu\n", p->pid, p->comm, p->updated?"running":"exited", pp->pid, pp->comm, pp->updated?"running":"exited", pp->utime, pp->stime, pp->minflt, pp->majflt, pp->cutime, pp->cstime, pp->cminflt, pp->cmajflt, pp->fix_cutime, pp->fix_cstime, pp->fix_cminflt, pp->fix_cmajflt);
 
-       // for each killed process, remove its values from the parents
-       // sums (we had already added them in a previous loop)
-       for(p = root_of_pids; p ; p = p->next) {
-               if(p->updated) continue;
-
-               if(debug) fprintf(stderr, "apps.plugin: UNMERGING %d %s\n", p->pid, p->comm);
-
-               unsigned long long diff_utime = p->utime + p->cutime + p->fix_cutime;
-               unsigned long long diff_stime = p->stime + p->cstime + p->fix_cstime;
-               unsigned long long diff_minflt = p->minflt + p->cminflt + p->fix_cminflt;
-               unsigned long long diff_majflt = p->majflt + p->cmajflt + p->fix_cmajflt;
-
-               struct pid_stat *t = p;
-               while((t = t->parent)) {
-                       if(!t->updated) continue;
-
-                       unsigned long long x;
-                       if(diff_utime && t->diff_cutime) {
-                               x = (t->diff_cutime < diff_utime)?t->diff_cutime:diff_utime;
-                               diff_utime -= x;
-                               t->diff_cutime -= x;
-                               t->fix_cutime += x;
-                               if(debug) fprintf(stderr, "apps.plugin: \t cutime %llu from %d %s %s\n", x, t->pid, t->comm, t->target->name);
-                       }
-                       if(diff_stime && t->diff_cstime) {
-                               x = (t->diff_cstime < diff_stime)?t->diff_cstime:diff_stime;
-                               diff_stime -= x;
-                               t->diff_cstime -= x;
-                               t->fix_cstime += x;
-                               if(debug) fprintf(stderr, "apps.plugin: \t cstime %llu from %d %s %s\n", x, t->pid, t->comm, t->target->name);
-                       }
-                       if(diff_minflt && t->diff_cminflt) {
-                               x = (t->diff_cminflt < diff_minflt)?t->diff_cminflt:diff_minflt;
-                               diff_minflt -= x;
-                               t->diff_cminflt -= x;
-                               t->fix_cminflt += x;
-                               if(debug) fprintf(stderr, "apps.plugin: \t cminflt %llu from %d %s %s\n", x, t->pid, t->comm, t->target->name);
-                       }
-                       if(diff_majflt && t->diff_cmajflt) {
-                               x = (t->diff_cmajflt < diff_majflt)?t->diff_cmajflt:diff_majflt;
-                               diff_majflt -= x;
-                               t->diff_cmajflt -= x;
-                               t->fix_cmajflt += x;
-                               if(debug) fprintf(stderr, "apps.plugin: \t cmajflt %llu from %d %s %s\n", x, t->pid, t->comm, t->target->name);
+                       if(unlikely(!p->updated)) {
+                               // this process has exit
+
+                               // find the first parent that has been updated
+                               while(pp && !pp->updated) {
+                                       // we may have to forward link it to its parent
+                                       if(unlikely(!pp->parent && pp->ppid > 0 && all_pids[pp->ppid]))
+                                               pp->parent = all_pids[pp->ppid];
+
+                                       // check again for parent
+                                       pp = pp->parent;
+                               }
+
+                               if(likely(pp)) {
+                                       // this is an exited child with a parent
+                                       // remove the known time from the parent's data
+                                       pp->fix_cminflt += p->last_minflt + p->last_cminflt + p->last_fix_cminflt;
+                                       pp->fix_cmajflt += p->last_majflt + p->last_cmajflt + p->last_fix_cmajflt;
+                                       pp->fix_cutime  += p->last_utime  + p->last_cutime  + p->last_fix_cutime;
+                                       pp->fix_cstime  += p->last_stime  + p->last_cstime  + p->last_fix_cstime;
+
+                                       // The known exited children (the ones we track) may have
+                                       // contributed more than the value accumulated into the process
+                                       // by the kernel.
+                                       // This can happen if the parent process has not waited-for
+                                       // its children (check: man 2 times).
+                                       // In this case, the kernel adds these resources to init (pid 1).
+                                       //
+                                       // The following code, attempts to fix this.
+                                       // Without this code, the charts will have random spikes
+                                       // for example, when an SSH session ends (sshd forks a child
+                                       // to serve the session, but when this session ends, sshd
+                                       // does not wait-for its child, thus all the resources of the
+                                       // ssh session get added to init, resulting in a huge spike on
+                                       // the charts).
+
+                                       if(unlikely(pp->cminflt < pp->fix_cminflt)) {
+                                               if(likely(init && pp != init)) {
+                                                       unsigned long long have = pp->fix_cminflt - pp->cminflt;
+                                                       unsigned long long max = init->cminflt - init->fix_cminflt;
+                                                       if(have > max) have = max;
+                                                       init->fix_cminflt += have;
+                                               }
+                                               pp->fix_cminflt = pp->cminflt;
+                                       }
+                                       if(unlikely(pp->cmajflt < pp->fix_cmajflt)) {
+                                               if(likely(init && pp != init)) {
+                                                       unsigned long long have = pp->fix_cmajflt - pp->cmajflt;
+                                                       unsigned long long max = init->cmajflt - init->fix_cmajflt;
+                                                       if(have > max) have = max;
+                                                       init->fix_cmajflt += have;
+                                               }
+                                               pp->fix_cmajflt = pp->cmajflt;
+                                       }
+                                       if(unlikely(pp->cutime < pp->fix_cutime)) {
+                                               if(likely(init && pp != init)) {
+                                                       unsigned long long have = pp->fix_cutime - pp->cutime;
+                                                       unsigned long long max = init->cutime - init->fix_cutime;
+                                                       if(have > max) have = max;
+                                                       init->fix_cutime += have;
+                                               }
+                                               pp->fix_cutime  = pp->cutime;
+                                       }
+                                       if(unlikely(pp->cstime < pp->fix_cstime)) {
+                                               if(likely(init && pp != init)) {
+                                                       unsigned long long have = pp->fix_cstime - pp->cstime;
+                                                       unsigned long long max = init->cstime - init->fix_cstime;
+                                                       if(have > max) have = max;
+                                                       init->fix_cstime += have;
+                                               }
+                                               pp->fix_cstime = pp->cstime;
+                                       }
+
+                                       if(unlikely(debug))
+                                               fprintf(stderr, "apps.plugin: \tupdating child metrics of %d (%s, %s) to its parent %d (%s, %s). Parent has now: utime=%llu, stime=%llu, minflt=%llu, majflt=%llu, cutime=%llu, cstime=%llu, cminflt=%llu, cmajflt=%llu, fix_cutime=%llu, fix_cstime=%llu, fix_cminflt=%llu, fix_cmajflt=%llu\n", p->pid, p->comm, p->updated?"running":"exited", pp->pid, pp->comm, pp->updated?"running":"exited", pp->utime, pp->stime, pp->minflt, pp->majflt, pp->cutime, pp->cstime, pp->cminflt, pp->cmajflt, pp->fix_cutime, pp->fix_cstime, pp->fix_cminflt, pp->fix_cmajflt);
+                               }
                        }
                }
-
-               if(diff_utime) error("Cannot fix up utime %llu", diff_utime);
-               if(diff_stime) error("Cannot fix up stime %llu", diff_stime);
-               if(diff_minflt) error("Cannot fix up minflt %llu", diff_minflt);
-               if(diff_majflt) error("Cannot fix up majflt %llu", diff_majflt);
+               else if(unlikely(p->ppid != 0))
+                       error("pid %d %s states parent %d, but the later does not exist.", p->pid, p->comm, p->ppid);
        }
 }
-#endif /* AGGREGATE_CHILDREN_TO_PARENTS */
+
 
 void cleanup_non_existing_pids(void) {
        int c;
@@ -1603,8 +1462,9 @@ void apply_apps_groups_targets_inheritance(void) {
 
        // children that do not have a target
        // inherit their target from their parent
-       int found = 1;
+       int found = 1, loops = 0;
        while(found) {
+               if(unlikely(debug)) loops++;
                found = 0;
                for(p = root_of_pids; p ; p = p->next) {
                        // if this process does not have a target
@@ -1626,6 +1486,7 @@ void apply_apps_groups_targets_inheritance(void) {
        // repeat, until nothing more can be done.
        found = 1;
        while(found) {
+               if(unlikely(debug)) loops++;
                found = 0;
                for(p = root_of_pids; p ; p = p->next) {
                        // if this process does not have any children
@@ -1658,7 +1519,7 @@ void apply_apps_groups_targets_inheritance(void) {
                        }
                }
 
-               if(debug)
+               if(unlikely(debug))
                        fprintf(stderr, "apps.plugin: merged %d processes\n", found);
        }
 
@@ -1667,25 +1528,18 @@ void apply_apps_groups_targets_inheritance(void) {
                all_pids[1]->target = apps_groups_default_target;
 
        // give a default target on all top level processes
+       if(unlikely(debug)) loops++;
        for(p = root_of_pids; p ; p = p->next) {
                // if the process is not merged itself
                // then is is a top level process
                if(!p->merged && !p->target)
                        p->target = apps_groups_default_target;
-
-#ifdef AGGREGATE_CHILDREN_TO_PARENTS
-               // by the way, update the diffs
-               // will be used later for subtracting killed process times
-               p->diff_cutime = p->utime - p->cutime;
-               p->diff_cstime = p->stime - p->cstime;
-               p->diff_cminflt = p->minflt - p->cminflt;
-               p->diff_cmajflt = p->majflt - p->cmajflt;
-#endif /* AGGREGATE_CHILDREN_TO_PARENTS */
        }
 
        // give a target to all merged child processes
        found = 1;
        while(found) {
+               if(unlikely(debug)) loops++;
                found = 0;
                for(p = root_of_pids; p ; p = p->next) {
                        if(unlikely(!p->target && p->merged && p->parent && p->parent->target)) {
@@ -1697,6 +1551,9 @@ void apply_apps_groups_targets_inheritance(void) {
                        }
                }
        }
+
+       if(unlikely(debug))
+               fprintf(stderr, "apps.plugin: apply_apps_groups_targets_inheritance() made %d loops on the process tree\n", loops);
 }
 
 long zero_all_targets(struct target *root) {
@@ -1749,10 +1606,13 @@ void aggregate_pid_on_target(struct target *w, struct pid_stat *p, struct target
        }
 
        if(likely(p->updated)) {
-               w->cutime += p->cutime; // - p->fix_cutime;
-               w->cstime += p->cstime; // - p->fix_cstime;
-               w->cminflt += p->cminflt; // - p->fix_cminflt;
-               w->cmajflt += p->cmajflt; // - p->fix_cmajflt;
+               if(unlikely(debug && (p->fix_cutime || p->fix_cstime || p->fix_cminflt || p->fix_cmajflt)))
+                       fprintf(stderr, "apps.plugin: \tadding child counters of %d (%s) to target %s. Currents: cutime=%llu, cstime=%llu, cminflt=%llu, cmajflt=%llu, Fixes: cutime=%llu, cstime=%llu, cminflt=%llu, cmajflt=%llu\n", p->pid, p->comm, w->name, p->cutime, p->cstime, p->cminflt, p->cmajflt, p->fix_cutime, p->fix_cstime, p->fix_cminflt, p->fix_cmajflt);
+
+               w->cutime  += p->cutime  - p->fix_cutime;
+               w->cstime  += p->cstime  - p->fix_cstime;
+               w->cminflt += p->cminflt - p->fix_cminflt;
+               w->cmajflt += p->cmajflt - p->fix_cmajflt;
 
                w->utime += p->utime; //+ (p->pid != 1)?(p->cutime - p->fix_cutime):0;
                w->stime += p->stime; //+ (p->pid != 1)?(p->cstime - p->fix_cstime):0;
@@ -1800,7 +1660,7 @@ void aggregate_pid_on_target(struct target *w, struct pid_stat *p, struct target
                }
 
                if(unlikely(debug || w->debug))
-                       fprintf(stderr, "apps.plugin: \tAggregating %s pid %d on %s utime=%llu, stime=%llu, cutime=%llu, cstime=%llu, minflt=%llu, majflt=%llu, cminflt=%llu, cmajflt=%llu\n", p->comm, p->pid, w->name, p->utime, p->stime, p->cutime, p->cstime, p->minflt, p->majflt, p->cminflt, p->cmajflt);
+                       fprintf(stderr, "apps.plugin: \tAggregating %s pid %d on %s utime=%llu, stime=%llu, cutime=%llu, cstime=%llu, minflt=%llu, majflt=%llu, cminflt=%llu, cmajflt=%llu, fix_cutime=%llu, fix_cstime=%llu, fix_cminflt=%llu, fix_cmajflt=%llu\n", p->comm, p->pid, w->name, p->utime, p->stime, p->cutime, p->cstime, p->minflt, p->majflt, p->cminflt, p->cmajflt, p->fix_cutime, p->fix_cstime, p->fix_cminflt, p->fix_cmajflt);
 
 /*             if(p->utime - p->old_utime > 100) fprintf(stderr, "BIG CHANGE: %d %s utime increased by %llu from %llu to %llu\n", p->pid, p->comm, p->utime - p->old_utime, p->old_utime, p->utime);
                if(p->cutime - p->old_cutime > 100) fprintf(stderr, "BIG CHANGE: %d %s cutime increased by %llu from %llu to %llu\n", p->pid, p->comm, p->cutime - p->old_cutime, p->old_cutime, p->cutime);
@@ -1811,16 +1671,6 @@ void aggregate_pid_on_target(struct target *w, struct pid_stat *p, struct target
                if(p->cminflt - p->old_cminflt > 15000) fprintf(stderr, "BIG CHANGE: %d %s cminflt increased by %llu from %llu to %llu\n", p->pid, p->comm, p->cminflt - p->old_cminflt, p->old_cminflt, p->cminflt);
                if(p->cmajflt - p->old_cmajflt > 15000) fprintf(stderr, "BIG CHANGE: %d %s cmajflt increased by %llu from %llu to %llu\n", p->pid, p->comm, p->cmajflt - p->old_cmajflt, p->old_cmajflt, p->cmajflt);
 */
-#ifdef AGGREGATE_CHILDREN_TO_PARENTS
-               p->old_utime = p->utime;
-               p->old_cutime = p->cutime;
-               p->old_stime = p->stime;
-               p->old_cstime = p->cstime;
-               p->old_minflt = p->minflt;
-               p->old_majflt = p->majflt;
-               p->old_cminflt = p->cminflt;
-               p->old_cmajflt = p->cmajflt;
-#endif /* AGGREGATE_CHILDREN_TO_PARENTS */
 
                if(o) {
                        // since the process switched target
@@ -1831,42 +1681,46 @@ void aggregate_pid_on_target(struct target *w, struct pid_stat *p, struct target
                        // IMPORTANT
                        // We add/subtract the last/OLD values we added to the target
 
-                       w->fix_cutime -= p->last_cutime;
-                       w->fix_cstime -= p->last_cstime;
-                       w->fix_cminflt -= p->last_cminflt;
-                       w->fix_cmajflt -= p->last_cmajflt;
+                       unsigned long long cutime  = p->last_cutime - p->last_fix_cutime;
+                       unsigned long long cstime  = p->last_cstime - p->last_fix_cstime;
+                       unsigned long long cminflt = p->last_cminflt - p->last_fix_cminflt;
+                       unsigned long long cmajflt = p->last_cmajflt - p->last_fix_cmajflt;
+
+                       w->fix_cutime  -= cutime;
+                       w->fix_cstime  -= cstime;
+                       w->fix_cminflt -= cminflt;
+                       w->fix_cmajflt -= cmajflt;
 
-                       w->fix_utime -= p->last_utime;
-                       w->fix_stime -= p->last_stime;
+                       w->fix_utime  -= p->last_utime;
+                       w->fix_stime  -= p->last_stime;
                        w->fix_minflt -= p->last_minflt;
                        w->fix_majflt -= p->last_majflt;
 
-
-                       w->fix_io_logical_bytes_read -= p->last_io_logical_bytes_read;
+                       w->fix_io_logical_bytes_read    -= p->last_io_logical_bytes_read;
                        w->fix_io_logical_bytes_written -= p->last_io_logical_bytes_written;
-                       w->fix_io_read_calls -= p->last_io_read_calls;
-                       w->fix_io_write_calls -= p->last_io_write_calls;
-                       w->fix_io_storage_bytes_read -= p->last_io_storage_bytes_read;
+                       w->fix_io_read_calls            -= p->last_io_read_calls;
+                       w->fix_io_write_calls           -= p->last_io_write_calls;
+                       w->fix_io_storage_bytes_read    -= p->last_io_storage_bytes_read;
                        w->fix_io_storage_bytes_written -= p->last_io_storage_bytes_written;
                        w->fix_io_cancelled_write_bytes -= p->last_io_cancelled_write_bytes;
 
                        // ---
 
-                       o->fix_cutime += p->last_cutime;
-                       o->fix_cstime += p->last_cstime;
-                       o->fix_cminflt += p->last_cminflt;
-                       o->fix_cmajflt += p->last_cmajflt;
+                       o->fix_cutime  += cutime;
+                       o->fix_cstime  += cstime;
+                       o->fix_cminflt += cminflt;
+                       o->fix_cmajflt += cmajflt;
 
-                       o->fix_utime += p->last_utime;
-                       o->fix_stime += p->last_stime;
+                       o->fix_utime  += p->last_utime;
+                       o->fix_stime  += p->last_stime;
                        o->fix_minflt += p->last_minflt;
                        o->fix_majflt += p->last_majflt;
 
-                       o->fix_io_logical_bytes_read += p->last_io_logical_bytes_read;
+                       o->fix_io_logical_bytes_read    += p->last_io_logical_bytes_read;
                        o->fix_io_logical_bytes_written += p->last_io_logical_bytes_written;
-                       o->fix_io_read_calls += p->last_io_read_calls;
-                       o->fix_io_write_calls += p->last_io_write_calls;
-                       o->fix_io_storage_bytes_read += p->last_io_storage_bytes_read;
+                       o->fix_io_read_calls            += p->last_io_read_calls;
+                       o->fix_io_write_calls           += p->last_io_write_calls;
+                       o->fix_io_storage_bytes_read    += p->last_io_storage_bytes_read;
                        o->fix_io_storage_bytes_written += p->last_io_storage_bytes_written;
                        o->fix_io_cancelled_write_bytes += p->last_io_cancelled_write_bytes;
                }
@@ -1876,28 +1730,33 @@ void aggregate_pid_on_target(struct target *w, struct pid_stat *p, struct target
 
                // since the process has exited, the user
                // will see a drop in our charts, because the incremental
-               // values of this process will not be there
+               // values of this process will not be there from now on
 
                // add them to the fix_* values and they will be added to
                // the reported values, so that the report goes steady
-               w->fix_minflt += p->minflt;
-               w->fix_majflt += p->majflt;
-               w->fix_utime += p->utime;
-               w->fix_stime += p->stime;
-               w->fix_cminflt += p->cminflt;
-               w->fix_cmajflt += p->cmajflt;
-               w->fix_cutime += p->cutime;
-               w->fix_cstime += p->cstime;
-
-               w->fix_io_logical_bytes_read += p->io_logical_bytes_read;
-               w->fix_io_logical_bytes_written += p->io_logical_bytes_written;
-               w->fix_io_read_calls += p->io_read_calls;
-               w->fix_io_write_calls += p->io_write_calls;
-               w->fix_io_storage_bytes_read += p->io_storage_bytes_read;
-               w->fix_io_storage_bytes_written += p->io_storage_bytes_written;
-               w->fix_io_cancelled_write_bytes += p->io_cancelled_write_bytes;
+
+               w->fix_minflt  += p->last_minflt;
+               w->fix_majflt  += p->last_majflt;
+               w->fix_utime   += p->last_utime;
+               w->fix_stime   += p->last_stime;
+
+               w->fix_cminflt += (p->last_cminflt - p->last_fix_cminflt);
+               w->fix_cmajflt += (p->last_cmajflt - p->last_fix_cmajflt);
+               w->fix_cutime  += (p->last_cutime  - p->last_fix_cutime);
+               w->fix_cstime  += (p->last_cstime  - p->last_fix_cstime);
+
+               w->fix_io_logical_bytes_read    += p->last_io_logical_bytes_read;
+               w->fix_io_logical_bytes_written += p->last_io_logical_bytes_written;
+               w->fix_io_read_calls            += p->last_io_read_calls;
+               w->fix_io_write_calls           += p->last_io_write_calls;
+               w->fix_io_storage_bytes_read    += p->last_io_storage_bytes_read;
+               w->fix_io_storage_bytes_written += p->last_io_storage_bytes_written;
+               w->fix_io_cancelled_write_bytes += p->last_io_cancelled_write_bytes;
        }
 
+       //if((long long)w->cutime + w->fix_cutime < 0)
+       //      error("Negative total cutime (%llu - %lld) on target %s after adding process %d (%s, %s) with utime=%llu, stime=%llu, minflt=%llu, majflt=%llu, cutime=%llu, cstime=%llu, cminflt=%llu, cmajflt=%llu, fix_cutime=%llu, fix_cstime=%llu, fix_cminflt=%llu, fix_cmajflt=%llu\n",
+       //                w->cutime, w->fix_cutime, w->name, p->pid, p->comm, p->updated?"running":"exited", p->utime, p->stime, p->minflt, p->majflt, p->cutime, p->cstime, p->cminflt, p->cmajflt, p->fix_cutime, p->fix_cstime, p->fix_cminflt, p->fix_cmajflt);
 }
 
 void count_targets_fds(struct target *root) {
@@ -1967,19 +1826,10 @@ void calculate_netdata_statistics(void)
        link_all_processes_to_their_parents();
        apply_apps_groups_targets_inheritance();
 
-#ifdef AGGREGATE_CHILDREN_TO_PARENTS
-       aggregate_children_to_parents();
-#endif /* AGGREGATE_CHILDREN_TO_PARENTS */
-
        zero_all_targets(users_root_target);
        zero_all_targets(groups_root_target);
        apps_groups_targets = zero_all_targets(apps_groups_root_target);
 
-#ifdef AGGREGATE_CHILDREN_TO_PARENTS
-       if(debug)
-               debug_childrens_aggregations(0, 1);
-#endif /* AGGREGATE_CHILDREN_TO_PARENTS */
-
        // this has to be done, before the cleanup
        struct pid_stat *p = NULL;
        struct target *w = NULL, *o = NULL;
@@ -2099,7 +1949,7 @@ void send_collected_data_to_netdata(struct target *root, const char *type, unsig
        for (w = root; w ; w = w->next) {
                if(w->target || (!w->processes && !w->exposed)) continue;
 
-               fprintf(stdout, "SET %s = %llu\n", w->name, w->utime + w->stime + w->fix_utime + w->fix_stime);
+               fprintf(stdout, "SET %s = %llu\n", w->name, w->utime + w->stime + w->fix_utime + w->fix_stime + (include_exited_childs?(w->cutime + w->cstime + w->fix_cutime + w->fix_cstime):0));
        }
        fprintf(stdout, "END\n");
 
@@ -2107,7 +1957,7 @@ void send_collected_data_to_netdata(struct target *root, const char *type, unsig
        for (w = root; w ; w = w->next) {
                if(w->target || (!w->processes && !w->exposed)) continue;
 
-               fprintf(stdout, "SET %s = %llu\n", w->name, w->utime + w->fix_utime);
+               fprintf(stdout, "SET %s = %llu\n", w->name, w->utime + w->fix_utime + (include_exited_childs?(w->cutime + w->fix_cutime):0));
        }
        fprintf(stdout, "END\n");
 
@@ -2115,7 +1965,7 @@ void send_collected_data_to_netdata(struct target *root, const char *type, unsig
        for (w = root; w ; w = w->next) {
                if(w->target || (!w->processes && !w->exposed)) continue;
 
-               fprintf(stdout, "SET %s = %llu\n", w->name, w->stime + w->fix_stime);
+               fprintf(stdout, "SET %s = %llu\n", w->name, w->stime + w->fix_stime + (include_exited_childs?(w->cstime + w->fix_cstime):0));
        }
        fprintf(stdout, "END\n");
 
@@ -2147,7 +1997,7 @@ void send_collected_data_to_netdata(struct target *root, const char *type, unsig
        for (w = root; w ; w = w->next) {
                if(w->target || (!w->processes && !w->exposed)) continue;
 
-               fprintf(stdout, "SET %s = %llu\n", w->name, w->minflt + w->fix_minflt);
+               fprintf(stdout, "SET %s = %llu\n", w->name, w->minflt + w->fix_minflt + (include_exited_childs?(w->cminflt + w->fix_cminflt):0));
        }
        fprintf(stdout, "END\n");
 
@@ -2155,7 +2005,7 @@ void send_collected_data_to_netdata(struct target *root, const char *type, unsig
        for (w = root; w ; w = w->next) {
                if(w->target || (!w->processes && !w->exposed)) continue;
 
-               fprintf(stdout, "SET %s = %llu\n", w->name, w->majflt + w->fix_majflt);
+               fprintf(stdout, "SET %s = %llu\n", w->name, w->majflt + w->fix_majflt + (include_exited_childs?(w->cmajflt + w->fix_cmajflt):0));
        }
        fprintf(stdout, "END\n");
 
@@ -2239,7 +2089,7 @@ void send_charts_updates_to_netdata(struct target *root, const char *type, const
 
        // we have something new to show
        // update the charts
-       fprintf(stdout, "CHART %s.cpu '' '%s CPU Time (%ld%% = %ld core%s)' 'cpu time %%' cpu %s.cpu stacked 20001 %d\n", type, title, (processors * 100), processors, (processors>1)?"s":"", type, update_every);
+       fprintf(stdout, "CHART %s.cpu '' '%s CPU Time (%d%% = %d core%s)' 'cpu time %%' cpu %s.cpu stacked 20001 %d\n", type, title, (processors * 100), processors, (processors>1)?"s":"", type, update_every);
        for (w = root; w ; w = w->next) {
                if(w->target || (!w->processes && !w->exposed)) continue;
 
@@ -2267,18 +2117,18 @@ void send_charts_updates_to_netdata(struct target *root, const char *type, const
                fprintf(stdout, "DIMENSION %s '' absolute 1 1 noreset\n", w->name);
        }
 
-       fprintf(stdout, "CHART %s.cpu_user '' '%s CPU User Time (%ld%% = %ld core%s)' 'cpu time %%' cpu %s.cpu_user stacked 20020 %d\n", type, title, (processors * 100), processors, (processors>1)?"s":"", type, update_every);
+       fprintf(stdout, "CHART %s.cpu_user '' '%s CPU User Time (%d%% = %d core%s)' 'cpu time %%' cpu %s.cpu_user stacked 20020 %d\n", type, title, (processors * 100), processors, (processors>1)?"s":"", type, update_every);
        for (w = root; w ; w = w->next) {
                if(w->target || (!w->processes && !w->exposed)) continue;
 
-               fprintf(stdout, "DIMENSION %s '' incremental 100 %ld noreset\n", w->name, hz * processors);
+               fprintf(stdout, "DIMENSION %s '' incremental 100 %u noreset\n", w->name, hz);
        }
 
-       fprintf(stdout, "CHART %s.cpu_system '' '%s CPU System Time (%ld%% = %ld core%s)' 'cpu time %%' cpu %s.cpu_system stacked 20021 %d\n", type, title, (processors * 100), processors, (processors>1)?"s":"", type, update_every);
+       fprintf(stdout, "CHART %s.cpu_system '' '%s CPU System Time (%d%% = %d core%s)' 'cpu time %%' cpu %s.cpu_system stacked 20021 %d\n", type, title, (processors * 100), processors, (processors>1)?"s":"", type, update_every);
        for (w = root; w ; w = w->next) {
                if(w->target || (!w->processes && !w->exposed)) continue;
 
-               fprintf(stdout, "DIMENSION %s '' incremental 100 %ld noreset\n", w->name, hz * processors);
+               fprintf(stdout, "DIMENSION %s '' incremental 100 %u noreset\n", w->name, hz);
        }
 
        fprintf(stdout, "CHART %s.major_faults '' '%s Major Page Faults (swap read)' 'page faults/s' swap %s.major_faults stacked 20010 %d\n", type, title, type, update_every);
@@ -2369,6 +2219,16 @@ void parse_args(int argc, char **argv)
                        continue;
                }
 
+               if(strcmp("no-childs", argv[i]) == 0) {
+                       include_exited_childs = 0;
+                       continue;
+               }
+
+               if(strcmp("with-childs", argv[i]) == 0) {
+                       include_exited_childs = 1;
+                       continue;
+               }
+
                if(!name) {
                        name = argv[i];
                        continue;
@@ -2424,8 +2284,6 @@ int main(int argc, char **argv)
        }
 #endif /* NETDATA_INTERNAL_CHECKS */
 
-       info("starting...");
-
        procfile_adaptive_initial_allocation = 1;
 
        time_t started_t = time(NULL);
@@ -2444,13 +2302,13 @@ int main(int argc, char **argv)
        }
 
        fprintf(stdout, "CHART netdata.apps_cpu '' 'Apps Plugin CPU' 'milliseconds/s' apps.plugin netdata.apps_cpu stacked 140000 %1$d\n"
-                       "DIMENSION user '' incremental 1 1000\n"
-                       "DIMENSION system '' incremental 1 1000\n"
-                       "CHART netdata.apps_files '' 'Apps Plugin Files' 'files/s' apps.plugin netdata.apps_files line 140001 %1$d\n"
-                       "DIMENSION files '' incremental 1 1\n"  
-                  "DIMENSION pids '' absolute 1 1\n"  
-                       "DIMENSION fds '' absolute 1 1\n"  
-                       "DIMENSION targets '' absolute 1 1\n", update_every);
+                       "DIMENSION user '' incremental 1 1000\n"
+                       "DIMENSION system '' incremental 1 1000\n"
+                       "CHART netdata.apps_files '' 'Apps Plugin Files' 'files/s' apps.plugin netdata.apps_files line 140001 %1$d\n"
+                       "DIMENSION files '' incremental 1 1\n"
+                       "DIMENSION pids '' absolute 1 1\n"
+                       "DIMENSION fds '' absolute 1 1\n"
+                       "DIMENSION targets '' absolute 1 1\n", update_every);
 
 #ifndef PROFILING_MODE
        unsigned long long sunext = (time(NULL) - (time(NULL) % update_every) + update_every) * 1000000ULL;
@@ -2488,7 +2346,8 @@ int main(int argc, char **argv)
                send_collected_data_to_netdata(users_root_target, "users", dt);
                send_collected_data_to_netdata(groups_root_target, "groups", dt);
 
-               if(debug) fprintf(stderr, "apps.plugin: done Loop No %llu\n", counter);
+               if(unlikely(debug))
+                       fprintf(stderr, "apps.plugin: done Loop No %llu\n", counter);
 
                current_t = time(NULL);
 
index 9ce6e33d08a619861e6f2f2a111eaa2900028bfd..670da064ac38a1fee24e6dc917f7e3b92b008e15 100644 (file)
@@ -668,6 +668,7 @@ struct cgroup *cgroup_add(const char *id) {
                                !strcmp(chart_id, "systemd") ||
                                !strcmp(chart_id, "system.slice") ||
                                !strcmp(chart_id, "machine.slice") ||
+                               !strcmp(chart_id, "init.scope") ||
                                !strcmp(chart_id, "user") ||
                                !strcmp(chart_id, "system") ||
                                !strcmp(chart_id, "machine") ||
index 49bdc73748e41ef87398bb56e24112057a88b406..a64eb9018ce5d120d70cf67a7d7fe2af42a68db4 100644 (file)
@@ -2,6 +2,7 @@
 <html lang="en">
 <head>
        <title>NetData Dashboard</title>
+       <meta name="application-name" content="netdata">
 
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta charset="utf-8">
index 051c9421c4cc1ae14abfe45995ae9390f7c7039d..fc1f9254209acd69301a069bfa876e273348e768 100644 (file)
@@ -2,6 +2,7 @@
 <html lang="en">
 <head>
        <title>NetData Dashboard</title>
+       <meta name="application-name" content="netdata">
 
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta charset="utf-8">
index ae1c1b3ea4c41afb39f032285b2eda9286cce619..f184321c0a3c657146e5cbcde8cda5a8370154fc 100644 (file)
@@ -2,6 +2,7 @@
 <html lang="en">
 <head>
        <title>NetData Dashboard</title>
+       <meta name="application-name" content="netdata">
 
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta charset="utf-8">
index 2be0ec3722f39ae0218eb3dffa3ba1df9941591c..1ef92d6a42c26528416aac6a3d90063b5aa01600 100644 (file)
@@ -2,6 +2,7 @@
 <html lang="en">
 <head>
        <title>NetData - Real-time performance monitoring, done right!</title>
+       <meta name="application-name" content="netdata">
 
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta charset="utf-8">
index 558127e4211037a2ecd01664c2401f5ecd3f4a61..69770e145e90c50cd76ca7704387146f0be0fbc3 100644 (file)
@@ -2,6 +2,7 @@
 <html lang="en">
 <head>
        <title>netdata dashboard</title>
+       <meta name="application-name" content="netdata">
 
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta charset="utf-8">
@@ -1325,19 +1326,19 @@ var menuData = {
 
        'apps': {
                title: 'Applications',
-               info: 'Per application statistics are collected using netdata\'s <code>apps.plugin</code>. This plugin walks through the entire <code>/proc</code> filesystem and aggregates statistics for applications of interest, defined in <code>/etc/netdata/apps_groups.conf</code> (the default is <a href="https://github.com/firehol/netdata/blob/master/conf.d/apps_groups.conf" target="_blank">here</a>). The plugin internally builds a process tree (much like <code>ps fax</code> does), and groups processes together (evaluating both child and parent processes) so that the result is always a chart with a predefined set of dimensions (of course, only application groups found running are reported).<br/><b>IMPORTANT</b>: The values shown here are not 100% accurate. They only include values for the processes running. If an application is spawning children continuously, which are terminated in just a few milliseconds (like shell scripts do), the values reported will be inaccurate. Linux does report the values for the exited children of a process. However, these values are reported to the parent process <b>only when the child exits</b>. If these values, of the exited child processes, were also aggregated in the charts below, the charts would have been full of spikes, presenting unrealistic utilization for each process group. So, we decided to ignore these values and present only the utilization of <b>the currently running processes</b>.',
+               info: 'Per application statistics are collected using netdata\'s <code>apps.plugin</code>. This plugin walks through the entire <code>/proc</code> filesystem and aggregates statistics for applications of interest, defined in <code>/etc/netdata/apps_groups.conf</code> (the default is <a href="https://github.com/firehol/netdata/blob/master/conf.d/apps_groups.conf" target="_blank">here</a>). The plugin internally builds a process tree (much like <code>ps fax</code> does), and groups processes together (evaluating both child and parent processes) so that the result is always a chart with a predefined set of dimensions (of course, only application groups found running are reported). The reported values are compatible with <code>top</code>, although the netdata plugin counts also the resources of exited children (unlike <code>top</code> which shows only the resources of the currently running processes). So for processes like shell scripts, the reported values include the resources used by the commands these scripts run within each timeframe.',
                height: 1.5
        },
 
        'users': {
                title: 'Users',
-               info: 'Per user statistics are collected using netdata\'s <code>apps.plugin</code>. This plugin walks through the entire <code>/proc</code> filesystem and aggregates statistics per user.<br/><b>IMPORTANT</b>: The values shown here are not 100% accurate. They only include values for the processes running. If an application is spawning children continuously, which are terminated in just a few milliseconds (like shell scripts do), the values reported will be inaccurate. Linux does report the values for the exited children of a process. However, these values are reported to the parent process <b>only when the child exits</b>. If these values, of the exited child processes, were also aggregated in the charts below, the charts would have been full of spikes, presenting unrealistic utilization for each process group. So, we decided to ignore these values and present only the utilization of <b>the currently running processes</b>.',
+               info: 'Per user statistics are collected using netdata\'s <code>apps.plugin</code>. This plugin walks through the entire <code>/proc</code> filesystem and aggregates statistics per user. The reported values are compatible with <code>top</code>, although the netdata plugin counts also the resources of exited children (unlike <code>top</code> which shows only the resources of the currently running processes). So for processes like shell scripts, the reported values include the resources used by the commands these scripts run within each timeframe.',
                height: 1.5
        },
 
        'groups': {
                title: 'User Groups',
-               info: 'Per user group statistics are collected using netdata\'s <code>apps.plugin</code>. This plugin walks through the entire <code>/proc</code> filesystem and aggregates statistics per user group.<br/><b>IMPORTANT</b>: The values shown here are not 100% accurate. They only include values for the processes running. If an application is spawning children continuously, which are terminated in just a few milliseconds (like shell scripts do), the values reported will be inaccurate. Linux does report the values for the exited children of a process. However, these values are reported to the parent process <b>only when the child exits</b>. If these values, of the exited child processes, were also aggregated in the charts below, the charts would have been full of spikes, presenting unrealistic utilization for each process group. So, we decided to ignore these values and present only the utilization of <b>the currently running processes</b>.',
+               info: 'Per user group statistics are collected using netdata\'s <code>apps.plugin</code>. This plugin walks through the entire <code>/proc</code> filesystem and aggregates statistics per user group. The reported values are compatible with <code>top</code>, although the netdata plugin counts also the resources of exited children (unlike <code>top</code> which shows only the resources of the currently running processes). So for processes like shell scripts, the reported values include the resources used by the commands these scripts run within each timeframe.',
                height: 1.5
        },
 
index c88a392752057f1d2c6c7d3fdba9e1172e3921f6..a1416dbbe49decdadcaed4594a19d5a5a3e402e2 100644 (file)
@@ -2,6 +2,7 @@
 <html lang="en">
 <head>
        <title>NetData Registry Dashboard</title>
+       <meta name="application-name" content="netdata">
 
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta charset="utf-8">
index ffef517d3774893e41e12e0cfa6a6f2e9e8c58d4..8d9d71e1b35146aa4d2ec25be1e738716bc3386e 100644 (file)
@@ -2,6 +2,7 @@
 <html lang="en">
 <head>
        <title>NetData TV Dashboard</title>
+       <meta name="application-name" content="netdata">
 
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta charset="utf-8">