mtail

Troubleshooting mtail installations

This page gives an overview of some avenues to debug your mtail installation.

Also, see the FAQ.

Reporting a problem

Please when reporting a problem, include the mtail version:

go get or build problems

package github.com/jaqx0r/mtail: no Go files

You’re using go 1.11 or higher, which now starts to use go modules, and doesn’t like source code layouts like mtail which doesn’t have any Go files in the top directory.

Either set GO111MODULE=on environment variable first, or go get the binary directly:

go get github.com/jaqx0r/mtail/cmd/mtail

vs

GO111MODULE=on go get -u github.com/jaqx0r/mtail
cd $GOPATH/src/github.com/jaqx0r/mtail
make install

Compilation problems

Compilation problems will be emitted to the standard INFO log

(The behaviour of glog is documented in https://github.com/golang/glog)

Errors for the most recent version of the program will also be displayed on the standard status page (served over HTTP at port 3903 by default) in the Program Loader section.

If a program fails to compile, it will not be loaded. If an existing program has been loaded, and a new version is written to disk (by you, or a configuration management system) and that new version does not compile, mtail will log the errors and not interrupt or restart the existing, older program.

The --compile_only flag will only attempt to compile the programs and not execute them. This can be used for pre-commit testing, for example.

Syntax trees, type information, and virtual machine bytecode

More detailed compiler debugging can be retrieved by using the --dump_ast, --dump_ast_types, and --dump_bytecode, all of which dump their state to the INFO log.

For example, type errors logged such as prog.mtail: Runtime error: conversion of "-0.000000912" to int failed: strconv.ParseInt: parsing "-0.000000912": invalid syntax suggest an invalid type inference of int instead of float for some program symbol or expression. Use the --dump_ast_types flag to see the type annotated syntax tree of the program for more details.

When reporting a problem, please include the AST type dump.

Memory or performance issues

mtail is a virtual machine emulator, and so strange performance issues can occur beyond the imagination of the author.

The standard Go profiling tool can help. Start with a cpu profile:

`go tool pprof /path/to/mtail http://localhost:3903/debug/pprof/profile’

or a memory profile:

`go tool pprof /path/to/mtail http://localhost:3903/debug/pprof/heap’

There are many good guides on using the profiling tool:

The goroutine stack dump can also help explain what is happening at the moment.

http://localhost:3903/debug/pprof/goroutine?debug=2 shows the full goroutine stack dump.

There is one (*VM).Run stack per program. These are opaque to the goroutine stack dump as they execute the bytecode. However, the second argument to Run on the stack is the first four letters of the program name, encoded as ASCII. You can transcode these back to their names by doing a conversion from the int32 value in hex provided in the stack, e.g.: 0x61706163 -> ‘apac’ (probably an apache log program); 0x7273796e -> ‘rsyn’ (probably an rsyncd log program)

Obvious problems seen in the goroutine stack dump are long-waiting goroutines, usually on mutexes. (they show their block time in minutes, e.g. goroutine 38 [semacquire, 1580 minutes]:) which usually also manifest as a logjam (no pun intended) in the loader, tailer, and watcher goroutines (in state ‘chan send’).

Simulating production for memory profiling

To analyse memory allocations under realistic load:

  1. Build mtail:
    bazel build //cmd/mtail
    
  2. Start mtail with example programs and a temporary log file:
    touch /tmp/mtail_pprof_log
    bazel-bin/cmd/mtail/mtail_/mtail \
      --progs examples/ \
      --logs /tmp/mtail_pprof_log \
      --poll_interval 50ms \
      --logtostderr \
      --stderrthreshold 0 &
    
  3. Feed test log data in a loop to simulate continuous traffic:
    while true; do
      cat internal/mtail/testdata/*.log >> /tmp/mtail_pprof_log
      sleep 6
    done &
    
  4. Capture an allocation profile with go tool pprof:
    go tool pprof -alloc_space -top \
      http://localhost:3903/debug/pprof/allocs
    

    Use -seconds 30 for a delta profile (allocations within a 30 s window), or omit it for a cumulative profile since process start.

  5. Drill into specific functions:
    go tool pprof -alloc_space -list 'ProcessLogLine' \
      http://localhost:3903/debug/pprof/allocs
    go tool pprof -alloc_space -peek 'execute' \
      http://localhost:3903/debug/pprof/allocs
    

Known allocation hot spots

As of v3.2.52, the cumulative alloc_space profile shows that roughly 60 % of heap allocations occur on the per-log-line hot path in ProcessLogLine and execute (internal/runtime/vm/vm.go).

Function Flat alloc Key lines
VM.ProcessLogLine ~28 % v.t = new(thread) (4.5 MB) — new thread per line; t.matches = make(...) (12.5 MB) — match map per line
VM.execute ~17 % t.matches[index] = v.re[index].FindStringSubmatch(...) (7 MB) — regex submatch allocations inside code.Match
Runtime.CompileAndRun ~9 % One-shot startup cost from program loading, parsing, type checking, and code generation

The single biggest allocation is t.matches = make(map[int][]string, len(v.re)) in ProcessLogLine: a pre-sized map to hold regex capture group results, recreated from scratch on every log line. An object pool (sync.Pool) for thread structs and their match maps would eliminate the bulk of hot-path allocations.

Distributed Tracing

mtail can export traces to the Jaeger trace collector. Specify the Jaeger endpoint with the --jaeger_endpoint flag

mtail --jaeger_endpoint http://localhost:14268/api/traces

The --trace_sample_period flag can be used to set how often a trace is sampled and sent to the collector. Set it to 100 to collect one in 100 traces.

Deployment problems

The INFO log at /tmp/mtail.INFO by default contains lots of information about any errors encountered. Adding the -v=2 flag raises the verbosity. See the glog manual for more logging flag options.

The one_shot and logtostderr flags may come in helpful for quickly launching mtail in non-daemon mode in order to flush out deployment issues like permissions problems.