Difference between revisions of "Network/Visualize pcap file data"

From braindump
Jump to navigation Jump to search
Line 73: Line 73:
1318546808 62 5642
1318546808 62 5642

== Produce a graph with <tt>R</tt> ===
=== Produce a graph with <tt>R</tt> ===
Finally to produce the graph in PDF or PNG format the below R script is being used.

Line 84: Line 84:
file <- args[[d]]
file <- args[[d]]
traffic <- read.table( file=file, header=T, sep="\t" );
traffic <- read.table( file=file, header=T, sep="\t" );
# remove the trailing non value
#traffic[ , ncol( traffic ) + 1 ] <- NULL
traffic$kbits <- ( traffic$bytes * 8 ) / 1024
traffic$kbits <- ( traffic$bytes * 8 ) / 1024
traffic$frames <- NULL
traffic$frames <- NULL
Line 100: Line 98:
plot( traffic, type="h", main=file, sub=sub.title, xlab="Time", ylab="Kbit/s" )
plot( traffic, type="h", main=file, sub=sub.title, xlab="Time", ylab="Kbit/s" )

To run the script issue the following command assuming the above script is called <tt>tshark-graph.R</tt>.

[[Category: Network]]
[[Category: Network]]

Revision as of 23:26, 16 July 2012

One nice day I given orders to produce network usage statistics to find eventual burst in the stream. However I was faced with two problems. The network monitoring software was graphing the network flow every minute which was too coarse and the interface was connected to a switch where I had no control so a mirror port was out of question. The assignment was to collect data for a week and then look at the numbers.

I was unsure how to go about it so I did run a tcpdump on the hosts in question everyday for the time period required for monitoring. At that point in time I had no idea how to process the pcap dump data and I had not the faintest clue how to present it at the end of the week. After a lot of searching I finally came across a nifty feature in tshark allowing me to aggregate bandwidth on a per second basis. Below is a short recipe how to create the graphs.


  • capture file the wireshark suite understands. E.g. pcap or Solaris snoop among others.
  • tshark
  • ruby
  • R


Aggregate traffic with tshark

To properly graph the data tshark needs to generate statistic on a per second basis. The below command will achive this.

tshark -q -z 'io,stat,1' -r <PcapFile> > <StatisticsFile>

The output is looking something like the excerpt below.

<<<<<<< <StatisticsFile>
Time            |frames|  bytes  
000.000-001.000      62      5578 
001.000-002.000      62      5386 
002.000-003.000      62      5692 
003.000-004.000      62      5968 
004.000-005.000      62      5428 
005.000-006.000      62      5838 
006.000-007.000      62      5912 

The only problem with the output above is that the time is relative to the start of the pcap file. Before passing the data to R it has to be properly massaged.

Convert the time with ruby

Note: I'm pretty sure this part could be done in R but with the deadline looming I decided to write it in a language I'm familiar with.

My data captures are usually automated with a script that writes the start date and time into the filename to make it unique. The below ruby script assumes the file names being passed to it are in the form of <String>-YYYY-MM-DD_hh-mm.stats


$files = ARGV

dateRegex = /(\d{4})-(0[1-9]|1[0-2])-(0[0-9]|[12][0-9]|3[01])_([01][0-9]|2[0-4])-([0-5][0-9])/

$files.each do |file|
    file.match( dateRegex )
    $time = Time.local( $1, $2, $3, $4, $5, 0 )

    $fh = File.open( file + ".ts", "w" )
    File.open( file ).each do |line|
        # filter lines 
        next unless line.match( /^(Time|\d)/ )
        if line.sub!( /^(\d+).*?\s(.*)/, '\2' )
            #line = ( $time + $1.to_i ).strftime( "%H:%M:%S" ) + line 
            line = ( $time + $1.to_i ).strftime( "%s" ) + line
            line.gsub!( /\s+/, "\t")
            line.gsub!( /\s+/, "")
            line.gsub!( /\|/, "\t")
        $fh.puts line

Invoke the script as shown below.

ruby make-timestamp.rb *stats

This will produce a file called <String>-YYYY-MM-DD_hh-mm.stats.ts. Below is an example of the file. Note: the is in Epoch for easier processing in R.

Time    frames  bytes
1318546800      62      5314
1318546801      62      5780
1318546802      62      6062
1318546803      62      5894
1318546804      62      5424
1318546805      62      5198
1318546806      62      5140
1318546807      59      5360
1318546808      62      5642

Produce a graph with R

Finally to produce the graph in PDF or PNG format the below R script is being used.


args <- commandArgs( trailingOnly = TRUE )

number.graphs <- length( args )

for ( d in 1:length( args ) ) { 
    file <- argsd
    traffic <-  read.table( file=file, header=T, sep="\t" );
    traffic$kbits   <- ( traffic$bytes * 8 ) / 1024
    traffic$frames  <- NULL
    traffic$bytes   <- NULL
    traffic$Time    <- as.POSIXlt.POSIXct( traffic$Time ) 

    traffic.max <- round( max( traffic$kbits ), digits = 2 )
    traffic.avg <- round( mean( traffic$kbits ), digits = 2 )
    sub.title <- paste( "Max:", traffic.max, "Kbit/s; Avg:", traffic.avg, "Kbit/s" )
    names( traffic ) 
    pdf( paste( file, ".pdf", sep = "" ) )
    plot( traffic,  type="h", main=file, sub=sub.title, xlab="Time", ylab="Kbit/s" )
    png( paste( file, ".png", sep = "" ) )
    plot( traffic,  type="h", main=file, sub=sub.title, xlab="Time", ylab="Kbit/s" )

To run the script issue the following command assuming the above script is called tshark-graph.R.