Difference between revisions of "Network/Visualize pcap file data"

Latest revision as of 23:45, 8 June 2014

One nice day I was given orders to produce network usage statistics to find eventual bursts in the network stream. Right from the start I faced two problems. Firtly the network monitoring software was graphing the network flow only every minute which was too coarse and secondly the interface was connected to a switch I had no control over at all, a mirror port had to be ruled out. The assignment was to collect data for a week and then look at the numbers.

I was unsure how to go about it in the first place. To not fall behind I ran tcpdump on the hosts in question until the week was over. That left me with a bit of time to figure out how to process the pcap dump data. After a lot of searching I finally came across a nifty feature in tshark allowing me to aggregate bandwidth on a per second basis. Below is a short recipe how to create graphs from captured network traffic.

Goal

Create a graph from a pcap capture file with a precision of one second.

Prerequisites

A capture file the wireshark suite understands. E.g. pcap or Solaris snoop among others.
tshark
R
ImageMagick's montage [optional]

Howto

Aggregate traffic with `tshark`

To properly graph the data tshark needs to generate statistic on a per second basis. The below command will achive this.

tshark -q -z 'io,stat,1' -r <PcapFile> > <StatisticsFile>

The output is looking something like the excerpt below.

===================================================================
IO Statistics
Interval: 1.000 secs
Column #0:
                |   Column #0
Time            |frames|  bytes
000.000-001.000      62      5578 
001.000-002.000      62      5386 
002.000-003.000      62      5692 
003.000-004.000      62      5968 
004.000-005.000      62      5428 
005.000-006.000      62      5838 
006.000-007.000      62      5912

The only problem with the output above is that the time is relative to the start of the pcap file.

Produce a graph with `R`

To produce the graph in PDF or PNG format R is used. There are a couple of things that need to be adjusted before running the script.

The number of lines to skip when reading the file in the above example that would be 7 but your milage may vary. The variable to assign the value is skip.header
tshark prints a comment line at the end of the file that has to be set with the comment.char variable in the script.
The file name has to contain the date and time when the capture was started e.g. <String>-YYYY-MM-DD_hh-mm.stats or the time can not be properly converted.

#!/usr/bin/Rscript

## ----------------------------------------------------------------------------
## Globals for reading the data file
## ----------------------------------------------------------------------------
skip.header  <- 7           # how many lines to skip including the header row
comment.char <- "="         # skip lines with <char> in it 


## ----------------------------------------------------------------------------
## Don't touch below unless you know what you are doing
## ----------------------------------------------------------------------------
col.names    <- c( "time", "frames", "bytes" )
args <- commandArgs( trailingOnly = TRUE )
number.graphs <- length( args )

for ( d in 1:length( args ) ) {
    file      <- args[[d]]
    # get date and time from file name and convert to a time object
    date.time <- as.POSIXlt(
                   gsub(
                     ".*([0-9]{4}-[0-9]{2}-[0-9]{2})_([0-9]{2})-([0-9]{2}).*",
                     "\\1 \\2:\\3:00",
                     file,
                     perl=T
                   )
                 )
    # read the data
    traffic   <- read.table( file=file,
                             header=F,
                             col.names=col.names,
                             skip=skip.header,
                             comment.char="="
                           )
    # massage the data a bit
    traffic$kbits   <- ( traffic$bytes * 8 ) / 1024
    traffic$frames  <- NULL
    traffic$bytes   <- NULL
    traffic$time    <- as.numeric(
                         gsub("-.*", "", traffic$time, perl = T )
                       ) + date.time
    # calculate max and avg
    traffic.max <- round( max( traffic$kbits ), digits = 2 )
    traffic.avg <- round( mean( traffic$kbits ), digits = 2 )
    # prepare the graph
    sub.title <- paste( "Max:", traffic.max, "Kbit/s; Avg:", traffic.avg, "Kbit/s" )
    names( traffic )
    # output as pdf and png 
    pdf( paste( file, ".pdf", sep = "" ) )
    plot( traffic,  type="h", main=file, sub=sub.title, xlab="Time", ylab="Kbit/s" )
    png( paste( file, ".png", sep = "" ) )
    plot( traffic,  type="h", main=file, sub=sub.title, xlab="Time", ylab="Kbit/s" )
}

To run the script issue the following command assuming the above script is called tshark-graph.R.

Rscript tshark-graph.R *stats

Resulting graph

There are sexier graphs out there but from a functional standpoint it does the job.

Combining graphs

R is fully capabale of creating a collection of graphs from a bunch of files but personally I think it's a lot more involved than simply using ImageMagick's montage command.

montage -geometry <Width>x<Height> <GraphFiles> <OutputGraph>

Yields a similar graph like the one below (intentionally downsampled to fit page)

@@ Line 1: / Line 1: @@
-{{DISPLAYTITLE: Visualize pcap file data}}
+{{DISPLAYTITLE: Visualize pcap file data with R}}
-One nice day I was given orders to produce network usage statistics to find eventual burst in the network stream. I faced two problems with this task. The network monitoring software was graphing the network flow only every minute which was too coarse and the interface was connected to a switch I had no control over at all, A mirror port had to be ruled out. The assignment was to collect data for a week and then look at the numbers.
+One nice day I was given orders to produce network usage statistics to find eventual bursts in the network stream. Right from the start I faced two problems. Firtly the network monitoring software was graphing the network flow only every minute which was too coarse and secondly the interface was connected to a switch I had no control over at all, a mirror port had to be ruled out. The assignment was to collect data for a week and then look at the numbers.
-I was unsure how to go about it in the first place. To not fall behind I ran <tt>tcpdump</tt> on the hosts in question until the week was over. At that point in time I had no idea how to process the <tt>pcap</tt> dump data and I had not the faintest clue how to present it at the end of the week. After a lot of searching I finally came across a nifty feature in <tt>tshark</tt> allowing me to aggregate bandwidth on a per second basis. Below is a short recipe how to create graphs from captured network traffic.
+I was unsure how to go about it in the first place. To not fall behind I ran <tt>tcpdump</tt> on the hosts in question until the week was over. That left me with a bit of time to figure out how to process the <tt>pcap</tt> dump data. After a lot of searching I finally came across a nifty feature in <tt>tshark</tt> allowing me to aggregate bandwidth on a per second basis. Below is a short recipe how to create graphs from captured network traffic.
+== Goal ==
+Create a graph from a <tt>pcap</tt> capture file with a precision of one second.
 == Prerequisites ==
-* capture file the wireshark suite understands. E.g. <tt>pcap</tt> or Solaris <tt>snoop</tt> among others.
+* A capture file the wireshark suite understands. E.g. <tt>pcap</tt> or Solaris <tt>snoop</tt> among others.
 * [http://wireshark.org tshark]
-* [http://ruby-lang.org ruby]
 * [http://r-project.org R]
 * [http://imagemagick.org ImageMagick's montage] [optional]
@@ Line 16: / Line 18: @@
  tshark -q -z 'io,stat,1' -r <span class="input"><PcapFile></span> > <span class="input"><StatisticsFile></span>
 The output is looking something like the excerpt below.
- <<<<<<< <span class="input"><StatisticsFile></span>
+ ===================================================================
- =======
+ IO Statistics
- Time            |frames|  bytes
+ Interval: 1.000 secs
+ Column #0:
+                 |   Column #0
+ Time            |frames|  bytes
  <span class="highlight">000.000-001.000</span>      62      5578
 .000-002.000      62      5386
@@ Line 26: / Line 32: @@
 .000-006.000      62      5838
 .000-007.000      62      5912
-The only problem with the output above is that the time is relative to the start of the <tt>pcap</tt> file. Before passing the data to <tt>R</tt> it has to be properly massaged.
+The only problem with the output above is that the time is relative to the start of the <tt>pcap</tt> file.
+=== Produce a graph with <tt>R</tt> ===
+To produce the graph in PDF or PNG format <tt>R</tt> is used. There are a couple of things that need to be adjusted before running the script.
+* The number of lines to skip when reading the file in the above example that would be '''7''' but your milage may vary. The variable to assign the value is <tt>skip.header</tt>
+* <tt>tshark</tt> prints a comment line at the end of the file that has to be set with the <tt>comment.char</tt> variable in the script.
+* The file name has to contain the date and time when the capture was started e.g. <tt><String>-<span class="highlight">YYYY-MM-DD_hh-mm</span>.stats</tt> or the time can not be properly converted.
+ #!/usr/bin/Rscript
-=== Convert the time with <tt>ruby</tt> ===
-'''Note:''' I'm pretty sure this part could be done in <tt>R</tt> but with the deadline looming I decided to write it in a language I'm familiar with.
-My data captures are usually automated using a script that writes the start date and time into the file name to make it unique. The below <tt>ruby</tt> script assumes the file names being passed to it are in the form of <tt><String>-<span class="highlight">YYYY-MM-DD_hh-mm</span>.stats</tt>
- #!/usr/bin/ruby
+ ## ----------------------------------------------------------------------------
- $files = ARGV
+ ## Globals for reading the data file
+ ## ----------------------------------------------------------------------------
+ skip.header  <- <span class="highlight">7</span>           # how many lines to skip including the header row
+ comment.char <- <span class="highlight">"="</span>         # skip lines with <char> in it
- dateRegex = /(\d{4})-(0[1-9]|1[0-2])-(0[0-9]|[12][0-9]|3[01])_([01][0-9]|2[0-4])-([0-5][0-9])/
- $files.each do |file|
-     file.match( dateRegex )
-     $time = Time.local( $1, $2, $3, $4, $5, 0 )
-     $fh = File.open( file + ".ts", "w" )
-     File.open( file ).each do |line|
-         # filter lines
-         next unless line.match( /^(Time|\d)/ )
-         line.strip!
-         if line.sub!( /^(\d+).*?\s(.*)/, '\2' )
-             #line = ( $time + $1.to_i ).strftime( "%H:%M:%S" ) + line
-             line = ( $time + $1.to_i ).strftime( "%s" ) + line
-             line.gsub!( /\s+/, "\t")
-         else
-             line.gsub!( /\s+/, "")
-             line.gsub!( /\|/, "\t")
-         end
-         $fh.puts line
-     end
-     $fh.close
- end
-Invoke the script as shown below assuming the above script is saved as <tt>make-timestamp.rb</tt>.
- ruby make-timestamp.rb *stats
-This will produce a file called <tt><String>-<span class="highlight">YYYY-MM-DD_hh-mm</span>.stats.<span class="highlight">ts</span></tt>. Below is an example of the file. '''Note:''' the time is in Epoch for easier processing in <tt>R</tt>.
- Time    frames  bytes
- <span class="highlight">1318546800</span>      62      5314
- 1318546801      62      5780
- 1318546802      62      6062
- 1318546803      62      5894
- 1318546804      62      5424
- 1318546805      62      5198
- 1318546806      62      5140
- 1318546807      59      5360
- 1318546808      62      5642
-=== Produce a graph with <tt>R</tt> ===
-Finally to produce the graph in PDF or PNG format the below R script is being used.
- #!/usr/bin/Rscript
+ ## ----------------------------------------------------------------------------
+ ## Don't touch below unless you know what you are doing
+ ## ----------------------------------------------------------------------------
+ col.names    <- c( "time", "frames", "bytes" )
  args <- commandArgs( trailingOnly = TRUE )
  number.graphs <- length( args )
  for ( d in 1:length( args ) ) {
-     file <- args[[d]]
+     file      <- args&#91;[d]]
+     # get date and time from file name and convert to a time object
-     traffic <-  read.table( file=file, header=T, sep="\t" );
+     date.time <- as.POSIXlt(
+                    gsub(
+                      ".*([0-9]{4}-[0-9]{2}-[0-9]{2})_([0-9]{2})-([0-9]{2}).*",
+                      "\\1 \\2:\\3:00",
+                      file,
+                      perl=T
+                    )
+                  )
+     # read the data
+     traffic   <- read.table( file=file,
+                              header=F,
+                              col.names=col.names,
+                              skip=skip.header,
+                              comment.char="="
+                            )
+     # massage the data a bit
      traffic$kbits   <- ( traffic$bytes * 8 ) / 1024
      traffic$frames  <- NULL
      traffic$bytes   <- NULL
-     traffic$Time    <- as.POSIXlt.POSIXct( traffic$Time )
+     traffic$time    <- as.numeric(
+                          gsub("-.*", "", traffic$time, perl = T )
+                        ) + date.time
+     # calculate max and avg
      traffic.max <- round( max( traffic$kbits ), digits = 2 )
      traffic.avg <- round( mean( traffic$kbits ), digits = 2 )
+     # prepare the graph
      sub.title <- paste( "Max:", traffic.max, "Kbit/s; Avg:", traffic.avg, "Kbit/s" )
      names( traffic )
+     # output as pdf and png
      pdf( paste( file, ".pdf", sep = "" ) )
      plot( traffic,  type="h", main=file, sub=sub.title, xlab="Time", ylab="Kbit/s" )
@@ Line 99: / Line 92: @@
      plot( traffic,  type="h", main=file, sub=sub.title, xlab="Time", ylab="Kbit/s" )
  }
+To run the script issue the following command assuming the above script is called <tt>tshark-graph.R</tt>.
+ Rscript tshark-graph.R *stats
-To run the script issue the following command assuming the above script is called <tt>tshark-graph.R</tt>.
- Rscript tshark-graph.R *ts
 === Resulting graph ===
@@ Line 113: / Line 105: @@
 [[File:Pcap-graph-montage.png]]
 [[Category: Network]]
+[[Category: Wireshark]]
+[[Category: R]]

Difference between revisions of "Network/Visualize pcap file data"

Latest revision as of 23:45, 8 June 2014

Contents

Goal

Prerequisites

Howto

Aggregate traffic with `tshark`

Produce a graph with `R`

Resulting graph

Combining graphs

Navigation menu

Difference between revisions of "Network/Visualize pcap file data"

Latest revision as of 23:45, 8 June 2014

Goal

Prerequisites

Howto

Aggregate traffic with tshark

Produce a graph with R

Resulting graph

Combining graphs

Navigation menu

Search

Aggregate traffic with `tshark`

Produce a graph with `R`