Difference between revisions of "Network/Visualize pcap file data"
(14 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
{{DISPLAYTITLE: Visualize pcap file data}} |
{{DISPLAYTITLE: Visualize pcap file data with R}} |
||
One nice day I was given orders to produce network usage statistics to find eventual |
One nice day I was given orders to produce network usage statistics to find eventual bursts in the network stream. Right from the start I faced two problems. Firtly the network monitoring software was graphing the network flow only every minute which was too coarse and secondly the interface was connected to a switch I had no control over at all, a mirror port had to be ruled out. The assignment was to collect data for a week and then look at the numbers. |
||
I was unsure how to go about it in the first place. To not fall behind I ran <tt>tcpdump</tt> on the hosts in question until the week was over. |
I was unsure how to go about it in the first place. To not fall behind I ran <tt>tcpdump</tt> on the hosts in question until the week was over. That left me with a bit of time to figure out how to process the <tt>pcap</tt> dump data. After a lot of searching I finally came across a nifty feature in <tt>tshark</tt> allowing me to aggregate bandwidth on a per second basis. Below is a short recipe how to create graphs from captured network traffic. |
||
== Goal == |
|||
Create a graph from a <tt>pcap</tt> capture file with a precision of one second. |
|||
== Prerequisites == |
== Prerequisites == |
||
* capture file the wireshark suite understands. E.g. <tt>pcap</tt> or Solaris <tt>snoop</tt> among others. |
* A capture file the wireshark suite understands. E.g. <tt>pcap</tt> or Solaris <tt>snoop</tt> among others. |
||
* [http://wireshark.org tshark] |
* [http://wireshark.org tshark] |
||
* [http://ruby-lang.org ruby] |
|||
* [http://r-project.org R] |
* [http://r-project.org R] |
||
* [http://imagemagick.org ImageMagick's montage] [optional] |
* [http://imagemagick.org ImageMagick's montage] [optional] |
||
Line 16: | Line 18: | ||
tshark -q -z 'io,stat,1' -r <span class="input"><PcapFile></span> > <span class="input"><StatisticsFile></span> |
tshark -q -z 'io,stat,1' -r <span class="input"><PcapFile></span> > <span class="input"><StatisticsFile></span> |
||
The output is looking something like the excerpt below. |
The output is looking something like the excerpt below. |
||
<<<<<<< <span class="input"><StatisticsFile></span> |
|||
=================================================================== |
|||
======= |
|||
IO Statistics |
|||
Time |frames| bytes |
|||
Interval: 1.000 secs |
|||
Column #0: |
|||
| Column #0 |
|||
Time |frames| bytes |
|||
<span class="highlight">000.000-001.000</span> 62 5578 |
<span class="highlight">000.000-001.000</span> 62 5578 |
||
001.000-002.000 62 5386 |
001.000-002.000 62 5386 |
||
Line 26: | Line 32: | ||
005.000-006.000 62 5838 |
005.000-006.000 62 5838 |
||
006.000-007.000 62 5912 |
006.000-007.000 62 5912 |
||
The only problem with the output above is that the time is relative to the start of the <tt>pcap</tt> file. |
The only problem with the output above is that the time is relative to the start of the <tt>pcap</tt> file. |
||
=== Produce a graph with <tt>R</tt> === |
|||
To produce the graph in PDF or PNG format <tt>R</tt> is used. There are a couple of things that need to be adjusted before running the script. |
|||
* The number of lines to skip when reading the file in the above example that would be '''7''' but your milage may vary. The variable to assign the value is <tt>skip.header</tt> |
|||
* <tt>tshark</tt> prints a comment line at the end of the file that has to be set with the <tt>comment.char</tt> variable in the script. |
|||
* The file name has to contain the date and time when the capture was started e.g. <tt><String>-<span class="highlight">YYYY-MM-DD_hh-mm</span>.stats</tt> or the time can not be properly converted. |
|||
#!/usr/bin/Rscript |
|||
=== Convert the time with <tt>ruby</tt> === |
|||
'''Note:''' I'm pretty sure this part could be done in <tt>R</tt> but with the deadline looming I decided to write it in a language I'm familiar with. |
|||
My data captures are usually automated using a script that writes the start date and time into the file name to make it unique. The below <tt>ruby</tt> script assumes the file names being passed to it are in the form of <tt><String>-<span class="highlight">YYYY-MM-DD_hh-mm</span>.stats</tt> |
|||
#!/usr/bin/ruby |
|||
## ---------------------------------------------------------------------------- |
|||
$files = ARGV |
|||
## Globals for reading the data file |
|||
## ---------------------------------------------------------------------------- |
|||
skip.header <- <span class="highlight">7</span> # how many lines to skip including the header row |
|||
comment.char <- <span class="highlight">"="</span> # skip lines with <char> in it |
|||
dateRegex = /(\d{4})-(0[1-9]|1[0-2])-(0[0-9]|[12][0-9]|3[01])_([01][0-9]|2[0-4])-([0-5][0-9])/ |
|||
$files.each do |file| |
|||
file.match( dateRegex ) |
|||
$time = Time.local( $1, $2, $3, $4, $5, 0 ) |
|||
$fh = File.open( file + ".ts", "w" ) |
|||
File.open( file ).each do |line| |
|||
# filter lines |
|||
next unless line.match( /^(Time|\d)/ ) |
|||
line.strip! |
|||
if line.sub!( /^(\d+).*?\s(.*)/, '\2' ) |
|||
#line = ( $time + $1.to_i ).strftime( "%H:%M:%S" ) + line |
|||
line = ( $time + $1.to_i ).strftime( "%s" ) + line |
|||
line.gsub!( /\s+/, "\t") |
|||
else |
|||
line.gsub!( /\s+/, "") |
|||
line.gsub!( /\|/, "\t") |
|||
end |
|||
$fh.puts line |
|||
end |
|||
$fh.close |
|||
end |
|||
Invoke the script as shown below assuming the above script is saved as <tt>make-timestamp.rb</tt>. |
|||
ruby make-timestamp.rb *stats |
|||
This will produce a file called <tt><String>-<span class="highlight">YYYY-MM-DD_hh-mm</span>.stats.<span class="highlight">ts</span></tt>. Below is an example of the file. '''Note:''' the time is in Epoch for easier processing in <tt>R</tt>. |
|||
Time frames bytes |
|||
<span class="highlight">1318546800</span> 62 5314 |
|||
1318546801 62 5780 |
|||
1318546802 62 6062 |
|||
1318546803 62 5894 |
|||
1318546804 62 5424 |
|||
1318546805 62 5198 |
|||
1318546806 62 5140 |
|||
1318546807 59 5360 |
|||
1318546808 62 5642 |
|||
=== Produce a graph with <tt>R</tt> === |
|||
Finally to produce the graph in PDF or PNG format the below R script is being used. |
|||
#!/usr/bin/Rscript |
|||
## ---------------------------------------------------------------------------- |
|||
## Don't touch below unless you know what you are doing |
|||
## ---------------------------------------------------------------------------- |
|||
col.names <- c( "time", "frames", "bytes" ) |
|||
args <- commandArgs( trailingOnly = TRUE ) |
args <- commandArgs( trailingOnly = TRUE ) |
||
number.graphs <- length( args ) |
number.graphs <- length( args ) |
||
for ( d in 1:length( args ) ) { |
for ( d in 1:length( args ) ) { |
||
file <- args |
file <- args[[d]] |
||
# get date and time from file name and convert to a time object |
|||
traffic <- read.table( file=file, header=T, sep="\t" ); |
|||
date.time <- as.POSIXlt( |
|||
gsub( |
|||
".*([0-9]{4}-[0-9]{2}-[0-9]{2})_([0-9]{2})-([0-9]{2}).*", |
|||
"\\1 \\2:\\3:00", |
|||
file, |
|||
perl=T |
|||
) |
|||
) |
|||
# read the data |
|||
traffic <- read.table( file=file, |
|||
header=F, |
|||
col.names=col.names, |
|||
skip=skip.header, |
|||
comment.char="=" |
|||
) |
|||
# massage the data a bit |
|||
traffic$kbits <- ( traffic$bytes * 8 ) / 1024 |
traffic$kbits <- ( traffic$bytes * 8 ) / 1024 |
||
traffic$frames <- NULL |
traffic$frames <- NULL |
||
traffic$bytes <- NULL |
traffic$bytes <- NULL |
||
traffic$ |
traffic$time <- as.numeric( |
||
gsub("-.*", "", traffic$time, perl = T ) |
|||
) + date.time |
|||
# calculate max and avg |
|||
traffic.max <- round( max( traffic$kbits ), digits = 2 ) |
traffic.max <- round( max( traffic$kbits ), digits = 2 ) |
||
traffic.avg <- round( mean( traffic$kbits ), digits = 2 ) |
traffic.avg <- round( mean( traffic$kbits ), digits = 2 ) |
||
# prepare the graph |
|||
sub.title <- paste( "Max:", traffic.max, "Kbit/s; Avg:", traffic.avg, "Kbit/s" ) |
sub.title <- paste( "Max:", traffic.max, "Kbit/s; Avg:", traffic.avg, "Kbit/s" ) |
||
names( traffic ) |
names( traffic ) |
||
# output as pdf and png |
|||
pdf( paste( file, ".pdf", sep = "" ) ) |
pdf( paste( file, ".pdf", sep = "" ) ) |
||
plot( traffic, type="h", main=file, sub=sub.title, xlab="Time", ylab="Kbit/s" ) |
plot( traffic, type="h", main=file, sub=sub.title, xlab="Time", ylab="Kbit/s" ) |
||
Line 99: | Line 92: | ||
plot( traffic, type="h", main=file, sub=sub.title, xlab="Time", ylab="Kbit/s" ) |
plot( traffic, type="h", main=file, sub=sub.title, xlab="Time", ylab="Kbit/s" ) |
||
} |
} |
||
To run the script issue the following command assuming the above script is called <tt>tshark-graph.R</tt>. |
|||
Rscript tshark-graph.R *stats |
|||
To run the script issue the following command assuming the above script is called <tt>tshark-graph.R</tt>. |
|||
Rscript tshark-graph.R *ts |
|||
=== Resulting graph === |
=== Resulting graph === |
||
Line 113: | Line 105: | ||
[[File:Pcap-graph-montage.png]] |
[[File:Pcap-graph-montage.png]] |
||
[[Category: Network]] |
[[Category: Network]] |
||
[[Category: Wireshark]] |
|||
[[Category: R]] |
Latest revision as of 23:45, 8 June 2014
One nice day I was given orders to produce network usage statistics to find eventual bursts in the network stream. Right from the start I faced two problems. Firtly the network monitoring software was graphing the network flow only every minute which was too coarse and secondly the interface was connected to a switch I had no control over at all, a mirror port had to be ruled out. The assignment was to collect data for a week and then look at the numbers.
I was unsure how to go about it in the first place. To not fall behind I ran tcpdump on the hosts in question until the week was over. That left me with a bit of time to figure out how to process the pcap dump data. After a lot of searching I finally came across a nifty feature in tshark allowing me to aggregate bandwidth on a per second basis. Below is a short recipe how to create graphs from captured network traffic.
Goal
Create a graph from a pcap capture file with a precision of one second.
Prerequisites
- A capture file the wireshark suite understands. E.g. pcap or Solaris snoop among others.
- tshark
- R
- ImageMagick's montage [optional]
Howto
Aggregate traffic with tshark
To properly graph the data tshark needs to generate statistic on a per second basis. The below command will achive this.
tshark -q -z 'io,stat,1' -r <PcapFile> > <StatisticsFile>
The output is looking something like the excerpt below.
===================================================================
IO Statistics
Interval: 1.000 secs
Column #0:
| Column #0
Time |frames| bytes
000.000-001.000 62 5578
001.000-002.000 62 5386
002.000-003.000 62 5692
003.000-004.000 62 5968
004.000-005.000 62 5428
005.000-006.000 62 5838
006.000-007.000 62 5912
The only problem with the output above is that the time is relative to the start of the pcap file.
Produce a graph with R
To produce the graph in PDF or PNG format R is used. There are a couple of things that need to be adjusted before running the script.
- The number of lines to skip when reading the file in the above example that would be 7 but your milage may vary. The variable to assign the value is skip.header
- tshark prints a comment line at the end of the file that has to be set with the comment.char variable in the script.
- The file name has to contain the date and time when the capture was started e.g. <String>-YYYY-MM-DD_hh-mm.stats or the time can not be properly converted.
#!/usr/bin/Rscript ## ---------------------------------------------------------------------------- ## Globals for reading the data file ## ---------------------------------------------------------------------------- skip.header <- 7 # how many lines to skip including the header row comment.char <- "=" # skip lines with <char> in it ## ---------------------------------------------------------------------------- ## Don't touch below unless you know what you are doing ## ---------------------------------------------------------------------------- col.names <- c( "time", "frames", "bytes" ) args <- commandArgs( trailingOnly = TRUE ) number.graphs <- length( args ) for ( d in 1:length( args ) ) { file <- args[[d]] # get date and time from file name and convert to a time object date.time <- as.POSIXlt( gsub( ".*([0-9]{4}-[0-9]{2}-[0-9]{2})_([0-9]{2})-([0-9]{2}).*", "\\1 \\2:\\3:00", file, perl=T ) ) # read the data traffic <- read.table( file=file, header=F, col.names=col.names, skip=skip.header, comment.char="=" ) # massage the data a bit traffic$kbits <- ( traffic$bytes * 8 ) / 1024 traffic$frames <- NULL traffic$bytes <- NULL traffic$time <- as.numeric( gsub("-.*", "", traffic$time, perl = T ) ) + date.time # calculate max and avg traffic.max <- round( max( traffic$kbits ), digits = 2 ) traffic.avg <- round( mean( traffic$kbits ), digits = 2 ) # prepare the graph sub.title <- paste( "Max:", traffic.max, "Kbit/s; Avg:", traffic.avg, "Kbit/s" ) names( traffic ) # output as pdf and png pdf( paste( file, ".pdf", sep = "" ) ) plot( traffic, type="h", main=file, sub=sub.title, xlab="Time", ylab="Kbit/s" ) png( paste( file, ".png", sep = "" ) ) plot( traffic, type="h", main=file, sub=sub.title, xlab="Time", ylab="Kbit/s" ) }
To run the script issue the following command assuming the above script is called tshark-graph.R.
Rscript tshark-graph.R *stats
Resulting graph
There are sexier graphs out there but from a functional standpoint it does the job.
Combining graphs
R is fully capabale of creating a collection of graphs from a bunch of files but personally I think it's a lot more involved than simply using ImageMagick's montage command.
montage -geometry <Width>x<Height> <GraphFiles> <OutputGraph>
Yields a similar graph like the one below (intentionally downsampled to fit page)