Difference between revisions of "Sar/Visualize CPU data"
| Line 27: | Line 27: | ||
 cpu.data$timestamp <- as.POSIXct( cpu.data$timestamp )  | 
   cpu.data$timestamp <- as.POSIXct( cpu.data$timestamp )  | 
||
 cpu.data$CPU[ cpu.data$CPU == "-1" ] <- "all"  | 
   cpu.data$CPU[ cpu.data$CPU == "-1" ] <- "all"  | 
||
With very little effort data is read into <tt>data.table</tt> format. Then we have to change the format of two fields. Namely <tt>timestamp</tt> needs conversion from a string to a proper time format like <tt>POSIXct</tt>. Plus the value for all CPUs is "-1" and to make it clear to the viewer we want it to be "all".   | 
|||
After the changes the structure of the cpu.data should look like the below data.   | 
|||
 str( cpu.data )   | 
|||
 'data.frame':	429 obs. of  10 variables:  | 
|||
  $ hostname : Factor w/ 1 level "hostname.local": 1 1 1 1 1 1 1 1 1 1 ...  | 
|||
  $ interval : int  588 588 588 590 590 590 589 589 589 588 ...  | 
|||
  $ timestamp: <span class="highlight">POSIXct</span>, format: "2014-05-04 00:10:01" "2014-05-04 00:10:01" ...  | 
|||
  $ CPU      : <span class="highlight">chr</span>  "all" "0" "1" "all" ...  | 
|||
  $ user     : <span class="highlight">num</span>  8.05 9.39 6.73 5.28 5.02 ...  | 
|||
  $ nice     : num  0 0 0 0 0 0 0 0 0 0 ...  | 
|||
  $ system   : num  3.83 4.32 3.34 2.75 3.21 2.3 4.59 4.99 4.19 5.1 ...  | 
|||
  $ iowait   : num  0.71 1.1 0.33 0.7 1.1 0.3 0.7 1.12 0.29 0.81 ...  | 
|||
  $ steal    : num  0 0 0 0 0 0 0 0 0 0 ...  | 
|||
  $ idle     : num  87.4 85.2 89.6 91.3 90.7 ...  | 
|||
=== Setting up the graph ===   | 
|||
The next step is to store the ggplot data into a variable for further processing. It's important to group the data by the <tt>CPU</tt> field and assign a different color to each CPU with the <tt>colour=</tt> assignment.   | 
|||
 cpu.graph <- ggplot( data=cpu.data, aes( x=timestamp, y=user, group=CPU, colour=CPU ) )  | 
   cpu.graph <- ggplot( data=cpu.data, aes( x=timestamp, y=user, group=CPU, colour=CPU ) )  | 
||
| ⚫ | |||
=== Display the result as a line graph ===  | 
|||
To display the graph a simply calling the stored graph data and assigning a layer with <tt>geom_line()</tt>.   | 
|||
| ⚫ | |||
Will result in a graph like this:   | 
  Will result in a graph like this:   | 
||
[[Image:Cpu-data-consolidated.png]]  | 
  [[Image:Cpu-data-consolidated.png]]  | 
||
Revision as of 21:24, 8 June 2014
This is a five minute guide how to visualize Linux's sar data provided by the sysstat utility without a lot of mangeling the data.
Goal
Create CPU graphs in R from the sar utility without massaging the output data too much.
Prerequisites
- The Linux sysstat package installed and configured to report performance data.
 - R
 - ggplot2 R library
 
Howto
Dumping the sar data with sadf
The data sar collects is in binary format and needs to be converted first to a format that can be imported into R. This is done with the sadf command which converts the collected data into tabular data delimited by semicolon. 
Note: On CentOS 6 and higher the sadf command also prints a header file to make most use of it we need to slightly changes it like remove the leading #, plus remove the % from the cpu data but only in the first. Other lines starting with # or containing a LINUX-RESTART should also be removed. Your milage may vary!
sadf -t -d -P ALL <SAR-FILE> | \ sed -e '1,1s/\(^#\|%\)//g' \ -e '/\(^#\|LINUX-RESTART\)/d' \ > <SADF-OUTPUT>
Importing the data into R
The next step is to read the tabular data into R and print the graphs there are just a handful of commands to do this. In R type the following commands.
library( ggplot2 )
cpu.data <- read.csv( file="<SADF-OUTPUT>", sep=";" )
cpu.data$timestamp <- as.POSIXct( cpu.data$timestamp )
cpu.data$CPU[ cpu.data$CPU == "-1" ] <- "all"
With very little effort data is read into data.table format. Then we have to change the format of two fields. Namely timestamp needs conversion from a string to a proper time format like POSIXct. Plus the value for all CPUs is "-1" and to make it clear to the viewer we want it to be "all".
After the changes the structure of the cpu.data should look like the below data.
str( cpu.data ) 'data.frame': 429 obs. of 10 variables: $ hostname : Factor w/ 1 level "hostname.local": 1 1 1 1 1 1 1 1 1 1 ... $ interval : int 588 588 588 590 590 590 589 589 589 588 ... $ timestamp: POSIXct, format: "2014-05-04 00:10:01" "2014-05-04 00:10:01" ... $ CPU : chr "all" "0" "1" "all" ... $ user : num 8.05 9.39 6.73 5.28 5.02 ... $ nice : num 0 0 0 0 0 0 0 0 0 0 ... $ system : num 3.83 4.32 3.34 2.75 3.21 2.3 4.59 4.99 4.19 5.1 ... $ iowait : num 0.71 1.1 0.33 0.7 1.1 0.3 0.7 1.12 0.29 0.81 ... $ steal : num 0 0 0 0 0 0 0 0 0 0 ... $ idle : num 87.4 85.2 89.6 91.3 90.7 ...
Setting up the graph
The next step is to store the ggplot data into a variable for further processing. It's important to group the data by the CPU field and assign a different color to each CPU with the colour= assignment.
cpu.graph <- ggplot( data=cpu.data, aes( x=timestamp, y=user, group=CPU, colour=CPU ) )
Display the result as a line graph
To display the graph a simply calling the stored graph data and assigning a layer with geom_line().
cpu.graph + geom_line()
Will result in a graph like this: 
Plotting each CPU separately
To show which CPU or core is used most it is probably better to separately print the CPUs. The ggplot2 library comes with a nifty command called facet_grid(). To print each separately simply add it to the end of the previous command.
cpu.graph + geom_line() + facet_grid( CPU~. )
Which will result in a graph like this.
Saving the graph to a file
Especially when automation comes into play saving to a file is a must this example shows how to save to PNG.
setwd( "/home/r-user/cpu-data" ) png( "cpu-graph-grid.png", width=600, height=360, res=72 ) cpu.graph + geom_line() + facet_grid( CPU~. ) dev.off()
