OSS Plots

From Open Source Software Research
Jump to: navigation, search
Top Links
• Query the Archive
• Schema Browser
• Research Data
• Making Queries
• Resources
• Papers
• Contact
• Schemas
• All tables
• Finding data
• ER diagrams

Notes on R

  • You can obtain a package for R from R-project.org
  • Know the format your data is in before you import it, as you will probably get errors if you don't enter the correct parameters in for read.table
  • You can take a look at the input by just typing in the object name at the left of <- read.table(...)
  • V1, V2, etc are the column labels assigned by read.table
  • See http://cran.r-project.org/manuals.html for more on how to use R


  • I used the statistics package R to develop these plots from the data I collected earlier
Sf0103 dev.png
Sf0103 proj.png


  • There are multiple commands you can use to import and plot the data. The following are the methods I used.
  • If the input file is taken from the query site, and of the form (or similar)

Replace the read.table command with scan and plot with the following:

sf0606 <- scan("sf0606", what=list(x=0,y=0), sep="#") 
plot(log(sf0606$y) ~ log(sf0606$x), xlab= "P Projects (log10)", ylab = "Number of Developers (log10)", 
main = "Number of Developers for P Projects", sub = "sf0606")

Multiple Samples

  • the following commands are useful if you have more than one table you want to plot data from, especially when observing trends across the months.
datafiles <- list.files (path=".", pattern="sf[0-9]+_proj")
for (i in 1:length(datafiles))
k <- read.table(datafiles[i], header=1, sep="|")
plot(log(k[,2]) ~ log(k[,1]), xlab = "N Developers (log10)", ylab = "Number of Projects (log10)", main= 
"Number of Projects for N Developers", sub = substr(datafiles[i],1,6))
coeff<- rlm(log(k[,2]) ~ log(k[,1]))
abline(coeff, col="blue", lty=2)
  • a list of all the input files can be created using the form object_name <- list.files (path="path_directory", pattern="format_of_files")

Single Sample

> library(MASS)
> sf0205 <- read.table("sf0205_projsize", header = TRUE, sep = "|")
> png("sf0205_size.png") 
> plot(log(sf0103[,2]) ~ log(sf0103[,1]), xlab = "N Developers (log10)", ylab = "Number of Projects (log10)", 
main = "Number of Projects for N Developers", sub = "sf0103")
> coeff<- rlm(log(sf0205[,2]) ~ log(sf0205[,1])
> abline(coeff, col = "blue" lty = 2) 
> legend(0, 2.5, coef(coeff))
> dev.off() 


> sf0103_dev <- read.table("sf0103_table", header = 1, sep = "|") 
> png("sf0103_dev.png") 
> plot(log(sf0103_dev[,2]) ~ log(sf0103_dev[,1]), xlab = "P Projects (log10)", ylab = "Number of Developers 
(log10)", main = "Number of Developers for P Projects", sub = "sf0103")
> coeff <- rlm(log(sf0103_dev[,2]) ~ log(sf0103_dev[,1]))
> abline(coeff, col = "blue", lty = 2) 
> legend(0, 2.5, coef(coeff))
> dev.off() 
  • read.table takes the filename, and various options, include the header indicator, and setting which delimiter to look for
  • the MASS library is required to use rlm, which will provide a more robust regression line
  • plot(y ~ x) is the basic form
  • abline provides the regression line
  • png("file") outputs the plot following
  • sf0205[,2] indicates the data should be taken from the second column of the table you just imported
  • lty is the line type; in this case the plot will generate a dotted line.
  • coef extracts the regression coefficient
  • legend just puts the coeffient values into a box on the graph
Related Links
Making Queries | Creating Snapshots | Sample Snapshots