Making Queries

From Open Source Software Research

Jump to: navigation, search
Top Links
• Query the Archive
• Schema Browser
• Research Data
• Making Queries
• Resources
• Papers
• Contact
• FAQ
• Schemas
• All tables
• Finding data
• ER diagrams

Contents

The Data Query Form

The query form features three main fields: SELECT, FROM, and WHERE.

The Result Field Separator features three options which allow you to choose between the colon, semicolon, and pound sign as a delimiter between the columns of your results.

One difficultly is working out whether the fields you are extracting actually contain the character that you are using as the separator. Many of the full text fields (eg artifact.summary or artifact.details) often contain these characters, which makes parsing the output impossible. A workaround is to always use REPLACE(summary, ":", "~|~"), then after the data has been extracted from the database substitute the ~|~ (which is just an unlikely sequence) for the colon again. (eg string.gsub("~|~",":")). You need to do that for each field that might contain the separating character. A full query, then, is:

SELECT a.artifact_id, a.submitted_by, a.open_date, REPLACE(a.summary, ':', '~|~'), REPLACE(a.details, ':', '~|~')
FROM sf0606.artifact as a
WHERE a.group_artifact_id=497423 ORDER BY a.open_date

There is also a link to a schema browser if you would like to explore what data is available for each schema.

You may access the query form here:
SourceForge Research Archive Query

Tips on Making Queries

The query is cap-insensitive. The WHERE field may be left empty without causing errors.
Some useful clauses include: ORDER BY, COUNT, GROUP BY.

More complex queries can be made by nesting commands.

You can see a sample query and its results here.
Some useful links for newcomers to SQL:

Common Queries

Hints on Debugging

Some of the common errors tend to look like the following:

This error generally refers to a syntax error at the end of the FROM field

Couldn't execute statement: ERROR:  syntax error at end of input at character 32 at /var/www/cgi-bin/req.pl line 71.

Check the end of the SELECT statement for syntax errors

Couldn't execute statement: ERROR:  syntax error at or near "FROM" at character 9 at /var/www/cgi-bin/req.pl line 71.

Check the schema browser for the information you are looking for, as the item in quotes is not in that table.

Couldn't execute statement: ERROR:  column "hello" does not exist at /var/www/cgi-bin/req.pl line 71.


Check to make sure you don't have unnecessary commas or other punctuation.

Analysis

Preliminary analysis on the relationship between number of projects and number of developers has been plotted using R.

  • Results can be found here.
  • Information on how the plots were created can be found here.
Related Links
Creating Snapshots | Sample Snapshots | Tips on Creating Plots | Dataset Resources