Now, there are no remaining speed issues - with the slowest query on my laptop taking less than a second. And the whole page return in 1.5 to 2.5 seconds. This is still pretty poor performance compared to some searching. I'm not using the SPARQL views or elaborate caching, and I haven't tried YARS yet, although Sesame in Memory seems to be doing quite well.
With more time, I'll spend time squeezing more speed out of this thing. But first:
There are some issues about the correctness of subcategories, and other minor issues, but the hardest part is now complete.
using json_encode in the quercus PHP environment caused some crashes recently - which I've filed a bug for and created a workaround.
I've also created this bug http://bugs.caucho.com/view.php?id=2014 on array_multisort. I need it fixed :(
JFYI, the quercus library is licensed under the GPL, and can run in java servers other than Caucho's Resin. It is a fairly complete implementation of PHP 5.
Sparql views are a caching mechanism for prepared queries - a simple mechanism for storing subgraphs to make the queries faster. It can uses more resources (HD space and connection resources) while improving speed.
Here are results of running the scripts against the database of 90,000 statements - 34 queries, in about .3 seconds.
It retrieves all distinct properties, then loops through and retrieves all distinct values for those properties.
The punchline : after a number of tests, prepared queries at this point are only slightly faster. I may cache the tuple query in quercus APC - but for now, the regular queries are sufficiently fast.
COUNT of 33 DISTINCT TOTAL PROPERTIES
foreach ($prepared_aggregates as $key => $row) {
$count[$key] = $row["count"];
}
array_multisort($count, SORT_NUMERIC,SORT_DESC, $prepared_aggregates);
1
private Hashtable[] doOrderBy(
Hashtable[] results,
String[] order_by,
String[] sorting
){
/**
* a container for the return values
* */
//Hashtable<String,String>
int resultlength = results.length;
Hashtable[] final_result = new Hashtable[resultlength] ; //new Hashtable<Integer, Hashtable<String,String>>();
/**
* a mapping for looking up keys of the provided results when the sorting is complete
* */
Hashtable<String,Integer> mapTable = new Hashtable<String,Integer>();
/**
* the array of strings for sorting
* */
ArrayList<String> sortingTable = new ArrayList<String>();
/**
* a container for the sort string
* */
String sortstring;
//Enumeration<Integer> resultKeys = results.keys();
//Enumeration<String> orders;
String order_key;
String order_value;
Iterator it;
Hashtable<String,String> row = new Hashtable<String,String>();
int rownumber=0;
int or_num = order_by.length;
//iterate over the results
for(int j=0;j<resultlength;j++){
sortstring = "";
//rownumber = resultKeys.nextElement();
row = results[j];
//orders = order_by.keys();
//iterate over the orders
for(int i=0;i< or_num ; i++) {
order_key = order_by[i];
//order_key = orders.nextElement();
//order_value = order_by.get(order_key);
sortstring += row.get(order_key);
}
//get the string from the row and addit to the sortstring
mapTable.put(sortstring,j);
sortingTable.add(sortstring);
}
//sort the sortTable
Collections.sort(sortingTable); //add MIXED ASC DESC
it = sortingTable.iterator();
//iterate the sort table
//retrieve the result key from the sortable
//get the hashtable result row from the results
//place in the final results with the new sorted index
rownumber = 0;
String[] debugger = new String[results.length];
while (it.hasNext()) {
order_key = (String)it.next();
final_result[rownumber] = results[mapTable.get(order_key)];
// debugger[rownumber] = order_key;
rownumber++;
}
//return debugger;
return final_result;
}
1
Faceted search also means that people often want to know not only the general category, but also the "Count" - which could mean three things.
1. How many object are there like this
2. How many objects are there like this in the search I just completed
3. How many objects are there like this if I add different facet
The computational challenge is big, and the programmatic challenge even greater when (especially, when like most sparql developers, I'm waiting for aggregate functions.)
What I am hoping will rescue me from the evil of slow queries is Sesame's prepared queries
You are looking at the speed of AHIRC front page loads now, after another 18 hour day. This is 3-12X faster than the last post!!

Looks like I'll be using a hybrid approach of the HTTP client, and a custom class running on JAVA to do the aggregation emulation.
Here's a recap of the consequences.
The first points of failure are (since the HTTP client for Sesame works great):
* SPARQL/Sesame not having Aggregate functions
* Sesame not having ORDER BY
This produces large amounts of results and/or queries, which then need to be parsed by JSON, leading to the second point of failure
* Zend JSON and native php 5.2 JSON are not fast enough (perhaps they should not be expected to be for 6000 results)
I'm going to take an application specific approach to solving the problems I'm facing with speed. By application specific, I mean that I will depend on another piece of software in the dependency chain - that being Caucho's Resin fast Java serverlet container. Why? To avoid sending lots of data and doing lots of queries over HTTP.
