NVO HOME National Virtual Observatory ICON
Open SkyQuery

Hosted By
JHU Home

Open SkyQuery Limitations

Open SkyQuery is a facility that allows users to access individual astronomical catalogs, as well as to compare them finding positional cross-matches subject to any other conditions or constraints the user wishes to define based on the data in the catalogs. The catalogs/databases that are available for use are shown on the Query Screen on the left side, under the title Nodes. We call them SkyNodes.

Users should be aware of the fact that queries between SkyNodes (including MyData) are always limited to a maximum of 5000 rows. We apply this restriction so Web access is possible and big queries don't swamp the systems.

What does this 5000 rows limit mean?
- Single node queries will be limited to 5000 rows.
- Cross-matches between query sets that contain about 5000 objects are likely to be incomplete.

Why?
The way Open SkyQuery works is as follows:

  • First, SkyNodes are queried for the number of rows that meet the query region constraints.
  • Then, a query plan is created in such a way that the smallest SkyNode is executed first and it sends the results on to the next SkyNode in size to do the first cross-match, and so on.
  • If a SkyNode has more than 5000 objects in the specified region, the first cut will be applied here. This is likely to happen when the REGION constraint covers a big area with a lot of objects. The REGION constraint is applied first to determine how many objects lie within it, and if the number of objects in the REGION is higher than 5000, the first 5000 objects only are selected. When the remainder of the WHERE clause constraints are applied in the second step, the total number of rows could end up being smaller than the 5000 limit.
  • Since the first 5000 objects are selected using the SQL "TOP 5000" construct, there is no order implied and hence no guarantee that the same 5000 objects will be selected if the query is repeated. This may lead to unrepeatable results from consecutive calls of the exactly same query. This is an annoying "feature" that we are addressing and trying to solve in our ongoing development.
  • Additional cuts may happen when a cross-match is performed and the results are sent to the next node. During the cross-match process, each object from the prior node is compared to the current catalog looking for matches. If the prior node provided about 5000 rows and a one-to-one match is expected, the xmatch table might end up with more than 5000 rows depending on how restrictive is the confidence level in "XMATCH () < confidence_level" and the catalog astrometric precision, sigma.
    http://openskyquery.net/Sky/SkySite/help/algo.aspx
So what should you do to avoid this limitation?
The only way to get around the region cut is to use small regions at the moment so that the number of matching objects will be less than 5000. We are already working on a parallel framework capable of doing full catalog-to-catalog cross-matches. Thank you for your patience!
 

NSF HOME NASA HOME
Developed with support from the National Science Foundation (under Cooperative Agreement
AST0122449 with the Johns Hopkins University), and NASA AISRP (awards NAG5-17042
and NAG5-12092). The NVO is a member of the International Virtual Observatory Alliance.

This NVO Application is hosted by the JHU Department of Physics & Astronomy.

Member
IVOA HOME
Meet the Developers
MEET THE DEVELOPERS

Last Modified: Tuesday, February 10, 2009 at 5:32:11 PM $Name: $Revision 1.1.1.1