MAST Table Access Protocol Hubble Source Catalog Demo

HSC TAP Service Introduction

Table Access Protocol (TAP) services allow more direct and flexible access to astronomical data than the simpler types of IVOA standard data services. Queries are built with the SQL-like Astronomical Data Query Language (ADQL), and can include geographic / spatial queries as well as filtering on other characteristics of the data. This also allows the user fine-grained control over the returned columns, unlike the fixed set of coumns retunred from cone, image, and spectral services.

For this example, we'll be using the astroquery TAP/TAP+ client, which was developed by the ESAC Space Data Centre for working with the GAIA catalog, but is interoperable with other valid TAP services, including those at MAST. As an astroquery project, TAP+ documentation is available at ReadTheDocs: http://astroquery.readthedocs.io/en/latest/utils/tap.html

We'll be using TAP+ to call the most recent version (3) of the Hubble Source Catalog TAP service at MAST. The schema is described within the service, and we'll show how to inspect it. The schema is also the same as the one available via the CasJobs interface, with an additional view added for the most common positional queries. CasJobs has its own copy of the schema documentation, which can be accessed through its own site: http://mastweb.stsci.edu/hcasjobs/


Imports

In [1]:
# Use the astroquery TapPlus library as our client to the data service.
from astroquery.utils.tap.core import TapPlus

## For handling ordinary astropy Tables in responses
from astropy.table import Table

# For displaying and manipulating some types of results
%matplotlib inline
import requests
import astropy
import numpy as np
import time
from matplotlib import pyplot as plt

# To allow display tweaks for wider response tables
from IPython.core.display import display
from IPython.core.display import HTML

# For the second example: kernel density estimates
from scipy.stats import gaussian_kde

Connecting, Finding, and Displaying Table Information

Connecting to a TAP Service

The TapPlus library is able to connect to any TAP service, given the "base" URL as noted in metadata registry resources describing the service. This is the URL for the newest version of the Hubble Source Catalog TAP service.

In [2]:
HSC_service = TapPlus(url="http://vao.stsci.edu/HSCTAP/tapservice.aspx")
Created TAP+ (v1.2.1) - Connection:
	Host: vao.stsci.edu
	Use HTTPS: False
	Port: 80
	SSL Port: 443

Querying for Table Schema Information

TAP services are self-describing, which means the service itself can be asked for its schema and documentation about it. Since the hubble Source Catalog does not follow a data model described by a standard, this is the best way to see what tables and columns we have available to then query based on geometry or other filters.

The main view for HSC, SumMagAper2CatView, is extremely wide, containing columns for all potential filters, each of which may have null data. So in showing our query results, we will cut off the display with "..." marks. You can change the 'if' line to show the rest of these columns.

In [3]:
HSC_tables = HSC_service.load_tables()
print('\n')
for table in HSC_tables:
    if( table.name == 'dbo.SumMagAper2CatView'):
        print(table)
        print('\n')
        for i, column in enumerate(table.columns):
            #only show the first 30 and last 10 columns
            if i < 30 or i > len(table.columns)-10:
                print(column.name)
            #skip display for the middle column names
            elif i == 30:
                print("...")
INFO: Retrieving tables... [astroquery.utils.tap.core]
INFO: Parsing tables... [astroquery.utils.tap.core]

KeyErrorTraceback (most recent call last)
<ipython-input-3-cec91d928ba8> in <module>
----> 1 HSC_tables = HSC_service.load_tables()
      2 print('\n')
      3 for table in HSC_tables:
      4     if( table.name == 'dbo.SumMagAper2CatView'):
      5         print(table)

/opt/conda/envs/notebooks_env/lib/python3.6/site-packages/astroquery/utils/tap/core.py in load_tables(self, only_names, include_shared_tables, verbose)
    807         return self._Tap__load_tables(only_names=only_names,
    808                                       include_shared_tables=include_shared_tables,  # noqa
--> 809                                       verbose=verbose)
    810 
    811     def load_data(self, params_dict=None, output_file=None, verbose=False):

/opt/conda/envs/notebooks_env/lib/python3.6/site-packages/astroquery/utils/tap/core.py in __load_tables(self, only_names, include_shared_tables, verbose)
    235         log.info("Parsing tables...")
    236         tsp = TableSaxParser()
--> 237         tsp.parseData(response)
    238         log.info("Done.")
    239         return tsp.get_tables()

/opt/conda/envs/notebooks_env/lib/python3.6/site-packages/astroquery/utils/tap/xmlparser/tableSaxParser.py in parseData(self, data)
     65         del self.__tables[:]
     66         self.__status = READING_SCHEMA
---> 67         xml.sax.parse(data, self)
     68         return self.__tables
     69 

/opt/conda/envs/notebooks_env/lib/python3.6/xml/sax/__init__.py in parse(source, handler, errorHandler)
     31     parser.setContentHandler(handler)
     32     parser.setErrorHandler(errorHandler)
---> 33     parser.parse(source)
     34 
     35 def parseString(string, handler, errorHandler=ErrorHandler()):

/opt/conda/envs/notebooks_env/lib/python3.6/xml/sax/expatreader.py in parse(self, source)
    109             self.reset()
    110             self._cont_handler.setDocumentLocator(ExpatLocator(self))
--> 111             xmlreader.IncrementalParser.parse(self, source)
    112         except:
    113             # bpo-30264: Close the source on error to not leak resources:

/opt/conda/envs/notebooks_env/lib/python3.6/xml/sax/xmlreader.py in parse(self, source)
    123         buffer = file.read(self._bufsize)
    124         while buffer:
--> 125             self.feed(buffer)
    126             buffer = file.read(self._bufsize)
    127         self.close()

/opt/conda/envs/notebooks_env/lib/python3.6/xml/sax/expatreader.py in feed(self, data, isFinal)
    215             # document. When feeding chunks, they are not normally final -
    216             # except when invoked from close.
--> 217             self._parser.Parse(data, isFinal)
    218         except expat.error as e:
    219             exc = SAXParseException(expat.ErrorString(e.code), e, self)

/tmp/build/80754af9/python_1564510748219/work/Modules/pyexpat.c in StartElement()

/opt/conda/envs/notebooks_env/lib/python3.6/xml/sax/expatreader.py in start_element(self, name, attrs)
    331     # event handlers
    332     def start_element(self, name, attrs):
--> 333         self._cont_handler.startElement(name, AttributesImpl(attrs))
    334 
    335     def end_element(self, name):

/opt/conda/envs/notebooks_env/lib/python3.6/site-packages/astroquery/utils/tap/xmlparser/tableSaxParser.py in startElement(self, name, attrs)
     72             self.__reading_schema(name, attrs)
     73         elif self.__status == READING_TABLE:
---> 74             self.__reading_table(name, attrs)
     75         elif self.__status == READING_TABLE_COLUMN:
     76             self.__reading_table_column(name, attrs)

/opt/conda/envs/notebooks_env/lib/python3.6/site-packages/astroquery/utils/tap/xmlparser/tableSaxParser.py in __reading_table(self, name, attrs)
    108         elif self.__check_item_id("column", name):
    109             self.__status = READING_TABLE_COLUMN
--> 110             self.__currentColumn = TapColumn(attrs.getValue('esatapplus:flags'))
    111 
    112     def __end_table(self, name):

/opt/conda/envs/notebooks_env/lib/python3.6/xml/sax/xmlreader.py in getValue(self, name)
    291 
    292     def getValue(self, name):
--> 293         return self._attrs[name]
    294 
    295     def getValueByQName(self, name):

KeyError: 'esatapplus:flags'

Querying for Data

As noted above, this view contains every filter known in the HSC, and can return each even if it is NULL for the given match (this can cause warnings in astroquery). In order to narrow results, one could query on individual filters where their value is not null, or only return certain of them.

Here we are searching for every row with data within a .1 degree circle of RA=129.23 and Dec=7.95, and returning a few columns to get an idea of what we have available.

In [ ]:
job = HSC_service.launch_job("""
SELECT TOP 10 MatchRA, MatchDec, TargetName, StartTime, StopTime
FROM dbo.SumMagAper2CatView
WHERE CONTAINS(POINT('ICRS', MatchRA, MatchDec),CIRCLE('ICRS',129.23,7.95,0.1))=1
  """)
HSC_results = job.get_results()
HSC_results

We can also filter by start/stop time or any other column in the view:

In [ ]:
job = HSC_service.launch_job("""
SELECT TOP 10 MatchID, MatchRA, MatchDec, TargetName, StartTime, StopTime, TargetName 
FROM dbo.SumMagAper2CatView
WHERE 
CONTAINS(POINT('ICRS', MatchRA, MatchDec),CIRCLE('ICRS',129.23,7.95,0.1))=1
AND StartTime > '2015-01-01' AND StopTime < '2015-04-01'
""")
HSC_results = job.get_results()
HSC_results

Use Case: Plotting a light curve for the most variable object in a field

A use case example: search for objects with 10 or more ACS F475W magnitudes in a crowded field near IC 1613 (see HSC Use Case 3). Then get the individual A_F475W measurements for the most variable object in the list and plot the light curve. Note we are using asynchronous query mode for this example rather than synchronous, because it has a longer allowed timeout, which can be useful for large or complex queries.

In [ ]:
job = HSC_service.launch_job_async("""
SELECT MatchID, MatchRA, MatchDec, TargetName, NumImages, NumVisits, A_F475W, A_F475W_MAD, A_F475W_N
FROM dbo.SumMagAper2CatView
WHERE 
   A_F475W_N >= 10
   AND
   CONTAINS(POINT('ICRS', MatchRA, MatchDec),CIRCLE('ICRS',16.117562,2.162183,0.1))=1
   """)
HSC_results = job.get_results()
HSC_results

plt.rcParams.update({'font.size': 16})
plt.figure(1,(10,6))
plt.scatter(HSC_results['A_F475W'], HSC_results['A_F475W_MAD'])
plt.xlabel('A_F475W')
plt.ylabel('A_F475W_MAD')
In [ ]:
madvalues = HSC_results['A_F475W_MAD']
i = np.argmax(madvalues)
print()
print(HSC_results[i])

matchid = HSC_results['MatchID'][i]
job = HSC_service.launch_job_async("""
SELECT SourceID, ImageID, SourceRA, SourceDec, D, Filter, Detector, MagAper2, StartMJD
FROM dbo.DetailedCatalog
WHERE 
   MatchID={}
   AND Detector='ACS/WFC' AND Filter='F475W' AND Det='Y'
ORDER BY StartMJD
""".format(matchid))
HSC_details = job.get_results()
HSC_details

plt.rcParams.update({'font.size': 16})
plt.figure(1,(10,6))
plt.scatter(HSC_details['StartMJD'], HSC_details['MagAper2'])
plt.xlabel('MJD')
plt.ylabel('A_F475W')

Use Case: Create a color magnitude diagram for the Small Magellanic Cloud

For another example of using data from a TAP service, we start by doing a search around the SMC with a .25 degree radius for objects with ACS F555W and F814W measurements. HSC TAP will limit us to 100k responses by default. Note this is a large query that can take over a minute to run. See HSC Use Case 2 for more details.

In [ ]:
t0 = time.time()

job = HSC_service.launch_job_async("""
SELECT MatchID, MatchRA, MatchDec, CI, A_F555W, A_F814W
FROM dbo.SumMagAper2CatView
WHERE A_F555W_N > 0 and A_F814W_N > 0
    AND CONTAINS(POINT('ICRS', MatchRA, MatchDec),CIRCLE('ICRS',13.1866,-72.8286,0.25))=1
   """)
HSC_results = job.get_results()
print("Query completed in {:.1f} sec".format(time.time()-t0))
HSC_results

Next, plot the color-magnitude diagram for the ~100k points retrieved from the database. This uses kernel density estimate for the crowded plot. As a preview for the demo, we are only working with 1/25th of the data so that the notebook executes quickly. You can switch the commented-out lines to call gaussian_kde for the full plot, which can take a few minutes to complete.

In [ ]:
f555w = HSC_results['A_F555W']
f814w = HSC_results['A_F814W']
VminusI = f555w-f814w
CI = HSC_results['CI']
w = np.where((CI>0.9) & (CI<1.6) & (VminusI > -1.5) & (VminusI < 1.5))
print(len(w[0]),"points remaining after CI and V-I filtering")

# Calculate the point density
x = np.array(VminusI[w])
y = np.array(f555w[w])
xy = np.vstack([x,y])

t0 = time.time()

z = gaussian_kde(xy[:, ::25])(xy) #to do the KDE on only the full dataset, comment out this and uncomment below:
#z = gaussian_kde(xy)(xy) #uncomment this line to do the KDE on the full dataset

print("kde took {:.1f} sec".format(time.time()-t0))
In [ ]:
# Sort the points by density, so that the densest points are plotted last
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]

plt.rcParams.update({'font.size': 16})
plt.figure(1,(12,10))
plt.scatter(x, y, c=z, s=2, edgecolor='', cmap='plasma')
plt.autoscale(tight=True)
plt.xlabel('V-I')
plt.ylabel('V')
plt.gca().invert_yaxis()
plt.colorbar()
plt.text(.17,.93,'{:d} stars in SMC'.format(len(x)),
       horizontalalignment='center',
       transform=plt.gca().transAxes)
#plt.savefig("smc_colormag.png")

Additional Resources

Table Access Protocol

Hubble Source Catalog v3

  • Catalog created at MAST by combining the tens of thousands of visit-based source lists in the Hubble Legacy Archive (HLA) into a single master catalog.
  • https://archive.stsci.edu/hst/hsc/

Astronomical Query Data Language (2.0)

TapPlus


About this Notebook

Authors: Rick White & Theresa Dower, STScI Archive Scientist & Software Engineer Updated On: 11/23/2018


STScI logo