MAST Table Access Protocol Hubble Source Catalog Demo

HSC TAP Service Introduction

Table Access Protocol (TAP) services allow more direct and flexible access to astronomical data than the simpler types of IVOA standard data services. Queries are built with the SQL-like Astronomical Data Query Language (ADQL), and can include geographic/spatial queries as well as filtering on other characteristics of the data. This also allows the user fine-grained control over the returned columns, unlike the fixed set of coumns returned from cone, image, and spectral services.

For this example, we'll be using the astropy affiliated PyVO client, which is interoperable with other valid TAP services, including those at MAST. PyVO documentation is available at ReadTheDocs: https://pyvo.readthedocs.io

We'll be using PyVO to call the most recent version (3) of the Hubble Source Catalog TAP service at MAST. The schema is described within the service, and we'll show how to inspect it. The schema is also the same as the one available via the CasJobs interface, with an additional view added for the most common positional queries. CasJobs has its own copy of the schema documentation, which can be accessed through its own site: http://mastweb.stsci.edu/hcasjobs/


Imports

In [1]:
import pyvo as vo

## For handling ordinary astropy Tables in responses
from astropy.table import Table

# For displaying and manipulating some types of results
%matplotlib inline
import requests
import astropy
import numpy as np
import time
from matplotlib import pyplot as plt

# To allow display tweaks for wider response tables
from IPython.core.display import display
from IPython.core.display import HTML

# For the second example: kernel density estimates
from scipy.stats import gaussian_kde


# suppress unimportant unit warnings from many TAP services
import warnings
warnings.filterwarnings("ignore", module="astropy.io.votable.*")

ModuleNotFoundErrorTraceback (most recent call last)
<ipython-input-1-27ca0255115b> in <module>
----> 1 import pyvo as vo
      2 
      3 ## For handling ordinary astropy Tables in responses
      4 from astropy.table import Table
      5 

ModuleNotFoundError: No module named 'pyvo'

Connecting, Finding, and Displaying Table Information

Connecting to a TAP Service

The PyVO library is able to connect to any TAP service, given the "base" URL as noted in metadata registry resources describing the service. This is the URL for the newest version of the Hubble Source Catalog TAP service.

In [ ]:
HSC_service = vo.dal.TAPService("http://vao.stsci.edu/hsctap/tapservice.aspx")

Querying for Table Schema Information

TAP services are self-describing, which means the service itself can be asked for its schema and documentation about it. Since the Hubble Source Catalog does not follow a data model described by a standard, this is the best way to see what tables and columns we have available to then query based on geometry or other filters.

Note that several views for HSC, including the main science table SumMagAper2CatView, are extremely wide, containing columns for all potential filters, each of which may have null data. The ability to filter queries based on only columns we want, or that aren't null, makes TAP services more flexible and potentially faster.

In [ ]:
HSC_service.describe()
In [ ]:
HSC_tables = HSC_service.tables
for tablename in HSC_tables.keys():
    if not "tap_schema" in tablename:  
        HSC_tables[tablename].describe()
        print("Columns={}".format(sorted([k.name for k in HSC_tables[tablename].columns ])))
        print("----")

Querying for Data

As noted above, this view contains every filter known in the HSC, and can return each even if it is NULL for the given match (this can cause warnings in astroquery). In order to narrow results, one could query on individual filters where their value is not null, or only return certain of them.

Here we are searching for every row with data within a .1 degree circle of RA=129.23 and Dec=7.95, and returning a few columns to get an idea of what we have available.

In [ ]:
results = HSC_service.run_async("""
SELECT TOP 10 MatchRA, MatchDec, TargetName, StartTime, StopTime
FROM dbo.SumMagAper2CatView
WHERE CONTAINS(POINT('ICRS', MatchRA, MatchDec),CIRCLE('ICRS',129.23,7.95,0.1))=1
  """)
results.to_table()

We can also filter by start/stop time or any other column in the view:

In [ ]:
results = HSC_service.run_async("""
SELECT TOP 10 MatchID, MatchRA, MatchDec, TargetName, StartTime, StopTime, TargetName 
FROM dbo.SumMagAper2CatView
WHERE 
CONTAINS(POINT('ICRS', MatchRA, MatchDec),CIRCLE('ICRS',129.23,7.95,0.1))=1
AND StartTime > '2015-01-01' AND StopTime < '2015-04-01'
""")
results.to_table()

Use Case: Plotting a light curve for the most variable object in a field

A use case example: search for objects with 10 or more ACS F475W magnitudes in a crowded field near IC 1613 (see HSC Use Case 3). Then get the individual A_F475W measurements for the most variable object in the list and plot the light curve. Note we must use asynchronous query mode for this example rather than synchronous, because it has a longer allowed timeout, which can be useful for large or complex queries.

In [ ]:
results = HSC_service.run_async("""
SELECT MatchID, MatchRA, MatchDec, TargetName, NumImages, NumVisits, A_F475W, A_F475W_MAD, A_F475W_N
FROM dbo.SumMagAper2CatView
WHERE 
   A_F475W_N >= 10
   AND
   CONTAINS(POINT('ICRS', MatchRA, MatchDec),CIRCLE('ICRS',16.117562,2.162183,0.1))=1
   """)
HSC_results = results.to_table()
HSC_results
In [ ]:
plt.rcParams.update({'font.size': 16})
plt.figure(1,(10,6))
plt.scatter(HSC_results['A_F475W'], HSC_results['A_F475W_MAD'])
plt.xlabel('A_F475W')
plt.ylabel('A_F475W_MAD')
In [ ]:
madvalues = HSC_results['A_F475W_MAD']
i = np.argmax(madvalues)
print()
print(HSC_results[i])

matchid = HSC_results['MatchID'][i]
details = HSC_service.run_async("""
SELECT SourceID, ImageID, SourceRA, SourceDec, D, Filter, Detector, MagAper2, StartMJD
FROM dbo.DetailedCatalog
WHERE 
   MatchID={}
   AND Detector='ACS/WFC' AND Filter='F475W' AND Det='Y'
ORDER BY StartMJD
""".format(matchid))
HSC_details = details.to_table()
HSC_details

plt.rcParams.update({'font.size': 16})
plt.figure(1,(10,6))
plt.scatter(HSC_details['StartMJD'], HSC_details['MagAper2'])
plt.xlabel('MJD')
plt.ylabel('A_F475W')

Use Case: Create a color magnitude diagram for the Small Magellanic Cloud

For another example of using data from a TAP service, we start by doing a search around the SMC with a .25 degree radius for objects with ACS F555W and F814W measurements. HSC TAP will limit us to 100k responses by default. Note this is a large query that can take over a minute to run. See HSC Use Case 2 for more details.

In [ ]:
t0 = time.time()

results = HSC_service.run_async("""
SELECT MatchID, MatchRA, MatchDec, CI, A_F555W, A_F814W
FROM dbo.SumMagAper2CatView
WHERE A_F555W_N > 0 and A_F814W_N > 0
    AND CONTAINS(POINT('ICRS', MatchRA, MatchDec),CIRCLE('ICRS',13.1866,-72.8286,0.25))=1
   """)
HSC_results = results.to_table()
print("Query completed in {:.1f} sec".format(time.time()-t0))
HSC_results

Next, plot the color-magnitude diagram for the ~100k points retrieved from the database. This uses kernel density estimate for the crowded plot. As a preview for the demo, we are only working with 1/25th of the data so that the notebook executes quickly. You can switch the commented-out lines to call gaussian_kde for the full plot, which can take a few minutes to complete.

In [ ]:
f555w = HSC_results['A_F555W']
f814w = HSC_results['A_F814W']
VminusI = f555w-f814w
CI = HSC_results['CI']
w = np.where((CI>0.9) & (CI<1.6) & (VminusI > -1.5) & (VminusI < 1.5))
print(len(w[0]),"points remaining after CI and V-I filtering")

# Calculate the point density
x = np.array(VminusI[w])
y = np.array(f555w[w])
xy = np.vstack([x,y])

t0 = time.time()

z = gaussian_kde(xy[:, ::25])(xy) #to do the KDE on only the full dataset, comment out this and uncomment below:
#z = gaussian_kde(xy)(xy) #uncomment this line to do the KDE on the full dataset

print("kde took {:.1f} sec".format(time.time()-t0))
In [ ]:
# Sort the points by density, so that the densest points are plotted last
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]

plt.rcParams.update({'font.size': 16})
plt.figure(1,(12,10))
plt.scatter(x, y, c=z, s=2, edgecolor='', cmap='plasma')
plt.autoscale(tight=True)
plt.xlabel('V-I')
plt.ylabel('V')
plt.gca().invert_yaxis()
plt.colorbar()
plt.text(.17,.93,'{:d} stars in SMC'.format(len(x)),
       horizontalalignment='center',
       transform=plt.gca().transAxes)
#plt.savefig("smc_colormag.png")

Additional Resources

Table Access Protocol

Hubble Source Catalog v3

  • Catalog created at MAST by combining the tens of thousands of visit-based source lists in the Hubble Legacy Archive (HLA) into a single master catalog.
  • https://archive.stsci.edu/hst/hsc/

Astronomical Query Data Language (2.0)

PyVO


About this Notebook

Authors: Rick White & Theresa Dower, STScI Archive Scientist & Software Engineer Updated On: 01/08/2020


STScI logo