Hubble Source Catalog API Notebook: SMC Color-Magnitude Diagram#

A new MAST interface supports queries to the current and previous versions of the Hubble Source Catalog. It allows searches of the summary table (with multi-filter mean photometry) and the detailed table (with all the multi-epoch measurements). It also has an associated API, which is used in this notebook.

This is based on part of HSC Use Case #2.

  • It searches the HSC for point-like objects in the Small Magellanic Cloud (SMC) with ACS/WFC V and I band measurements,

  • selects a subset of those objects in a V-I color range,

  • plots the positions of the objects on the sky, and

  • plots the color-magnitude diagram for the selected objects.

The whole process takes only about 2 minutes to complete.

Another notebook demonstrates other search capabilities of the API to find variable objects and plot their light curves. A more complex notebook that shows how to access the proper motion tables using the HSC API is also available.

Instructions:#

  • Complete the initialization steps described below.

  • Run the notebook.

Running the notebook from top to bottom takes about 2 minutes.

Table of Contents#

  • Initialization

  • Find variable objects the SMC

    • Use MAST name resolver

    • Search HSC summary table

    • Show object positions on the sky

    • Plot the color-magnitude diagram

Initialization #

Install Python modules#

  1. This notebook requires the use of Python 3.

  2. Modules can be installed with conda, if using the Anaconda distribution of python, or with pip.

    • If you are using conda, do not install / update / remove a module with pip, that exists in a conda channel.

    • If a module is not available with conda, then it’s okay to install it with pip

import astropy
from astropy.coordinates import SkyCoord
import time
import sys
import os
import requests
import json
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

from astropy.table import Table
import pandas as pd

from io import StringIO

from fastkde import fastKDE
from scipy.interpolate import RegularGridInterpolator
from astropy.modeling import models, fitting

# set width for pprint
astropy.conf.max_width = 150
/tmp/ipykernel_1961/2104996804.py:13: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
# set universal matplotlib parameters
plt.rcParams.update({'font.size': 16})

Useful functions#

  • Execute HSC searches and resolve names using MAST query.

  • Here we define several interrelated functions for retrieving information from the MAST API.

    • The hcvcone(ra, dec, radius [, keywords]) function searches the HCV catalog near a position.

    • The hcvsearch() function performs general non-positional queries.

    • The hcvmetadata() function gives information about the columns available in a table.

hscapiurl = "https://catalogs.mast.stsci.edu/api/v0.1/hsc"


def hsccone(ra, dec, radius, table="summary", release="v3", format="csv", magtype="magaper2",
            columns=None, baseurl=hscapiurl, verbose=False, **kw):
    """Do a cone search of the HSC catalog

    Parameters
    ----------
    ra (float): (degrees) J2000 Right Ascension
    dec (float): (degrees) J2000 Declination
    radius (float): (degrees) Search radius (<= 0.5 degrees)
    table (string): summary, detailed, propermotions, or sourcepositions
    release (string): v3 or v2
    magtype (string): magaper2 or magauto (only applies to summary table)
    format: csv, votable, json, table
    columns: list of column names to include (None means use defaults)
    baseurl: base URL for the request
    verbose: print info about request
    **kw: other parameters (e.g., 'numimages.gte':2)
    """

    data = kw.copy()
    data['ra'] = ra
    data['dec'] = dec
    data['radius'] = radius
    return hscsearch(table=table, release=release, format=format, magtype=magtype,
                     columns=columns, baseurl=baseurl, verbose=verbose, **data)


def hscsearch(table="summary", release="v3", magtype="magaper2", format="csv",
              columns=None, baseurl=hscapiurl, verbose=False, **kw):
    """Do a general search of the HSC catalog (possibly without ra/dec/radius)

    Parameters
    ----------
    table (string): summary, detailed, propermotions, or sourcepositions
    release (string): v3 or v2
    magtype (string): magaper2 or magauto (only applies to summary table)
    format: csv, votable, json, table
    columns: list of column names to include (None means use defaults)
    baseurl: base URL for the request
    verbose: print info about request
    **kw: other parameters (e.g., 'numimages.gte':2).  Note this is required!
    """

    data = kw.copy()
    if not data:
        raise ValueError("You must specify some parameters for search")
    if format not in ("csv", "votable", "json", 'table'):
        raise ValueError("Bad value for format")
    if format == "table":
        rformat = "csv"
    else:
        rformat = format
    url = f"{cat2url(table, release, magtype, baseurl=baseurl)}.{rformat}"
    if columns:
        # check that column values are legal
        # create a dictionary to speed this up
        dcols = {}
        for col in hscmetadata(table, release, magtype)['name']:
            dcols[col.lower()] = 1
        badcols = []
        for col in columns:
            if col.lower().strip() not in dcols:
                badcols.append(col)
        if badcols:
            raise ValueError(f"Some columns not found in table: {', '.join(badcols)}")
        # two different ways to specify a list of column values in the API
        # data['columns'] = columns
        data['columns'] = f"[{','.join(columns)}]"

    # either get or post works
    # r = requests.post(url, data=data)
    r = requests.get(url, params=data)

    if verbose:
        print(r.url)
    r.raise_for_status()
    if format == "json":
        return r.json()
    elif format == "table":
        # use pandas to work around bug in Windows for ascii.read
        return Table.from_pandas(pd.read_csv(StringIO(r.text)))
    else:
        return r.text


def hscmetadata(table="summary", release="v3", magtype="magaper2", baseurl=hscapiurl):
    """Return metadata for the specified catalog and table
    
    Parameters
    ----------
    table (string): summary, detailed, propermotions, or sourcepositions
    release (string): v3 or v2
    magtype (string): magaper2 or magauto (only applies to summary table)
    baseurl: base URL for the request
    
    Returns an astropy table with columns name, type, description
    """
    url = f"{cat2url(table, release, magtype, baseurl=baseurl)}/metadata"
    r = requests.get(url)
    r.raise_for_status()
    v = r.json()
    # convert to astropy table
    tab = Table(rows=[(x['name'], x['type'], x['description']) for x in v],
                names=('name', 'type', 'description'))
    return tab


def cat2url(table="summary", release="v3", magtype="magaper2", baseurl=hscapiurl):
    """Return URL for the specified catalog and table
    
    Parameters
    ----------
    table (string): summary, detailed, propermotions, or sourcepositions
    release (string): v3 or v2
    magtype (string): magaper2 or magauto (only applies to summary table)
    baseurl: base URL for the request
    
    Returns a string with the base URL for this request
    """
    checklegal(table, release, magtype)
    if table == "summary":
        url = f"{baseurl}/{release}/{table}/{magtype}"
    else:
        url = f"{baseurl}/{release}/{table}"
    return url


def checklegal(table, release, magtype):
    """Checks if this combination of table, release and magtype is acceptable
    
    Raises a ValueError exception if there is problem
    """
    
    releaselist = ("v2", "v3")
    if release not in releaselist:
        raise ValueError(f"Bad value for release (must be one of {', '.join(releaselist)})")
    if release == "v2":
        tablelist = ("summary", "detailed")
    else:
        tablelist = ("summary", "detailed", "propermotions", "sourcepositions")
    if table not in tablelist:
        raise ValueError(f"Bad value for table (for {release} must be one of {', '.join(tablelist)})")
    if table == "summary":
        magtypelist = ("magaper2", "magauto")
        if magtype not in magtypelist:
            raise ValueError(f"Bad value for magtype (must be one of {', '.join(magtypelist)})")

Helper function for crowded scatterplots#

select_subset() selects a sample of points in crowded regions to speed up plotting. This converts the fastkde PDF values (which are probabilities per unit area using the x and y data units) to the expected number of points within a region covered by the marker on the plot. It then picks only a few points to plot in crowded regions while selecting all the points in uncrowded regions.

Inputs:

  • zs = array of PDF density values computed using fastkde

  • axes = axis range tuple from fastkde

  • markersize = matplotlib marker size (which is an area in points)

  • over = oversampling factor (empirically 5 fills in the holes)

  • verbose = print additional info

Returns an array of subscripts to plot from zs.

def select_subset(zs, kde_axes, fig, markersize=2, over=5, verbose=False):
    """
    Select a subset of points in crowded regions to speed up plotting
    Returns array of subscripts to plot from zs    
    """
    # get figure size in points
    figsize = fig.get_size_inches()*72.0
    # scale factor to convert PDF values to expected number of points in the marker area
    # divide ranges by 2 because fastkde pads the data ranges to allow FFTs
    range0 = 0.5*(kde_axes[0][-1]-kde_axes[0][0])
    range1 = 0.5*(kde_axes[1][-1]-kde_axes[1][0])
    ss = len(zs) * range0 * range1 * markersize / (over*figsize[0]*figsize[1])
    cc = (1.0/(ss*zs).clip(min=1)).cumsum()
    wsel = np.searchsorted(cc, np.arange(np.ceil(cc[-1]).astype(int)+1),
                           side='right').clip(max=len(zs)-1)
    if verbose:
        print(f"Plotting {len(wsel)} of {len(zs)} points")
    return wsel

Find objects in the SMC #

This is based on HSC Use Case #2, which includes an example of creating a color-magnitude diagram for the SMC using MAST CasJobs. This is simple to do using the HSC API.

Use astropy name resolver to get position of the SMC #

target = 'SMC'
coord_smc = SkyCoord.from_name(target)

ra_smc = coord_smc.ra.degree
dec_smc = coord_smc.dec.degree
print(f'ra: {ra_smc}\ndec: {dec_smc}')
ra: 13.15833333
dec: -72.80027778

Select objects with the desired magnitudes and colors near the SMC #

This searches the summary table for objects in a 3x3 degree box centered on the galaxy that has measurements in both ACS F555W and F814W. It computes the V-I color, and selects only objects in the range \(-1.5 < V-I < 1.5\).

This large query returns more than 700,000 objects, and takes about a minute to complete.

# save typing a quoted list of columns
columns = """MatchID,MatchRA,MatchDec,CI,A_F555W,A_F814W""".split(",")
columns = [x.strip() for x in columns]
columns = [x for x in columns if x and not x.startswith('#')]

# select objects with at least one ACS F555W and ACS F814W measurement
# and with concentration index 0.9 < CI < 1.6, consistent with point sources
# search a large 3x3 degree box in RA and Dec centered on the SMC
ddec = 1.5
dra = ddec/np.cos(np.radians(dec_smc))
constraints = {'A_F555W_N.gte': 1, 'A_F814W_N.gte': 1, 'CI.gt': 0.9, 'CI.lt': 1.6,
               'MatchDec.gt': dec_smc-ddec, 'MatchDec.lt': dec_smc+ddec,
               'MatchRA.gt': ra_smc-dra, 'MatchRA.lt': ra_smc+dra}

# do a search with a large number of rows allowed
t0 = time.time()
tab = hscsearch(table="summary", release='v3', columns=columns, verbose=True, pagesize=1000000, format='table', **constraints)

print("{:.1f} s: retrieved data and converted to {}-row astropy table".format(time.time()-t0, len(tab)))

# compute color column and select for objects in more limited color range
tab['V-I'] = tab['A_F555W'] - tab['A_F814W']
tab = tab[(tab['V-I'] < 1.5) & (tab['V-I'] > -1.5)]
print(f"{(time.time()-t0):.1f} s: selected {len(tab)} objects with -1.5 < V-I < 1.5")

# clean up the output format
tab['A_F555W'].format = "{:.3f}"
tab['A_F814W'].format = "{:.3f}"
tab['V-I'].format = "{:.3f}"
tab['CI'].format = "{:.3f}"
tab['MatchRA'].format = "{:.6f}"
tab['MatchDec'].format = "{:.6f}"
tab
https://catalogs.mast.stsci.edu/api/v0.1/hsc/v3/summary/magaper2.csv?pagesize=1000000&A_F555W_N.gte=1&A_F814W_N.gte=1&CI.gt=0.9&CI.lt=1.6&MatchDec.gt=-74.30027778&MatchDec.lt=-71.30027778&MatchRA.gt=8.085683150035852&MatchRA.lt=18.230983509964148&columns=%5BMatchID%2CMatchRA%2CMatchDec%2CCI%2CA_F555W%2CA_F814W%5D
43.8 s: retrieved data and converted to 764094-row astropy table
43.8 s: selected 761835 objects with -1.5 < V-I < 1.5
Table length=761835
MatchIDMatchRAMatchDecCIA_F555WA_F814WV-I
int64float64float64float64float64float64float64
5516350914.991703-73.1958591.03026.17825.6250.553
5516351112.911687-73.1297021.15021.95421.9060.047
2913677815.938170-72.8220851.03324.79924.5030.295
9915863313.334587-72.1869471.01622.36822.2810.088
10695431315.174259-72.4457231.47422.97223.670-0.697
10695434115.596673-72.3677230.93920.39019.8810.509
6292804312.428396-73.2926361.09422.11222.388-0.275
1060022617.046776-72.9796530.93325.65824.9220.736
4643539311.770928-73.4747211.05323.93123.6450.286
8942369212.888105-72.2640361.04923.40523.2850.120
.....................
323529214.684925-72.2010681.08124.86024.0430.817
323181311.757286-73.5172350.99823.16823.1010.067
322915513.354680-72.1956961.11220.58221.234-0.653
322527815.470807-72.8911031.05224.42224.0490.373
322412413.438907-72.1816070.96024.30224.0500.252
322180214.773816-72.1610391.46225.56325.0330.530
322181213.076247-72.9515150.90225.23124.7260.504
322101914.761594-72.1441430.97426.22725.3580.869
321837416.027455-72.8334021.01025.20124.8300.371
321601516.875679-72.3420951.13722.16721.8010.366

Plot object positions on the sky #

We mark the galaxy center as well. These fields are sprinkled all over the galaxy (as determined by the HST proposals).

Note there are 761835 observations in tab. Each observation is plotted as a small point (.), so the fields in the plot represent thousands of observations.

# convert to pandas for easier manipulation
obs = tab[['MatchRA', 'MatchDec']].to_pandas()

# round
obs.MatchRA = obs.MatchRA.round(0)
obs.MatchDec = obs.MatchDec.round(2)

# get the value counts, to "roughly" demonstrate the observations per plotted square cluster
obs.value_counts()
MatchRA  MatchDec
13.0     -73.10      14552
         -73.09      13337
15.0     -72.16      11778
         -72.15      11527
13.0     -73.11      11149
                     ...  
         -73.03          9
12.0     -72.56          9
15.0     -73.22          7
14.0     -72.53          6
12.0     -72.54          2
Name: count, Length: 214, dtype: int64
fig, ax = plt.subplots(figsize=(10, 10))
# adjust markersize (0-1) and alpha (0-1) as desired
ax.plot('MatchRA', 'MatchDec', 'bo', markersize=0.005, alpha=0.4, data=tab, label=f'{len(tab)} HSC measurements')
ax.plot(ra_smc, dec_smc, 'rx', label=target, markersize=10)
ax.invert_xaxis()
ax.set_aspect(1.0/np.cos(np.radians(dec_smc)))
ax.set(xlabel='RA [deg]', ylabel='Dec [deg]')
ax.legend()
<matplotlib.legend.Legend at 0x7ffa6ed93c10>
../../../_images/71e285d67b473498c3d255efb15719390692a083df4f1449dd0dce941799545e.png

Plot the color-magnitude diagram #

This uses the fastkde module to get a kernel density estimate in order to plot a dense scatterplot.

# Calculate the point density
t0 = time.time()
x = tab['V-I']
y = tab['A_F555W']
myPDF, axes = fastKDE.pdf(x, y, numPoints=2**9+1)
print(f"kde took {(time.time()-t0):.1f} sec for {len(x)} points")

# interpolate to get z values at points
finterp = RegularGridInterpolator((axes[1], axes[0]), myPDF, bounds_error=False, fill_value=0.0)
z = finterp((y, x))

# Sort the points by density, so that the densest points are plotted last
idx = z.argsort()
xs, ys, zs = x[idx], y[idx], z[idx]
kde took 2.5 sec for 761835 points
fig, ax = plt.subplots(figsize=(12, 10), tight_layout=True)

# select a subset of points to plot
wsel = select_subset(zs, axes, fig, verbose=True)

sc = ax.scatter(xs[wsel], ys[wsel], c=zs[wsel], s=2, edgecolors='none', cmap='plasma')
ax.set(xlabel='V - I [mag]', ylabel='V [mag]',
       title=f'{len(tab):,} stars in the Small Magellanic Cloud',
       xlim=(-1.5, 1.5), ylim=(14, 27))
ax.invert_yaxis()
_ = fig.colorbar(sc, ax=ax)
Plotting 208665 of 761835 points
../../../_images/fe154406a0c62fb8cdba734d7c0a8f6e5529b8cec7929641535fea6d8f7061fe.png

About this Notebook#

If you have comments or questions on this notebook, please contact us through the Archive Help Desk e-mail at archive@stsci.edu.

Author(s): Rick White, Trenton McKinney
Last Updated: May 2023


Top of Page Space Telescope Logo