JWST SI Keyword Search for Exoplanet Spectra#


This tutorial will illustrate how to use the MAST API to search for JWST science data by values of FITS header keywords, and then retrieve all products for the corresponding observations. Searching by SI Keyword values and accessing all data products is not supported in the MAST Portal, nor with the astroquery.mast Observations class by itself. Rather, we will be using astroquery.mast’s Mast class to make direct calls to the MAST API.

Specifically, this tutorial will show you how to:

  • Use the Mast class of astroquery.mast to search for JWST science files by values of FITS header keywords

  • Construct a unique set of Observation IDs to perform a search with the astroquery.mast Observation class

  • Fetch the unique data products associated with the Observations

  • Filter the results for science products

  • Download a bash script to retrieve the filtered products

Here are key distinctions between the two search methods with astroquery.mast:
  • Advanced Search for Observations: Uses the Observations class to search for data products that match certain metadata values. The available metadata upon which to conduct such a search is limited to coordinates, timestamps, and a modest set of instrument configuration information. Returns MAST Observations objects, which are collections of all levels of products (all formats) and all ancillary data products.
  • SI Keyword Search: Uses the Mast class to search for FITS products that match values of user-specified keywords, where the set of possible keywords is very large. Returns only FITS products, and only finds highest level of calibrated products (generally, L-2b and L-3).

Connecting files that match keyword values to observations is not difficult, but it is a little convoluted. First, you’ll use the API to perform a Science Instrument (SI) Keyword Search to find matching product files. The names of these files contain the MAST Observation ID as a sub-string. Then we can use the IDs to perform an advanced Observation search for matching Observations.

Here are the steps in the process:


Part I: Keyword Search for Exoplanet Spectra

Part II: Convert to Observation Search Part III: Download Data Products Additional Resources


The following packages are needed for this tutorial:

  • astropy.io allows us to open the .fits files that we download

  • astropy.table holds the results of our product query and finds the unique files

  • astropy.time creates Time objects and converts between time representations

  • astroquery.mast constructs the queries, retrieves tables of results, and retrieves data products

  • matplotlib.pyplot is a common plotting tool that we’ll use to look at our results

from astropy.io import fits
from astropy.table import unique, vstack
from astropy.time import Time
from astroquery.mast import Mast,Observations

import matplotlib.pyplot as plt

I : Keyword Search for Exoplanet Spectra#

This example shows how to search for NIRISS spectral time-series observations (TSO) taken of transiting exo-planets. The data are from Commissioning or Early Release Science programs, and are therefore public.

Specify Search Criteria#

The criteria for SI Keyword searches consists of FITS header keyword name/value pairs. Learn more about SI keywords from the JWST Keyword Dictionary, and about the supported set of keyword values that can be queried. With this kind of query it is necessary to construct a specific structure to hold the query parameters.

The following helper routines translate a simple dictionary (one that is easy to customize in Python) to the required JSON-style syntax, while the second creates a Min:Max pair of parameters for date-time stamps which, as with all parameters that vary continuously, must be expressed as a range of values in a dictionary.

def set_params(parameters):
    return [{"paramName" : p, "values" : v} for p, v in parameters.items()]

def set_mjd_range(min, max):
    '''Set time range in MJD given limits expressed as ISO-8601 dates'''
    return {
        "min": Time(min, format='isot').mjd, 
        "max": Time(max, format='isot').mjd

Add a Date Range#

A date range is specified here (though is not strictly needed) to illustrate how to express these common parameters. For historical reasons, the astroquery.mast parameter names for timestamps come in pairs: one with a similar name to the corresponding FITS keyword (e.g. data_obs), and another with the string _mjd appended (e.g. date_obs_mjd). The values are equivalent, but are expressed in ISO-8601 and MJD representations, respectively.

Change or add keywords and values to the keywords dictionary below to customize your criteria. Note that multiple, discreet-valued parameters are given in a list. As a reminder, if you are unsure of your keyword and keyword value options, see the Field Descriptions of JWST Instrument Keywords and JWST Keyword Dictionary.

# Looking for NIRISS SOSS commissioning and ERS data taken between June 1st and August 4th
keywords = {'category': ['COM','ERS'],
            'exp_type': ['NIS_SOSS'],
            'tsovisit': ['T'],
            'date_obs_mjd': [set_mjd_range('2022-06-01','2022-08-04')]

# Restructuring the keywords dictionary to the MAST syntax
params = {'columns': '*',
          'filters': set_params(keywords)

The following cell displays the constructed parameter object to illustrate the syntax for the query, which is described formally here.


III: Download the Data Products#

Next we’ll download the data products that are connected to each Observation. In order to do this, we’ll need to query for our desired data products using the list of Observations we obtained above.

Query for Data Products#

Here we take care to fetch the products from Observations a few at a time (in batches) to avoid server timeouts. This can happen if there are a large number of files in one or more of the matched Observations. A larger batch size will execute faster, but increases the risk of a server timeout. A batch size of five offers is significantly faster than “one at a time”, while keeping the risk of timeout low.

The following bit of python magic splits one long list into a list of smaller lists, each of which has a size no larger than batch_size.

batch_size = 5
batches = [matched_obs[i:i+batch_size] for i in range(0, len(matched_obs), batch_size)]

Now fetch the constituent products in a list of tables.

t = [Observations.get_product_list(obs) for obs in batches]

We need to stack the individual tables and extract a unique set of file names. Observations often have many files in common (e.g., guide-star files) and this will avoid any duplicates.

products = unique(vstack(t), keys='productFilename')
print(f'  Number of unique products: {len(products)}')

Display the resulting list of files if you like.


Filter the Data Products#

If there are a subset of products of interest (or, a set of products you would like to exclude) there are a number of ways to do that. The cell below applies a filter to select only calibration level 2 and 3 spectral products classified as SCIENCE plus the INFO files that define product associations; it also excludes guide-star products. See the full set of Products Field Descriptions for the all queryable fields.

# Retrieve level 2 and 3 SCIENCE and INFO products of type spectrum.
filtered_products = Observations.filter_products(products,
                                                 productType=['SCIENCE', 'INFO'],
                                                 calib_level=[2, 3],

Display selected columns of the filtered products, if you like.

filtered_products['description','dataURI', 'calib_level', 'size', 'proposal_id']

Download the Data Products#

We’ll go over your options for data downloads in the sections below. Note that for public data, you will not need to login.

Optional: MAST Login#

If you intend to retrieve data that are protected by an Exclusive Access Period (EAP), you will need to be both authorized and authenticated. You can authenticate by presenting a valid Auth.MAST token with the login function. (See MAST User Accounts for more information about whether you need to login.)

This step is unnecessary if you are only retrieving public data.

If you have arrived at this point, wish to retrieve EAP products, and have not establihed a token, you need to do the following:
  • Create a token here: Auth.MAST
  • Cut/paste the token string in response to the prompt that will appear when downloading the script.
Defining the token string as an environment variable will not work for an already-running notebook.
# Observations.login()

Retrieve FIles#

Now let’s fetch the products. The example below shows how to retrieve a bash script (rather than direct file download) to retreive our entire list at once. Scripts are a much better choice if the number or size of files in the download manifest is large (more than 100 files or 10GB).

# Downloading via a bash script.

manifest = Observations.download_products(filtered_products,

In the interest of time (and not crashing our servers), we will download one small product from our list above. Let’s download a reasonably sized (~10MB) file. The file we choose is raw spectral data, so additional extraction would be needed for scientific analysis.

# We are fixing the file for reproducability

Let’s actually visualize the raw data from which you can extract the spectrum:

# Read in the "SCI" data from the fits file
sci = fits.getdata("jw02734001001_04101_00001-seg004_nis_rate.fits", 1)

plt.figure(figsize=(15, 10))

We are, in effect, seeing the cleaned spectrum on the detector; if you adjust the scaling you might be able to see the spectrum of order three in the lower left corner.

Additional Resources#

The links below take you to documentation that you might find useful when constructing queries.

About this notebook#

This notebook was developed by Archive Sciences Branch staff: chiefly Dick Shaw, with additional editing from Jenny Medina and Thomas Dutkiewicz.

For support, please contact the Archive HelpDesk at archive@stsci.edu, or through the JWST HelpDesk Portal.

Last updated: May 2023

Space Telescope Logo