JWST SI Keyword Search for Exoplanet Spectra#

Introduction#

This tutorial will illustrate how to use the MAST API to search for JWST science data by values of FITS header keywords, and then retrieve all products for the corresponding observations. Searching by SI Keyword values and accessing all data products is not supported in the MAST Portal, nor with the astroquery.mast Observations class by itself. Rather, we will be using astroquery.mast’s Mast class to make direct calls to the MAST API.

Specifically, this tutorial will show you how to:

  • Use the Mast class of astroquery.mast to search for JWST science files by values of FITS header keywords

  • Construct a unique set of Observation IDs to perform a search with the astroquery.mast Observation class

  • Fetch the unique data products associated with the Observations

  • Filter the results for science products

  • Download a bash script to retrieve the filtered products

Here are key distinctions between the two search methods with astroquery.mast:
  • Advanced Search for Observations: Uses the Observations class to search for data products that match certain metadata values. The available metadata upon which to conduct such a search is limited to coordinates, timestamps, and a modest set of instrument configuration information. Returns MAST Observations objects, which are collections of all levels of products (all formats) and all ancillary data products.
  • SI Keyword Search: Uses the Mast class to search for FITS products that match values of user-specified keywords, where the set of possible keywords is very large. Returns only FITS products, and only finds highest level of calibrated products (generally, L-2b and L-3).

Connecting files that match keyword values to observations is not difficult, but it is a little convoluted. First, you’ll use the API to perform a Science Instrument (SI) Keyword Search to find matching product files. The names of these files contain the MAST Observation ID as a sub-string. Then we can use the IDs to perform an advanced Observation search for matching Observations.

Here are the steps in the process:

Imports

Part I: Keyword Search for Exoplanet Spectra

Part II: Convert to Observation Search Part III: Download Data Products Additional Resources

Imports#

The following packages are needed for this tutorial:

  • astropy.io allows us to open the .fits files that we download

  • astropy.table holds the results of our product query and finds the unique files

  • astropy.time creates Time objects and converts between time representations

  • astroquery.mast constructs the queries, retrieves tables of results, and retrieves data products

  • matplotlib.pyplot is a common plotting tool that we’ll use to look at our results

from astropy.io import fits
from astropy.table import unique, vstack
from astropy.time import Time
from astroquery.mast import Mast,Observations

import matplotlib.pyplot as plt

I : Keyword Search for Exoplanet Spectra#

This example shows how to search for NIRISS spectral time-series observations (TSO) taken of transiting exo-planets. The data are from Commissioning or Early Release Science programs, and are therefore public.

Specify Search Criteria#

The criteria for SI Keyword searches consists of FITS header keyword name/value pairs. Learn more about SI keywords from the JWST Keyword Dictionary, and about the supported set of keyword values that can be queried. With this kind of query it is necessary to construct a specific structure to hold the query parameters.

The following helper routines translate a simple dictionary (one that is easy to customize in Python) to the required JSON-style syntax, while the second creates a Min:Max pair of parameters for date-time stamps which, as with all parameters that vary continuously, must be expressed as a range of values in a dictionary.

def set_params(parameters):
    return [{"paramName" : p, "values" : v} for p, v in parameters.items()]

def set_mjd_range(min, max):
    '''Set time range in MJD given limits expressed as ISO-8601 dates'''
    return {
        "min": Time(min, format='isot').mjd, 
        "max": Time(max, format='isot').mjd
        }

Add a Date Range#

A date range is specified here (though is not strictly needed) to illustrate how to express these common parameters. For historical reasons, the astroquery.mast parameter names for timestamps come in pairs: one with a similar name to the corresponding FITS keyword (e.g. data_obs), and another with the string _mjd appended (e.g. date_obs_mjd). The values are equivalent, but are expressed in ISO-8601 and MJD representations, respectively.

Change or add keywords and values to the keywords dictionary below to customize your criteria. Note that multiple, discreet-valued parameters are given in a list. As a reminder, if you are unsure of your keyword and keyword value options, see the Field Descriptions of JWST Instrument Keywords and JWST Keyword Dictionary.

# Looking for NIRISS SOSS commissioning and ERS data taken between June 1st and August 4th
keywords = {'category': ['COM','ERS'],
            'exp_type': ['NIS_SOSS'],
            'tsovisit': ['T'],
            'date_obs_mjd': [set_mjd_range('2022-06-01','2022-08-04')]
           }

# Restructuring the keywords dictionary to the MAST syntax
params = {'columns': '*',
          'filters': set_params(keywords)
         }

The following cell displays the constructed parameter object to illustrate the syntax for the query, which is described formally here.

params
{'columns': '*',
 'filters': [{'paramName': 'category', 'values': ['COM', 'ERS']},
  {'paramName': 'exp_type', 'values': ['NIS_SOSS']},
  {'paramName': 'tsovisit', 'values': ['T']},
  {'paramName': 'date_obs_mjd', 'values': [{'min': 59731.0, 'max': 59795.0}]}]}

III: Download the Data Products#

Next we’ll download the data products that are connected to each Observation. In order to do this, we’ll need to query for our desired data products using the list of Observations we obtained above.

Query for Data Products#

Here we take care to fetch the products from Observations a few at a time (in batches) to avoid server timeouts. This can happen if there are a large number of files in one or more of the matched Observations. A larger batch size will execute faster, but increases the risk of a server timeout. A batch size of five offers is significantly faster than “one at a time”, while keeping the risk of timeout low.

The following bit of python magic splits one long list into a list of smaller lists, each of which has a size no larger than batch_size.

batch_size = 5
batches = [matched_obs[i:i+batch_size] for i in range(0, len(matched_obs), batch_size)]

Now fetch the constituent products in a list of tables.

t = [Observations.get_product_list(obs) for obs in batches]

We need to stack the individual tables and extract a unique set of file names. Observations often have many files in common (e.g., guide-star files) and this will avoid any duplicates.

products = unique(vstack(t), keys='productFilename')
print(f'  Number of unique products: {len(products)}')
  Number of unique products: 850

Display the resulting list of files if you like.

products
Table length=850
obsIDobs_collectiondataproduct_typeobs_iddescriptiontypedataURIproductTypeproductGroupDescriptionproductSubGroupDescriptionproductDocumentationURLprojectprvversionproposal_idproductFilenamesizeparent_obsiddataRightscalib_level
str9str4str8str50str64str1str81str9str28str13str1str7str6str4str63int64str9str6int64
84343195JWSTimagejw01091002001_02101_00004-seg001_nissource/target (L3) : association generatorSmast:JWST/product/jw01091-o002_20240112t191830_image2_00001_asn.jsonINFO--ASN--CALJWST1.10.11091jw01091-o002_20240112t191830_image2_00001_asn.json1428118370229PUBLIC2
84343178JWSTimagejw01091002001_02101_00003-seg001_nissource/target (L3) : association generatorSmast:JWST/product/jw01091-o002_20240112t191830_image2_00002_asn.jsonINFO--ASN--CALJWST1.10.11091jw01091-o002_20240112t191830_image2_00002_asn.json1428118370229PUBLIC2
84343036JWSTimagejw01091002001_02101_00002-seg001_nissource/target (L3) : association generatorSmast:JWST/product/jw01091-o002_20240112t191830_image2_00003_asn.jsonINFO--ASN--CALJWST1.10.11091jw01091-o002_20240112t191830_image2_00003_asn.json1428118370229PUBLIC2
84343098JWSTimagejw01091002001_02101_00001-seg001_nissource/target (L3) : association generatorSmast:JWST/product/jw01091-o002_20240112t191830_image2_00004_asn.jsonINFO--ASN--CALJWST1.10.11091jw01091-o002_20240112t191830_image2_00004_asn.json1428118370229PUBLIC2
84343158JWSTspectrumjw01091002001_03101_00001-seg005_nissource/target (L3) : association generatorSmast:JWST/product/jw01091-o002_20240112t191830_tso-spec2_00002_asn.jsonINFO--ASN--CALJWST1.12.51091jw01091-o002_20240112t191830_tso-spec2_00002_asn.json1898118370229PUBLIC2
84343085JWSTspectrumjw01091002001_03101_00001-seg004_nissource/target (L3) : association generatorSmast:JWST/product/jw01091-o002_20240112t191830_tso-spec2_00003_asn.jsonINFO--ASN--CALJWST1.12.51091jw01091-o002_20240112t191830_tso-spec2_00003_asn.json1898118370229PUBLIC2
84343207JWSTspectrumjw01091002001_03101_00001-seg003_nissource/target (L3) : association generatorSmast:JWST/product/jw01091-o002_20240112t191830_tso-spec2_00004_asn.jsonINFO--ASN--CALJWST1.12.51091jw01091-o002_20240112t191830_tso-spec2_00004_asn.json1898118370229PUBLIC2
84343217JWSTspectrumjw01091002001_03101_00001-seg002_nissource/target (L3) : association generatorSmast:JWST/product/jw01091-o002_20240112t191830_tso-spec2_00005_asn.jsonINFO--ASN--CALJWST1.12.51091jw01091-o002_20240112t191830_tso-spec2_00005_asn.json1898118370229PUBLIC2
84343025JWSTspectrumjw01091002001_03101_00001-seg001_nissource/target (L3) : association generatorSmast:JWST/product/jw01091-o002_20240112t191830_tso-spec2_00006_asn.jsonINFO--ASN--CALJWST1.12.51091jw01091-o002_20240112t191830_tso-spec2_00006_asn.json1898118370229PUBLIC2
118370229JWSTspectrumjw01091-o002_t001_niriss_clear-gr700xd-substrip256source/target (L3) : association generatorDmast:JWST/product/jw01091-o002_20240112t191830_tso3_00002_asn.jsonINFOMinimum Recommended ProductsASN--CALJWST1.12.51091jw01091-o002_20240112t191830_tso3_00002_asn.json4796118370229PUBLIC3
.........................................................
87599585JWSTimagejw02734002001_02101_00001-seg001_nisFGS guide star trackingSmast:JWST/product/jw02734002001_gs-track_2022172034940_cal.fitsAUXILIARY--GS-TRACK--CALJWST1.10.12734jw02734002001_gs-track_2022172034940_cal.fits2494080118972907PUBLIC2
87599585JWSTimagejw02734002001_02101_00001-seg001_nisFGS guide star trackingSmast:JWST/product/jw02734002001_gs-track_2022172034940_stream.fitsAUXILIARY--GS-TRACK--CALJWST--2734jw02734002001_gs-track_2022172034940_stream.fits800640118972907PUBLIC1
87599585JWSTimagejw02734002001_02101_00001-seg001_nisFGS guide star trackingSmast:JWST/product/jw02734002001_gs-track_2022172034940_uncal.fitsAUXILIARY--GS-TRACK--CALJWST--2734jw02734002001_gs-track_2022172034940_uncal.fits915840118972907PUBLIC1
87599585JWSTimagejw02734002001_02101_00001-seg001_nisFGS guide star trackingSmast:JWST/product/jw02734002001_gs-track_2022172035054_cal.fitsAUXILIARY--GS-TRACK--CALJWST1.10.12734jw02734002001_gs-track_2022172035054_cal.fits2494080118972907PUBLIC2
87599585JWSTimagejw02734002001_02101_00001-seg001_nisFGS guide star trackingSmast:JWST/product/jw02734002001_gs-track_2022172035054_stream.fitsAUXILIARY--GS-TRACK--CALJWST--2734jw02734002001_gs-track_2022172035054_stream.fits800640118972907PUBLIC1
87599585JWSTimagejw02734002001_02101_00001-seg001_nisFGS guide star trackingSmast:JWST/product/jw02734002001_gs-track_2022172035054_uncal.fitsAUXILIARY--GS-TRACK--CALJWST--2734jw02734002001_gs-track_2022172035054_uncal.fits915840118972907PUBLIC1
87599585JWSTimagejw02734002001_02101_00001-seg001_nisFGS guide star trackingSmast:JWST/product/jw02734002001_gs-track_2022172035743_cal.fitsAUXILIARY--GS-TRACK--CALJWST1.10.12734jw02734002001_gs-track_2022172035743_cal.fits2502720118972907PUBLIC2
87599585JWSTimagejw02734002001_02101_00001-seg001_nisFGS guide star trackingSmast:JWST/product/jw02734002001_gs-track_2022172035743_stream.fitsAUXILIARY--GS-TRACK--CALJWST--2734jw02734002001_gs-track_2022172035743_stream.fits800640118972907PUBLIC1
87599585JWSTimagejw02734002001_02101_00001-seg001_nisFGS guide star trackingSmast:JWST/product/jw02734002001_gs-track_2022172035743_uncal.fitsAUXILIARY--GS-TRACK--CALJWST--2734jw02734002001_gs-track_2022172035743_uncal.fits918720118972907PUBLIC1
118972907JWSTspectrumjw02734-o002_t002_niriss_clear-gr700xd-substrip256source/target (L3) : association poolDmast:JWST/product/jw02734_20240112t200404_pool.csvINFOMinimum Recommended ProductsPOOL--CALJWST1.12.52734jw02734_20240112t200404_pool.csv7985118972907PUBLIC3

Filter the Data Products#

If there are a subset of products of interest (or, a set of products you would like to exclude) there are a number of ways to do that. The cell below applies a filter to select only calibration level 2 and 3 spectral products classified as SCIENCE plus the INFO files that define product associations; it also excludes guide-star products. See the full set of Products Field Descriptions for the all queryable fields.

# Retrieve level 2 and 3 SCIENCE and INFO products of type spectrum.
filtered_products = Observations.filter_products(products,
                                                 productType=['SCIENCE', 'INFO'],
                                                 dataproduct_type='spectrum',
                                                 calib_level=[2, 3],
                                                )

Display selected columns of the filtered products, if you like.

filtered_products['description','dataURI', 'calib_level', 'size', 'proposal_id']
Table length=106
descriptiondataURIcalib_levelsizeproposal_id
str64str81int64int64str4
source/target (L3) : association generatormast:JWST/product/jw01091-o002_20240112t191830_tso-spec2_00002_asn.json218981091
source/target (L3) : association generatormast:JWST/product/jw01091-o002_20240112t191830_tso-spec2_00003_asn.json218981091
source/target (L3) : association generatormast:JWST/product/jw01091-o002_20240112t191830_tso-spec2_00004_asn.json218981091
source/target (L3) : association generatormast:JWST/product/jw01091-o002_20240112t191830_tso-spec2_00005_asn.json218981091
source/target (L3) : association generatormast:JWST/product/jw01091-o002_20240112t191830_tso-spec2_00006_asn.json218981091
source/target (L3) : association generatormast:JWST/product/jw01091-o002_20240112t191830_tso3_00002_asn.json347961091
target (L3) : spectroscopic white-light curvemast:JWST/product/jw01091-o002_t001_niriss_clear-gr700xd-substrip256_whtlt.ecsv3966721091
exposure/target (L2b/L3): 1D extracted spectrum per integrationmast:JWST/product/jw01091-o002_t001_niriss_clear-gr700xd-substrip256_x1dints.fits37761484801091
exposure (L2b): 3D calibrated exposuremast:JWST/product/jw01091002001_03101_00001-seg001_nis_calints.fits224244646401091
exposure (L2c): 3D Calibrated data per integrationmast:JWST/product/jw01091002001_03101_00001-seg001_nis_o002_crfints.fits224244646401091
...............
exposure (L2c): 3D Calibrated data per integrationmast:JWST/product/jw02734002001_04101_00001-seg002_nis_o002_crfints.fits212919968002734
exposure (L2a): 2D count rate averaged over integrationsmast:JWST/product/jw02734002001_04101_00001-seg002_nis_rate.fits2105465602734
exposure (L2a): 3D countrate per integrationmast:JWST/product/jw02734002001_04101_00001-seg002_nis_rateints.fits210486368002734
exposure/target (L2b/L3): 1D extracted spectrum per integrationmast:JWST/product/jw02734002001_04101_00001-seg002_nis_x1dints.fits2886464002734
exposure (L2b): 3D calibrated exposuremast:JWST/product/jw02734002001_04101_00001-seg003_nis_calints.fits210403481602734
exposure (L2c): 3D Calibrated data per integrationmast:JWST/product/jw02734002001_04101_00001-seg003_nis_o002_crfints.fits210403481602734
exposure (L2a): 2D count rate averaged over integrationsmast:JWST/product/jw02734002001_04101_00001-seg003_nis_rate.fits2105465602734
exposure (L2a): 3D countrate per integrationmast:JWST/product/jw02734002001_04101_00001-seg003_nis_rateints.fits28389296002734
exposure/target (L2b/L3): 1D extracted spectrum per integrationmast:JWST/product/jw02734002001_04101_00001-seg003_nis_x1dints.fits2709286402734
source/target (L3) : association poolmast:JWST/product/jw02734_20240112t200404_pool.csv379852734

Download the Data Products#

We’ll go over your options for data downloads in the sections below. Note that for public data, you will not need to login.

Optional: MAST Login#

If you intend to retrieve data that are protected by an Exclusive Access Period (EAP), you will need to be both authorized and authenticated. You can authenticate by presenting a valid Auth.MAST token with the login function. (See MAST User Accounts for more information about whether you need to login.)

This step is unnecessary if you are only retrieving public data.

If you have arrived at this point, wish to retrieve EAP products, and have not establihed a token, you need to do the following:
  • Create a token here: Auth.MAST
  • Cut/paste the token string in response to the prompt that will appear when downloading the script.
Defining the token string as an environment variable will not work for an already-running notebook.
# Observations.login()

Retrieve FIles#

Now let’s fetch the products. The example below shows how to retrieve a bash script (rather than direct file download) to retreive our entire list at once. Scripts are a much better choice if the number or size of files in the download manifest is large (more than 100 files or 10GB).

# Downloading via a bash script.

manifest = Observations.download_products(filtered_products,
                                          curl_flag=True
                                         )
Downloading URL https://mast.stsci.edu/api/v0.1/Download/bundle.sh to ./mastDownload_20240220205911.sh ...
 [Done]

In the interest of time (and not crashing our servers), we will download one small product from our list above. Let’s download a reasonably sized (~10MB) file. The file we choose is raw spectral data, so additional extraction would be needed for scientific analysis.

# We are fixing the file for reproducability
Observations.download_file("mast:JWST/product/jw02734001001_04101_00001-seg004_nis_rate.fits");
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02734001001_04101_00001-seg004_nis_rate.fits to /home/runner/work/mast_notebooks/mast_notebooks/notebooks/JWST/SI_keyword_exoplanet_search/jw02734001001_04101_00001-seg004_nis_rate.fits ...
 [Done]

Let’s actually visualize the raw data from which you can extract the spectrum:

# Read in the "SCI" data from the fits file
sci = fits.getdata("jw02734001001_04101_00001-seg004_nis_rate.fits", 1)

plt.figure(figsize=(15, 10))
plt.imshow(sci);
../../../_images/b51a5b9d05f55ef33ba619e7ac8776730108cb90c72c114132ed7f926e40f970.png

We are, in effect, seeing the cleaned spectrum on the detector; if you adjust the scaling you might be able to see the spectrum of order three in the lower left corner.

Additional Resources#

The links below take you to documentation that you might find useful when constructing queries.

About this notebook#

This notebook was developed by Archive Sciences Branch staff: chiefly Dick Shaw, with additional editing from Jenny Medina and Thomas Dutkiewicz.

For support, please contact the Archive HelpDesk at archive@stsci.edu, or through the JWST HelpDesk Portal.

Last updated: May 2023

Space Telescope Logo