MAST Query#

Use case: Be able to perform MAST queries for NIRSpec data.
Data: None Tools: astroquery.
Cross-intrument: all instruments.
Documentation: This notebook is part of a STScI’s larger post-pipeline Data Analysis Tools Ecosystem.

By the end of this tutorial, you will:

1) Be able to perform MAST queries for NIRSpec data 
2) Understand MAST search options for NIRSpec MOS, IFU, and Fixed Slit data
3) Filter MAST queries by various parameters
4) Perform potential duplication checks for your targets

Introduction#

MAST (Barbara A. Mikulski Archive for Space Telescopes) provides a means of retrieving data through a variety of methods. The MAST Portal allows users to interactively search for particular observations and filter by keywords. An in-depth tutorial is available to aid users in accessing MAST in this manner. In some scenarios, particularly with large queries, it may be more efficient to use Astroquery. Some broad capabilities including positional queries, metadata queries, and catalog queries are described in the documentation. General documentation on the MAST API is also available.

One reason why users may want to search the archive is to find potential duplicatation observations for their proposals. As described in the JWST Duplicate Observations Policy, an observation may be a duplication of another observation if it observes the same astronomical source or field with the same instrument as well as similar sensitivity and spectral range. More instrument-specific details can be found in the documentation.

The following topics are covered in this notebook:

  1. How to submit a NIRSpec MAST Query using python

    • How to perform a broad search for all NIRSpec Data

    • How to perform a search using one query filter parameter

    • How to perform a search using multiple query filter parameters

    • Information on available search parameters

  2. How to perform checks for potential duplication issues with any given targets by comparing with pre-existing MAST Data (NOTE: This may not include scheduled observations, and results should be confirmed in APT)

    • Define an input catalog describing a potential observation

    • Query MAST within a search radius for observations with overlapping wavelengths and similar exposure times

    • Examine potential for duplicate observations

Although this notebook is designed for NIRSpec users, other JWST instruments follow similar structures for MAST queries. More information that may be relevant to additional instruments can be found on the Astroquery web page.

Main Content#

# Import MAST
from astroquery.mast import Mast

Perform a search for all NIRSpec Data#

This first example demonstrates how to use Mast.service_request_async(), which builds and executes a Mashup query based upon a service and parameters. More information and description of options can be found in the documentation. The response in this format provides a format which can list filenames.

# Set up the service for whichever instrument you want to query. Format with the first letter capitalized only.
service = "Mast.Jwst.Filtered.Nirspec"

# Enter query parameters, or leave empty (as shown here) to retrieve all data for that instrument.
# Leaving 'columns' as an asterisk includes all data columns. Replacing 'columns' with COUNT_BIG(*) will 
#     return the number of files in the search.
parameters = {"columns": "*",
               "filters": []}

# Retrieve the MAST response (this step may take a few seconds)
response = Mast.service_request_async(service, parameters)
# Gather the results
results = response[0].json()['data']
# Take the filename from each result if you prefer a list of filenames
filenames = []
for result in results:
    filename = result['filename'].split('.')[0]
    filenames.append(filename)
# Print the first ten filenames
print(filenames[:10])
['jw01864-c1003_t003_nirspec_g395h-f290lp_s3d', 'jw01864-c1003_t003_nirspec_g395h-f290lp_x1d', 'jw01864-c1003_t002_nirspec_g395h-f290lp_s3d', 'jw01864-c1003_t002_nirspec_g395h-f290lp_x1d', 'jw01864-c1000_t002_nirspec_g395h-f290lp_s3d', 'jw01864-c1000_t002_nirspec_g395h-f290lp_x1d', 'jw01947-c1008_t008_nirspec_g395m-f290lp_s3d', 'jw01947-c1008_t008_nirspec_g395m-f290lp_x1d', 'jw01893-c1003_t004_nirspec_prism-clear_s3d', 'jw01893-c1003_t004_nirspec_prism-clear_x1d']

Use one query parameter:#

# Enter one query parameter:
service = "Mast.Jwst.Filtered.Nirspec"
one_parameter = {"columns": "*",
               "filters": [{"paramName": "exp_type",
                            "values": ['NRS_MSASPEC']
                            }]
             }
response_one_parameter = Mast.service_request_async(service, one_parameter)
results_one_parameter = response_one_parameter[0].json()['data']
# Print the first ten filenames
for result_one_parameter in results_one_parameter[:10]:
    print(result_one_parameter['filename'].split('.')[0])
jw01181-c1009_s00761_nirspec_f170lp-g235m_s2d
jw01181-c1009_s00761_nirspec_f170lp-g235m_x1d
jw01181-c1009_s00761_nirspec_f290lp-g395m_s2d
jw01181-c1009_s00761_nirspec_f290lp-g395m_x1d
jw01181-c1009_s00849_nirspec_f170lp-g235m_s2d
jw01181-c1009_s00849_nirspec_f170lp-g235m_x1d
jw01181-c1009_s00849_nirspec_f290lp-g395m_s2d
jw01181-c1009_s00849_nirspec_f290lp-g395m_x1d
jw01181-c1009_s00917_nirspec_f170lp-g235m_s2d
jw01181-c1009_s00917_nirspec_f170lp-g235m_x1d

Use multiple query parameters:#

# Use multiple query parameters
service = "Mast.Jwst.Filtered.Nirspec"
multiple_parameters = {"columns": "*",
               "filters": [{"paramName": "apername",
                            "values": ['NRS1_FULL', 'NRS_FULL_MSA']
                            },
                            {"paramName": "detector",
                             "values": ['NRS1']
                            },
                            {"paramName": "filter",
                             "values": ['F290LP', 'F170LP']
                            },
                            {"paramName": "exp_type",
                             "values": ['NRS_IFU', 'NRS_MSASPEC', 'NRS_BRIGHTOBJ','NRS_WATA']
                            },
                            {"paramName": "readpatt",
                             "values": ['NRS', 'NRSRAPID']
                            }
                            ]}
response_multiple_parameters = Mast.service_request_async(service, multiple_parameters)
results_multiple_parameters = response_multiple_parameters[0].json()['data']
# Print the first ten filenames
for result_multiple_parameters in results_multiple_parameters[:10]:
    # In addition to the filename, additional parameters can be printed. See below for a list of available options.
    print(result_multiple_parameters['filename'].split('.')[0] + '     ' + result_multiple_parameters['visitype'] )
jw04291007001_03101_00001_nrs1_rate     PRIME_TARGETED_FIXED
jw04291007001_03101_00001_nrs1_rateints     PRIME_TARGETED_FIXED

A list of available filtering parameters in the Archive:#

The following parameters are available filters using the MAST query service. To use any of these filters, add the filter as the ‘paramName’ and the various options you would like to filter by as the ‘values’.

In addition to being used as filters, if you would like to return something other than the filename, any of these keys can also be used to access or return any component of the result. For instance, instead of printing the filenames with “result[‘filename’]” as shown above, one could print the PI name by replacing this with “result[‘pi_name’]”.

# Dictionary keys of all of the NIRSpec data (without any filters):
all_keys = ""
for result in results:
    for key in result.keys():
        if key not in all_keys:
            all_keys = all_keys + key + ", "
print(all_keys[:-2]) # print all keys without the last comma
ArchiveFileID, filename, fileSetName, productLevel, act_id, apername, asnpool, asntable, bartdelt, bendtime, bkgdtarg, bkglevel, bkgsub, bmidtime, bstrtime, category, cont_id, datamode, dataprob, date, date_mjd, date_end, date_end_mjd, date_obs, date_obs_mjd, detector, drpfrms1, drpfrms3, duration, effexptm, effinttm, eng_qual, exp_type, expcount, expend, expmid, exposure, expripar, expstart, fastaxis, filter, frmdivsr, gainfact, gdstarid, groupgap, gs_dec, gs_mag, gs_order, gs_ra, gsendtim, gsendtim_mjd, gsstrttm, gsstrttm_mjd, gs_udec, gs_umag, gs_ura, helidelt, hendtime, hga_move, hmidtime, hstrtime, instrume, intarget, is_psf, lamp, mu_dec, mu_epoch, mu_epoch_mjd, mu_ra, nexposur, nextend, nframes, ngroups, nints, nresets, nrststrt, nsamples, numdthpt, nwfsest, obs_id, observtn, obslabel, origin, pcs_mode, pi_name, pps_aper, prd_ver, program, prop_dec, prop_ra, pwfseet, readpatt, sca_num, scicat, sdp_ver, selfref, seq_id, slowaxis, subarray, subcat, subsize1, subsize2, substrt1, substrt2, targ_dec, targ_ra, targname, targoopp, targprop, targtype, targudec, targura, telescop, template, tframe, tgroup, timesys, title, tsample, tsovisit, visit_id, visitend, visitend_mjd, visitgrp, visitsta, visitype, vststart, vststart_mjd, xoffset, yoffset, zerofram, errtype, rois, roiw, wpower, wtype, datamodl, exp_only, exsegnum, exsegtot, intstart, intend, date_beg, date_beg_mjd, obsfoldr, sctarate, opmode, osf_file, expsteng, expsteng_mjd, masterbg, scatfile, srctyapt, tcatfile, texptime, patt_num, pattsize, patttype, pridtpts, subpxpts, crowdfld, engqlptg, oss_ver, noutputs, gs_v3_pa, dirimage, pixfrac, pxsclrt, segmfile, va_dec, va_ra, compress, bkgmeth, targcat, targdesc, gsc_ver, primecrs, extncrs, tmeasure, s_region, cal_ver, cal_vcs, crds_ctx, crds_ver, fcsrlpos, focuspos, fxd_slit, grating, gwa_pxav, gwa_pyav, gwa_tilt, gwa_xp_v, gwa_xtil, gwa_yp_v, gwa_ytil, is_imprt, msaconid, msametfl, msametid, msastate, preimage, rma_pos, subpxpat, nrs_norm, nrs_ref, pattstrt, dithpnts, nod_type, spat_num, spec_num, fileSize, checksum, ingestStartDate, ingestStartDate_mjd, ingestCompletionDate, ingestCompletionDate_mjd, FileTypeID, publicReleaseDate, publicReleaseDate_mjd, isRestricted, isItar, isStale, FileSetId, dataURI

Note that some of the above parameters are specific to a particular observation mode. For instance, ‘msaconid’, ‘msametfl’, ‘msametid’, and ‘msastate’ correspond to MOS observations.

# All options for a given parameter (eg, exp_type):
exp_type_list = []
for x in results:
    exp_type_list.append(x['exp_type']) if x['exp_type'] not in exp_type_list else exp_type_list
for exp_type in exp_type_list:
    print(exp_type)
NRS_IFU
NRS_MSASPEC
NRS_LAMP
NRS_MIMF
NRS_BRIGHTOBJ
NRS_MSATA
NRS_TACONFIRM
NRS_DARK
NRS_FIXEDSLIT
NRS_WATA
NRS_IMAGE
NRS_CONFIRM
NRS_AUTOWAVE
NRS_FOCUS
NRS_AUTOFLAT

Potential Duplication Checks#

Duplication for NIRSpec might occur if there is a target within the duplication search radius that not only uses the same grating and disperser but also has an exposure time within a factor of four of that of the previous observation. In wide field imaging, the field overlap must be greater than 50% to be considered a duplicate. More information on identifying potential duplicate observations might be useful for observers.

In this demonstration, an input catalog is used to search an area with a radius equal to the longest side of the aperture, query for those observations matching the wavelength and exposure time criteria, and find sources in the archive which might be duplications for any target in the catalog.

While the Mast.service_request_async() shown above is a useful method for filtering data with metadata parameters, the Observations.query_criteria() is used in this case to search a particular region of the sky with additional parameters.

Duplication search radii are dependent on aperture sizes, which are described in the NIRSpec Fixed Slits JDox page. The longest side of the slit is used for the fixed slit duplication search radius. The following cell contains duplication search radii for each NIRSpec aperture in degrees.

# Default duplication search radii
mos_radius = 180/360  # arcseconds/360 to get degrees
mos_slit_radius = 0.46/360 # arcseconds/360 to get degrees
ifu_radius = 3/360  # arcseconds/360 to get degrees
fs_S1600A1_radius = 1.6/360  # arcseconds/360 to get degrees
fs_S200A1_radius = 3.3/360  # arcseconds/360 to get degrees
fs_S200A2_radius = 3.3/360  # arcseconds/360 to get degrees
fs_S400A1_radius = 3.8/360  # arcseconds/360 to get degrees
fs_S200B1_radius = 3.3/360  # arcseconds/360 to get degrees

The search area to use in the potential duplicate observation search is determined by the longest side of the aperture as defined above.

search_area = {"MSA": mos_radius, "IFU": ifu_radius, "S1600A1": fs_S1600A1_radius, "S200A1": fs_S200A1_radius,
               "S200A2": fs_S200A2_radius, "S400A1": fs_S400A1_radius, "S200B1": fs_S200B1_radius}

The value labeled ‘instrument_name’ in this case includes both the instrument and mode. This dictionary is used to determine the appropriate name to use in the query.

instrument_name = {"MSA": "NIRSPEC/MSA", "IFU": "NIRSPEC/IFU", "S1600A1": "NIRSPEC/SLIT", "S200A1": "NIRSPEC/SLIT",
                   "S200A2": "NIRSPEC/SLIT", "S400A1": "NIRSPEC/SLIT", "S200B1": "NIRSPEC/SLIT"}

Setup input catalog containing the target catalog number, aperture, RA, DEC, grating, filter, exposure time, search area, and instrument/mode name for each target. This format was chosen so that additional targets can easily be added to the input catalog list. Although these are the primary considerations for duplication checking, additional criteria can also be added using this dictionary format. It may be useful to check which observation fields are available for filtering NIRSpec data using Astroquery.

Note that NIRSpec FS or IFU spectroscopic observations can be considered duplications of MOS observations of the same astronomical target with similar wavelength coverage and resolution. Thus, it may be helpful to query for MOS observations as well when searching for FS or IFU observation potential duplicates.

# Input Catalog to use in query

target1_number = "1"
target1_aperture = "MSA"
target1_RA = 53.13
target1_DEC = -27.8
target1_grating = "G395H"
target1_filters = "F290LP"
target1_exposure_time = 950
target1_search_area = search_area[target1_aperture]
target1_instrument_name = instrument_name[target1_aperture]
target1 = {'number': target1_number, 'RA': target1_RA, 'DEC': target1_DEC, 'grating': target1_grating, 
           'filters': target1_filters, 'exposure_time': target1_exposure_time, 'search_area': target1_search_area,
           'instrument_name': target1_instrument_name}

target2_number = "2"
target2_aperture = "IFU"
target2_RA = 68.73091
target2_DEC = 24.48140
target2_grating = "G395H"
target2_filters = "F290LP"
target2_exposure_time = 22.749
target2_search_area = search_area[target2_aperture]
target2_instrument_name = instrument_name[target2_aperture]
target2 = {'number': target2_number, 'RA': target2_RA, 'DEC': target2_DEC, 'grating': target2_grating,
           'filters': target2_filters, 'exposure_time': target2_exposure_time, 'search_area': target2_search_area,
           'instrument_name': target2_instrument_name} 
           
target3_number = "3"
target3_aperture = "S200B1"
target3_RA = 5.130
target3_DEC = -36.895
target3_grating = "G140H"
target3_filters = "F070LP"
target3_exposure_time = 1880.22
target3_search_area = search_area[target3_aperture]
target3_instrument_name = instrument_name[target3_aperture]
target3 = {'number': target3_number, 'RA': target3_RA, 'DEC': target3_DEC, 'grating': target3_grating,
           'filters': target3_filters, 'exposure_time': target3_exposure_time, 'search_area': target3_search_area,
           'instrument_name': target3_instrument_name} 

input_catalog = [target1, target2, target3]

Import MAST’s Observations package in order to query based on particular criteria other than position or target name. This package also allows you to search by ‘proposal_pi’ or other observation fields. In the case of duplication checking, this query format is particularly useful because it allows users to search within a minimum and maximum RA and Dec value in addition to other parameters such as filters and gratings.

from astroquery.mast import Observations
import numpy as np

Next, a dictionary is generated using the target’s catalog number as a key which contains observation tables for each target in the input catalog.

# Generate dictionary of observation tables
table = {}
for target in input_catalog:  # Loop through all targets in catalog
    key=target['number']
    # Perform query using information in the input catalog
    table[key] = Observations.query_criteria(instrument_name = target['instrument_name'],
                     s_ra=[(target['RA']-target['search_area']),(target['RA']+target['search_area'])],
                     s_dec=[target['DEC']-target['search_area'],target['DEC']+target['search_area']],
                     grating=target['grating'],
                     filters=target['filters'],
                     t_exptime=[(target['exposure_time'])/4,(target['exposure_time'])*4])
    # Perform query for MOS data since FS and IFU observations can also be duplicates of MOS observations
    if target['instrument_name'] != "NIRSPEC/MSA":
        table[key+"MOS"] = Observations.query_criteria(instrument_name = "NIRSPEC/MSA",
                         s_ra=[target['RA']-target['search_area'],target['RA']+target['search_area']],
                         s_dec=[target['DEC']-target['search_area'],target['DEC']+target['search_area']],
                         grating=target['grating'],
                         filters=target['filters'],
                         t_exptime=[(target['exposure_time'])/4,(target['exposure_time'])*4]
                         )
    print("""The observation table below contains potential duplicate observations for 
          target {} in the input catalog.""".format(key))
    print(table[key][:5])  # Remove the [:5] indexing to display all results
    print()
WARNING: InputWarning: Filter grating does not exist. This filter will be skipped. [astroquery.mast.discovery_portal]
The observation table below contains potential duplicate observations for 
          target 1 in the input catalog.
intentType obs_collection provenance_name ... srcDen   obsid     objID  
---------- -------------- --------------- ... ------ --------- ---------
   science           JWST         CALJWST ...    nan 184082682 432756645
   science           JWST         CALJWST ...    nan 184077491 432756772
   science           JWST         CALJWST ...    nan 184077510 432756812
   science           JWST         CALJWST ...    nan 184086672 432756908
   science           JWST         CALJWST ...    nan 184085888 432757102
WARNING: NoResultsWarning: Query returned no results. [astroquery.mast.discovery_portal]
The observation table below contains potential duplicate observations for 
          target 2 in the input catalog.
intentType obs_collection provenance_name ... srcDen   obsid     objID  
---------- -------------- --------------- ... ------ --------- ---------
   science           JWST         CALJWST ...    nan 174587286 532806367
   science           JWST         CALJWST ...    nan 174587254 532806410
   science           JWST         CALJWST ...    nan 174587266 532806430
   science           JWST         CALJWST ...    nan 174587305 532806485
   science           JWST         CALJWST ...    nan 174587324 532806560
The observation table below contains potential duplicate observations for 
          target 3 in the input catalog.
intentType obs_collection provenance_name ... srcDen   obsid     objID  
---------- -------------- --------------- ... ------ --------- ---------
   science           JWST         CALJWST ...    nan 202703412 430536499
   science           JWST         CALJWST ...    nan 202703511 430536516
   science           JWST         CALJWST ...    nan 202703591 430536535
   science           JWST         CALJWST ...    nan 202703639 430536558
   science           JWST         CALJWST ...    nan 202703744 430536574

Because it is important to check these observations in APT to determine more robustly whether or not these potential duplicate observations would indeed be considered duplications, this tool can be used to retrieve the proposal IDs that can then be searched in APT.

# Print a list of the proposal IDs generated in the full table
proposal_ids=[]
for target in input_catalog:  # Go through every target in the input catalog
    for proposal_id in table[target['number']]['proposal_id']:  # Loop through proposal IDs for each observation
        if int(proposal_id) not in proposal_ids:  # Remove duplicates
            proposal_ids.append(int(proposal_id))
print("It may be helpful to check these proposal IDs in APT to compare proposal details:", proposal_ids)
It may be helpful to check these proposal IDs in APT to compare proposal details: [1212, 1287, 1286, 3215, 1180, 1282, 1222]
# Get list of Product Group ID (obsid rather than Observation ID / obs_id)
obs_ids=[]
for target in input_catalog:  # Go through every target in the input catalog
    for obs_id in table[target['number']]['obsid']:  # Loop through proposal IDs for each observation
        if obs_id not in obs_ids:  # Remove duplicates
            obs_ids.append(obs_id)
print('Product group IDs:', obs_ids[:10])
Product group IDs: [np.str_('184082682'), np.str_('184077491'), np.str_('184077510'), np.str_('184086672'), np.str_('184085888'), np.str_('184081863'), np.str_('184077795'), np.str_('184077831'), np.str_('184081418'), np.str_('184083268')]
# Download data products based on a Product Group ID such as '2003839997'
# Data will download to a folder named "mastDownload" in present working directory unless a download directory is given as a parameter.
# Warning: This cell may take a few minutes to run.
product_group_ID = '2003839997'
manifest = Observations.download_products(product_group_ID, download_dir=None)
WARNING: NoResultsWarning: No products to download. [astroquery.mast.observations]

Note that if you would like to access proprietary data, you may need to log in first. More information can be found in the documenation on Accessing Proprietary Data.

Additional Resources#

A notebook is available which shows a target field by querying position. Another notebook provides details on Mashup. These notebooks may be useful for more general JWST duplication checking, as they provide information such as uploading a target list of files or visualizing results in Aladin.

As noted in more detail in Identifying Potential Duplicate Observations, it is essential to check these targets in APT.

If you use astroquery, please cite the paper Ginsburg, Sipőcz, Brasseur et al 2019.

About this Notebook:#

Author: Teagan King, Science Support Analyst

Date Updated: March 12, 2021


Top of Page Space Telescope Logo