Astroquery: Exploring Metadata from the James Webb Space Telescope#


Learning Goals#

By the end of this tutorial, you will:

  • Understand how to use the astroquery.mast module to access metadata from the James Webb Space Telescope (JWST).

  • Run metadata queries based on coordinates, an object name, or non-positional criteria.

  • Use optional search parameters to further refine query results.

Automated testing has found an error in this Notebook. The authors have been notified and are working on the issue; in the meantime, please use this as a reference only.

Table of Contents#

  • Introduction

  • Querying MAST for JWST Metadata

    • Setup

    • Optional Search Parameters

    • Query by Object Name

    • Query by Region

    • Query by Criteria

  • Additional Resources

  • Exercise Solutions

Introduction#

Welcome! This tutorial focuses on using the astroquery.mast module to search for metadata from the James Webb Space Telescope (JWST). Launched in December of 2021, JWST is an advanced space observatory designed for observations in the infrared light spectrum.

The Mikulski Archive for Space Telescopes (MAST) hosts publicly accessible data products from space telescopes like JWST. astroquery.mast provides access to a broad set of JWST metadata, including header keywords, proposal information, and observational parameters. The available metadata can also be found using the MAST JWST Search interface.

Please note that astroquery.mast.MastMissions and the MAST JWST Search API do not yet support data product downloads.

Imports#

This notebook uses the following packages:

  • astroquery.mast to query the MAST Archive

  • astropy.coordinates to assign coordinates of interest

from astroquery.mast import MastMissions
from astropy.coordinates import SkyCoord

Querying MAST for JWST Metadata#

Setup#

In order to make queries on JWST metadata, we will have to perform some setup. First, we will instantiate an object of the MastMissions class and assign its mission to be 'jwst'. Its service is set to the default of 'search'.

# Create MastMissions object and assign mission to 'jwst'
missions = MastMissions(mission='jwst')

print(f'Mission: {missions.mission}')
print(f'Service: {missions.service}')

When writing queries, keyword arguments can be used to specify output characteristics (see the following section) and filter on values like instrument, exposure type, and proposal ID. The available column names for a mission are returned by the get_column_list function. Below, we will print out the name, data type, and description for the first 10 columns in JWST metadata.

# Get available columns for JWST mission
columns = missions.get_column_list()
columns[:10]

Optional Search Parameters#

Before we dive in to the actual queries, it’s important to know how we can refine our results with optional keyword arguments. The following parameters are available:

  • radius: For positional searches only. Only return results within a certain distance from an object or set of coordinates. Default is 3 arcminutes.

  • limit: The maximum number of results to return. Default is 5000.

  • offset: Skip the first n results. Useful for paging through results.

  • select_cols: A list of columns to be returned in the response.

As we walk through different types of queries, we will see these parameters in action!

Query by Object Name#

We’ve reached our first query! We can use object names to perform metadata queries using the query_object function.

To start, let’s query for the Messier 1 object, a supernova remnant in the Taurus constellation. You may know it better as the Crab Nebula!

# Query for Messier 1 ('M1')
results = missions.query_object('M1')

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]

There were 250 total results, meaning that 250 JWST datasets were targeting the Crab Nebula. Now, let’s try refining our search a bit more.

  • Each dataset is associated with a celestial coordinate, given by targ_ra (right ascension) and targ_dec (declination). By default, the query returns all datasets that fall within 3 arcminutes from the object’s coordinates. Let’s set the radius parameter to be 1 arcminute instead.

  • Say that we’re not interested in the first 4 results. We can assign offset to skip a certain number of rows.

  • By default, a subset of recommended columns are returned for each query. However, we can specify exactly which columns to return using the select_cols keyword argument. The ArchiveFileID column is included automatically.

# Refined query for Messier 1 ('M1')
results = missions.query_object('M1',
                                radius=1,  # Search within a 1 arcminute radius
                                offset=4,  # Skip the first 4 results
                                select_cols=['fileSetName', 'targprop', 'date_obs'])  # Select certain columns

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]

Exercise 1#

Now it’s your turn! Try querying for the Whirlpool Galaxy object. Search within a radius of 1 arcminute, skip the first 300 results, and select the fileSetName and opticalElements columns.

# # Query for Whirlpool Galaxy
# results = missions.query_object(...)  # Write your query!

# # Display the first 5 results
# print(f'Total number of results: {len(results)}')
# results[:5]

Query by Region#

The missions object also allows us to query by a region in the sky. By passing in a set of coordinates to the query_region function, we can return datasets that fall within a certain radius value of that point. This type of search is also known as a cone search.

# Create coordinate object
coords = SkyCoord(210.80227, 54.34895, unit=('deg'))

# Query for results within 10 arcminutes of coords
results = missions.query_region(coords, radius=10)

# Display results
print(f'Total number of results: {len(results)}')
results[:5]

395 JWST datasets fall within our cone search. In other words, their target coordinates are within 10 arcminutes of the coordinates that we defined.

Exercise 2:#

JWST has observed the star Vega, which has a right ascension of 279.23473 degrees and a declination of 38.78369 degrees. Use the query_region function to search for datasets within 15 arcminutes of Vega. Select the fileSetName, targprop, targ_ra, and targ_dec columns.

# # Vega coordinates
# vega = SkyCoord(_, _, unit=('deg'))  # Fill in with Vega's coordinates

# # Query for datasets around Vega
# results = missions.query_region(...)  # Write your query!

# # Display the first 5 results
# print(f'Total number of results: {len(results)}')
# results[:5]

Query by Criteria#

In some cases, we may want to run queries with non-positional parameters. To accomplish this, we use the query_criteria function.

For any of our query functions, we can filter our results by the value of columns in the dataset.

Let’s say that we only want observations from JWST’s Near Infrared Camera (NIRCam) instrument, and that we only want datasets connected to program number 1189.

# Query with column criteria
results = missions.query_criteria(instrume='NIRCAM',  # From Near Infrared Camera
                                  program=1189,
                                  select_cols=['fileSetName', 'instrume', 'exp_type', 'program', 'pi_name'])

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]

To exclude and filter out a certain value from the results, we can prepend the value with !.

Let’s run the same query as above, but this time, we will filter out datasets coming from the NIRCam instrument.

# Filtered query, excluding NIRCam datasets
results = missions.query_criteria(program=1189,
                                  instrume='!NIRCAM',  # Exclude datasets from the NIRCam instrument
                                  select_cols=['fileSetName', 'instrume', 'exp_type', 'program', 'pi_name'])

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]

We can also use wildcards for more advanced filtering. Let’s use the same query from above, but we will add an exposure type filter for fixed slits (FS) spectroscopy.

# Filtered query with wildcard
results = missions.query_criteria(program=1189,
                                  instrume='!NIRCAM',  # Exclude datasets from the NIRCam instrument
                                  exp_type='*FIXEDSLIT*', # Any exposure type that contains 'FIXEDSLIT'
                                  select_cols=['fileSetName', 'instrume', 'exp_type', 'program', 'pi_name'])

# Display the first 10 results
print(f'Total number of results: {len(results)}')
results[:10]

To filter by multiple values for a single column, we use a string of the values delimited by commas.

To illustrate this, we will use a slightly different query. We query for datasets that have a fixed slits spectroscopy exposure type and targets with moving coordinates (targtype='MOVING'). We will add another filter to match three different last names for principal investigators (PIs).

# Filtered query with multiple values
results = missions.query_criteria(exp_type='*FIXEDSLIT*', # Any exposure type that contains 'FIXEDSLIT'
                                  targtype='MOVING',  # Only return moving targets
                                  pi_name='Stansberry, Parker, Lunine',  # Last name of PI can be any of these 3 values
                                  select_cols=['fileSetName', 'targtype', 'instrume', 'exp_type', 'program', 'pi_name'])

# Display the first 10 results
print(f'Total number of results: {len(results)}')
results[:10]

For columns with numeric or date values, we can filter using comparison values:

  • <: Return values less than or before the given number/date

  • >: Return values greater than or after the given number/date

  • <=: Return values less than or equal to the given number/date

  • >=: Return values greater than or equal to the given number/date

As an example, let’s write a query to return all datasets with an observation date before February 1, 2022.

# Query using comparison operator
results = missions.query_criteria(date_obs='<2022-02-01',  # Must be observed before February 1, 2022
                                  select_cols=['fileSetName', 'program', 'date_obs'])

# Display results
print(f'Total number of results: {len(results)}')
results

For numeric or date data types, we can also filter with ranges. This requires the following syntax: '#..#'.

Let’s write a query that uses range syntax to return datasets that belong to a program number between 1150 and 1155. We will also select for exposure durations that are greater than or equal to 100 seconds.

# Query using range operator
results = missions.query_criteria(program='1150..1155', # Program number between 1150 and 1155
                                  duration='>100',  # Exposure duration is greater than or equal to 100 seconds
                                  select_cols=['fileSetName', 'program', 'duration'])

# Display results
print(f'Total number of results: {len(results)}')
results

Exercise 3#

It’s time to apply all that you’ve learned! Write a non-positional query based on the following:

  • Fixed targets (HINT: targtype='FIXED')

  • Instument is Mid-Infrared Instrument (MIRI) or Fine Guidance Sensor (FGS)

  • Proposal type should NOT include General Observers (GO)

  • Exposure type includes the string 'IMAGE'

  • Right ascension is between 70 and 75 degrees

  • Program number is less than 1200.

  • Skip the first 5 entries.

  • Select the following columns: fileSetName, targtype, instrume, proposal_type, exp_type, targ_ra, program

# # A non-positional query with column criteria
# results = missions.query_criteria(...)  # Write your query here!

# # Display results
# print(f'Total number of results: {len(results)}')
# results

Additional Resources#

Exercise Solutions#

Exercise 1#

# Query for Whirlpool Galaxy
results = missions.query_object('Whirlpool',
                                radius=1,  # Search radius of 1 arcminute
                                offset=300,  # Skip the first 300 rows
                                select_cols=['fileSetName', 'opticalElements'])

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]

Exercise 2#

# Vega coordinates
vega = SkyCoord(279.23473, 38.78369, unit=('deg'))

# Query for datasets around Vega
results = missions.query_region(vega,
                                radius=15,  # Search radius of 15 arcminutes
                                select_cols=['fileSetName', 'targprop', 'targ_ra', 'targ_dec'])

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]

Exercise 3#

# A non-positional query with column criteria
results = missions.query_criteria(targtype='FIXED',  # Fixed target
                                  instrume='MIRI, FGS',  # Select MIRI and FGS observations
                                  proposal_type='!GO',  # Not from a general observer proposal
                                  exp_type='*IMAGE*',  # Contains the string "IMAGE"
                                  targ_ra='70..75',  # Between 70 and 75
                                  program='<1200',  # Less than 1200
                                  offset=5,  # Skip the first 5 results
                                  select_cols=['fileSetName', 'targtype', 'instrume', 'proposal_type', 
                                               'exp_type', 'targ_ra', 'program'])

# Display results
print(f'Total number of results: {len(results)}')
results

Citations#

If you use astroquery for published research, please cite the authors. Follow these links for more information about citing astroquery:

About this Notebook#

Author(s): Sam Bianco
Keyword(s): Tutorial, JWST, Astroquery, MastMissions
First published: June 2024
Last updated: June 2024


Top of Page Space Telescope Logo