Searching for Mission-Specific Data with Astroquery#


Learning Goals#

By the end of this tutorial, you will:

  • Understand how to use the astroquery.mast module to access mission dataset metadata from MAST.

  • Run metadata queries based on coordinates, an object name, or non-positional criteria.

  • Filter and download data products associated with datasets of interest.

  • Search for datasets from multiple missions and among High Level Science Products (HLSPs).

Table of Contents#

  • Introduction

  • Imports

  • Querying for Datasets from Missions-MAST

    • Search Parameters

    • Query by Object Name

    • Query by Region

    • Query by Criteria

  • Getting Data Products

    • Performing a Product Query

    • Filtering Data Products

  • Downloading Products

    • Exclusive Data Access

  • Switching Missions

  • Exercises

  • Exercise Solutions

  • Additional Resources

Introduction#

Welcome! This tutorial explores the capabilities of the astroquery.mast.MastMissions class, a versatile tool for accessing and working with datasets hosted by the Mikulski Archive for Space Telescopes (MAST). MastMissions is a Python wrapper for the MAST Search API, which allows you to search for mission-specific dataset metadata and data products. This data is also findable through the MAST Search UI.

The following missions/products are available for search as of January 2025:

In this notebook, we will walk through the basic workflow for searching datasets, retrieving data products, and downloading data products. This workflow will look very similar to the one used with the astroquery.mast.Observations class, detailed in our “Searching MAST using astroquery.mast” notebook. There are a few key differences to note, and you should use the class that is best suited for your unique goals:

  • API: MastMissions uses the Mast Search API while Observations uses the MAST Portal API.

  • Collection: MastMissions can only perform queries on a single collection, or “mission”, at a time. Observations uses the Common Archive Observation Model (CAOM) and can run queries across every available observational collection at the same time.

  • Filter Keywords: MastMissions has an extensive selection of mission-specific keywords to use while writing queries. Observations is limited to the fields described by the CAOM and has no criteria with mission-specific meaning.

In summary, MastMissions is well-suited for fast, mission-specific queries that might require a more extensive selection of filter keywords. Observations is better for more broad, multi-mission searches.

Imports#

This notebook uses the following packages:

  • astropy to handle astronomical units and coordinate systems

  • astroquery.mast to query the MAST Archive

import astropy.units as u
from astropy.coordinates import SkyCoord
from astroquery.mast import MastMissions

Querying for Datasets from Missions-MAST#

In order to make queries on Missions-MAST metadata, we will have to perform some setup. We will initialize an object of the astroquery.mast.MastMissions class and assign its mission attribute. The object can be used to search mission dataset metadata by object name, sky position, or other criteria.

The default value for mission is hst, meaning that queries will be run on Hubble dataset metadata. The searchable metadata for Hubble encompasses all information that was previously accessible through the original HST web search form. The metadata for Hubble and all other available missions is also available through the MAST Search UI.

Later in the tutorial, we will learn how to change the mission attribute to make queries on other missions.

# Create MastMissions object to search for Hubble datasets
missions = MastMissions(mission='hst')
missions.mission
'hst'

Search Parameters#

When writing queries, keyword arguments can be used to specify output characteristics and filter on fields like instrument, exposure type, and proposal ID. The available column names for a mission are returned by the get_column_list function. Below, we will print out the name, data type, and description for the first 10 columns in HST metadata.

# Get available columns for HST mission
columns = missions.get_column_list()
columns[:10]
Table length=10
namedata_typedescription
str22str9str226
search_posstringSearch Position (RA and Dec)
sci_data_set_namestringData set name, the first character indicates instrument; L=COS; I=WFC3; J=ACS; N=NICMOS; O=STIS; U=WFPC2; W=WFPC; X=FOC; Y=FOS; Z=GHRS; F=FGS; V=HSP; nine-character name (e.g. J8BA7JCAQ, O4140Q020)
sci_targnamestringTarget name designated by the observer for the HST proposal; Uppercase; No blank characters; Spaces sometimes filled with - ; (e.g. A901-FIELD-25, NGC4486-POS1, 0537-441INCA221-36, ALPHA-CEN)
sci_hapnumbooleanReports if there are any Hubble Advanced Products (HAP), enter 0 for no, 1 for yes
sci_haspnumbooleanReports if there are any Hubble Advanced Spectral Products (HASP), enter 0 for no, 1 for yes
sci_instrumestringInstrument used (e.g. ACS, COS, FGS, FOC, FOS, HRS, HSP, NICMOS, STIS, WFC3, WFPC, WFPC2)
sci_aper_1234stringAperture configuration; WFPC2 (e.g. PC1, WF3, WFALL); ACS (e.g. WFC, HRC, SBC); STIS (e.g. 25MAMA, F25QTZ)
sci_spec_1234stringThe filter(s) or grating(s) used (e.g. G160L, G270M, G230LB, F300W)
sci_actual_durationfloatExposure time in seconds
sci_start_timedatetimeObservation start time; The earliest in-flight data is available from Apr 24 1990

We can refine our results even further with optional keyword arguments. The following parameters are available:

  • radius: For positional searches only. Only return results within a certain distance from an object or set of coordinates. Default is 3 arcminutes.

  • limit: The maximum number of results to return. Default is 5000.

  • offset: Skip the first n results. Useful for paging through results.

  • sort_by: A list of field names to sort by.

  • sort_desc: A list of booleans (one for each field specified in sort_by), describing if each field should be sorted in descending order (True) or ascending order (False)

  • select_cols: A list of columns to be returned in the response.

As we walk through different types of queries, we will see these parameters in action!

Query by Object Name#

We’ve reached our first query! We can use object names to perform metadata queries using the query_object function.

To start, let’s query for the Messier 1 object, a supernova remnant in the Taurus constellation. You may know it better as the Crab Nebula!

# Query for Messier 1 ('M1')
results = missions.query_object('M1')

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]
Total number of results: 741
Table masked=True length=5
search_possci_data_set_namesci_targnamesci_hapnumsci_haspnumsci_instrumesci_aper_1234sci_spec_1234sci_actual_durationsci_start_timesci_pep_idsci_pi_last_namesci_rasci_decsci_refnumsci_central_wavelengthsci_release_datesci_stop_timesci_preview_namescp_scan_typesci_hlspang_sep
str15str9str30int64int64str6str17str26float64str26int64str12float64float64int64float64str26str26str9str18int64str19
83.6324 22.0174J9FX01041CRAB10ACSWFC1-POL120VF606W;POL120V2300.02005-09-06T21:26:46.43300010526HESTER83.6337538176822.0185914189555934.78962006-09-07T05:15:19.8030002005-09-06T22:08:13.497000J9FX01041----0.10383123118898147
83.6324 22.0174J9FX02041CRAB10ACSWFC1-POL120VF606W;POL120V2300.02005-09-15T22:52:34.47000010526HESTER83.6337539443322.0185915334555934.78862006-09-16T09:52:13.4170002005-09-15T23:34:01.537000J9FX02041----0.1038410702845166
83.6324 22.0174J9FX03041CRAB10ACSWFC1-POL120VF606W;POL120V2300.02005-09-25T21:05:47.46300010526HESTER83.6337540831522.018591658155934.78712006-09-26T06:08:15.5000002005-09-25T21:47:14.527000J9FX03041----0.10385181961996841
83.6324 22.0174J9FX04041CRAB10ACSWFC1-POL120VF606W;POL120V2300.02005-10-02T22:33:28.45700010526HESTER83.6337541870222.0185917507855934.78662006-10-03T07:13:23.7870002005-10-02T23:14:55.550000J9FX04041----0.10385983838178592
83.6324 22.0174J9FX05041CRAB10ACSWFC1-POL120VF606W;POL120V2000.02005-10-12T20:52:39.50300010526HESTER83.6337543475422.0185918930555934.78522006-10-13T02:32:04.7100002005-10-12T21:29:06.567000J9FX05041----0.10387219098736124

There were over 600 total results, meaning that hundreds of HST datasets were targeting the Crab Nebula. Now, let’s try refining our search a bit more.

  • Each dataset is associated with a celestial coordinate, given by sci_ra (right ascension) and sci_dec (declination). By default, the query returns all datasets that fall within 3 arcminutes from the object’s coordinates. Let’s set the radius parameter to be 1 arcminute instead.

  • Say that we’re not interested in the first 4 results. We can assign offset to skip a certain number of rows.

  • By default, a subset of recommended columns are returned for each query. However, we can specify exactly which columns to return using the select_cols keyword argument. Certain columns are included automatically, depending on the mission.

# Refined query for Messier 1 ('M1')
results = missions.query_object('M1',
                                radius=1,  # Search within a 1 arcminute radius
                                offset=4,  # Skip the first 4 results
                                select_cols=['sci_start_time', 'sci_pi_last_name'])  # Select certain columns

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]
Total number of results: 453
Table masked=True length=5
search_possci_data_set_namesci_targnamesci_start_timesci_pi_last_nameang_sep
str15str9str22str26str12str19
83.6324 22.0174J9FX05041CRAB2005-10-12T20:52:39.503000HESTER0.10387219098736124
83.6324 22.0174J9FX06041CRAB2005-10-22T20:40:28.510000HESTER0.10388689143244417
83.6324 22.0174J9FX07041CRAB2005-10-30T20:36:14.527000HESTER0.10390159755961832
83.6324 22.0174J9FX01011CRAB2005-09-06T16:41:38.433000HESTER0.10392027927762577
83.6324 22.0174J9FX01021CRAB2005-09-06T18:16:07.427000HESTER0.10392027927762577

Query by Region#

The missions object also allows us to query by a region in the sky. By passing in a set of coordinates to the query_region function, we can return datasets that fall within a certain radius value of that point. This type of search is also known as a cone search.

# Create coordinate object
coords = SkyCoord(210.80227, 54.34895, unit=('deg'))

# Query for results within 10 arcseconds of coordinates
results = missions.query_region(coords, 
                                radius=10 * u.arcsec)

# Display results
print(f'Total number of results: {len(results)}')
results[:5]
Total number of results: 19
Table masked=True length=5
search_possci_data_set_namesci_targnamesci_hapnumsci_haspnumsci_instrumesci_aper_1234sci_spec_1234sci_actual_durationsci_start_timesci_pep_idsci_pi_last_namesci_rasci_decsci_refnumsci_central_wavelengthsci_release_datesci_stop_timesci_preview_namescp_scan_typesci_hlspang_sep
str18str9str16int64int64str6str4str5float64str26int64str8float64float64int64float64str19str26str9str1int64str20
210.80227 54.34895OBQU01050NUCLEUS+HODGE60200STIS52X2G140L186.02012-05-24T07:51:40.55300012556GORDON210.801851267654.3487915152611425.02013-05-24T10:09:142012-05-24T07:54:46.553000OBQU01050----0.017460048037303017
210.80227 54.34895OBQU010H0NUCLEUS+HODGE60200STIS52X2G230L186.02012-05-24T09:17:38.57000012556GORDON210.801851267654.3487915152612376.02013-05-24T13:15:152012-05-24T09:20:44.570000OBQU010H0----0.017460048037303017
210.80227 54.34895OBQU01030NUCLEUS+HODGE60200STIS52X2G140L186.02012-05-24T07:43:20.55300012556GORDON210.802000295854.3492839127211425.02013-05-24T10:08:322012-05-24T07:46:26.553000OBQU01030----0.022143836477276503
210.80227 54.34895OBQU010F0NUCLEUS+HODGE60200STIS52X2G230L186.02012-05-24T09:09:18.57000012556GORDON210.802000295854.3492839127212376.02013-05-24T13:15:052012-05-24T09:12:24.570000OBQU010F0----0.022143836477276503
210.80227 54.34895W1000501TNGC5457-NUC00WFPCP6F555W100.01992-07-16T06:31:16.5170003639WESTPHAL210.80286554.34933666666667145479.01993-07-16T13:33:261992-07-16T06:32:56.517000W1000501T----0.031163986021028572

The above datasets fall within our cone search. In other words, their target coordinates are within 10 arcseconds of the coordinates that we defined.

Query by Criteria#

In some cases, we may want to run queries with non-positional parameters. To accomplish this, we use the query_criteria function.

For any of our query functions, we can filter our results by the value of columns in the dataset.

Let’s say that we want observations from HST’s Wide Field Camera 3 (WFC3) instument that use the F555W filter. We are only interested in datasets connected to proposal number 15879.

# Query with column criteria
results = missions.query_criteria(sci_instrume='WFC3',  # From Wide Field Camera 3
                                  sci_spec_1234='F555W',  # Uses F555W filter
                                  sci_pep_id=15879,  # Proposal number 15879
                                  select_cols=['sci_instrume', 'sci_spec_1234', 'sci_pep_id', 'sci_pi_last_name'])

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]
Total number of results: 98
Table masked=True length=5
sci_data_set_namesci_instrumesci_spec_1234sci_pep_idsci_pi_last_name
str9str6str5int64str5
IE37NAHHQWFC3F555W15879RIESS
IE37NAHIQWFC3F555W15879RIESS
IE37NBARQWFC3F555W15879RIESS
IE37NBASQWFC3F555W15879RIESS
IE37NBATQWFC3F555W15879RIESS

To exclude and filter out a certain value from the results, we can prepend the value with !.

Let’s run the same query as above, but this time, we will filter out datasets that use the F555W filter.

# Filtered query, excluding datasets using F555W filter
results = missions.query_criteria(sci_instrume='WFC3', 
                                  sci_spec_1234='!F555W',  # Excludes datasets that use F555W filter
                                  sci_pep_id=15879,
                                  select_cols=['sci_instrume', 'sci_spec_1234', 'sci_pep_id', 'sci_pi_last_name'])

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]
Total number of results: 236
Table masked=True length=5
sci_data_set_namesci_instrumesci_spec_1234sci_pep_idsci_pi_last_name
str9str6str5int64str5
IE37NAHAQWFC3F153M15879RIESS
IE37NAHEQWFC3F153M15879RIESS
IE37NAHFQWFC3F160W15879RIESS
IE37NAHGQWFC3F814W15879RIESS
IE37NAHJQWFC3F814W15879RIESS

We can also use wildcards on string criteria for more advanced filtering. Wildcards are special characters used in search patterns to represent one or more unknown characters, allowing for flexible matching of strings. The wildcard character is *: it replaces any number of characters preceding, following, or in between the existing characters, depending on its placement.

Let’s use the same query from above, but we will add the condition that the target name must contain the string “GEM”.

# Filtered query with wildcard
results = missions.query_criteria(sci_instrume='WFC3', 
                                  sci_spec_1234='!F555W',
                                  sci_pep_id=15879,
                                  sci_targname='*GEM*',  # Must contain the string 'GEM'
                                  select_cols=['sci_instrume', 'sci_spec_1234', 'sci_pep_id', 'sci_pi_last_name'])

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]
Total number of results: 5
Table masked=True length=5
sci_data_set_namesci_targnamesci_instrumesci_spec_1234sci_pep_idsci_pi_last_name
str9str7str6str5int64str5
IE37OBBWQV-W-GEMWFC3F153M15879RIESS
IE37OBCGQV-W-GEMWFC3F153M15879RIESS
IE37OBCHQV-W-GEMWFC3F160W15879RIESS
IE37OBCIQV-W-GEMWFC3F814W15879RIESS
IE37OBCLQV-W-GEMWFC3F814W15879RIESS

To filter by multiple values for a single column, we use a string of the values delimited by commas.

To illustrate this, we will use a slightly different query. We query for WFC3 datasets from proposal 15879 that use either the F153M filter or the F160W filter.

# Filtered query with multiple values
results = missions.query_criteria(sci_instrume='WFC3', 
                                  sci_spec_1234='F153M, F160W',  # Uses either F153M filter OR F160W filter
                                  sci_pep_id=15879,
                                  select_cols=['sci_instrume', 'sci_spec_1234', 'sci_pep_id', 'sci_pi_last_name'])

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]
Total number of results: 138
Table masked=True length=5
sci_data_set_namesci_instrumesci_spec_1234sci_pep_idsci_pi_last_name
str9str6str5int64str5
IE37NAHAQWFC3F153M15879RIESS
IE37NAHEQWFC3F153M15879RIESS
IE37NAHFQWFC3F160W15879RIESS
IE37NBAMQWFC3F160W15879RIESS
IE37NBANQWFC3F160W15879RIESS

For columns with numeric or date values, we can filter using comparison values:

  • <: Return values less than or before the given number/date

  • >: Return values greater than or after the given number/date

  • <=: Return values less than or equal to the given number/date

  • >=: Return values greater than or equal to the given number/date

As an example, let’s write a query to return all datasets with an observation date before May 1, 1990. These were some of Hubble’s first observations! We’ll use the optional sort_by and sort_desc keywords to sort our results in reverse chronological order.

# Query using comparison operator
results = missions.query_criteria(sci_start_time='<1990-05-01',  # Must be observed before May 1, 1990
                                  select_cols=['sci_start_time', 'sci_pep_id'],
                                  sort_by=['sci_start_time'],  # Sort by observation start time
                                  sort_desc=[True])  # Sort in descending order

# Display the first 10 results
print(f'Total number of results: {len(results)}')
results[:10]
Total number of results: 196
Table masked=True length=10
sci_data_set_namesci_start_timesci_pep_id
str9str26int64
W0340Y01R1990-04-24T23:18:15.2530001476
X1680B01T1990-04-24T23:17:59.0530004107
X1681201T1990-04-24T23:17:59.0530004107
X16I1N01T1990-04-24T23:17:59.0530003801
X14W0401T1990-04-24T23:17:59.0530003504
X14W0402T1990-04-24T23:17:59.0530003504
X14W0403T1990-04-24T23:17:59.0530003504
X14W0404T1990-04-24T23:17:59.0530003504
X14W0405T1990-04-24T23:17:59.0530003504
X14W0406T1990-04-24T23:17:59.0530003504

For numeric or date data types, we can also filter with ranges. This requires the following syntax: '#..#'.

Let’s write a query that uses range syntax to return datasets that have an exposure time between 5000 and 5005 seconds.

# Query using range operator
results = missions.query_criteria(sci_actual_duration='5000..5005',  # Exposure duration is between 5000 and 5005 seconds
                                  select_cols=['sci_pep_id', 'sci_actual_duration'])

# Display results
print(f'Total number of results: {len(results)}')
results[:10]
Total number of results: 133
Table masked=True length=10
sci_data_set_namesci_actual_durationsci_pep_id
str9float64int64
LC12030205000.76812936
J95S150105000.010418
IC2B020905005.012875
IC3A050405000.013018
IC3A060405000.013018
IC3A070405000.013018
IE9T010205000.016274
IE9T020205000.016274
IE9T030205000.016274
IE9T040205000.016274

Wow, there’s a lot of tips and tricks for writing queries! Here’s a quick summary:

  • To exclude and filter out a certain value from the results, prepend the value with !.

  • Wildcards are special characters used in search patterns to represent one or more unknown characters, allowing for flexible matching of strings. The wildcard character is * and it replaces any number of characters preceding, following, or in between existing characters, depending on its placement.

  • To filter by multiple values for a single column, use a string of values delimited by commas.

  • For columns with numeric or date data types, filter using comparison values (<, >, <=, >=).

  • For columns with numeric or date data types, select a range with the syntax '#..#'.

Getting Data Products#

Performing a Product Query#

Each observation returned from a MAST query can have one or more associated data products. For example, a JWST observation might return an uncalibrated file, a guide-star file, and the actual science data.

For reproducibility, we’ll run another criteria query for datasets that use Hubble’s Advanced Camera for Surveys (ACS) instrument. We are interested in datasets connected to proposal number 12451 that are associated with at least one High Level Science Product.

# Query using range operator
datasets = missions.query_criteria(sci_pep_id=12451,  # Proposal number 12451
                                   sci_instrume='ACS',  # Use ACS instrument
                                   sci_hlsp='>1')  # Associated with at least one HLSP

# Display results
print(f'Total number of results: {len(datasets)}')
datasets[:5]
Total number of results: 29
Table masked=True length=5
sci_data_set_namesci_targnamesci_hapnumsci_haspnumsci_instrumesci_aper_1234sci_spec_1234sci_actual_durationsci_start_timesci_pep_idsci_pi_last_namesci_rasci_decsci_refnumsci_central_wavelengthsci_release_datesci_stop_timesci_preview_namescp_scan_typesci_hlsp
str9str18int64int64str6str7str14float64str26int64str7float64float64int64float64str19str26str9str1int64
JBTAA0010ABELL20910ACSWFC-FIXF625W;CLEAR2L1032.02012-06-28T14:11:54.65300012451POSTMAN22.96891666667-13.61122222222366311.84812012-06-28T17:17:472012-06-28T14:31:45.717000JBTAA0010--12
JBTAA0020ABELL20910ACSWFC-FIXF850LP;CLEAR2L1035.02012-06-28T14:34:44.67000012451POSTMAN22.96887414019-13.61118826361369031.4592012-06-28T17:47:502012-06-28T14:55:15.703000JBTAA0020--12
JBTAA1010ABELL20910ACSWFC-FIXF475W;CLEAR2L1032.02012-06-28T15:47:39.67700012451POSTMAN22.96891666667-13.61122222222364746.95652012-06-28T19:30:262012-06-28T16:07:30.710000JBTAA1010--12
JBTAA1020ABELL20910ACSWFC-FIXF775W;CLEAR2L1031.02012-06-28T16:10:33.66000012451POSTMAN22.96887414019-13.61118826361367693.46882012-06-28T19:10:462012-06-28T16:31:00.693000JBTAA1020--12
JBTAA3010ABELL20910ACSWFC-FIXF606W;CLEAR2L1032.02012-07-15T16:19:51.68000012451POSTMAN22.96891666667-13.61122222222365921.89112012-07-15T19:08:212012-07-15T16:39:42.713000JBTAA3010--12

The get_product_list function accepts a table of datasets or a list of dataset IDs and returns a table containing the associated data products. Let’s fetch the data products for the first three datasets in the table above.

# Get a list of data products
products = missions.get_product_list(datasets[:3])

# Display results
print(f'Total number of products: {len(products)}')
products[:5]
Total number of products: 309
Table length=5
product_keyaccessdatasetinstrument_namefiltersfilenameuriauthz_primary_identifierauthz_secondary_identifierfile_suffixcategorysizetype
str66str6str9str6str14str56str66str42str3str15str14int64str9
JBTAA0010_jbtaa0010_asn.fitsPUBLICJBTAA0010ACSF625W;CLEAR2Ljbtaa0010_asn.fitsJBTAA0010/jbtaa0010_asn.fitsJBTAA0010ASNASNAUX11520science
JBTAA0010_jbtaa0010_trl.fitsPUBLICJBTAA0010ACSF625W;CLEAR2Ljbtaa0010_trl.fitsJBTAA0010/jbtaa0010_trl.fitsJBTAA0010CALTRLAUX679680science
JBTAA0010_jbtaa0010_drc.fitsPUBLICJBTAA0010ACSF625W;CLEAR2Ljbtaa0010_drc.fitsJBTAA0010/jbtaa0010_drc.fitsJBTAA0010CALDRCCALIBRATED214660800science
JBTAA0010_jbtaa0010_spt.fitsPUBLICJBTAA0010ACSF625W;CLEAR2Ljbtaa0010_spt.fitsJBTAA0010/jbtaa0010_spt.fitsJBTAA0010CALSPTUNCALIBRATED86400science
JBTAA0010_jbtaa0010_log.txtPUBLICJBTAA0010ACSF625W;CLEAR2Ljbtaa0010_log.txtJBTAA0010/jbtaa0010_log.txtJBTAA0010CALLOGAUX224739science

Some products can be associated with multiple datasets, and this table may contain duplicates. To return a list of products with only unique filenames, use the get_unique_product_list function.

# Get products with unique filenames
unique_products = missions.get_unique_product_list(datasets[:3])

# Display results
unique_products[:5]
INFO: 26 of 309 products were duplicates. Only returning 283 unique product(s). [astroquery.mast.utils]
INFO: To return all products, use `MastMissions.get_product_list` [astroquery.mast.missions]
Table length=5
product_keyaccessdatasetinstrument_namefiltersfilenameuriauthz_primary_identifierauthz_secondary_identifierfile_suffixcategorysizetype
str66str6str9str6str14str56str66str42str3str15str14int64str9
JBTAA0010_17717071j_osc.fitsPUBLICJBTAA0010ACSF625W;CLEAR2L17717071j_osc.fitsJBTAA0010/17717071j_osc.fitsOSCREFERENCE17280reference
JBTAA0010_25g1256nj_bpx.fitsPUBLICJBTAA0010ACSF625W;CLEAR2L25g1256nj_bpx.fitsJBTAA0010/25g1256nj_bpx.fitsBPXREFERENCE23040reference
JBTAA0010_37g1550cj_mdz.fitsPUBLICJBTAA0010ACSF625W;CLEAR2L37g1550cj_mdz.fitsJBTAA0010/37g1550cj_mdz.fitsMDZREFERENCE247680reference
JBTAA0010_4af1559ij_imp.fitsPUBLICJBTAA0010ACSF625W;CLEAR2L4af1559ij_imp.fitsJBTAA0010/4af1559ij_imp.fitsIMPREFERENCE953280reference
JBTAA0010_4bb1536cj_idc.fitsPUBLICJBTAA0010ACSF625W;CLEAR2L4bb1536cj_idc.fitsJBTAA0010/4bb1536cj_idc.fitsIDCREFERENCE285120reference

Filtering Data Products#

These datasets returned quite a few products! We are not interested in all of them, and luckily, we have a handy function to filter them for us. filter_products allows you to filter based on file extension (extension) and any other of the product fields.

A quick note on filtering: the AND operation is performed for a list of filters, and the OR operation is performed within a filter set. For example, the filter below will return FITS products that are “science” type and have a file_suffix of “ASN” (association files) or “JIF” (jitter information files).

# Filter products 
filtered = missions.filter_products(products,
                                    extension='fits',  # FITS file extension
                                    type='science',  # Science data
                                    file_suffix=['ASN', 'JIF'])  # Association files OR jitter information files

# Display results
filtered
Table length=6
product_keyaccessdatasetinstrument_namefiltersfilenameuriauthz_primary_identifierauthz_secondary_identifierfile_suffixcategorysizetype
str66str6str9str6str14str56str66str42str3str15str14int64str9
JBTAA0010_jbtaa0010_asn.fitsPUBLICJBTAA0010ACSF625W;CLEAR2Ljbtaa0010_asn.fitsJBTAA0010/jbtaa0010_asn.fitsJBTAA0010ASNASNAUX11520science
JBTAA0010_jbtaa0010_jif.fitsPUBLICJBTAA0010ACSF625W;CLEAR2Ljbtaa0010_jif.fitsJBTAA0010/jbtaa0010_jif.fitsJBTAA0010OMSJIFJITTER/SUPPORT60480science
JBTAA0020_jbtaa0020_asn.fitsPUBLICJBTAA0020ACSF850LP;CLEAR2Ljbtaa0020_asn.fitsJBTAA0020/jbtaa0020_asn.fitsJBTAA0020ASNASNAUX11520science
JBTAA0020_jbtaa0020_jif.fitsPUBLICJBTAA0020ACSF850LP;CLEAR2Ljbtaa0020_jif.fitsJBTAA0020/jbtaa0020_jif.fitsJBTAA0020OMSJIFJITTER/SUPPORT60480science
JBTAA1010_jbtaa1010_asn.fitsPUBLICJBTAA1010ACSF475W;CLEAR2Ljbtaa1010_asn.fitsJBTAA1010/jbtaa1010_asn.fitsJBTAA1010ASNASNAUX11520science
JBTAA1010_jbtaa1010_jif.fitsPUBLICJBTAA1010ACSF475W;CLEAR2Ljbtaa1010_jif.fitsJBTAA1010/jbtaa1010_jif.fitsJBTAA1010OMSJIFJITTER/SUPPORT60480science

Downloading Products#

The download_products function accepts a table of products like the one above and will download the products to your local machine. By default, products will be downloaded into the current working directory, in a subdirectory called mastDownload. The full local filepaths will have the form mastDownload/<mission>/<Dataset ID>/file. You can change the download directory using the download_dir parameter.

# Download products using filtered product Table
manifest = missions.download_products(filtered[:2])

# Display results
manifest
Downloading URL https://mast.stsci.edu/search/hst/api/v0.1/retrieve_product?product_name=JBTAA0010%2Fjbtaa0010_asn.fits to mastDownload/hst/JBTAA0010/jbtaa0010_asn.fits ...
 [Done]
Downloading URL https://mast.stsci.edu/search/hst/api/v0.1/retrieve_product?product_name=JBTAA0010%2Fjbtaa0010_jif.fits to mastDownload/hst/JBTAA0010/jbtaa0010_jif.fits ...
 [Done]
Table length=2
Local PathStatusMessageURL
objectstr8objectobject
mastDownload/hst/JBTAA0010/jbtaa0010_asn.fitsCOMPLETENoneNone
mastDownload/hst/JBTAA0010/jbtaa0010_jif.fitsCOMPLETENoneNone

For a more streamlined workflow, the function also accepts dataset IDs and product filters.

# Download products using dataset IDs and product filters
manifest = missions.download_products(['JBTAA0010', 'JBTAA0020'],
                                      extension='fits',
                                      type='science',
                                      file_suffix=['ASN', 'JIF'])

# Display results
manifest
INFO: Found cached file mastDownload/hst/JBTAA0010/jbtaa0010_asn.fits with expected size 11520. [astroquery.query]
INFO: Found cached file mastDownload/hst/JBTAA0010/jbtaa0010_jif.fits with expected size 60480. [astroquery.query]
Downloading URL https://mast.stsci.edu/search/hst/api/v0.1/retrieve_product?product_name=JBTAA0020%2Fjbtaa0020_asn.fits to mastDownload/hst/JBTAA0020/jbtaa0020_asn.fits ...
 [Done]
Downloading URL https://mast.stsci.edu/search/hst/api/v0.1/retrieve_product?product_name=JBTAA0020%2Fjbtaa0020_jif.fits to mastDownload/hst/JBTAA0020/jbtaa0020_jif.fits ...
 [Done]
Table length=4
Local PathStatusMessageURL
objectstr8objectobject
mastDownload/hst/JBTAA0010/jbtaa0010_asn.fitsCOMPLETENoneNone
mastDownload/hst/JBTAA0010/jbtaa0010_jif.fitsCOMPLETENoneNone
mastDownload/hst/JBTAA0020/jbtaa0020_asn.fitsCOMPLETENoneNone
mastDownload/hst/JBTAA0020/jbtaa0020_jif.fitsCOMPLETENoneNone

To download a single data product file, use the download_file function with a MAST URI as input. The default is to download the file to the current working directory, but you can specify the download directory or filepath with the local_path keyword argument.

# Download a single data product
result = missions.download_file('JBTAA0010/jbtaa0010_asn.fits')

# Display result
result
Downloading URL https://mast.stsci.edu/search/hst/api/v0.1/retrieve_product?product_name=JBTAA0010%2Fjbtaa0010_asn.fits to jbtaa0010_asn.fits ...
 [Done]
('COMPLETE', None, None)

Exclusive Data Access#

Some data may not be publicly available and will require authentication and authorization. To download proprietary data with Astroquery, you will need a MyST Account with proper permissions. You will also need to provide an API token.

You can use the login function to authenticate yourself. After uncommenting and executing the following cell, you should be prompted to enter your token.

# missions.login()

You can also provide a token to a MastMissions object upon initialization using the mast_token parameter. However, remember to be cautious with your API token. You should not share the token or check it into source control. For the best security, we recommend using the login method to authenticate yourself.

Switching Missions#

As mentioned previously, each MastMissions object can only make queries and download products from a single collection at a time. This collection can be modified with the mission class attribute, which is case-insensitive. This allows users to query multiple collections with the same object.

To demonstrate, we’ll create a new MastMissions object and initialize the mission to be 'JWST'. This will perform queries on dataset metadata from the James Webb Space Telescope.

multi_mission = MastMissions(mission='JWST')
multi_mission.mission
'jwst'

Next, we’ll query for JWST datasets around NGC 346, a young star cluster in the Small Magellanic Cloud. We’ll use a radius of 0.2 arcminutes.

# Query JWST for NGC 346
results = multi_mission.query_object('NGC 346',
                                     radius=0.2)  # Search within a 0.2 arcminute radius

# Display results
print(f'Total number of datasets: {len(results)}')
results[:5]
Total number of datasets: 169
Table masked=True length=5
ArchiveFileIDfileSetNameproductLeveltargproptarg_ratarg_decinstrumeexp_typeopticalElementsdate_obsdurationprogramobservtnvisitpublicReleaseDatepi_nameproposal_typeproposal_cycletargtypeaccessang_sep
int64str25str14str22float64float64str7str13str76str27float64int64int64int64str19str17str3int64str5str16float64
143912893jw01227001003_02101_000011b, 2a, 2b, 2cNGC-34614.77060458333333-72.16920833333336MIRIMIR_IMAGEF770W2022-10-10T07:43:23.133000077.7011227132023-10-11T03:40:56Meixner, MargaretGTO1FIXEDPUBLIC0.0
143912790jw01227001003_02101_000021b, 2a, 2b, 2cNGC-34614.77060458333333-72.16920833333336MIRIMIR_IMAGEF770W2022-10-10T07:46:04.093000077.7011227132023-10-11T03:39:18Meixner, MargaretGTO1FIXEDPUBLIC0.0
143912742jw01227001003_02101_000031b, 2a, 2b, 2cNGC-34614.77060458333333-72.16920833333336MIRIMIR_IMAGEF770W2022-10-10T07:48:42.301000077.7011227132023-10-11T03:39:23Meixner, MargaretGTO1FIXEDPUBLIC0.0
143912909jw01227001003_02101_000041b, 2a, 2b, 2cNGC-34614.77060458333333-72.16920833333336MIRIMIR_IMAGEF770W2022-10-10T07:51:20.445000077.7011227132023-10-11T03:40:28Meixner, MargaretGTO1FIXEDPUBLIC0.0
143912806jw01227001003_02103_000011b, 2a, 2b, 2cNGC-34614.77060458333333-72.16920833333336MIRIMIR_IMAGEF1000W2022-10-10T07:56:20.165000099.9011227132023-10-11T03:43:08Meixner, MargaretGTO1FIXEDPUBLIC0.0

This query returned over 160 JWST datasets. Now, let’s try it with a different data collection. We’ll reassign the mission attribute on the multi_mission object to be 'ullyses' and run the same query.

multi_mission.mission = 'ullyses'
multi_mission.mission
'ullyses'
# Query ULLYSES for NGC 346
results = multi_mission.query_object('NGC 346',
                                     radius=0.2)  # Search within a 0.2 arcminute radius

# Display results
print(f'Total number of datasets: {len(results)}')
results[:5]
Total number of datasets: 2
Table masked=True length=2
search_postarget_idnames_searchtarget_name_hlspsimbad_linktarget_classificationtarg_ratarg_dechost_galaxy_namespectral_typebmv0_magu_magb_magv_maggaia_g_mean_magstar_massinstrumentgratingfilterobservation_idang_sep
str17int64str48str14str66str12float64float64str3str4float64float64float64float64float64float64str4str5str1str39str19
14.76833 -72.177555NGC346 MPG 396,Cl* NGC346 MPG 396,NGC346-MPG-396NGC346-MPG-396https://simbad.u-strasbg.fr/simbad/sim-id?Ident=Cl*+NGC346+MPG+396Mid O Dwarf14.762184457428036-72.17637442586573SMCO7 V-0.2713.0914.1714.3914.382007--COSG160M--hlsp_ullyses_hst_cos_ngc346-mpg-396_uv0.13152388757468492
14.76833 -72.177557NGC346 MPG 487,Cl* NGC346 MPG 487,NGC346-MPG-487NGC346-MPG-487https://simbad.u-strasbg.fr/simbad/sim-id?Ident=Cl*+NGC346+MPG+487Late O Dwarf14.778136436564196-72.17813531865009SMCO8 V-0.2713.314.3114.5314.49983725.7STISE140M--hlsp_ullyses_hst_stis_ngc346-mpg-487_uv0.18407398702435046

Notice that this query returned only a few datasets. The result tables also look very different in terms of data and column keywords. This is because each query is being performed on a different data collection!

Exercises#

Exercise 1: It’s time to apply all that you’ve learned and try your hand at writing a MastMissions query! Write a non-positional query based on the following:

  • Image observations

  • Instrument should NOT include the Cosmic Origins Spectrograph (COS)

  • Filter used is F150W, F105W, or F110W

  • Declination is greater than 0 degrees

  • Exposure time is between 1000 and 2000 seconds

  • Target name contains the string “GAL”

  • Skip the first 5 entries

  • Sort by exposure time in descending order

  • Limit the results to 3 datasets

# A non-positional query with column criteria
# results = missions.query_criteria(...)  # Write your query here!

# Display results
# results

Exercise 2: Using your results from Exercise 1, download the association table data products for the 3 datasets (HINT: file_suffix = 'ASN'). You can fetch, filter, and download the products as three separate steps, or use the streamlined workflow built in to download_products.

# Fetch products from 3 datasets
# products = missions.get_product_list(...)

# Filter products
# filtered = missions.filter_products(...)

# Download products
# missions.download_products(...)

Exercise 3: Use a new MastMissions object and the mission attribute to search for datasets around the coordinate “22h57m39s -29d37m20s” from both HST and JWST. Use a radius of 0.1 arcminutes.

# Create new MastMissions object
#m = MastMissions()

# Create sky coordinate object
#coord = SkyCoord(...)

# Query HST metadata for region
#results = m.query_region(...)

# Display the first 5 results
#print(f'Total number of datasets: {len(results)}')
#results[:5]
# Switch mission to JWST
# ...

# Query JWST metadata for region
#results = m.query_region(...)

# Display the first 5 results
#print(f'Total number of datasets: {len(results)}')
#results[:5]

Exercise Solutions#

Exercise 1:

# A non-positional query with column criteria
results = missions.query_criteria(sci_obs_type='IMAGE',
                                  sci_instrume='!COS',
                                  sci_spec_1234='F150W, F105W, F110W',
                                  sci_dec='>0',
                                  sci_actual_duration='1000..2000',
                                  sci_targname='*GAL*',
                                  offset=5,
                                  sort_by=['sci_actual_duration'],
                                  sort_desc=[True],
                                  limit=3)

# Display results
results
WARNING: MaxResultsWarning: Maximum results returned, may not include all sources within radius. [astroquery.mast.missions]
Table masked=True length=3
sci_data_set_namesci_targnamesci_hapnumsci_haspnumsci_instrumesci_aper_1234sci_spec_1234sci_actual_durationsci_start_timesci_pep_idsci_pi_last_namesci_rasci_decsci_refnumsci_central_wavelengthsci_release_datesci_stop_timesci_preview_namescp_scan_typesci_hlsp
str9str27int64int64str6str6str5float64str26int64str6float64float64int64float64str19str26str9str1int64
N4A705010GAL-CLUS-0026+1653-ARCC00NICMOSNIC1F110W1151.90381998-01-08T03:44:07.4900007425TURNER6.65711805012917.16004527511011292.41999-01-08T19:47:331998-01-08T04:04:06.490000N4A705010----
N4A703010GAL-CLUS-0026+1653-ARCD00NICMOSNIC1F110W1151.90381997-10-31T14:20:31.4930007425TURNER6.6428944036417.16556874169011292.41998-10-31T21:30:521997-10-31T14:40:30.493000N4A703010----
IF9V31030VIRGO-INTERGALACTIC-FIELD-E10WFC3IR-FIXF110W1058.8055432023-12-14T15:40:25.30700017510GREGG187.689691666712.99041111111011534.4592023-12-14T21:58:472023-12-14T16:08:42.237000IF9V31030----

Exercise 2:

# As 3 separate steps
# Fetch products from first 3 datasets
products = missions.get_product_list(results)

# Filter products
filtered = missions.filter_products(products,
                                    file_suffix='ASN')

# Download products
missions.download_products(filtered)
WARNING: MaxResultsWarning: Maximum results returned, may not include all sources within radius. [astroquery.mast.missions]
Downloading URL https://mast.stsci.edu/search/hst/api/v0.1/retrieve_product?product_name=IF9V31030%2Fif9v31030_asn.fits to mastDownload/hst/IF9V31030/if9v31030_asn.fits ...
 [Done]
Downloading URL https://mast.stsci.edu/search/hst/api/v0.1/retrieve_product?product_name=N4A703010%2Fn4a703010_asn.fits to mastDownload/hst/N4A703010/n4a703010_asn.fits ...
 [Done]
Downloading URL https://mast.stsci.edu/search/hst/api/v0.1/retrieve_product?product_name=N4A705010%2Fn4a705010_asn.fits to mastDownload/hst/N4A705010/n4a705010_asn.fits ...
 [Done]
Table length=3
Local PathStatusMessageURL
objectstr8objectobject
mastDownload/hst/IF9V31030/if9v31030_asn.fitsCOMPLETENoneNone
mastDownload/hst/N4A703010/n4a703010_asn.fitsCOMPLETENoneNone
mastDownload/hst/N4A705010/n4a705010_asn.fitsCOMPLETENoneNone
# Streamlined
missions.download_products(results['sci_data_set_name'].tolist(),
                           file_suffix='ASN')
WARNING: MaxResultsWarning: Maximum results returned, may not include all sources within radius. [astroquery.mast.missions]
WARNING: MaxResultsWarning: Maximum results returned, may not include all sources within radius. [astroquery.mast.missions]
WARNING: MaxResultsWarning: Maximum results returned, may not include all sources within radius. [astroquery.mast.missions]
INFO: Found cached file mastDownload/hst/IF9V31030/if9v31030_asn.fits with expected size 11520. [astroquery.query]
INFO: Found cached file mastDownload/hst/N4A703010/n4a703010_asn.fits with expected size 11520. [astroquery.query]
INFO: Found cached file mastDownload/hst/N4A705010/n4a705010_asn.fits with expected size 11520. [astroquery.query]
Table length=3
Local PathStatusMessageURL
objectstr8objectobject
mastDownload/hst/IF9V31030/if9v31030_asn.fitsCOMPLETENoneNone
mastDownload/hst/N4A703010/n4a703010_asn.fitsCOMPLETENoneNone
mastDownload/hst/N4A705010/n4a705010_asn.fitsCOMPLETENoneNone

Exercise 3:

# Create new MastMissions object
m = MastMissions()

# Create sky coordinate object
coord = SkyCoord('22h57m39s -29d37m20s')

# Query HST metadata for region
results = m.query_region(coord,
                         radius=0.1)

# Display the first 5 results
print(f'Total number of datasets: {len(results)}')
results[:5]
Total number of datasets: 476
Table masked=True length=5
search_possci_data_set_namesci_targnamesci_hapnumsci_haspnumsci_instrumesci_aper_1234sci_spec_1234sci_actual_durationsci_start_timesci_pep_idsci_pi_last_namesci_rasci_decsci_refnumsci_central_wavelengthsci_release_datesci_stop_timesci_preview_namescp_scan_typesci_hlspang_sep
str18str9str19int64int64str6str12str13float64str26int64str9float64float64int64float64str26str26str9str1int64str20
344.4125 -29.62222N4N92Q040HD21695600NICMOSNIC2-FIXF222M255.9231998-08-06T01:24:53.7330007894HENRY344.4125879875-29.62214772201822181.6992000-07-04T20:12:321998-08-06T01:29:20.733000N4N92Q040----0.006314126839181508
344.4125 -29.62222N4N92Q030HD21695600NICMOSNIC2-FIXF207M255.9231998-08-06T01:19:59.7330007894HENRY344.4125879884-29.62214772055820824.12000-07-04T20:12:321998-08-06T01:24:26.733000N4N92Q030----0.00631422112268177
344.4125 -29.62222N4N92Q020HD21695600NICMOSNIC2-FIXF180M127.928681998-08-06T01:17:16.7330007894HENRY344.412587989-29.62214771973817971.12000-07-04T20:12:321998-08-06T01:19:35.733000N4N92Q020----0.006314277659432727
344.4125 -29.62222N4N92Q010HD21695600NICMOSNIC2-FIXF110W127.928681998-08-06T01:14:30.7330007894HENRY344.4125879895-29.62214771891811284.72000-07-04T20:12:321998-08-06T01:16:49.733000N4N92Q010----0.006314330409832213
344.4125 -29.62222Z3EJ0102THD21695600HRS2.0MIRROR-A10.01996-09-30T08:18:34.9370006627LALLEMENT344.4123659004-29.62212099830--1997-09-30T22:45:421996-09-30T08:22:46.937000------0.00917640457522127
# Switch mission to JWST
m.mission = 'JWST'

# Query JWST metadata for region
results = m.query_region(coord,
                         radius=0.1)

# Display the first 5 results
print(f'Total number of datasets: {len(results)}')
results[:5]
Total number of datasets: 67
Table masked=True length=5
ArchiveFileIDfileSetNameproductLeveltargproptarg_ratarg_decinstrumeexp_typeopticalElementsdate_obsdurationprogramobservtnvisitpublicReleaseDatepi_nameproposal_typeproposal_cycletargtypeaccessang_sep
int64str25str14str25float64float64str6str13str48str27float64int64int64int64str19str20str3int64str5str16float64
139989292jw01193006001_02101_000011b, 2a, 2b, 2cFOMALHAUT344.4150901092018-29.62327928996265MIRIMIR_IMAGEF2550W2022-10-21T20:17:08.6000000269.1021193612023-10-21T22:14:59Beichman, Charles A.GTO1FIXEDPUBLIC0.0
139989097jw01193006001_02101_000021b, 2a, 2b, 2cFOMALHAUT344.4150901103719-29.62327929047184MIRIMIR_IMAGEF2550W2022-10-21T20:22:59.8950000269.1021193612023-10-21T22:12:51Beichman, Charles A.GTO1FIXEDPUBLIC0.0
139993460jw01193006001_02101_000031b, 2a, 2b, 2cFOMALHAUT344.415090111542-29.62327929098103MIRIMIR_IMAGEF2550W2022-10-21T20:28:51.1910000269.1021193612023-10-21T22:12:48Beichman, Charles A.GTO1FIXEDPUBLIC0.0
139989372jw01193006001_02101_000041b, 2a, 2b, 2cFOMALHAUT344.4150901127151-29.62327929149152MIRIMIR_IMAGEF2550W2022-10-21T20:34:43.3830000269.1021193612023-10-21T22:13:16Beichman, Charles A.GTO1FIXEDPUBLIC0.0
139989260jw01193007001_02101_000011b, 2a, 2bFOMALHAUT-COPY-MIRI-CORON344.4150901161078-29.62327929296791MIRIMIR_TACQFND2022-10-21T20:53:55.86000001.2961193712023-10-22T07:32:01Beichman, Charles A.GTO1FIXEDPUBLIC0.0

Additional Resources#

Citations#

If you use astroquery for published research, please cite the authors. Follow these links for more information about citing astroquery:

About this Notebook#

Author(s): Sam Bianco
Keyword(s): Tutorial, Astroquery, MastMissions
First published: January 2025
Last updated: January 2025


Top of Page Space Telescope Logo