Large Downloads in astroquery.mast

Large Downloads in `astroquery.mast`#

For some programs stored in the MAST Archive, you may encounter issues when downloading data via the MAST Portal due to a large number of files. This applies particularly to JWST programs using Wide-Field Slitless Spectroscopy. It is preferable — and often, necessary — to use an API to get this data instead. In this tutorial, we’ll use seemingly innocuous observations that expand into a considerable number of related files.

To that end, this notebook will demonstrate:

Searching the MAST Portal for observations using the astroquery.mast API
Retreiving associated data products, without causing a timeout error
Downloading the desired subset of data products

Table of Contents#

Imports
Search the MAST Archive
Retrieve Associated Products
Filter and Download Products
Further Reading

Imports#

In order to run this notebook, we need:

astroquery.mast to access the MAST Archive
astropy.table to hold the results of our queries, combine them, and then filter them for unique products

from astroquery.mast import Observations
from astropy.table import unique, vstack

Search the MAST Archive#

The first step to downloading the data is finding the observations we’re interested in. This is easiest to do using query_criteria, which allows us to specify criteria such as RA/Dec, filters, exposure time, and any other fields listed here.

In this example, we use query_criteria to find NIRCam observations from JWST Program 1073. When querying for JWST data, using obs_collection = 'JWST' greatly inreases the speed of the search by decreasing the number of potential matches. This applies to all mission available in MAST, including HST.

matched_obs = Observations.query_criteria(
        obs_collection='JWST',
        proposal_id='1073',
        instrument_name='NIRCAM/IMAGE' # Be sure to specify the full "instrument/mode" configuration!
        )

# This displays selected columns from the observation table, as a sanity check
columns = ['dataproduct_type', 'filters', 'calib_level', 't_exptime', 'proposal_pi', 'intentType', 'obsid', 'instrument_name']
matched_obs[columns][:5]

Table length=5

dataproduct_type	filters	calib_level	t_exptime	proposal_pi	intentType	obsid	instrument_name
str5	str5	int64	float64	str19	str7	str9	str12
image	F277W	3	343.576	Koekemoer, Anton M.	science	83254380	NIRCAM/IMAGE
image	F115W	3	343.576	Koekemoer, Anton M.	science	83254391	NIRCAM/IMAGE
image	F277W	3	343.576	Koekemoer, Anton M.	science	75900624	NIRCAM/IMAGE
image	F150W	3	343.576	Koekemoer, Anton M.	science	75914186	NIRCAM/IMAGE
image	F150W	3	343.576	Koekemoer, Anton M.	science	118344942	NIRCAM/IMAGE

The above search results in 15 observations. Keep this in the number in mind as we search for associated products.

Retreive Associated Products#

Each observation has associated data products. Which products are of interest to you depends on how you intend to use the data; more on this in the section below. For now, let’s retreive all the products by requesting them in small “chunks”.

Note: It is wise to avoid requesting all of the products simultaneously. This is extremely likely to take an enormous amount of time, fail, or worse, do both, ultimately giving you a headache. MAST offers no medical advice, but we are decidedly anti-headache. Requesting products in groups of five offers the best balance between speed and reliability.

# Split the observations into "chunks" of size five
sz_chunk = 5
chunks = [matched_obs[i:i+sz_chunk] for i in range(0, len(matched_obs), sz_chunk)]

# Get the list of products for each chunk
t = [Observations.get_product_list(chunk) for chunk in chunks]

# Keep only the unique files
files = unique(vstack(t), keys='productFilename')

# How many files are there? How large are they?
print(f"There are {len(files)} unique files, which are {sum(files['size'])/10**9:.1f} GB in size.")

There are 6768 unique files, which are 299.6 GB in size.

Now the issue with requesting all of the products simultaneously is clear: there are more than 6,000 unique files associated with our 15 observations.

Running this search on the MAST Portal results in over 30,000 files since the Portal does not exclude duplicate results; that is nearing the limit of the what the Portal can load. One of the advantages of using the API is avoiding this large number of duplicates.

Filter and Download Products#

If you are trying to download proprietary data, you will need to login. This requires a MAST token, which you can create at the auth.mast wesbite. If you have not set this as environment variable, you will have to enter it in the login prompt below.

In this example, we are looking to download the uncalibrated products. We will filter those out below using the productSubGroupDescription field. You can find the other available product filters, including product type and file size, here. Examples are also included, but commented out, in the cell below.

An additional option we make use of is the curl_script flag. Rather than downloading the data immediately, this method instead downloads a curl script. This is turned off by default, but is more robust than a direct download, and is highly recommended when downloading a large number of files. You can run this script using bash mastDownload_dddd.sh, changing dddd to reflect the actual name of your file.

# Un-comment below if downloading data during its exclusive access period.
# Observations.login()

manifest = Observations.download_products(
           files,
           productSubGroupDescription='UNCAL',
           curl_flag=True
           #, dataproduct_type='IMAGE'
           #, calib_level = [2]
           )

Downloading URL https://mast.stsci.edu/api/v0.1/Download/bundle.sh to ./mastDownload_20250327192734.sh ...

 [Done]

All of the code in this notebook is available as a ‘companion script’, for further convenience.

Futher Reading#

For a full explanation of product levels and the processing pipleline, see Science Data Products
JWST Archive Manual
Astropy and the relevant Table object

About this Notebook#

Authors: Thomas Dutkiewicz, Dick Shaw
Keywords: Downloads, astroquery, MAST
Last Updated: Aug 2022
Next Review Date: Feb 2023

Top of Page