Accessing JWST Proposal Data in astroquery.mast

Learning Goals

This tutorial is aimed at researchers of any level looking for specific observations from a particular program ID. It will cover the basics of authentication, data search, and data downloads.

By the end of this tutorial, you will:

  • Know to how to login/logout to access data in astroquery.
  • Be able to search for data based on proposal ID.
  • Download filtered data products from the MAST archive.

Table of Contents

The workflow for this notebook consists of:

Imports

We need to import Observations from astroquery.mast to access the MAST archives:

In [1]:
from astroquery.mast import Observations

Logging in and out

Most data in the MAST archive is public, and can be accessed without logging in. However, some data is restricted during its 'exclusive access period' (EAP), during which time it is only available to the PI team. During this period, your team will need to sign in to access the data.

To begin, you should make sure that you have an authorized MyST Account.

In order to access data programatically we will also need to obtain an API token. To create and view tokens associated with your account, visit https://auth.mast.stsci.edu/tokens.

There are several ways to enter your token, including:

  1. Manual response to prompt from Observations.login() (must be done every time)
  2. Python keyring; either through the keyring library or Observations.login
  3. Storing it in the bash environment variable $MAST-API-TOKEN

This flexiblility can overwhelming at first; let's take a look at some examples of these methods below.

In [2]:
# Option 1: Respond to prompt
Observations.login()
INFO: MAST API token accepted, welcome Thomas Dutkiewicz [astroquery.mast.auth]

This works well for infrequent API users, but storing the token is far more convenient for repeated logins. You can store the token using keyring or use the built-in store_token flag:

In [3]:
# Option 2: Store Token
Observations.login(store_token=True)
INFO: MAST API token accepted, welcome Thomas Dutkiewicz [astroquery.mast.auth]

Using 'store_token' will allow us to automatically log in, without needing to re-enter the token, for as long as the token remains valid. Note that tokens expire after 10 days of inactivity, or 60 days after creation, whichever comes first. Once it expires, you should use reenter_token=True to overwrite the old token with the new one.

The third option is to store the token as the bash environment variable $MAST_API_TOKEN. This method varies from system to system; for more details, you can check out this guide (links to a non-STScI site).

Let's take a minute to verify that our login was successful:

In [4]:
session_info = Observations.session_info()
attrib.first_name: Thomas
attrib.last_name: Dutkiewicz
attrib.display_name: Thomas Dutkiewicz
attrib.department: Archive Sci App Branch
attrib.email: tdutkiewicz@stsci.edu
attrib.Jwstcalengdataaccess: true
anon: False
session: None
token: bef7564f...

You should see all of your information above. (Some details were omitted from this example to protect security and privacy.) If not, verify that your token and MyST account are active.

And of course, if the need arises, we can logout:

In [5]:
Observations.logout()
session_info = Observations.session_info()
eppn:
ezid: anonymous
anon: True
scopes: []
session: None
token: None

Searching for Data by ID

We can use a program ID to query the MAST archives for data. In the example below, we'll use 2733 as the ID. This is the program that produced the stunning images of the Southern Ring Nebula!

In [6]:
# Let's get a list of all observations associated with this proposal
obs_list = Observations.query_criteria(proposal_id=2733)

# We can chooose the columns we want to display in our table
disp_col = ['dataproduct_type','calib_level','obs_id',
            'target_name','filters','proposal_pi', 'obs_collection']
obs_list[disp_col].show_in_notebook()
Out[6]:
Table length=11
idxdataproduct_typecalib_levelobs_idtarget_namefiltersproposal_piobs_collection
0image3STSCI_PR_2022-033NGC 3132 | Southern Ring Nebula | Eight-Burst Nebula----OPO
1image3jw02733-o002_t001_miri_f1130wNGC 3132F1130WPontoppidan, Klaus M.JWST
2image3jw02733-o002_t001_miri_f770wNGC 3132F770WPontoppidan, Klaus M.JWST
3image3jw02733-o001_t001_nircam_clear-f090wNGC 3132F090WPontoppidan, Klaus M.JWST
4image3jw02733-o002_t001_miri_f1800wNGC 3132F1800WPontoppidan, Klaus M.JWST
5image3jw02733-o001_t001_nircam_clear-f356wNGC 3132F356WPontoppidan, Klaus M.JWST
6image3jw02733-o001_t001_nircam_f405n-f444wNGC 3132F444WPontoppidan, Klaus M.JWST
7image3jw02733-o002_t001_miri_f1280wNGC 3132F1280WPontoppidan, Klaus M.JWST
8image3jw02733-o001_t001_nircam_f444w-f470nNGC 3132F444WPontoppidan, Klaus M.JWST
9image3jw02733-o001_t001_nircam_clear-f187nNGC 3132F187NPontoppidan, Klaus M.JWST
10image3jw02733-o001_t001_nircam_clear-f212nNGC 3132F212NPontoppidan, Klaus M.JWST

We have limited the display columns in the above table for conciseness. For a complete list of observation fields (the columns in the above table) and their descriptions, read here.

We can verify that we have the right observation by looking at the 'proposal_pi' column above. The first observation is a press release image from the Webb Science Launch; this is why it is marked as part of the "Office of Public Outreach" (OPO) collection.

Data Products

Level 3 products are the result of combining and processing multiple lower level products. These two categories are distinct; level 3 products are target-based (sometimes called source-based), while levels 2 and 1 are directly associated with an exposure. A great starting point to understand JWST files and the processing pipeline is available on the Jdox website.

For level 3 observations, it's likely that there are many associated (levels 2 and 1) data products. Let's take a look at how many products are associated with the second observation from our search above.

In [7]:
# We explicity get the 2nd observation by name in this cell.

mask = (obs_list['obs_id'] == 'jw02733-o002_t001_miri_f1130w')
data_products = Observations.get_product_list(obs_list[mask])
print(len(data_products))
3470

This produces over 3000 data products associated with this observation! This is not uncommon for a JWST level-3 observation. In the next section, we'll take a look at how we can filter down the number of results before we download them.

Filtering and Downloading Data

Filtering

You can apply filter keyword arguments to download only data products that meet your given criteria. Available filters are “mrp_only” (minimum recommended products), “extension” (file extension), calib_level (calibration level), and all products fields listed here.

In this example, let's try filtering for only the level 2, calibrated exposures. It is important that we also filter by "SCIENCE" type products; otherwise, our results will include guide star acquisition images.

In [8]:
filtered_prod = Observations.filter_products(data_products, calib_level=[2], productType="SCIENCE")

# Again, we choose columns of interest for convenience
disp_col = ['obsID','dataproduct_type','productFilename','size','calib_level']
filtered_prod[disp_col].show_in_notebook(display_length=10)
Out[8]:
Table length=80
idxobsIDdataproduct_typeproductFilenamesizecalib_level
087599771imagejw02733002001_02103_00001_mirimage_o002_crf.fits550483202
187599771imagejw02733002001_02103_00001_mirimage_cal.fits296841602
287599771imagejw02733002001_02103_00001_mirimage_i2d.fits294451202
387599771imagejw02733002001_02103_00001_mirimage_rate.fits211910402
487599771imagejw02733002001_02103_00001_mirimage_rateints.fits423360002
587602206imagejw02733002001_02103_00002_mirimage_o002_crf.fits550483202
687602206imagejw02733002001_02103_00002_mirimage_cal.fits296841602
787602206imagejw02733002001_02103_00002_mirimage_i2d.fits294451202
887602206imagejw02733002001_02103_00002_mirimage_rate.fits211910402
987602206imagejw02733002001_02103_00002_mirimage_rateints.fits423360002
1087602200imagejw02733002001_02103_00003_mirimage_o002_crf.fits550483202
1187602200imagejw02733002001_02103_00003_mirimage_cal.fits296841602
1287602200imagejw02733002001_02103_00003_mirimage_i2d.fits294451202
1387602200imagejw02733002001_02103_00003_mirimage_rate.fits211910402
1487602200imagejw02733002001_02103_00003_mirimage_rateints.fits423360002
1587599767imagejw02733002001_02103_00004_mirimage_o002_crf.fits550483202
1687599767imagejw02733002001_02103_00004_mirimage_cal.fits296841602
1787599767imagejw02733002001_02103_00004_mirimage_i2d.fits294451202
1887599767imagejw02733002001_02103_00004_mirimage_rate.fits211910402
1987599767imagejw02733002001_02103_00004_mirimage_rateints.fits423360002
2087600168imagejw02733002001_02103_00005_mirimage_o002_crf.fits550483202
2187600168imagejw02733002001_02103_00005_mirimage_cal.fits296841602
2287600168imagejw02733002001_02103_00005_mirimage_i2d.fits294451202
2387600168imagejw02733002001_02103_00005_mirimage_rate.fits211910402
2487600168imagejw02733002001_02103_00005_mirimage_rateints.fits423360002
2587602147imagejw02733002001_02103_00006_mirimage_o002_crf.fits550483202
2687602147imagejw02733002001_02103_00006_mirimage_cal.fits296841602
2787602147imagejw02733002001_02103_00006_mirimage_i2d.fits294451202
2887602147imagejw02733002001_02103_00006_mirimage_rate.fits211910402
2987602147imagejw02733002001_02103_00006_mirimage_rateints.fits423360002
3087602171imagejw02733002001_02103_00007_mirimage_o002_crf.fits550483202
3187602171imagejw02733002001_02103_00007_mirimage_cal.fits296841602
3287602171imagejw02733002001_02103_00007_mirimage_i2d.fits294451202
3387602171imagejw02733002001_02103_00007_mirimage_rate.fits211910402
3487602171imagejw02733002001_02103_00007_mirimage_rateints.fits423360002
3587602190imagejw02733002001_02103_00008_mirimage_o002_crf.fits550483202
3687602190imagejw02733002001_02103_00008_mirimage_cal.fits296841602
3787602190imagejw02733002001_02103_00008_mirimage_i2d.fits294451202
3887602190imagejw02733002001_02103_00008_mirimage_rate.fits211910402
3987602190imagejw02733002001_02103_00008_mirimage_rateints.fits423360002
4087602196imagejw02733002002_02103_00001_mirimage_o002_crf.fits550483202
4187602196imagejw02733002002_02103_00001_mirimage_cal.fits296841602
4287602196imagejw02733002002_02103_00001_mirimage_i2d.fits294451202
4387602196imagejw02733002002_02103_00001_mirimage_rate.fits211910402
4487602196imagejw02733002002_02103_00001_mirimage_rateints.fits423360002
4587600176imagejw02733002002_02103_00002_mirimage_o002_crf.fits550483202
4687600176imagejw02733002002_02103_00002_mirimage_cal.fits296841602
4787600176imagejw02733002002_02103_00002_mirimage_i2d.fits294451202
4887600176imagejw02733002002_02103_00002_mirimage_rate.fits211910402
4987600176imagejw02733002002_02103_00002_mirimage_rateints.fits423360002
5087600445imagejw02733002002_02103_00003_mirimage_o002_crf.fits550483202
5187600445imagejw02733002002_02103_00003_mirimage_cal.fits296841602
5287600445imagejw02733002002_02103_00003_mirimage_i2d.fits294451202
5387600445imagejw02733002002_02103_00003_mirimage_rate.fits211910402
5487600445imagejw02733002002_02103_00003_mirimage_rateints.fits423360002
5587599751imagejw02733002002_02103_00004_mirimage_o002_crf.fits550483202
5687599751imagejw02733002002_02103_00004_mirimage_cal.fits296841602
5787599751imagejw02733002002_02103_00004_mirimage_i2d.fits294451202
5887599751imagejw02733002002_02103_00004_mirimage_rate.fits211910402
5987599751imagejw02733002002_02103_00004_mirimage_rateints.fits423360002
6087599752imagejw02733002002_02103_00005_mirimage_o002_crf.fits550483202
6187599752imagejw02733002002_02103_00005_mirimage_cal.fits296841602
6287599752imagejw02733002002_02103_00005_mirimage_i2d.fits294451202
6387599752imagejw02733002002_02103_00005_mirimage_rate.fits211910402
6487599752imagejw02733002002_02103_00005_mirimage_rateints.fits423360002
6587600443imagejw02733002002_02103_00006_mirimage_o002_crf.fits550483202
6687600443imagejw02733002002_02103_00006_mirimage_cal.fits296841602
6787600443imagejw02733002002_02103_00006_mirimage_i2d.fits294451202
6887600443imagejw02733002002_02103_00006_mirimage_rate.fits211910402
6987600443imagejw02733002002_02103_00006_mirimage_rateints.fits423360002
7087600439imagejw02733002002_02103_00007_mirimage_o002_crf.fits550483202
7187600439imagejw02733002002_02103_00007_mirimage_cal.fits296841602
7287600439imagejw02733002002_02103_00007_mirimage_i2d.fits294451202
7387600439imagejw02733002002_02103_00007_mirimage_rate.fits211910402
7487600439imagejw02733002002_02103_00007_mirimage_rateints.fits423360002
7587602208imagejw02733002002_02103_00008_mirimage_o002_crf.fits550483202
7687602208imagejw02733002002_02103_00008_mirimage_cal.fits296841602
7787602208imagejw02733002002_02103_00008_mirimage_i2d.fits294451202
7887602208imagejw02733002002_02103_00008_mirimage_rate.fits211910402
7987602208imagejw02733002002_02103_00008_mirimage_rateints.fits423360002

Well, that was effective! We now have 80 files, instead of over 3000.

As a final check before we proceed to the download, let's find the total file size of our results:

In [9]:
total = sum(filtered_prod['size'])
print('{:.2f} GB'.format(total/10**9))
2.84 GB

For downloads larger than a GB, it is highly recommended that you follow the steps in Downloading via Curl Script rather than attempting to download the data directly.

Downloading Data Directly

We'll use the filtered product list to select our downloads. This method will immediately send a request to the MAST archives, and download the data to this notebook's folder.

Note: By default, this will only download the first five files. This reduces download time for the purposes of the tutorial while still demonstrating a successful download.

In [10]:
# Don't forget to login, if accessing non-public data! You can un-comment the line below:
# Observations.login()

# You can download all of the products by removing the '[:5]' from the line below:
manifest = Observations.download_products(filtered_prod[:5])
print(manifest['Status'])
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02733002001_02103_00001_mirimage_o002_crf.fits to ./mastDownload/JWST/jw02733002001_02103_00001_mirimage/jw02733002001_02103_00001_mirimage_o002_crf.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02733002001_02103_00001_mirimage_cal.fits to ./mastDownload/JWST/jw02733002001_02103_00001_mirimage/jw02733002001_02103_00001_mirimage_cal.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02733002001_02103_00001_mirimage_i2d.fits to ./mastDownload/JWST/jw02733002001_02103_00001_mirimage/jw02733002001_02103_00001_mirimage_i2d.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02733002001_02103_00001_mirimage_rate.fits to ./mastDownload/JWST/jw02733002001_02103_00001_mirimage/jw02733002001_02103_00001_mirimage_rate.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02733002001_02103_00001_mirimage_rateints.fits to ./mastDownload/JWST/jw02733002001_02103_00001_mirimage/jw02733002001_02103_00001_mirimage_rateints.fits ... [Done]
 Status
--------
COMPLETE
COMPLETE
COMPLETE
COMPLETE
COMPLETE

Downloading via Curl Script

Rather than downloading the files directly, we can instead download a curl script. You can run the script at any time to download your data.

This method supports larger data volumes (and downloads more quickly!) than a traditional portal download.

In [11]:
manifest = Observations.download_products(filtered_prod, curl_flag=True)
Downloading URL https://mast.stsci.edu/api/v0.1/Download/bundle.sh to ./mastDownload_20221010200759.sh ... [Done]

You can run the script in your terminal by navigating to the desired download location and typing bash [filename].sh. For Windows users, this will require cygwin or other programs that support bash scripts. You may be prompted for your API token.

Additional Resources

Within the current directory, there is a companion script that unifies all of the code from this notebook. It runs in the terminal with two arguments: the program ID, and whether you should download a curl script.
For example, you might run python3 companion_script.py 2733 True to download the above data via a curl script.

For additional details about astroquery.mast, see the readthedocs page.

About this Notebook

For additonal questions, comments, or feedback, please email archive@stsci.edu.

Authors: Thomas Dutkiewicz, Susan Mullally
Keywords: JWST, MAST, authentication
Last Updated: Jul 2022
Next Review: Jan 2023

Citations

If you use astroquery for published research, please cite the authors.