Accessing JWST Proposal Data in astroquery.mast#

Learning Goals#

This tutorial is aimed at researchers of any level looking for specific observations from a particular program ID. It will cover the basics of authentication, data search, and data downloads.

By the end of this tutorial, you will:

  • Know to how to login/logout to access data in astroquery.

  • Be able to search for data based on proposal ID.

  • Download filtered data products from the MAST Archive.


Table of Contents#

The workflow for this notebook consists of:

  • Logging in/out

  • Searching for Data by ID

  • Data Products

  • Filtering and Downloading Data

    • Filtering

    • Downloading Directly

    • Downloading via Curl Script

  • Additional Resources

Imports#

We need to import Observations from astroquery.mast to access the MAST Archive:

from astroquery.mast import Observations

Logging in and out#

Most data in the MAST Archive is public, and can be accessed without logging in. However, some data is restricted during its ‘exclusive access period’ (EAP), during which time it is only available to the PI team. During this period, your team will need to sign in to access the data.

To begin, you should make sure that you have an authorized MyST Account.

In order to access data programatically we will also need to obtain an API token. To create and view tokens associated with your account, visit https://auth.mast.stsci.edu/tokens.

There are several ways to enter your token, including:

  1. Manual response to prompt from Observations.login() (must be done every time)

  2. Python keyring; either through the keyring library or Observations.login

  3. Storing it in the bash environment variable $MAST_API_TOKEN

This flexiblility can overwhelming at first; let’s take a look at some examples of these methods below.

# Option 1: Respond to prompt. Uncomment the line below
#Observations.login()

This works well for infrequent API users, but storing the token is far more convenient for repeated logins. You can store the token using keyring or use the built-in store_token flag:

# Option 2: Store Token. Uncomment the line below
#Observations.login(store_token=True)

Using ‘store_token’ will allow us to automatically log in, without needing to re-enter the token, for as long as the token remains valid. Note that tokens expire after 10 days of inactivity, or 60 days after creation, whichever comes first. Once it expires, you should use reenter_token=True to overwrite the old token with the new one.

The third option is to store the token as the bash environment variable $MAST_API_TOKEN. This method varies from system to system; for more details, you can check out this guide (links to a non-STScI site).

Let’s take a minute to verify that our login was successful:

session_info = Observations.session_info()
eppn: 
ezid: anonymous
anon: True
scopes: []
session: None
token: None

You should see all of your information above. If not, verify that your token and MyST account are active.

And of course, if the need arises, we can logout:

Observations.logout()
session_info = Observations.session_info()
eppn: 
ezid: anonymous
anon: True
scopes: []
session: None
token: None

Searching for Data by ID#

We can use a program ID to query the MAST Archive for data. In the example below, we’ll use 2733 as the ID. This is the program that produced the stunning images of the Southern Ring Nebula!

# Let's get a list of all observations associated with this proposal
obs_list = Observations.query_criteria(proposal_id=2733)

# We can chooose the columns we want to display in our table
disp_col = ['dataproduct_type','calib_level','obs_id',
            'target_name','filters','proposal_pi', 'obs_collection']
obs_list[disp_col].show_in_notebook()
WARNING: AstropyDeprecationWarning: show_in_notebook() is deprecated as of 6.1 and to create
         interactive tables it is recommended to use dedicated tools like:
         - https://github.com/bloomberg/ipydatagrid
         - https://docs.bokeh.org/en/latest/docs/user_guide/interaction/widgets.html#datatable
         - https://dash.plotly.com/datatable [warnings]
Table length=12
idxdataproduct_typecalib_levelobs_idtarget_namefiltersproposal_piobs_collection
0image3STSCI_PR_2022-059Southern Ring Nebula | NGC 3132----OPO
1image3STSCI_PR_2022-033NGC 3132 | Southern Ring Nebula | Eight-Burst Nebula----OPO
2image3jw02733-o002_t001_miri_f1130wNGC-3132F1130WPontoppidan, Klaus M.JWST
3image3jw02733-o002_t001_miri_f770wNGC-3132F770WPontoppidan, Klaus M.JWST
4image3jw02733-o002_t001_miri_f1800wNGC-3132F1800WPontoppidan, Klaus M.JWST
5image3jw02733-o002_t001_miri_f1280wNGC-3132F1280WPontoppidan, Klaus M.JWST
6image3jw02733-o001_t001_nircam_clear-f187nNGC-3132F187NPontoppidan, Klaus M.JWST
7image3jw02733-o001_t001_nircam_clear-f090wNGC-3132F090WPontoppidan, Klaus M.JWST
8image3jw02733-o001_t001_nircam_clear-f356wNGC-3132F356WPontoppidan, Klaus M.JWST
9image3jw02733-o001_t001_nircam_clear-f212nNGC-3132F212NPontoppidan, Klaus M.JWST
10image3jw02733-o001_t001_nircam_f405n-f444wNGC-3132F444W;F405NPontoppidan, Klaus M.JWST
11image3jw02733-o001_t001_nircam_f444w-f470nNGC-3132F444W;F470NPontoppidan, Klaus M.JWST

We have limited the display columns in the above table for conciseness. For a complete list of observation fields (the columns in the above table) and their descriptions, read here.

We can verify that we have the right observation by looking at the 'proposal_pi' column above. The first observation is a press release image from the Webb Science Launch; this is why it is marked as part of the “Office of Public Outreach” (OPO) collection.

Data Products#

Level 3 products are the result of combining and processing multiple lower level products. These two categories are distinct; level 3 products are target-based (sometimes called source-based), while levels 2 and 1 are directly associated with an exposure. A great starting point to understand JWST files and the processing pipeline is available on the Jdox website.

For level 3 observations, it’s likely that there are many associated (levels 2 and 1) data products. Let’s take a look at how many products are associated with the second observation from our search above.

# We explicity get the 2nd observation by name in this cell.

mask = (obs_list['obs_id'] == 'jw02733-o002_t001_miri_f1130w')
data_products = Observations.get_product_list(obs_list[mask])
print(len(data_products))
3458

This produces over 3000 data products associated with this observation! This is not uncommon for a JWST level-3 observation. In the next section, we’ll take a look at how we can filter down the number of results before we download them.

Filtering and Downloading Data#

Filtering#

You can apply filter keyword arguments to download only data products that meet your given criteria. Available filters are “mrp_only” (minimum recommended products), “extension” (file extension), calib_level (calibration level), and all products fields listed here.

In this example, let’s try filtering for only the level 2, calibrated exposures. It is important that we also filter by “SCIENCE” type products; otherwise, our results will include guide star acquisition images.

filtered_prod = Observations.filter_products(data_products, calib_level=[2], productType="SCIENCE")

# Again, we choose columns of interest for convenience
disp_col = ['obsID','dataproduct_type','productFilename','size','calib_level']
filtered_prod[disp_col].show_in_notebook(display_length=10)
WARNING: AstropyDeprecationWarning: show_in_notebook() is deprecated as of 6.1 and to create
         interactive tables it is recommended to use dedicated tools like:
         - https://github.com/bloomberg/ipydatagrid
         - https://docs.bokeh.org/en/latest/docs/user_guide/interaction/widgets.html#datatable
         - https://dash.plotly.com/datatable [warnings]
Table length=80
idxobsIDdataproduct_typeproductFilenamesizecalib_level
087599751imagejw02733002002_02103_00004_mirimage_rate.fits211910402
187599751imagejw02733002002_02103_00004_mirimage_cal.fits296870402
287599751imagejw02733002002_02103_00004_mirimage_o002_crf.fits296870402
387599751imagejw02733002002_02103_00004_mirimage_rateints.fits423360002
487599751imagejw02733002002_02103_00004_mirimage_i2d.fits294451202
587599752imagejw02733002002_02103_00005_mirimage_cal.fits296870402
687599752imagejw02733002002_02103_00005_mirimage_rate.fits211910402
787599752imagejw02733002002_02103_00005_mirimage_rateints.fits423360002
887599752imagejw02733002002_02103_00005_mirimage_o002_crf.fits296870402
987599752imagejw02733002002_02103_00005_mirimage_i2d.fits294451202
1087599767imagejw02733002001_02103_00004_mirimage_cal.fits296870402
1187599767imagejw02733002001_02103_00004_mirimage_i2d.fits294451202
1287599767imagejw02733002001_02103_00004_mirimage_rate.fits211910402
1387599767imagejw02733002001_02103_00004_mirimage_o002_crf.fits296870402
1487599767imagejw02733002001_02103_00004_mirimage_rateints.fits423360002
1587599771imagejw02733002001_02103_00001_mirimage_o002_crf.fits296870402
1687599771imagejw02733002001_02103_00001_mirimage_rateints.fits423360002
1787599771imagejw02733002001_02103_00001_mirimage_cal.fits296870402
1887599771imagejw02733002001_02103_00001_mirimage_i2d.fits294451202
1987599771imagejw02733002001_02103_00001_mirimage_rate.fits211910402
2087600168imagejw02733002001_02103_00005_mirimage_o002_crf.fits296870402
2187600168imagejw02733002001_02103_00005_mirimage_rate.fits211910402
2287600168imagejw02733002001_02103_00005_mirimage_i2d.fits294451202
2387600168imagejw02733002001_02103_00005_mirimage_rateints.fits423360002
2487600168imagejw02733002001_02103_00005_mirimage_cal.fits296870402
2587600176imagejw02733002002_02103_00002_mirimage_i2d.fits294451202
2687600176imagejw02733002002_02103_00002_mirimage_cal.fits296870402
2787600176imagejw02733002002_02103_00002_mirimage_rateints.fits423360002
2887600176imagejw02733002002_02103_00002_mirimage_rate.fits211910402
2987600176imagejw02733002002_02103_00002_mirimage_o002_crf.fits296870402
3087600439imagejw02733002002_02103_00007_mirimage_o002_crf.fits296870402
3187600439imagejw02733002002_02103_00007_mirimage_cal.fits296870402
3287600439imagejw02733002002_02103_00007_mirimage_i2d.fits294451202
3387600439imagejw02733002002_02103_00007_mirimage_rateints.fits423360002
3487600439imagejw02733002002_02103_00007_mirimage_rate.fits211910402
3587600443imagejw02733002002_02103_00006_mirimage_cal.fits296870402
3687600443imagejw02733002002_02103_00006_mirimage_rate.fits211910402
3787600443imagejw02733002002_02103_00006_mirimage_o002_crf.fits296870402
3887600443imagejw02733002002_02103_00006_mirimage_i2d.fits294451202
3987600443imagejw02733002002_02103_00006_mirimage_rateints.fits423360002
4087600445imagejw02733002002_02103_00003_mirimage_i2d.fits294451202
4187600445imagejw02733002002_02103_00003_mirimage_cal.fits296870402
4287600445imagejw02733002002_02103_00003_mirimage_rateints.fits423360002
4387600445imagejw02733002002_02103_00003_mirimage_o002_crf.fits296870402
4487600445imagejw02733002002_02103_00003_mirimage_rate.fits211910402
4587602147imagejw02733002001_02103_00006_mirimage_rateints.fits423360002
4687602147imagejw02733002001_02103_00006_mirimage_o002_crf.fits296870402
4787602147imagejw02733002001_02103_00006_mirimage_i2d.fits294451202
4887602147imagejw02733002001_02103_00006_mirimage_cal.fits296870402
4987602147imagejw02733002001_02103_00006_mirimage_rate.fits211910402
5087602171imagejw02733002001_02103_00007_mirimage_rate.fits211910402
5187602171imagejw02733002001_02103_00007_mirimage_i2d.fits294451202
5287602171imagejw02733002001_02103_00007_mirimage_rateints.fits423360002
5387602171imagejw02733002001_02103_00007_mirimage_cal.fits296870402
5487602171imagejw02733002001_02103_00007_mirimage_o002_crf.fits296870402
5587602190imagejw02733002001_02103_00008_mirimage_rateints.fits423360002
5687602190imagejw02733002001_02103_00008_mirimage_rate.fits211910402
5787602190imagejw02733002001_02103_00008_mirimage_i2d.fits294451202
5887602190imagejw02733002001_02103_00008_mirimage_o002_crf.fits296870402
5987602190imagejw02733002001_02103_00008_mirimage_cal.fits296870402
6087602196imagejw02733002002_02103_00001_mirimage_cal.fits296870402
6187602196imagejw02733002002_02103_00001_mirimage_rateints.fits423360002
6287602196imagejw02733002002_02103_00001_mirimage_i2d.fits294451202
6387602196imagejw02733002002_02103_00001_mirimage_rate.fits211910402
6487602196imagejw02733002002_02103_00001_mirimage_o002_crf.fits296870402
6587602200imagejw02733002001_02103_00003_mirimage_o002_crf.fits296870402
6687602200imagejw02733002001_02103_00003_mirimage_rate.fits211910402
6787602200imagejw02733002001_02103_00003_mirimage_rateints.fits423360002
6887602200imagejw02733002001_02103_00003_mirimage_cal.fits296870402
6987602200imagejw02733002001_02103_00003_mirimage_i2d.fits294451202
7087602206imagejw02733002001_02103_00002_mirimage_o002_crf.fits296870402
7187602206imagejw02733002001_02103_00002_mirimage_rateints.fits423360002
7287602206imagejw02733002001_02103_00002_mirimage_i2d.fits294451202
7387602206imagejw02733002001_02103_00002_mirimage_cal.fits296870402
7487602206imagejw02733002001_02103_00002_mirimage_rate.fits211910402
7587602208imagejw02733002002_02103_00008_mirimage_i2d.fits294451202
7687602208imagejw02733002002_02103_00008_mirimage_o002_crf.fits296870402
7787602208imagejw02733002002_02103_00008_mirimage_cal.fits296870402
7887602208imagejw02733002002_02103_00008_mirimage_rateints.fits423360002
7987602208imagejw02733002002_02103_00008_mirimage_rate.fits211910402

Well, that was effective! We now have 80 files, instead of over 3000.

As a final check before we proceed to the download, let’s find the total file size of our results:

total = sum(filtered_prod['size'])
print('{:.2f} GB'.format(total/10**9))
2.44 GB

For downloads larger than a GB, it is highly recommended that you follow the steps in Downloading via Curl Script rather than attempting to download the data directly.

Downloading Data Directly#

We’ll use the filtered product list to select our downloads. This method will immediately send a request to the MAST Archive, and download the data to this notebook’s folder.

Note: By default, this will only download the first five files. This reduces download time for the purposes of the tutorial while still demonstrating a successful download.

# Don't forget to login, if accessing non-public data! You can un-comment the line below:
# Observations.login()

# You can download all of the products by removing the '[:5]' from the line below:
manifest = Observations.download_products(filtered_prod[:5])
print(manifest['Status'])
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02733002002_02103_00004_mirimage_rate.fits to ./mastDownload/JWST/jw02733002002_02103_00004_mirimage/jw02733002002_02103_00004_mirimage_rate.fits ...
 [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02733002002_02103_00004_mirimage_cal.fits to ./mastDownload/JWST/jw02733002002_02103_00004_mirimage/jw02733002002_02103_00004_mirimage_cal.fits ...
 [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02733002002_02103_00004_mirimage_o002_crf.fits to ./mastDownload/JWST/jw02733002002_02103_00004_mirimage/jw02733002002_02103_00004_mirimage_o002_crf.fits ...
 [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02733002002_02103_00004_mirimage_rateints.fits to ./mastDownload/JWST/jw02733002002_02103_00004_mirimage/jw02733002002_02103_00004_mirimage_rateints.fits ...
 [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02733002002_02103_00004_mirimage_i2d.fits to ./mastDownload/JWST/jw02733002002_02103_00004_mirimage/jw02733002002_02103_00004_mirimage_i2d.fits ...
 [Done]
 Status 
--------
COMPLETE
COMPLETE
COMPLETE
COMPLETE
COMPLETE

Downloading via Curl Script#

Rather than downloading the files directly, we can instead download a curl script. You can run the script at any time to download your data.

This method supports larger data volumes (and downloads more quickly!) than a traditional portal download.

manifest = Observations.download_products(filtered_prod, curl_flag=True)
Downloading URL https://mast.stsci.edu/api/v0.1/Download/bundle.sh to ./mastDownload_20240710223201.sh ...
 [Done]

You can run the script in your terminal by navigating to the desired download location and typing bash [filename].sh. For Windows users, this will require cygwin or other programs that support bash scripts. You may be prompted for your API token.

Additional Resources#

Within the current directory, there is a companion script that unifies all of the code from this notebook. It runs in the terminal with two arguments: the program ID, and whether you should download a curl script.
For example, you might run python3 companion_script.py 2733 True to download the above data via a curl script.

For additional details about astroquery.mast, see the readthedocs page.

About this Notebook#

For additonal questions, comments, or feedback, please email archive@stsci.edu.

Authors: Thomas Dutkiewicz, Susan Mullally
Keywords: JWST, MAST, authentication
Last Updated: Jul 2022
Next Review: Jan 2023

Citations#

If you use astroquery for published research, please cite the authors.


Top of Page Space Telescope Logo