Large downloads in astroquery.mast from AWS to local storage#

Introduction#

Several MAST datasets are now available from the Registry of Open Data on AWS, a cloud data storage service. These datasets include data from JWST, HST, TESS, Pan-STARRS, GALEX, and Kepler/K2. In this notebook, you’ll learn how to download data in bulk to your local machine’s storage from two large survey missions, GALEX and Pan-STARRS (PS1).

To give some more context on the missions we’ll be focusing on:

  • Galaxy Evolution Explorer (GALEX) was a NASA mission led by the California Institute of Technology, whose primary goal was to investigate how star formation in galaxies evolved from the early Universe up to the present. GALEX used microchannel plate detectors to obtain direct images in the near-UV (NUV) and far-UV (FUV), and a grism to disperse light for low resolution spectroscopy.

  • Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) is a system for wide-field astronomical imaging developed and operated by the Institute for Astronomy at the University of Hawaii. Pan-STARRS1 (PS1) is the first part of Pan-STARRS to be completed. The PS1 survey used a 1.8 meter telescope and its 1.4 Gigapixel camera to image the sky in five broadband filters (g, r, i, z, y).

Learning Goals#

In this notebook, you will:

  • Learn how to download MAST data from our AWS cloud repositories to your local machine’s storage.

  • Make targeted queries to MAST using parameters such as: right ascension, declination, and more.

  • Filter the resulting products by using parameters such as: productType, productSubGroupDescription, productGroupDescription, mrp_only, and more.

Imports#

We only need one import for this notebook!

  • astroquery.mast.Observations to access the MAST API

from astroquery.mast import Observations

Using astroquery to query MAST’s multi-mission database#

  • The Observations API from astroquery.mast can be used to query MAST’s multi-mission database, an instance of the Common Archive Observation Model housing structured metadata from multiple missions in a unified database, from legacy missions to currently operational missions. In this database, individual data products are organized under “observations”.

  • Note: for certain missions like JWST and HST, there is also a MastMissions API from astroquery.mast that can be used to query mission-specific metadata beyond that which can be made to conform to the Common Archive Observation Model. This notebook will not demonstrate the capabilities of the MastMissions API. Refer, instead, to Searching for Mission-Specific Data with Astroquery.

First, let’s turn on access to MAST’s datasets in the AWS cloud. Downloading from AWS is sometimes faster, and is always preferred to lessen the pressure on MAST’s on-premise servers.

Observations.enable_cloud_dataset()
INFO: Using the S3 STScI public dataset [astroquery.mast.cloud]

The Four-Step Data Download Process#

Retrieving MAST data from AWS to your local machine can be performed with the following four-step process:

  • Step 1: Retrieve observation metadata matching your query criteria

  • Step 2: Retrieve metadata for the individual data products that comprise those observations

  • Step 3: (Optional) Filter the data products based on further product-level criteria

  • Step 4: Download the files from AWS to your local machine

Step 1#

First, let’s retrieve observations in a sky coordinate range of interest. One way to search observations by coordinates is by giving Observations.query_criteria() a box defining the search area, consisting of two coordinates for the right ascension range and two coordinates for the declination range. You can also supply any number of missions, and various other metadata constraints. In this case, we’ll retrieve both GALEX and Pan-STARRS observations. The output is an astropy table.

obs = Observations.query_criteria(s_ra=[30.2, 31.2], s_dec=[-10.25, -9.25]
                                  , obs_collection=["GALEX", "PS1"])
print(f'We retrieved {len(obs)} observations.')
We retrieved 317 observations.

If you would like to filter by other parameters, see the other filter parameters below.

# NOTE: Use this line if you want to get app the parameters and the details.
# Observations.get_metadata("observations").pprint(max_lines=-1, max_width=-1)

# NOTE: Use this line just to get the parameters and their names.
Observations.get_metadata('observations')['Column Name'].pprint(max_lines=-1)
     Column Name     
---------------------
           intentType
       obs_collection
      provenance_name
      instrument_name
              project
              filters
    wavelength_region
          target_name
target_classification
               obs_id
                 s_ra
                s_dec
          proposal_id
          proposal_pi
            obs_title
     dataproduct_type
          calib_level
                t_min
                t_max
        t_obs_release
            t_exptime
               em_min
               em_max
                objID
             s_region
              jpegURL
             distance
                obsid
           dataRights
               mtFlag
               srcDen
              dataURL
        proposal_type
      sequence_number

Step 2#

Now, we can retrieve the individual data products organized under those observations.

prod = Observations.get_product_list(obs)
print(f'We retrieved {len(prod)} data products.')
# prod is another astropy table
We retrieved 6149 data products.

Step 3#

Now we can use Observations.filter_products() to filter for specific data products. This function can filter on obs_collection (mission), productType,productSubGroupDescription, productGroupDescription, and mrp_only, among numerous other parameters described on the product field descriptions page.

mrp_only means to request only data products identified by MAST as the main “Minimum Recommended” products in each observation. For example, in Pan-STARRS (PS1), limiting your results to MRP products excludes the individual-epoch warp images, and various other ancillary files.

For GALEX, the range of possible values for these parameters include:

  • productType: AUXILIARY, CATALOG, INFO, PREVIEW, SCIENCE, THUMBNAIL

  • productSubGroupDescription: Catalog Only, Imaging Only, Spectra Only, Spectral Image Strips Only, Whole Field Images Only

  • productGroupDescription: Minimum Recommended Products

  • mrp_only: True, False.

For Pan-STARRS (PS1), the range of possible values for these parameters include:

  • productType: AUXILIARY, CATALOG, INFO, SCIENCE

  • productSubGroupDescription: N/A

  • productGroupDescription: Minimum Recommended Products

  • mrp_only: True, False

Note that productSubGroupDescription and productGroupDescription are generally not needed when filtering for Pan-STARRS products.

Let’s start with GALEX:

filt_prod_galex = Observations.filter_products(
    prod,
    obs_collection="GALEX",
    productType="SCIENCE",
    productSubGroupDescription="Imaging Only",
    productGroupDescription="Minimum Recommended Products",
    mrp_only=True
)

# Shows how many files are left after applying the filter.
print(f'We are left with {len(filt_prod_galex)} files.')

# Shows the filtered table of data products.
display(filt_prod_galex)
We are left with 7 files.
Table masked=True length=7
obsIDobs_collectiondataproduct_typeobs_iddescriptiontypedataURIproductTypeproductGroupDescriptionproductSubGroupDescriptionproductDocumentationURLprojectprvversionproposal_idproductFilenamesizeparent_obsiddataRightscalib_levelfilters
str7str5str8str43str139str1str166str9str28str56str1str3str3str3str68int64str7str10int64str3
665GALEXimage2436590472420917248Intensity map (J2000)Cmast:GALEX/url/data/GR6/pipe/01-vsn/03716-MISDR1_18032_0666/d/01-main/0001-img/07-try/MISDR1_18032_0666-fd-int.fits.gzSCIENCEMinimum Recommended ProductsImaging Only--MIS----MISDR1_18032_0666-fd-int.fits.gz9559896665PUBLIC2FUV
665GALEXimage2436590472420917248Intensity map (J2000)Cmast:GALEX/url/data/GR6/pipe/01-vsn/03716-MISDR1_18032_0666/d/01-main/0001-img/07-try/MISDR1_18032_0666-nd-int.fits.gzSCIENCEMinimum Recommended ProductsImaging Only--MIS----MISDR1_18032_0666-nd-int.fits.gz17603914665PUBLIC2NUV
4923GALEXimage3209978155506860032Intensity map (J2000)Cmast:GALEX/url/data/GR7/pipe/01-vsn/25697-GI5_028097_W1_18085_0274/d/01-main/0007-img/07-try/GI5_028097_W1_18085_0274-nd-int.fits.gzSCIENCEMinimum Recommended ProductsImaging Only--GII--177GI5_028097_W1_18085_0274-nd-int.fits.gz171394884923PUBLIC2NUV
29153GALEXimage6380521092288610304Intensity map (J2000)Cmast:GALEX/url/data/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg03-fd-int.fits.gzSCIENCEMinimum Recommended ProductsImaging Only--AIS----AIS_273_sg03-fd-int.fits.gz145522629153PUBLIC2FUV
29153GALEXimage6380521092288610304Intensity map (J2000)Cmast:GALEX/url/data/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg03-nd-int.fits.gzSCIENCEMinimum Recommended ProductsImaging Only--AIS----AIS_273_sg03-nd-int.fits.gz768336729153PUBLIC2NUV
29161GALEXimage6380521100878544896Intensity map (J2000)Cmast:GALEX/url/data/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg11-fd-int.fits.gzSCIENCEMinimum Recommended ProductsImaging Only--AIS----AIS_273_sg11-fd-int.fits.gz123405929161PUBLIC2FUV
29161GALEXimage6380521100878544896Intensity map (J2000)Cmast:GALEX/url/data/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg11-nd-int.fits.gzSCIENCEMinimum Recommended ProductsImaging Only--AIS----AIS_273_sg11-nd-int.fits.gz678391729161PUBLIC2NUV

And now let’s try Pan-STARRS:

filt_prod_ps1 = Observations.filter_products(
    prod,
    obs_collection="PS1",
    mrp_only=True
)

# Show how many files are left after applying the filter.
print(f'We are left with {len(filt_prod_ps1)} files.')

# Show the first few rows of the filtered table of data products.
display(filt_prod_ps1[0:3])
We are left with 30 files.
Table masked=True length=3
obsIDobs_collectiondataproduct_typeobs_iddescriptiontypedataURIproductTypeproductGroupDescriptionproductSubGroupDescriptionproductDocumentationURLprojectprvversionproposal_idproductFilenamesizeparent_obsiddataRightscalib_levelfilters
str7str5str8str43str139str1str166str9str28str56str1str3str3str3str68int64str7str10int64str3
1972016PS1imagerings.v3.skycell.1062.048.stk.gstack data imageCmast:PS1/product/rings.v3.skycell.1062.048.stk.g.unconv.fitsSCIENCEMinimum Recommended Products----3PIpv3--rings.v3.skycell.1062.048.stk.g.unconv.fits667958401972016PUBLIC3g
1972017PS1imagerings.v3.skycell.1062.048.stk.istack data imageCmast:PS1/product/rings.v3.skycell.1062.048.stk.i.unconv.fitsSCIENCEMinimum Recommended Products----3PIpv3--rings.v3.skycell.1062.048.stk.i.unconv.fits659606401972017PUBLIC3i
1972018PS1imagerings.v3.skycell.1062.048.stk.rstack data imageCmast:PS1/product/rings.v3.skycell.1062.048.stk.r.unconv.fitsSCIENCEMinimum Recommended Products----3PIpv3--rings.v3.skycell.1062.048.stk.r.unconv.fits675158401972018PUBLIC3r

Step 4#

Download the files you need to your local computer using Observations.download_products! For example, for GALEX:

Observations.download_products(filt_prod_galex, cloud_only=True)
Downloading URL s3://stpubdata/galex/GR6/pipe/01-vsn/03716-MISDR1_18032_0666/d/01-main/0001-img/07-try/MISDR1_18032_0666-fd-int.fits.gz to ./mastDownload/GALEX/2436590472420917248/MISDR1_18032_0666-fd-int.fits.gz ...
 [Done]
Downloading URL s3://stpubdata/galex/GR6/pipe/01-vsn/03716-MISDR1_18032_0666/d/01-main/0001-img/07-try/MISDR1_18032_0666-nd-int.fits.gz to ./mastDownload/GALEX/2436590472420917248/MISDR1_18032_0666-nd-int.fits.gz ...
 [Done]
Downloading URL s3://stpubdata/galex/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg03-fd-int.fits.gz to ./mastDownload/GALEX/6380521092288610304/AIS_273_sg03-fd-int.fits.gz ...
 [Done]
Downloading URL s3://stpubdata/galex/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg03-nd-int.fits.gz to ./mastDownload/GALEX/6380521092288610304/AIS_273_sg03-nd-int.fits.gz ...
 [Done]
Downloading URL s3://stpubdata/galex/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg11-fd-int.fits.gz to ./mastDownload/GALEX/6380521100878544896/AIS_273_sg11-fd-int.fits.gz ...
 [Done]
Downloading URL s3://stpubdata/galex/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg11-nd-int.fits.gz to ./mastDownload/GALEX/6380521100878544896/AIS_273_sg11-nd-int.fits.gz ...
 [Done]
Downloading URL s3://stpubdata/galex/GR7/pipe/01-vsn/25697-GI5_028097_W1_18085_0274/d/01-main/0007-img/07-try/GI5_028097_W1_18085_0274-nd-int.fits.gz to ./mastDownload/GALEX/3209978155506860032/GI5_028097_W1_18085_0274-nd-int.fits.gz ...
 [Done]
Table length=7
Local PathStatusMessageURL
str80str8objectobject
./mastDownload/GALEX/2436590472420917248/MISDR1_18032_0666-fd-int.fits.gzCOMPLETENoneNone
./mastDownload/GALEX/2436590472420917248/MISDR1_18032_0666-nd-int.fits.gzCOMPLETENoneNone
./mastDownload/GALEX/6380521092288610304/AIS_273_sg03-fd-int.fits.gzCOMPLETENoneNone
./mastDownload/GALEX/6380521092288610304/AIS_273_sg03-nd-int.fits.gzCOMPLETENoneNone
./mastDownload/GALEX/6380521100878544896/AIS_273_sg11-fd-int.fits.gzCOMPLETENoneNone
./mastDownload/GALEX/6380521100878544896/AIS_273_sg11-nd-int.fits.gzCOMPLETENoneNone
./mastDownload/GALEX/3209978155506860032/GI5_028097_W1_18085_0274-nd-int.fits.gzCOMPLETENoneNone

Note that, because you turned on Observations.enable_cloud_dataset earlier, download_products will attempt to download every file from the AWS S3 bucket no matter what you set for cloud_only. However, if you leave cloud_only=False as per the default, astroquery will download a file from MAST’s on-premise server if it can’t find the file in AWS. If you turn on cloud_only=True as above, astroquery will skip downloading any file that it can’t find in AWS.

About this Notebook#

  • Authors: Yingquan Li, Bernie Shao, Adrian Lucy

  • Keywords: GALEX, Pan-STARRS, Bulk Download, Python, AWS

  • Updated On: 2025-04-23

  • References: Missions Mast Search

For support, please contact the Archive HelpDesk at archive@stsci.edu.

Space Telescope Logo\