Large downloads in astroquery.mast from AWS to local storage#
Introduction#
Several MAST datasets are now available from the Registry of Open Data on AWS, a cloud data storage service. These datasets include data from JWST, HST, TESS, Pan-STARRS, GALEX, and Kepler/K2. In this notebook, you’ll learn how to download data in bulk to your local machine’s storage from two large survey missions, GALEX and Pan-STARRS (PS1).
To give some more context on the missions we’ll be focusing on:
Galaxy Evolution Explorer (GALEX) was a NASA mission led by the California Institute of Technology, whose primary goal was to investigate how star formation in galaxies evolved from the early Universe up to the present. GALEX used microchannel plate detectors to obtain direct images in the near-UV (NUV) and far-UV (FUV), and a grism to disperse light for low resolution spectroscopy.
Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) is a system for wide-field astronomical imaging developed and operated by the Institute for Astronomy at the University of Hawaii. Pan-STARRS1 (PS1) is the first part of Pan-STARRS to be completed. The PS1 survey used a 1.8 meter telescope and its 1.4 Gigapixel camera to image the sky in five broadband filters (g, r, i, z, y).
Learning Goals#
In this notebook, you will:
Learn how to download MAST data from our AWS cloud repositories to your local machine’s storage.
Make targeted queries to MAST using parameters such as:
right ascension
,declination
, and more.Filter the resulting products by using parameters such as:
productType
,productSubGroupDescription
,productGroupDescription
,mrp_only
, and more.
Imports#
We only need one import for this notebook!
astroquery.mast.Observations
to access the MAST API
from astroquery.mast import Observations
Using astroquery to query MAST’s multi-mission database#
The
Observations
API fromastroquery.mast
can be used to query MAST’s multi-mission database, an instance of the Common Archive Observation Model housing structured metadata from multiple missions in a unified database, from legacy missions to currently operational missions. In this database, individual data products are organized under “observations”.Note: for certain missions like JWST and HST, there is also a
MastMissions
API fromastroquery.mast
that can be used to query mission-specific metadata beyond that which can be made to conform to the Common Archive Observation Model. This notebook will not demonstrate the capabilities of theMastMissions
API. Refer, instead, to Searching for Mission-Specific Data with Astroquery.
First, let’s turn on access to MAST’s datasets in the AWS cloud. Downloading from AWS is sometimes faster, and is always preferred to lessen the pressure on MAST’s on-premise servers.
Observations.enable_cloud_dataset()
INFO: Using the S3 STScI public dataset [astroquery.mast.cloud]
The Four-Step Data Download Process#
Retrieving MAST data from AWS to your local machine can be performed with the following four-step process:
Step 1: Retrieve observation metadata matching your query criteria
Step 2: Retrieve metadata for the individual data products that comprise those observations
Step 3: (Optional) Filter the data products based on further product-level criteria
Step 4: Download the files from AWS to your local machine
Step 1#
First, let’s retrieve observations in a sky coordinate range of interest. One way to search observations by coordinates is by giving Observations.query_criteria()
a box defining the search area, consisting of two coordinates for the right ascension range and two coordinates for the declination range. You can also supply any number of missions, and various other metadata constraints. In this case, we’ll retrieve both GALEX and Pan-STARRS observations. The output is an astropy
table.
obs = Observations.query_criteria(s_ra=[30.2, 31.2], s_dec=[-10.25, -9.25]
, obs_collection=["GALEX", "PS1"])
print(f'We retrieved {len(obs)} observations.')
We retrieved 317 observations.
If you would like to filter by other parameters, see the other filter parameters below.
# NOTE: Use this line if you want to get app the parameters and the details.
# Observations.get_metadata("observations").pprint(max_lines=-1, max_width=-1)
# NOTE: Use this line just to get the parameters and their names.
Observations.get_metadata('observations')['Column Name'].pprint(max_lines=-1)
Column Name
---------------------
intentType
obs_collection
provenance_name
instrument_name
project
filters
wavelength_region
target_name
target_classification
obs_id
s_ra
s_dec
proposal_id
proposal_pi
obs_title
dataproduct_type
calib_level
t_min
t_max
t_obs_release
t_exptime
em_min
em_max
objID
s_region
jpegURL
distance
obsid
dataRights
mtFlag
srcDen
dataURL
proposal_type
sequence_number
Step 2#
Now, we can retrieve the individual data products organized under those observations.
prod = Observations.get_product_list(obs)
print(f'We retrieved {len(prod)} data products.')
# prod is another astropy table
We retrieved 6149 data products.
Step 3#
Now we can use Observations.filter_products()
to filter for specific data products. This function can filter on obs_collection
(mission), productType
,productSubGroupDescription
, productGroupDescription
, and mrp_only
, among numerous other parameters described on the product field descriptions page.
mrp_only
means to request only data products identified by MAST as the main “Minimum Recommended” products in each observation. For example, in Pan-STARRS (PS1), limiting your results to MRP products excludes the individual-epoch warp images, and various other ancillary files.
For GALEX, the range of possible values for these parameters include:
productType: AUXILIARY, CATALOG, INFO, PREVIEW, SCIENCE, THUMBNAIL
productSubGroupDescription: Catalog Only, Imaging Only, Spectra Only, Spectral Image Strips Only, Whole Field Images Only
productGroupDescription: Minimum Recommended Products
mrp_only: True, False.
For Pan-STARRS (PS1), the range of possible values for these parameters include:
productType: AUXILIARY, CATALOG, INFO, SCIENCE
productSubGroupDescription: N/A
productGroupDescription: Minimum Recommended Products
mrp_only: True, False
Note that productSubGroupDescription and productGroupDescription are generally not needed when filtering for Pan-STARRS products.
Let’s start with GALEX:
filt_prod_galex = Observations.filter_products(
prod,
obs_collection="GALEX",
productType="SCIENCE",
productSubGroupDescription="Imaging Only",
productGroupDescription="Minimum Recommended Products",
mrp_only=True
)
# Shows how many files are left after applying the filter.
print(f'We are left with {len(filt_prod_galex)} files.')
# Shows the filtered table of data products.
display(filt_prod_galex)
We are left with 7 files.
obsID | obs_collection | dataproduct_type | obs_id | description | type | dataURI | productType | productGroupDescription | productSubGroupDescription | productDocumentationURL | project | prvversion | proposal_id | productFilename | size | parent_obsid | dataRights | calib_level | filters |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
str7 | str5 | str8 | str43 | str139 | str1 | str166 | str9 | str28 | str56 | str1 | str3 | str3 | str3 | str68 | int64 | str7 | str10 | int64 | str3 |
665 | GALEX | image | 2436590472420917248 | Intensity map (J2000) | C | mast:GALEX/url/data/GR6/pipe/01-vsn/03716-MISDR1_18032_0666/d/01-main/0001-img/07-try/MISDR1_18032_0666-fd-int.fits.gz | SCIENCE | Minimum Recommended Products | Imaging Only | -- | MIS | -- | -- | MISDR1_18032_0666-fd-int.fits.gz | 9559896 | 665 | PUBLIC | 2 | FUV |
665 | GALEX | image | 2436590472420917248 | Intensity map (J2000) | C | mast:GALEX/url/data/GR6/pipe/01-vsn/03716-MISDR1_18032_0666/d/01-main/0001-img/07-try/MISDR1_18032_0666-nd-int.fits.gz | SCIENCE | Minimum Recommended Products | Imaging Only | -- | MIS | -- | -- | MISDR1_18032_0666-nd-int.fits.gz | 17603914 | 665 | PUBLIC | 2 | NUV |
4923 | GALEX | image | 3209978155506860032 | Intensity map (J2000) | C | mast:GALEX/url/data/GR7/pipe/01-vsn/25697-GI5_028097_W1_18085_0274/d/01-main/0007-img/07-try/GI5_028097_W1_18085_0274-nd-int.fits.gz | SCIENCE | Minimum Recommended Products | Imaging Only | -- | GII | -- | 177 | GI5_028097_W1_18085_0274-nd-int.fits.gz | 17139488 | 4923 | PUBLIC | 2 | NUV |
29153 | GALEX | image | 6380521092288610304 | Intensity map (J2000) | C | mast:GALEX/url/data/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg03-fd-int.fits.gz | SCIENCE | Minimum Recommended Products | Imaging Only | -- | AIS | -- | -- | AIS_273_sg03-fd-int.fits.gz | 1455226 | 29153 | PUBLIC | 2 | FUV |
29153 | GALEX | image | 6380521092288610304 | Intensity map (J2000) | C | mast:GALEX/url/data/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg03-nd-int.fits.gz | SCIENCE | Minimum Recommended Products | Imaging Only | -- | AIS | -- | -- | AIS_273_sg03-nd-int.fits.gz | 7683367 | 29153 | PUBLIC | 2 | NUV |
29161 | GALEX | image | 6380521100878544896 | Intensity map (J2000) | C | mast:GALEX/url/data/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg11-fd-int.fits.gz | SCIENCE | Minimum Recommended Products | Imaging Only | -- | AIS | -- | -- | AIS_273_sg11-fd-int.fits.gz | 1234059 | 29161 | PUBLIC | 2 | FUV |
29161 | GALEX | image | 6380521100878544896 | Intensity map (J2000) | C | mast:GALEX/url/data/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg11-nd-int.fits.gz | SCIENCE | Minimum Recommended Products | Imaging Only | -- | AIS | -- | -- | AIS_273_sg11-nd-int.fits.gz | 6783917 | 29161 | PUBLIC | 2 | NUV |
And now let’s try Pan-STARRS:
filt_prod_ps1 = Observations.filter_products(
prod,
obs_collection="PS1",
mrp_only=True
)
# Show how many files are left after applying the filter.
print(f'We are left with {len(filt_prod_ps1)} files.')
# Show the first few rows of the filtered table of data products.
display(filt_prod_ps1[0:3])
We are left with 30 files.
obsID | obs_collection | dataproduct_type | obs_id | description | type | dataURI | productType | productGroupDescription | productSubGroupDescription | productDocumentationURL | project | prvversion | proposal_id | productFilename | size | parent_obsid | dataRights | calib_level | filters |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
str7 | str5 | str8 | str43 | str139 | str1 | str166 | str9 | str28 | str56 | str1 | str3 | str3 | str3 | str68 | int64 | str7 | str10 | int64 | str3 |
1972016 | PS1 | image | rings.v3.skycell.1062.048.stk.g | stack data image | C | mast:PS1/product/rings.v3.skycell.1062.048.stk.g.unconv.fits | SCIENCE | Minimum Recommended Products | -- | -- | 3PI | pv3 | -- | rings.v3.skycell.1062.048.stk.g.unconv.fits | 66795840 | 1972016 | PUBLIC | 3 | g |
1972017 | PS1 | image | rings.v3.skycell.1062.048.stk.i | stack data image | C | mast:PS1/product/rings.v3.skycell.1062.048.stk.i.unconv.fits | SCIENCE | Minimum Recommended Products | -- | -- | 3PI | pv3 | -- | rings.v3.skycell.1062.048.stk.i.unconv.fits | 65960640 | 1972017 | PUBLIC | 3 | i |
1972018 | PS1 | image | rings.v3.skycell.1062.048.stk.r | stack data image | C | mast:PS1/product/rings.v3.skycell.1062.048.stk.r.unconv.fits | SCIENCE | Minimum Recommended Products | -- | -- | 3PI | pv3 | -- | rings.v3.skycell.1062.048.stk.r.unconv.fits | 67515840 | 1972018 | PUBLIC | 3 | r |
Step 4#
Download the files you need to your local computer using Observations.download_products
! For example, for GALEX:
Observations.download_products(filt_prod_galex, cloud_only=True)
Downloading URL s3://stpubdata/galex/GR6/pipe/01-vsn/03716-MISDR1_18032_0666/d/01-main/0001-img/07-try/MISDR1_18032_0666-fd-int.fits.gz to ./mastDownload/GALEX/2436590472420917248/MISDR1_18032_0666-fd-int.fits.gz ...
[Done]
Downloading URL s3://stpubdata/galex/GR6/pipe/01-vsn/03716-MISDR1_18032_0666/d/01-main/0001-img/07-try/MISDR1_18032_0666-nd-int.fits.gz to ./mastDownload/GALEX/2436590472420917248/MISDR1_18032_0666-nd-int.fits.gz ...
[Done]
Downloading URL s3://stpubdata/galex/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg03-fd-int.fits.gz to ./mastDownload/GALEX/6380521092288610304/AIS_273_sg03-fd-int.fits.gz ...
[Done]
Downloading URL s3://stpubdata/galex/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg03-nd-int.fits.gz to ./mastDownload/GALEX/6380521092288610304/AIS_273_sg03-nd-int.fits.gz ...
[Done]
Downloading URL s3://stpubdata/galex/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg11-fd-int.fits.gz to ./mastDownload/GALEX/6380521100878544896/AIS_273_sg11-fd-int.fits.gz ...
[Done]
Downloading URL s3://stpubdata/galex/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg11-nd-int.fits.gz to ./mastDownload/GALEX/6380521100878544896/AIS_273_sg11-nd-int.fits.gz ...
[Done]
Downloading URL s3://stpubdata/galex/GR7/pipe/01-vsn/25697-GI5_028097_W1_18085_0274/d/01-main/0007-img/07-try/GI5_028097_W1_18085_0274-nd-int.fits.gz to ./mastDownload/GALEX/3209978155506860032/GI5_028097_W1_18085_0274-nd-int.fits.gz ...
[Done]
Local Path | Status | Message | URL |
---|---|---|---|
str80 | str8 | object | object |
./mastDownload/GALEX/2436590472420917248/MISDR1_18032_0666-fd-int.fits.gz | COMPLETE | None | None |
./mastDownload/GALEX/2436590472420917248/MISDR1_18032_0666-nd-int.fits.gz | COMPLETE | None | None |
./mastDownload/GALEX/6380521092288610304/AIS_273_sg03-fd-int.fits.gz | COMPLETE | None | None |
./mastDownload/GALEX/6380521092288610304/AIS_273_sg03-nd-int.fits.gz | COMPLETE | None | None |
./mastDownload/GALEX/6380521100878544896/AIS_273_sg11-fd-int.fits.gz | COMPLETE | None | None |
./mastDownload/GALEX/6380521100878544896/AIS_273_sg11-nd-int.fits.gz | COMPLETE | None | None |
./mastDownload/GALEX/3209978155506860032/GI5_028097_W1_18085_0274-nd-int.fits.gz | COMPLETE | None | None |
Note that, because you turned on Observations.enable_cloud_dataset
earlier, download_products
will attempt to download every file from the AWS S3 bucket no matter what you set for cloud_only
. However, if you leave cloud_only=False
as per the default, astroquery will download a file from MAST’s on-premise server if it can’t find the file in AWS. If you turn on cloud_only=True
as above, astroquery will skip downloading any file that it can’t find in AWS.
About this Notebook#
Authors: Yingquan Li, Bernie Shao, Adrian Lucy
Keywords: GALEX, Pan-STARRS, Bulk Download, Python, AWS
Updated On: 2025-04-23
References: Missions Mast Search
For support, please contact the Archive HelpDesk at archive@stsci.edu.
