Inputting User Data using the Hubble Advanced Spectral Products Script#

This Notebook is designed to walk you through downloading and using the Hubble Advanced Spectral Products (HASP) co-add script.#

Learning Goals:#

By the end of this tutorial, you will learn how to:

  • Setup a conda environment

  • Download the HASP wrapper script

  • Use astroquery.mast to download COS and STIS data

  • Run the co-add script

  • Examine the co-added output

  • Run the scipt with multiple threshold values and examine output

Table of Contents#

0. Introduction

1. Downloading HST Spectroscopic Data

- 1.1 Using astroquery to Download STIS Data

2. Running the Co-add Script

3. Working with Co-added Data Products

- 3.1 Inspecting the Output Files

- 3.2 Viewing Co-added STIS Data

- 3.3 Putting it all together with COS Data

4. Downloading HASP Data Products

- 4.1 Downloading Data Products using astroquery

5. Changing the Threshold Flag

- 5.1 Using Astroquery to Download Additional Data

- 5.2 Running the Co-add Script with Multiple Threshold Values

- 5.3 Analyzing the Co-added Spectra of Different Threshold Values

0. Introduction#

The Hubble Advanced Spectral Products (HASP) code is a script that co-adds spectra of the same target within programs. This software is able to co-add data taken with the spectrographs onboard the Hubble Space Telescope (HST); the Space Telescope Imaging Spectrograph (STIS) and the Cosmic Origins Spectrograph (COS). The Hubble Spectroscopic Legacy Archive (HSLA) uses this script to co-add these instruments’ data from The Mikulski Archive for Space Telescopes (MAST) to create high-quality spectra with a broad wavelength coverage (whenever possible from the ultraviolet to the near-infrared) that is publicly available for the scientific community. These custom co-addition notebooks will instruct users on how to produce their own co-adds in cases where the MAST archive data needs special processing or is rejected by the default filters used in the co-add script.

The script first co-adds the observations for each grating for a given program, then it combines all gratings for the observation set. Finally, it co-adds the spectra of each observation set in the program to produce a fully co-added spectra for each target in a program. Please check out the COS 2024-01 ISR for more information about HASP.

This notebook will show users how to download data from MAST, run the co-add script, understand the output files and inspect the abutted data by plotting flux as a function of wavelength. It will also show users how to change the flux threshold flag.

Please check out our Setup.ipynb notebook before running this tutorial to learn how to install and run the co-add code.

Imports#

We will be using multiple libraries to retrieve and analyze data. We will use:

  • Path.pathlib to create product and data directories

  • astroquery.mast Observations to download COS and STIS data

  • shutil to perform directory and file operations

  • os to interact with the operating system

  • astropy.io fits to work with FITS files

  • matplotlib.pyplot to plot abutted spectra

  • glob to work with multiple files in our directories

  • subprocesses to run our script in the notebook with varying threshold flag values

  • numpy to help analyze our data

  • scipy.interpolate interp1d to interpolate our data

We recommend creating a HASP-specific conda environment when co-adding spectra. You can checkout our Setup.ipynb notebook to create such an environment. Alternatively, you can also download the required dependencies to run this notebook with the terminal command:

pip install -r requirements.txt

This will download the dependencies that are necessary to run this current notebook. Let’s import all of our packages that we will use in this notebook and print our conda environment by running the next cell:

import os
from pathlib import Path
from astroquery.mast import Observations
import shutil
import glob as glob
from astropy.io import fits
import matplotlib.pyplot as plt
import subprocess
import numpy as np
from scipy.interpolate import interp1d

print("Currently active conda environment:", os.environ.get("CONDA_PREFIX"))

To do our tutorial, we will create data folders that will contain downloaded data from MAST (one for the STIS and the another for COS). We will also create products folders to contain the HASP script output, a.k.a the co-added spectra. We will have one folder for STIS and another for COS too.

# Creating the data download directories for COS and STIS
stis_data_dir = Path("./stis_data/")
cos_data_dir = Path("./cos_data/")

# Creating the products directory to hold the output
stis_products_dir = Path("./stis_products/")
cos_products_dir = Path("./cos_products/")

# If the directory doesn't exist, then create it
stis_data_dir.mkdir(exist_ok=True)
cos_data_dir.mkdir(exist_ok=True)

stis_products_dir.mkdir(exist_ok=True)
cos_products_dir.mkdir(exist_ok=True)

1. Downloading HST Spectroscopic Data#

Now that we have a conda environment created and the co-add code downloaded, we can start downloading data using Observations class from the Python package astroquery.mast. Here we will download two datasets, one taken with STIS and the other taken with COS.

1.1 Using Astroquery to Download STIS Data#

We will be downloading STIS data for the white dwarf GD71; this object is a well-known primary white dwarf standard. We will specifically download data from Program 7656, which has observations of GD71 using the gratings G230L and G140L.

We can start with querying the MAST database for the STIS program’s data. This will give us a list of all observations for the program.

# Querying the data on MAST for our target
gd71_query = Observations.query_criteria(
    proposal_id=7656,
    target_name="GD71",
    dataproduct_type="SPECTRUM",
    provenance_name="CALSTIS"
)

Now that we have queried the observations for Program 7656, we can get a list that contains the data products for these observations:

# Getting a product list for our query
gd71_products = Observations.get_product_list(
    gd71_query
)

Let’s print out this list to see the associated data files:

print(gd71_products["productFilename"])

As you can see, we have a very long list of different type of data products for our program. Luckily, we don’t need all of these files to run the wrapper. We only need to download the following COS and/or STIS files:

  • X1D - the one-dimensional extracted product spectra.

  • SX1 - the one-dimensional extracted spectra from combined or cosmic-ray rejected images. This file is only produced with STIS data.

We will specify that we want to download only these files with the productSubGroupDescription parameter. We will also specify the directory that will contain the downloaded data products. Below, we download the STIS files for the progam.

Observations.download_products(
    gd71_products,
    download_dir=str(stis_data_dir),
    productSubGroupDescription=["X1D", "SX1"]
)

When we downloaded the data using astroquery, it created a directory ./stis_data/mastDownload/HST, with separate folders for each different dataset ID. The script will need all of our newly downloaded data product files in a single directory, so we must move all STIS files to our ./stis_data directory. We will create a function to consolidate our data, since we will be utilizing it for two additional datasets later on in the tutorial.

def consolidate_files(data_path):
    '''
    Consolidate all files to single directory; necessary for HASP script run.
    ---------------
    Input:
    str data_path : ./mastDownload/HST folders paths; files to be moved here
    ---------------
    Output:
    None. Files moved to data_path. ./mastDownload/HST directory is deleted.
    '''
    # The path to all obs_id folders
    mast_path = f"{data_path}/mastDownload/HST/"

    try:
        # Check if mastDownload exists
        if not os.path.exists(mast_path):
            print(f"Directory {mast_path} doesn't exist.")
            return

        # Get a list of the obs_id paths in mastDownload
        obs_id_dirs = os.listdir(mast_path)

        # Iterate through each obs_id folder and move the files
        for obs_id in obs_id_dirs:
            obs_id_path = os.path.join(mast_path, obs_id)
            files = glob.glob(obs_id_path + "/*fits")

            for file in files:
                file_path = Path(file)
                new_path = data_path / file_path.name
                shutil.move(file, new_path)

        # Now we can remove the mastDownload directory
        if os.path.exists(mast_path):
            shutil.rmtree(f"{data_path}/mastDownload")

    except Exception as e:
        print(f"An error occurred: {e}")

Now, using the function to move our STIS files to a single directory:

consolidate_files(stis_data_dir)

Now we can run the co-add script!

2. Running the Co-add Script#

Now that we’ve downloaded the GD71 STIS data, we can run the co-add script. Currently, the co-add code abuts spectra for a single program. Run the script by using the next cell’s command.

Note: Make sure that you are in the hasp-env conda environment that we created at the beginning of the Setup.ipynb notebook.

The -i parameter is the input directory (i.e, where the FITS files are located). -o is the directory that will contain the newly created co-added products. Note that if you want to exclude certain data files from the co-add, you can just remove them from the input directory. There is more information about this (and the other flags) in our Setup.ipynb notebook.

!swrapper -i ./stis_data -o ./stis_products

We have now created the co-added products for Program 7656 using the wrapper!

3. Working with Co-added Data Products#

With the newly co-added files in the ./stis_products/output directory, we can begin to inspect the data.

3.1 Understanding the Output Files#

The following information about the output file naming conventions is in our Setup.ipynb notebook. If you’re already familiar with this, you can skip to Section 4.2, where we view our co-added spectra.

Let’s look at the ./stis_products/output directory to look at the newly abutted spectra. Currently, the script outputs abutted products for a single program.

The script produces multiple different files with abutted spectra. Currently, the script outputs abutted products for a single program. It first creates co-added spectra for each grating of a single observation set:

hst_programID_instrument_targetname_grating_obset_cspec.fits

It then co-adds the spectra of all gratings for a single observation set:

hst_programID_instrument_targetname_allGratings_obset_cspec.fits

Finally, it co-adds all abutted observation sets’ spectra to create a final co-added product for a single target:

hst_programID_instrument_targetname_allGratings_cspec.fits

An example of this is below. These filenames are the output files for our STIS GD71 dataset that is co-added in this notebook. Here, the programID is 7656, the instrument is STIS, and the targetname is gd71.

Step

Filename

Description

1

hst_7656_stis_gd71_g140l_o4a520_cspec.fits

Co-adding all G140L observations for the observation set, O4A520.

2

hst_7656_stis_gd71_g140l-g230l-g430l-g750l_o4a520_cspec.fits

Co-adding all observations taken at every grating for the observation set, O4A520.

3

hst_7656_stis_gd71_g140l-g230l-g430l-g750l_o4a5_cspec.fits

Co-adding all GD71 observations at each grating for this program, O4A5.

Note: HST file naming conventions use a combination of three letters and/or numbers to have a unique association between a PI’s proposal ID and program ID, meaning that o4a5 at the end of hst_7656_stis_gd71_g140l-g230l-g430l-g750l_o4a5_cspec.fits is essentially the program ID for our example. Check out more information on the MAST HST file naming convention page

3.2 Viewing the Co-added Data#

Let’s take a look at the co-added spectra that we just created. We will create a plot of flux as a function of wavelength using matplotlib.pyplot.

With the current version of the HASP script, the fully abutted filename should be:

hst_7656_stis_gd71_sg140l-sg230l-g430l-g750l_o4a5_cspec.fits

Double check your products folder to make sure the name of your fully co-added spectra is the same as above, otherwise the subsequent cells in this notebook will not run since the pathname won’t exist. Update and run the cell below with the full co-add filename.

stis_coadd_filename = "hst_7656_stis_gd71_sg140l-sg230l-g430l-g750l_o4a5_cspec.fits"
with fits.open(f"./{stis_products_dir}/{stis_coadd_filename}") as hdul:
    # Getting the file's data
    gd71_data = hdul[1].data

    # Getting the wavelength and flux data for the abutted file
    wavelength = gd71_data["WAVELENGTH"]
    flux = gd71_data["FLUX"]

    plt.figure(figsize=(12, 6))

    # Plotting the spectra
    plt.scatter(wavelength, flux,
                # Setting the size of the data points
                s=1)

    # Formatting the plot by adding titles
    plt.title("STIS GD71, abutted")
    plt.xlabel(r'Wavelength [$\AA$]')
    plt.ylabel(r'Flux [$erg\ s^{-1}\ cm^{-2}\ \AA^{-1}$]')

    # Saving the figure to the ./stis_products_dir
    plt.savefig(f"{stis_products_dir}/gd71_stis.png")

    # Showing the plot below
    plt.show()

3.3 Putting it all together with COS data#

Let’s combine all of the STIS work from above and do another example, but this time using COS data instead of STIS. We will use the same target, GD71, and download the data from Program 11479. This has observations of GD71 using the gratings G230L, G185M, G225M, and G285M.

# Querying the COS data in MAST and getting the product list
gd71_products = Observations.get_product_list(
    Observations.query_criteria(
        proposal_id=11479,
        target_name="GD71",
        dataproduct_type="SPECTRUM"
    )
)

# Downloading the data to the ./cos_data directory
Observations.download_products(
    gd71_products,
    download_dir=str(cos_data_dir),
    productSubGroupDescription=["X1D", "SX1"]
)

# Consolidating all of our files to a single directory
consolidate_files(cos_data_dir)

Now we run the wrapper script:

!swrapper -i ./cos_data -o ./cos_products

Similar to our STIS example, ensure that the filename is updated to the current HASP version by running and/or updating the next cell, then plot the newly abutted COS spectra:

cos_coadd_filename = "hst_11479_cos_gd71_cg230l-g185m-g225m-g285m_laad_cspec.fits"
with fits.open(f"./{cos_products_dir}/{cos_coadd_filename}") as hdul:
    # Getting the file's data
    gd71_data = hdul[1].data

    # Getting the wavelength and flux data for the abutted file
    wavelength = gd71_data["WAVELENGTH"]
    flux = gd71_data["FLUX"]

    plt.figure(figsize=(12, 6))

    # Plotting the spectra
    plt.scatter(wavelength, flux,
                s=1)

    # Formatting the plot
    plt.title("COS GD71, abutted")
    plt.xlabel(r'Wavelength [$\AA$]')
    plt.ylabel(r'Flux [$erg\ s^{-1}\ cm^{-2}\ \AA^{-1}$]')

    # Saving the figure to ./cos_products
    plt.savefig(f"{cos_products_dir}/gd71_cos.png")

    # Showing the figure below
    plt.show()

We have now created and plotted the abutted COS spectra. Just for fun, let’s compare the abutted STIS and COS products!

# Getting the COS and STIS data
stis_hdul = fits.open(f"{stis_products_dir}/{stis_coadd_filename}")
stis_data = stis_hdul[1].data

cos_hdul = fits.open(f"{cos_products_dir}/{cos_coadd_filename}")
cos_data = cos_hdul[1].data

# Getting the flux and wavelengths for both files
stis_wavelength = stis_data["WAVELENGTH"]
stis_flux = stis_data["FLUX"]

cos_wavelength = cos_data["WAVELENGTH"]
cos_flux = cos_data["FLUX"]

plt.figure(figsize=(12, 6))

# Plotting datapoints for STIS
plt.scatter(stis_wavelength, stis_flux,
            # Setting the size of the datapoints
            s=1,
            # Setting the color for the STIS datapoints
            color="red",
            # Adding a label for the legend
            label="STIS")

# Plotting the datapoints for COS
plt.scatter(cos_wavelength, cos_flux,
            s=1,
            color="blue",
            label="COS")

# Formatting the plot by adding labels
plt.title("STIS & COS GD71, abutted")
plt.xlabel(r'Wavelength [$\AA$]')
plt.ylabel(r'Flux [$erg\ s^{-1}\ cm^{-2}\ \AA^{-1}$]')

# Adding a legend to the plot
plt.legend()

# Saving the figure to our general directory
plt.savefig("./gd71_cos_stis.png")

# Showing the plot below
plt.show()

stis_hdul.close()
cos_hdul.close()

4. Downloading HASP Data Products#

As of January 2024, the HASP data products are available for download via astroquery. The products will also be available for query and download on the official HASP web search form (still a work in progress at the time of this notebook’s publishing).

Users can download coadded data themselves; we will show an example of this with a STIS coadd.

4.1 Downloading Data Products using astroquery#

We can use astroquery.mast’s Observations module to download data products, similar to how we downloaded our STIS and COS data in Section 1.1 and Section 3.3, respectively. Let’s download GD71 coadds for the STIS example that we created in the previous section.

stis_query = Observations.query_criteria(
    proposal_id=7656,
    target_name="GD71",
    dataproduct_type="SPECTRUM",
)

display(stis_query)

We can see that our query returned a total of 11 datasets. We can see our HASP coadd is at the bottom of the table, with the value for project being HASP instead of HST. We will add these criteria to our query to isolate our product.

stis_query = Observations.query_criteria(
    proposal_id=7656,
    target_name="GD71",
    dataproduct_type="SPECTRUM",
    project="HASP"
)

display(stis_query)

Now, let’s see our product list:

stis_prodlist = Observations.get_product_list(
    stis_query
)

# Print number of product files that can be downloaded
print(f"Number of files: {len(stis_prodlist)}\n")

# Printing the entire product list
stis_prodlist.pprint_include_names = ("productFilename", "obs_id", "description", "productType", "productSubGroupDescription", "proposal_id")

display(stis_prodlist)

We have 68 files that we can download for our dataset! There are many different file types, auxiliary files such as support (SPT), jitter (JIT), wavecal exposures (WAV), previews of the spectra in .PNG format, and our science files (to name a few). Note that the files with a productSubGroupDescription value not equal to CSPEC are files that are used by CALSTIS during data calibration (meaning they are not produced by the HASP coadd script). The science files consist of the SX1 files which we downloaded when running the coadd script, and the actual coadds themselves (both intermediate and final). Let’s just download the Minimum Recommended Product files after we create directories to store them by setting mrp_only = True.

Note that the minimum recommended product filetypes are different depending on if you are downloading a HASP dataset or a non-HASP dataset, e.g. HASP will not include the SX1 files but downloading a science dataset would.

# To store our astroquery example STIS and COS data
stis_astroquery = Path("./stis_astroquery/")
stis_astroquery.mkdir(exist_ok=True)
# Downloading the products to our new directory
Observations.download_products(
    stis_prodlist,
    download_dir=str(stis_astroquery),
    mrp_only=True
)
# Putting all files to the stis_astroquery directory, rather than MAST's nested directories
consolidate_files(stis_astroquery)

We’ve downloaded the HASP products, so let’s plot them now:

%matplotlib widget

# The name of the full coadd (same filename that we have in Section 3.2)
stis_coadd_filename = "hst_7656_stis_gd71_sg140l-sg230l-g430l-g750l_o4a5_cspec.fits"
stis_coadd_path = f"./{stis_astroquery}/{stis_coadd_filename}"

full_coadd_hdul = fits.open(stis_coadd_path)

# List of coadd files for a single grating (ignoring the full coadd above)
grat_coadds = [f for f in glob.glob(f"./{stis_astroquery}/*_o4a5_*") if f != stis_coadd_path]

# Starting to plot
plt.figure(figsize=(12, 6))

# Plotting the coadd for each grating
for file in grat_coadds:
    file_hdul = fits.open(file)
    data = file_hdul[1].data
    wl = data["WAVELENGTH"].flatten()
    flux = data["FLUX"].flatten()

    plt.plot(wl, flux,
             lw=2,
             label=file.split("/")[-1])
    
    file_hdul.close()
    
# Plotting full coadd over the plot
data = full_coadd_hdul[1].data

wl = data["WAVELENGTH"].flatten()
flux = data["FLUX"].flatten()

plt.plot(wl, flux,
         lw=1,
         color="black",
         alpha=0.6,
         label="Full Coadd")

# Formatting the plot
plt.title("STIS GD71 Coadd from Astroquery")
plt.xlabel(r'Wavelength [$\AA$]')
plt.ylabel(r'Flux [$erg\ s^{-1}\ cm^{-2}\ \AA^{-1}$]')

plt.legend()

# Setting ylimit to better show spectral features
plt.ylim(0, 3e-12)

plt.show()

full_coadd_hdul.close()
# Closing the plot to save memory
plt.close()

Click the rectangular box on the left hand side of the plot to zoom into specific portions of the plot – you will be able to see each grating’s caodd against the final full coadd. Cool!

5. Changing the Threshold Flag#

The flux threshold flag is an optional flag that can be used to change the number of files that are co-added based on the flux of each spectra. When determining which files to abut, the script first creates a general co-add using all of the input files, and then iterates through each file to calculate the scaled median deviation. The script rejects a file if:

\[ \left\langle{\frac{F_{x1d}(i) - F_{coadd(i)}}{\sigma_{x1d}(i)}}\right\rangle{} < \frac{C_{thresh}}{\sqrt{N_{pix}}} \]

In the equation, the median deviation from the co-add is \(F_{x1d}(i) - F_{coadd(i)}\), the uncertainty per wavelength bin of the input spectrum is \(\sigma_{x1d}(i)\) and \(N_{pix}\) is the number of wavelength bins for a given mode. Essentially, the scaled median deviation is the dispersion that quantifies the data quality (DQ) flag pixel spread for a spectrum that is less sensitive to outliers and size variations. The script will reject all files containing segments with scaled median deviation less than (i.e. more negative than) the threshold (the default is -50). For example, raising the threshold to -25 will abut less files than the default threshold of -50. Likewise, lowering the threshold to -100 will include more files in the dataset to be co-added.

An important thing to note about changing the threshold is that including more files increases the SNR of your co-added spectra but decreases the flux accuracy. The threshold value may be altered if a user’s science case is not dependent on the accuracy of a dataset’s absolute flux. For example, if you have a large dataset of a bright target (i.e. a standard star), then you can lower the threshold to include more files in the abutment since the spread of data quality between each file is minimal. Conversely, if you have a dataset of a dim target, or a dataset with many poor observations, then you will want to raise the threshold to only include the best data out of the dataset.

More information about the flux checking algorithm can be found in Section 2.3.2 of the ISR COS 2024-01.

5.1 Downloading the Data#

We will first download an additional dataset to illustrate how the co-add code changes abutment when the threshold flag is changed. We will download data for the spectroscopic binary, BD+17D4708 from Program 9631. We will run the script with three different threshold values: t = -1, -2, -50.

First we will query and download the data using Observations from astroquery.mast:

# Querying out data from MAST
bd_query = query = Observations.query_criteria(
            proposal_id=9631,
            target_name="BD+17D4708",
            dataproduct_type="SPECTRUM",
            provenance_name=["CALCOS", "CALSTIS"]
        )

print(f"The number of datasets queried is {str(len(bd_query))} datasets. \
      The query list is printed below:")
print(bd_query)

Let’s create a new directory and download our data.

# Creating the data download directories
bd_data_dir = Path("./bd_data/")
bd_products_dir = Path("./bd_products/")

# If the directory doesn't exist, then create it
bd_data_dir.mkdir(exist_ok=True)
bd_products_dir.mkdir(exist_ok=True)

Now we get the product list and download our data to our directory ./bd_products.

# Getting the product list from our query
bd_prod = Observations.get_product_list(
            bd_query
        )

# Downloading the products, but only the necessary X1D and SX1 files
Observations.download_products(
            bd_prod,
            download_dir=str(bd_data_dir),
            productSubGroupDescription=["X1D", "SX1"]
        )

5.2 Running the Co-add Script with Different Threshold Values#

Let’s first organize our data so that all data files are in a single directory instead of separate observation set directories.

consolidate_files(bd_data_dir)

Now that we’ve downloaded our data and organized it to be properly run by the script, we can run the code using different threshold values. Below is a function that runs the script using inputted threshold values, which will allow us to run the code in a single cell rather than multiple.

def run_script(indir, outdir, thresh):
    '''
    Run the hasp wrapper script with different thresholds.
    -------------------
    Input:
    str indir : Directory that holds all un-abutted data.
    str outdir : Directory that will store output folders and files.
    list thresh : List of threshold values used for script.
    --------------------
    Output:
    Wrapper script is run using different threshold values.
    A folder will be created for each value used, containing abutted products.
    '''
    # Looping through thresh vals to run through script
    for val in thresh:
        # Creating a folder for each thresh val
        output = os.path.join(outdir, "out" + str(val))
        if not os.path.exists(output):
            os.mkdir(output)

        print(f"Running script where threshold = {str(val)}")

        # Specifying the arguments for the scipt
        arguments = ["-i", indir, "-o", output, "-t", val]

        # Creating a list composed of the script commands and arguments
        command = ["python", "-m", "hasp.wrapper"] + arguments

        # Running the script with the arguments above
        subprocess.run(command, check=True)

Let’s create a variable called thresholds that will consist of the different threshold values that we will use to help us visualize the relationship between number of files co-added and the final co-added product. We will run the script using each of these values.

Note: feel free to run the cells with more threshold values!

# Remember that the default value is t = -50
thresholds = ["-1", "-2", "-50"]

Let’s run the script on our data now using the run_script function, using the thresholds defined above.

run_script(bd_data_dir, bd_products_dir, thresholds)

5.3 Analyzing the Different Co-added Spectra of Different Threshold Values#

Now that we’ve run the script on our dataset using differnet threshold values, we can start to analyze the differences between them.

The run_script function created multiple folders for the abutted products; one folder for each threshold value (ex. ./out-50). In each folder is the fully co-added spectra. At the time of this notebook, the fully co-added filename is:

hst_9631_stis_bdp17d4708_g230lb-g430l-g750l_o8h1_cspec.fits

Before you continue, make sure that the fully co-added filename is valid, as the filenaming structure may have changed since this notebook was published. Update and run the cell below.

bd_coadd_name = "hst_9631_stis_bdp17d4708_g230lb-g430l-g750l_o8h1_cspec.fits"

We can see how many files were used in the co-add by running next the cell.

# Iterating through the threshold values
for thresh in thresholds:
    # Path to the co-added file
    path_to_coadd = f"{bd_products_dir}/out{thresh}/{bd_coadd_name}"
    coadd_hdul = fits.open(path_to_coadd)

    # Opening the file to check num files, and printing
    numfiles = len(coadd_hdul[2].data["FILENAME"])

    # Printing the number of files for the given threshold value
    print(f"The number of files co-added for t = {thresh} is {numfiles} files")

    coadd_hdul.close()

As we can see, as we increase the threshold value (t = -50 -> t = -1) we went from 12 files being used to only 6. Since we are using more data with the lower threshold (t = -50), we should see an increase in SNR.

Looking at SNR:#

We’ll plot the SNR of both coadds below.

%matplotlib inline
# A list of the the non-default thresholds used.
nondefault_thresholds = [thresh for thresh in thresholds if thresh != "-50"]

# List of colors to be used when plotting
colors = ['blue',
          'red']

# Creating our figure: 1 column and len(thresholds) rows
figure, ax = plt.subplots(len(nondefault_thresholds), 1,
                          figsize=(10, 9))

# Iterating through threshold values to compare their SNR to default output
for i, thresh in enumerate(nondefault_thresholds):

    # The SNR and wavelength data for the default threshold co-add output
    default_path = f"{bd_products_dir}/out-50/{bd_coadd_name}"
    default_hdul = fits.open(default_path)
    data50 = default_hdul[1].data
    wavelength50 = data50["WAVELENGTH"]
    snr50 = data50["SNR"]

    # Getting wavelength and SNR for the current iteration's threshold value
    curr_path = f"{bd_products_dir}/out{thresh}/{bd_coadd_name}"
    curr_hdul = fits.open(curr_path)
    data = curr_hdul[1].data
    wavelength = data["WAVELENGTH"]
    snr = data["SNR"]

    # This is the number of files used in the particular co-add
    numfiles = len(fits.open(curr_path)[2].data["FILENAME"])
    numfiles_default = len(fits.open(default_path)[2].data["FILENAME"])

    ax[i].scatter(wavelength50, snr50,
                  label=f"t = -50, {numfiles_default} files co-added",
                  s=1,
                  color="black")

    ax[i].scatter(wavelength, snr,
                  label=f"t = {thresh}, {numfiles} files co-added",
                  color=colors[i],
                  s=1,
                  alpha=0.4)

    ax[i].legend()

    # Highlighting the region of the plot with the biggest SNR change
    ax[i].axvspan(3050, 5700,
                  alpha=0.3,
                  color="grey")

    # Adding formatting
    ax[i].set_title(f"SNR vs Wavelength for t = {thresh}, -50",
                    weight="bold")
    ax[i].set_xlabel(r'Wavelength [$\AA$]')
    ax[i].set_ylabel("SNR")

    default_hdul.close()
    curr_hdul.close()

figure.tight_layout()

# Saving the figure to ./bd_products
figure.savefig(f"{bd_products_dir}/snr_wavelength.png")

We can clearly see above that using more files in the co-add substantially increases SNR. This is especially prevalent in the grey highlighted section of the top plot, where we set t = -1. The number of files used with that threshold value were 6 files, whereas the default threshold value of t = -50 used 12 files.

Looking at Flux:#

We’re going to create two diffrent plots to analyze our flux data: a standard flux vs wavelength plot, and a differential plot. We will also calculate the percent change in flux between the threshold values vs the default.

A differential plot shows the absolute differences between two datasets; in our case, we will be seeing the flux difference at each wavelength between one threshold value’s co-add and the default co-add. However, we will first need to interpolate the spectra to a common wavelength grid. Essentially, we will be adjusting the wavelength values so the two datasets can share the same wavelength points.

# Thresholds used that aren't the default
nondefault_thresholds = [thresh for thresh in thresholds if thresh != "-50"]

for i, thresh in enumerate(nondefault_thresholds):
    # Getting the data for the co-add that uses default threshold of t = -50
    default_path = f"{bd_products_dir}/out-50/{bd_coadd_name}"
    default_hdul = fits.open(default_path)
    thresh_data50 = default_hdul[1].data

    wavelength50 = thresh_data50["WAVELENGTH"]
    flux50 = thresh_data50["FLUX"]

    # Getting data for the co-add that uses the current iteration's threshold
    curr_path = f"{bd_products_dir}/out{thresh}/{bd_coadd_name}"
    curr_hdul = fits.open(curr_path)
    thresh_data_curr = curr_hdul[1].data

    wavelength_curr = thresh_data_curr["WAVELENGTH"]
    flux_curr = thresh_data_curr["FLUX"]

    # Getting the number of files used in co-add, will put this in the plot
    numfiles = len(fits.open(curr_path)[2].data["FILENAME"])
    numfiles_default = len(fits.open(default_path)[2].data["FILENAME"])

    # Minimum wavelength value in dataset
    minwave = min(wavelength50[0].min(), wavelength_curr[0].min())

    # Maximum wavelength value in dataset
    maxwave = max(wavelength50[0].max(), wavelength_curr[0].max())

    # Creating a common wavelength grid using shape of default wavelength axis
    common_wavelength = np.arange(start=minwave,
                                  stop=maxwave,
                                  step=1)

    # Interpolating the default + current threshold co-add onto new grid
    interp_flux50 = interp1d(wavelength50[0],
                             flux50,
                             kind='linear',
                             fill_value="extrapolate")(common_wavelength)
    
    interp_flux_curr = interp1d(wavelength_curr[0],
                                flux_curr,
                                kind='linear',
                                fill_value="extrapolate")(common_wavelength)

    # Creating two subplots, one flux vs wavelength and one differential plot
    fig, [ax0, ax1] = plt.subplots(2, 1,
                                   figsize=(10, 9))

    # Plotting the top plot, a.k.a. the flux vs wavelength for t = -50
    ax0.scatter(common_wavelength, interp_flux50,
                label=f"t = -50, {numfiles_default} files",
                color="black",
                s=1)

    # Plotting flux vs wavelength on same plot for current co-add
    ax0.scatter(common_wavelength, interp_flux_curr,
                label=f"t = {thresh}, {numfiles} files",
                color=colors[i],
                s=1,
                alpha=0.3)

    # Calculating difference between current co-add and default flux values
    flux_diff = interp_flux_curr - interp_flux50

    # Plotting the differential plot
    ax1.scatter(common_wavelength, flux_diff,
                color="black",
                s=1)

    # Calculating the percent difference and putting it on the plot
    percent_difference = abs(interp_flux50 - interp_flux_curr) / ((interp_flux50 + interp_flux_curr) / 2) * 100

    # Calculating the mean percent difference (ignoring NaNs)
    percent_difference = round(np.nanmean(percent_difference), 2)

    # Adding text box onto plot that displays newly calculated 5 difference
    ax0.text(0.02, 0.9,
             f"Average percent difference: {str(percent_difference)}%",
             transform=ax0.transAxes,
             weight="bold",
             bbox=dict(facecolor=colors[i],
                       alpha=0.3))
    
    # Setting the y axis limits on both plots to be the same
    ax0.set_ylim(0, 1e-12)
    ax1.set_ylim(0, 1e-12)

    # Adding formatting
    ax0.set_title(f"Flux vs Wavelength for t = {thresh}, -50",
                  weight="bold")

    ax0.set_xlabel(r'Wavelength [$\AA$]')
    ax0.set_ylabel(r'Flux [$erg\ s^{-1}\ cm^{-2}\ \AA^{-1}$]')

    # Putting the legend on the flux plot
    ax0.legend()

    ax1.set_title(f"Absolute Difference for t = {thresh}, -50",
                  weight="bold")
    
    ax1.set_xlabel(r'Wavelength [$\AA$]')
    ax1.set_ylabel(r'Flux Difference [$erg\ s^{-1}\ cm^{-2}\ \AA^{-1}$]')

    fig.tight_layout()

    fig.savefig(f"{bd_products_dir}/flux_differential.png")

    fig.show()

    default_hdul.close()
    curr_hdul.close()

Congrats on completing the notebook!#

There are more tutorial notebooks for custom co-addition cases in this repo, check them out!#

About this Notebook#

Author: Sierra Gomez (sigomez@stsci.edu)

Updated on: 01/29/2024

This tutorial was generated to be in compliance with the STScI style guides and would like to cite the Jupyter guide in particular.

Citations#

If you use the following packages for published research, please cite the authors. Follow these links for more information about citations:

Space Telescope Logo