Run spec2 pipeline#

The dispersed images for WFSS will be run through the spec2 pipeline after the image2 and image3 pipelines have been run on the corresponding direct images. As mentioned in the previous notebook, it is extremely helpful for this step if you have a source catalog that only includes the sources that you would like to look at. Any additional sources will take extra time to calibrate.

Use case: After creating a custom source catalog, spec2 should be run on the dispersed WFSS images.
Data: JWST/NIRISS images and spectra from program 2079 obs 004. This should be stored in a single directory data, and can be downloaded from the notebook 00_niriss_mast_query_data_setup.ipynb.
Tools: astropy, crds, glob, jdaviz, json, jwst, matplotlib, numpy, os, pandas, shutil, urllib, zipfile
Cross-instrument: NIRISS

Content

  • Imports & Data Setup

  • Custom Spec2 Pipeline Run

    • Spec2 Association Files

    • Run Spec2

    • Examine the Outputs of Spec2

  • Explore Data Further

    • Find a Source

    • Limit the Number of Extracted Sources

    • Final Visualization

Author: Rachel Plesha (rplesha@stsci.edu), Camilla Pacifici (cpacifici@stsci.edu), JWebbinar notebooks.
First Published: May 2024
Last edited: August 2024
Last tested: This notebook was last tested with JWST pipeline version 1.12.5 and the CRDS context jwst_1225.pmap.

Imports & Data Setup#

CRDS Documentation

# Update the CRDS path to your local directory
%env CRDS_PATH=crds_cache
%env CRDS_SERVER_URL=https://jwst-crds.stsci.edu
import glob
import json
import os
import shutil
import urllib
import zipfile
import numpy as np
import pandas as pd

from astropy.table import Table
from astropy.io import fits
from astropy import constants as const

from matplotlib import pyplot as plt
from matplotlib import patches
%matplotlib widget
# %matplotlib inline

from jwst.pipeline import Spec2Pipeline

Check what version of the JWST pipeline you are using. To see what the latest version of the pipeline is available or install a previous version, check GitHub. Also verify what CRDS version you are using. CRDS documentation explains how to set a specific context to use in the JWST pipeline. If either of these values are different from the last tested note above there may be differences in this notebook.

import jwst
import crds
print('JWST Pipeliene Version:', jwst.__version__)
print('CRDS Context:', crds.get_context_name('jwst'))

Data Setup#

The data directory, data_dir should contain all of the association and rate files in a single, flat directory. custom_run_image3 should match the output directory for the custom image3 calibration in the notebook 01_niriss_wfss_image2_image3.ipynb.

For spec2, we need the rate files that we downloaded as well as the segmentation maps and source catalogs from image3. Because of that, we will create a new directory (defined by custom_run_spec2), change into it, and copy the relevant data over.

For a regular workflow, it is likely easier to leave all files in a single directory compared to the multiple directories we make here.

Open up our data information file which will make parsing the rate files to copy over easier as we only need the dispersed images, i.e. those observed with GR150R or GR150C.

data_dir = 'data'
if not os.path.exists(data_dir):
    os.mkdir(data_dir)
    
custom_run_spec2 = 'custom_spec2' # saving files here in this notebook
custom_run_image3 = 'custom_image3_calibrated' # results of custom image3 calibration

# if the directories dont't exist yet, make it
for custom_dir in [custom_run_spec2, custom_run_image3]:
    if not os.path.exists(os.path.join(data_dir, custom_dir)):
        os.mkdir(os.path.join(data_dir, custom_dir))
# if you have not downloaded the data from notebook 00 or have not run notebook 01, run this cell. Otherwise, feel free to skip it.

# Download uncalibrated data from Box into the data directory:
boxlink = 'https://data.science.stsci.edu/redirect/JWST/jwst-data_analysis_tools/niriss_wfss_advanced/niriss_wfss_advanced_02_input.zip'
boxfile = os.path.basename(boxlink)
urllib.request.urlretrieve(boxlink, boxfile)

zf = zipfile.ZipFile(boxfile, 'r')
zf.extractall(path=data_dir)

# move the files downloaded from the box file into the top level data directory
box_download_dir = os.path.join(data_dir, boxfile.split('.zip')[0])

for filename in glob.glob(os.path.join(box_download_dir, '*')):
    if '.csv' in filename:
        # move to the current directory
        os.rename(filename, os.path.basename(filename))
    elif '_segm.fits' in filename or '_cat.ecsv' in filename:
        # move the image2 products to the appropriate directory
        os.rename(filename, os.path.join(data_dir, custom_run_spec2, os.path.basename(filename)))
    elif '_i2d.fits' in filename:
        # move image3 products to their directory, too
        os.rename(filename, os.path.join(data_dir, custom_run_image3, os.path.basename(filename)))
    else:
        # move to the data directory 
        os.rename(filename, os.path.join(data_dir, os.path.basename(filename)))
        
# remove unnecessary files now
os.remove(boxfile)
os.rmdir(box_download_dir)
# From the csv file we created earlier, find a list of all of the grism observations we will want to calibrate with spec2
listrate_file = 'list_ngdeep_rate.csv'
rate_df = pd.read_csv(listrate_file)
# copy all of the grism rate files
gr150r_files = list(rate_df[rate_df['FILTER'] == 'GR150R']['FILENAME'])
gr150c_files = list(rate_df[rate_df['FILTER'] == 'GR150C']['FILENAME'])

for grism_rate in gr150r_files + gr150c_files:
    if os.path.exists(grism_rate):
        shutil.copy(grism_rate, os.path.join(data_dir, custom_run_spec2, os.path.basename(grism_rate)))
    else:
        print(f'{grism_rate} does not exist. Not able to copy file.')
# copy all of the spec2 asn files
for asn in glob.glob(os.path.join(data_dir, '*spec2*asn*.json')):
    if os.path.exists(asn):
        shutil.copy(asn, os.path.join(data_dir, custom_run_spec2, os.path.basename(asn)))
    else:
        print(f'{asn} does not exist. Not able to copy file.')
# copy all of the necessary image3 output files
cats = glob.glob(os.path.join(data_dir, custom_run_image3, '*source*_cat.ecsv')) # copy both the source-match and source118 catalogs
segm = glob.glob(os.path.join(data_dir, custom_run_image3, '*_segm.fits'))

for image3_file in cats + segm:
    if os.path.exists(image3_file):
        shutil.copy(image3_file, os.path.join(data_dir, custom_run_spec2, os.path.basename(image3_file)))
    else:
        print(f'{image3_file} does not exist. Not able to copy file.')
cwd = os.getcwd() # get the current working directory 
if cwd != data_dir: # if you are not already in the location of the data, change into it
    try:
        os.chdir(data_dir)
    except FileNotFoundError:
        print(f"Not able to change into: {data_dir}.\nRemaining in: {cwd}")
        pass

if not os.path.exists(custom_run_spec2):
    os.mkdir(custom_run_spec2)

os.chdir(custom_run_spec2)

new_cwd = os.getcwd()
print('Now in:', new_cwd)
# directories for calibrating later
source_outdir = 'new_catalog_calibrated'
if not os.path.exists(source_outdir):
    os.mkdir(source_outdir)


param_outdir = 'parameter_input_calibrated'
if not os.path.exists(param_outdir):
    os.mkdir(param_outdir)

Custom Spec2 Pipeline Run#

Because this is a custom spec2 pipeline run, the first step is to make sure the correct source catalog is being used. This will define the spectral trace cutouts, and therefore inform the pipeline on where to extract the spectrum.

Spec2 Association Files#

As with the imaging part of the pipeline, there are association files for spec2. These are a bit more complex in that they need to have the science (WFSS) data, direct image, source catalog, and segmentation map included as members. For the science data, the rate files are used as inputs, similarly to image2. Also like image2, there should be one asn file for each dispersed image dither position in an observing sequence. In this case, that should match the number of rate files where FILTER=GR150R or FILTER=GR150C. For this program and observation, there are three dithers per grism, and both GR150R and GR150C are used, totaling six exposures per observing sequence with five observing sequences in the observation using the blocking filters F115W -> F115W -> F150W -> F150W -> F200W.

Looking in a Spec2 Association File#

spec2_asns = np.sort(glob.glob('*spec2*asn*.json'))
print(len(spec2_asns), 'Spec2 ASN files')
# number of asn files should match the number of dispersed images -- 30 for obs 004
print(len(rate_df[(rate_df['FILTER'] == 'GR150R') | (rate_df['FILTER'] == 'GR150C')]), 'Dispersed image rate files')
# look at one of the association files
asn_data = json.load(open(spec2_asns[0]))
for key, data in asn_data.items():
    print(f"{key} : {data}")
# in particular, take a closer look at the product filenames with the association file:
for product in asn_data['products']:
    for key, value in product.items():
        if key == 'members':
            print(f"{key}:")
            for member in value:
                print(f"    {member['expname']} : {member['exptype']}")
        else:
            print(f"{key}: {value}")

Modify the Association File to use Custom Source Catalog#

What is currently in the association files uses the pipeline source catalog. In the previous notebook, we created a custom source catalog that we want to use with the extension source-match_cat.ecsv, so we need to point to that catalog instead in the association files.

new_source_ext = 'source-match_cat.ecsv'

# loop through all of the spec2 association files in the current directory
for asn in spec2_asns:
    asn_data = json.load(open(asn))
    for product in asn_data['products']: # for every product, check the members        
        for member in product['members']: # there are multiple members per product
            if member['exptype'] == 'sourcecat':
                cat_in_asn = member['expname']
                # check that we haven't already updated the source catalog name
                if new_source_ext not in cat_in_asn:
                    new_cat = cat_in_asn.replace('cat.ecsv', new_source_ext)
                    # actually update the association file member
                    if os.path.exists(new_cat):
                        member['expname'] = new_cat
                    else:
                        print(f"{new_cat} does not exist in the currenty working directory")

    # write out the new association file
    with open(asn, 'w', encoding='utf-8') as f:
        json.dump(asn_data, f, ensure_ascii=False, indent=4)
# double check that things still look ok:
asn_check = json.load(open(spec2_asns[0]))
for product in asn_check['products']:
    for key, value in product.items():
        if key == 'members':
            print(f"{key}:")
            for member in value:
                print(f"    {member['expname']} : {member['exptype']}")
        else:
            print(f"{key}: {value}")

Alternatively, open the .json file in your favorite text editor and manually edit each of the catalog names

Run spec2#

Like the image pipeline, we can supply custom parameters for steps in the pipeline.

More information about everything that is run during the spec2 stage of the pipeline can be found here.

To start, we are only going to run spec2 on one association file because spec2 can take a while to run if there are many sources. We are saving the outputs to the directory we are currently in, which should be the custom_spec2 directory made above.

test_asn = spec2_asns[0]
print(f'Calibrating: {test_asn}')
# check if the calibrated file already exists
asn_data = json.load(open(test_asn))
x1d_file = f"{asn_data['products'][0]['name']}_x1d.fits"

if os.path.exists(x1d_file):
    print(x1d_file, ': x1d file already exists.')
else:
    spec2 = Spec2Pipeline.call(test_asn, save_results=True)

Examining the Outputs of Spec2#

The outputs of spec2 are cal.fits and x1d.fits files. Here we do a quick look into some important parts of these files.

asn_example = json.load(open(spec2_asns[0]))
rate_file = asn_example['products'][0]['members'][0]['expname']
source_cat = asn_example['products'][0]['members'][2]['expname']
cal_file = rate_file.replace('rate', 'cal')
x1d_file = rate_file.replace('rate', 'x1d')
# open all of the files to look at
rate_hdu = fits.open(rate_file)
cal_hdu = fits.open(cal_file)
x1d_hdu = fits.open(x1d_file)
cat = Table.read(source_cat)

# first look at how many sources we expect from the catalog
print(f'There are {len(cat)} sources identified in the current catalog.\n')

# then look at how long the cal and x1d files are for comparison
print(f'The x1d file has {len(x1d_hdu)} extensions & the cal file has {len(cal_hdu)} extensions')

Note that the 0th and final extension in each file do not contain science data, but the remaining extensions correspond to each source. The x1d file contains the extracted spectrum for each source, while the cal file contains the 2D cutout information in seven extensions for each source (SCI, DQ, ERR, WAVELENGTH, VAR_POISSON, VAR_RNOISE, VAR_FLAT).

This in part is why it is so important to have a refined source catalog. If there are sources that are not useful for your research, there is no reason to create a cutout and extract them.

Notice that there are more sources than there are extensions in the files. This is because the pipeline defaults to only extracting the 100 brightest sources. To change this behavior, supply the pipeline with the paramter wfss_nbright, which we do below.

print(x1d_hdu.info())
print(cal_hdu.info())

The x1d file is a BinTable, so there are additional columns contained inside each of the data extensions. x1d further reading goes through what is contained in each column, but we can also take a quick look by looking at one of the data columns.

# a quick look at the columns also available in the x1d file
print(x1d_hdu[1].data.columns)

Explore Spec2 Data Further#

Find a Source in the Spec2 Data#

Each extension of the cal and x1d files has a source ID in the header. These values should match with the values in the source catalog.

def find_source_ext(x1d_hdu, cal_hdu, source_id, info=True):
    # x1d extension first
    x1d_source_ids = np.array([x1d_hdu[ext].header['SOURCEID'] for ext in range(len(x1d_hdu))[1:-1]]) # cut off the 0th and last data extensions
    wh_x1d = np.where(x1d_source_ids == source_id)[0][0] + 1 # need to add 1 for the primary header
    
    # look for cal extension, too, but only in the SCI extension; 
    # fill in with a source ID of -999 for all other extensions to get the right extension value
    cal_source_ids = np.array([cal_hdu[ext].header['SOURCEID'] if cal_hdu[ext].header['EXTNAME'] == 'SCI'
                               else -999 for ext in range(len(cal_hdu))[1:-1]]) 
    wh_cal = np.where(cal_source_ids == source_id)[0][0] + 1 # need to add 1 for the primary header

    if info:
        print(f"All source IDs in x1d file:\n{x1d_source_ids}\n")
        print(f"Extension {wh_x1d} in {x1d_hdu[0].header['FILENAME']} contains the data for source {source_id} from our catalog")
        print(f"Extension {wh_cal} in {cal_hdu[0].header['FILENAME']} contains the data for source {source_id} from our catalog")

    return wh_x1d, wh_cal
# Let's look for source 118 that we identified in the previous notebook.
source_id = 118
wh_x1d_118, wh_cal_118 = find_source_ext(x1d_hdu, cal_hdu, source_id)

x1d_data_118 = x1d_hdu[wh_x1d_118].data 
cal_data_118 = cal_hdu[wh_cal_118].data

First, let’s look at the extraction box as it appears on the rate image. This will help us get a global feel for how much we can trust the pipeline’s extraction.

Note: In this example, the extraction box isn’t fully cenetered around the spectrum. There is an ongoing calibration effort to better account for difference in the spectral trace shape across the NIRISS detector. Updates about the status of this calibration can be found on the NIRISS jdox

# fill in the nan values from the bad pixels with zero to make it easier to look at
rate_data = rate_hdu[1].data
rate_data[np.isnan(rate_data)] = 0

# extraction box parameters from the header of the cal data:
cal_header = cal_hdu[wh_cal_118].header
sx_left = cal_header['SLTSTRT1']
swidth = cal_header['SLTSIZE1']
sx_right = cal_header['SLTSTRT1'] + swidth
sy_bottom = cal_header['SLTSTRT2']
sheight = cal_header['SLTSIZE2']
sy_top = cal_header['SLTSTRT2'] + sheight

# plot the rate file and the extraction box
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
ax1.imshow(rate_data, origin='lower', vmin=0.2, vmax=1, aspect='auto') # the scaling may need some adjustment
ax2.imshow(rate_data, origin='lower', vmin=0.2, vmax=1, aspect='auto') # the scaling may need some adjustment

rectangle = patches.Rectangle((sx_left, sy_bottom), swidth, sheight, edgecolor='darkorange', facecolor="None", linewidth=1)
ax1.add_patch(rectangle)
ax1.set_title(rate_file)

rectangle2 = patches.Rectangle((sx_left, sy_bottom), swidth, sheight, edgecolor='darkorange', facecolor="None", linewidth=2)
ax2.add_patch(rectangle2)
ax2.set_xlim(sx_left-30, sx_right+30)
ax2.set_ylim(sy_bottom-30, sy_top+30)
ax2.set_title(f'Source {source_id} Zoom in')

plt.suptitle(f"{cal_hdu[0].header['FILTER']} {cal_hdu[0].header['PUPIL']}")

We can then take a look at the extracted spectrum in this box both in the cal file and the x1d file. In the extracted spectrum below you can see the [OII] and H\(\beta\) emission lines from the galaxy.

Note: The upturned edge effects seen in the 1-D spectrum are due to interpolation at the edges of the extraction box for the current flux calibration. This is also part of an ongoing calibration effort.

Additional note: The default units of flux from the pipeline are in Jansky. However, in these extracted spectra we show units of erg/s/cm^2/Angstrom. To turn this off, set convert=False in plot_cutout_and_spectrum

def Fnu_to_Flam(wave_micron, flux_jansky):
    # convert Jansky flux units to erg/s/cm^2/Angstrom with an input wavelength in microns
    f_lambda = 1E-21 * flux_jansky * (const.c.value) / (wave_micron**2) # erg/s/cm^2/Angstom
    return f_lambda
def plot_cutout_and_spectrum(cal_data, x1d_data, cal_file, x1d_file, source_id, convert=True):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 5))
    
    # plot the cal image
    ax1.imshow(cal_data, origin='lower', vmin=0, vmax=np.nanmax(cal_data)*.01, aspect='auto')
    ax1.set_title(os.path.basename(cal_file))
    
    # plot the spectrum
    wave = x1d_data['WAVELENGTH'].ravel()
    flux = x1d_data['FLUX'].ravel()

    if convert:
        flux = Fnu_to_Flam(wave, flux)
        fluxunit = 'egs/s/cm^2/Angstrom'
    else:
        fluxunit = 'Jy'

    ax2.plot(wave, flux)
    ax2.set_title(os.path.basename(x1d_file))

    edge_buffer = int(len(flux) * .25)
    max_flux = np.nanmax(flux[edge_buffer:edge_buffer*-1])
    ax2.set_ylim(0,  max_flux+(max_flux*0.1)) # cutting the flux of the edges & adding 10% buffer to the limits
    ax2.set_xlabel('Wavelength (Microns)')
    ax2.set_ylabel(f'Flux ({fluxunit})')

    if fits.getval(cal_file, 'FILTER') == 'GR150C':
        ax1.set_xlabel('X Pixels (<--- dispersion direction)')
        ax1.set_ylabel('Y Pixels')
    else:
        ax1.set_xlabel('X Pixels')
        ax1.set_ylabel('Y Pixels (<--- dispersion direction)')        
    
    plt.suptitle(f"{fits.getval(cal_file, 'FILTER')} {fits.getval(cal_file, 'PUPIL')} Source {source_id}", x=0.5, y=1)
plot_cutout_and_spectrum(cal_data_118, x1d_data_118, cal_file, x1d_file, source_id)

Limit the Number of Extracted Sources#

Because it takes so long to extract so many sources, let’s see if we can pair down the number of sources being extracted. We’ll do this with parameter inputs (for bright point sources) and with a further refined source catalog.

Limit Extraction Using Parameter Inputs#

For this calibration, we are explicitly calling out parameters the following step:

  • extract_2d where we are setting wfss_nbright which limits the number of bright sources that are extracted and wfss_mmag_extract which sets a limit on the faintest magnitude we want extracted. Further Reading

In this case, we’ll limit the extractions to only the 10 brightest objects.

# check if the calibrated file already exists
asn_data = json.load(open(spec2_asns[0]))
x1d_file = os.path.join(param_outdir, f"{asn_data['products'][0]['name']}_x1d.fits")

if os.path.exists(x1d_file):
    print(x1d_file, ': x1d file already exists.')
else:
    spec2 = Spec2Pipeline.call(spec2_asns[0],
                               steps={'extract_2d': {'wfss_nbright': 10}, },
                               save_results=True,
                               output_dir=param_outdir)
# open the x1d to examine how many sources were extracted
x1ds = glob.glob(os.path.join(param_outdir, '*x1d.fits*'))
with fits.open(x1ds[0]) as temp_x1d:
    print(f'The x1d file has {len(temp_x1d)-2} extracted sources')

Limit Extraction Using the Source Catalog#

In the last notebook we limited the catalog a couple of different ways. Here, let’s limit the catalog to a specific magnitude range, and then use that new catalog to extract the data.

source_match_cats = np.sort(glob.glob('*source-match_cat.ecsv'))
source_match_cat = Table.read(source_cat)

# look at the possible magnitude ranges to look at
mags = cat['isophotal_vegamag']
min_vegmag = mags.min()
max_vegmag = mags.max()
print(f"Magnitude range: {min_vegmag} -  {max_vegmag}")

# source 118 should have a Vega mag of ~21.68
source_id = 118
source_mag = source_match_cat[source_match_cat['label'] == source_id]['isophotal_vegamag'][0]
print(f"Magnitude for source in previous notebook (source {source_id}) : {source_mag}")
# find the catalog for how many sources are between a specific magnitude (21.18 to make sure to include our example source)
min_mag = 21.1
max_mag = 21.2
mag_cat = source_match_cat[(source_match_cat['isophotal_vegamag'] >= min_mag) & (source_match_cat['isophotal_vegamag'] <= max_mag)]

mag_cat['label', 'xcentroid', 'ycentroid', 'sky_centroid', 'is_extended', 'isophotal_abmag', 'isophotal_vegamag']
mag_source_ext = 'mag-limit_cat.ecsv'

new_cat = Table(mag_cat) # turn the row instance into a dataframe again

# save the new catalog with a unique name
new_cat_name = source_cat.replace('cat.ecsv', mag_source_ext)
new_cat.write(new_cat_name, overwrite=True)
print('Saved:', new_cat_name)

Once we have a source catalog that we’ve limited to specific sources, let’s match the remaining catalogs to those sources

for cat_name in source_match_cats:

    if cat_name == source_cat:
        # the base one has already been saved
        continue

    # match the source IDs between the current catalog and the base catalog above
    cat = Table.read(cat_name)
    cat.add_index('label')
    new_cat = cat.loc[list(mag_cat['label'])]

    # check to ensure the sources are all there
    print(repr(new_cat['label', 'xcentroid', 'ycentroid', 'sky_centroid', 'is_extended', 'isophotal_abmag', 'isophotal_vegamag']))
    
    # save the new catalog with a unique name
    new_cat_name = cat_name.replace('cat.ecsv', mag_source_ext)
    new_cat.write(new_cat_name, overwrite=True)
    print('Saved:', new_cat_name)
    print()

Once the new catalogs have been made, we have to update the association files to use the new catalogs. Note: the association files will need to be updated again if you want to calibrate again with the previous source catalogs

new_source_ext = mag_source_ext

# loop through all of the spec2 association files in the current directory
for asn in spec2_asns:
    asn_data = json.load(open(asn))
    for product in asn_data['products']: # for every product, check the members        
        for member in product['members']: # there are multiple members per product
            if member['exptype'] == 'sourcecat':
                cat_in_asn = member['expname']
                # check that we haven't already updated the source catalog name
                if new_source_ext not in cat_in_asn:
                    new_cat = cat_in_asn.replace('cat.ecsv', new_source_ext)
                    # actually update the association file member
                    if os.path.exists(new_cat):
                        member['expname'] = new_cat
                    else:
                        print(f"{new_cat} does not exist in the currenty working directory")

    # write out the new association file
    with open(asn, 'w', encoding='utf-8') as f:
        json.dump(asn_data, f, ensure_ascii=False, indent=4)
# double check that the source catalog has been changed
for spec2_asn in spec2_asns:
    asn_check = json.load(open(spec2_asn))
    for product in asn_check['products']:
        for key, value in product.items():
            if key == 'members':
                for member in value:
                    if member['exptype'] == 'sourcecat':
                        print(f"    {member['exptype']}: {member['expname']}")
            else:
                print(f"{key}: {value}")

Now when we calibrate everything, for a single file it should take a lot less time because there are a limited number of sources. However, we will calibrate all of the files in this visit, so this cell might take a bit of time to run.

# calibrate with the new source catalog
for spec2_asn in spec2_asns:
    # check if the calibrated file already exists
    asn_data = json.load(open(spec2_asn))
    x1d_file = os.path.join(source_outdir, f"{asn_data['products'][0]['name']}_x1d.fits")
    
    if os.path.exists(x1d_file):
        print(x1d_file, ': x1d file already exists.')
    else:
        spec2 = Spec2Pipeline.call(spec2_asn, save_results=True, output_dir=source_outdir)

Final Visualization#

Now that everything has been calibrated, it’s useful to look at all of the extracted files. The cal and x1d files from spec2 are extracted at each dither step, which is shown below. Spec3 then turns the individual dither x1d files into a single combined spectrum per grism and filter for each source.

Note that for GR150R data, the dispersion direction is in the -Y direction, and for GR150C data, the dispersion direction is in the -X direction.

Look at all of the Files for a Single Source#

# Explore the new Data
x1ds = np.sort(glob.glob(os.path.join(source_outdir, "*x1d.fits")))

# get a list of all of the source IDs from the first file to look at for this example
sources = [fits.getval(x1ds[0], 'SOURCEID', ext=ext) for ext in range(len(fits.open(x1ds[0])))[1:-1]]
source_id = 118

# plot each x1d/cal file
for i, x1d_file in enumerate(x1ds):
    cal_file = x1d_file.replace('x1d.fits', 'cal.fits')
    with fits.open(x1d_file) as x1d_hdu, fits.open(cal_file) as cal_hdu:

        try:
            wh_x1d, wh_cal = find_source_ext(x1d_hdu, cal_hdu, source_id, info=False)
        except IndexError:
            # this means the source isn't in this observation
            continue

        x1d_data = x1d_hdu[wh_x1d].data 
        cal_data = cal_hdu[wh_cal].data

    plot_cutout_and_spectrum(cal_data, x1d_data, cal_file, x1d_file, source_id)

Overplot these files on top of each other to compare. The two grisms will be different line styles to draw attention to any differences that could be due to the calibration, including contamination, and each blocking filter will be a different color.

# Overplot the different grisms on top of each other for the same source
x1ds = np.sort(glob.glob(os.path.join(source_outdir, "*x1d.fits")))

# get a list of all of the source IDs from the first file to look at for this example
sources = [fits.getval(x1ds[0], 'SOURCEID', ext=ext) for ext in range(len(fits.open(x1ds[0])))[1:-1]]
source_id = 118

# create a figure with three panels
src118_fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(11, 9), sharex=True, sharey=True)

ls_dict = {'GR150R': '--',
           'GR150C': '-',
           }

color_dict = {'F115W': '#e1cb00',
              'F150W': '#32b45c',
              'F200W': '#099ab4',
              }

max_fluxes = []
all_waves = []
# plot each x1d file
for i, x1d_file in enumerate(x1ds):
    with fits.open(x1d_file) as x1d_hdu:
        try:
            wh_x1d, wh_cal = find_source_ext(x1d_hdu, cal_hdu, source_id, info=False)
        except IndexError:
            # this means the source isn't in this observation
            continue

        x1d_data = x1d_hdu[wh_x1d].data 
        grism = x1d_hdu[0].header['FILTER']
        filter = x1d_hdu[0].header['PUPIL']

    wave = x1d_data['WAVELENGTH'].ravel()
    flux = x1d_data['FLUX'].ravel()

    flux = Fnu_to_Flam(wave, flux)
    fluxunits = 'erg/s/cm^2/Angstrom'

    ax1.plot(wave, flux, color=color_dict[filter], ls=ls_dict[grism], alpha=0.7)
    if grism == 'GR150C':
        ax2.plot(wave, flux, color=color_dict[filter], ls=ls_dict[grism], alpha=0.7)
    else:
        ax3.plot(wave, flux, color=color_dict[filter], ls=ls_dict[grism], alpha=0.7)

    # save the maximum fluxes for plotting, removing any edge effects
    edge_buffer = int(len(flux) * .25)
    max_fluxes.append(np.nanmax(flux[edge_buffer:edge_buffer*-1]))
    all_waves.extend(wave)

# plot limits & labels
ax1.set_ylim(0, np.max(max_fluxes))
ax1.set_xlim(np.min(all_waves), np.max(all_waves))

src118_fig.suptitle(f'Source {source_id}')
ax1.set_title('GR150R & GR150C')
ax2.set_title('GR150C')
ax3.set_title('GR150R')

for ax in [ax1, ax2, ax3]:
    ax.set_xlabel('Wavelength (Microns)')
    ax.set_ylabel(f'Flux ({fluxunits})')

# label for each of the filters
for filt, color in color_dict.items():
    ax1.plot(0, 0, color=color, label=filt)

ax1.legend(bbox_to_anchor=(1, 1.05))
src118_fig.tight_layout()

Look at all of the sources for a single file#

Note that some sources might not actually be extracting anything interesting. If this is the case, go back to your source catalog and images to ensure that you have the correct source identified and the target is centered in the cutout. This notebook uses simple examples to create and limit the source catalog, so it might not always show the most scientifically interesting sources.

x1d_file = os.path.join(source_outdir, 'jw02079004002_11101_00002_nis_x1d.fits') # this is a nice spectrum of 118 above
cal_file = x1d_file.replace('x1d.fits', 'cal.fits')
with fits.open(x1d_file) as x1d_hdu, fits.open(cal_file) as cal_hdu:

    for ext in range(len(x1d_hdu))[1:-1]:

        source_id = x1d_hdu[ext].header['SOURCEID']

        try:
            wh_x1d, wh_cal = find_source_ext(x1d_hdu, cal_hdu, source_id, info=False)
        except IndexError:
            # this means the source isn't in this observation
            continue
    
        x1d_data = x1d_hdu[wh_x1d].data 
        cal_data = cal_hdu[wh_cal].data
    
        plot_cutout_and_spectrum(cal_data, x1d_data, cal_file, x1d_file, source_id)

Visualize where the extracted sources are on the dispersed image, and how that compares to the direct image#

# setting up the figure
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 5), sharex=True, sharey=True)

# **grism data

# for the dispersed image plot
# x1d_file and cal_file use the same root as we are looking at for a single source
rate_file = os.path.basename(x1d_file.replace('x1d.fits', 'rate.fits'))

# fill in the nan values from the bad pixels with zero to make it easier to look at
with fits.open(rate_file) as rate_hdu:
    rate_data = rate_hdu[1].data
rate_data[np.isnan(rate_data)] = 0

# plot the rate file and the extraction box
ax1.imshow(rate_data, origin='lower', vmin=0, vmax=np.nanmax(rate_data)*0.01, aspect='auto')

with fits.open(x1d_file) as x1d_hdu, fits.open(cal_file) as cal_hdu:

    for ext in range(len(x1d_hdu))[1:-1]:

        source_id = x1d_hdu[ext].header['SOURCEID']

        try:
            wh_x1d, wh_cal = find_source_ext(x1d_hdu, cal_hdu, source_id, info=False)
        except IndexError:
            # this means the source isn't in this observation
            continue
    
        x1d_data = x1d_hdu[wh_x1d].data 
        cal_data = cal_hdu[wh_cal].data
    
        # extraction box parameters from the header of the cal data:
        cal_header = cal_hdu[wh_cal].header
        sx_left = cal_header['SLTSTRT1']
        swidth = cal_header['SLTSIZE1']
        sx_right = cal_header['SLTSTRT1'] + swidth
        sy_bottom = cal_header['SLTSTRT2']
        sheight = cal_header['SLTSIZE2']
        sy_top = cal_header['SLTSTRT2'] + sheight
        
        rectangle = patches.Rectangle((sx_left, sy_bottom), swidth, sheight, edgecolor='darkred', facecolor="None", linewidth=1)
        ax1.add_patch(rectangle)
        ax1.text(sx_left, sy_top+10, source_id, fontsize=12, color='darkred')
        
    ax1.set_title(f"{os.path.basename(x1d_file).split('_nis')[0]}: {cal_hdu[0].header['FILTER']} {cal_hdu[0].header['PUPIL']}")

# **imaging data
asn_data = json.load(open(fits.getval(x1d_file, 'ASNTABLE')))
i2d_name = asn_data['products'][0]['members'][1]['expname']
cat_name = asn_data['products'][0]['members'][2]['expname']
with fits.open(os.path.join('../../', data_dir, custom_run_image3, i2d_name)) as i2d:
    ax2.imshow(i2d[1].data, origin='lower', aspect='auto', vmin=0, vmax=np.nanmax(i2d[1].data)*0.01)
    ax2.set_title(f"{os.path.basename(i2d_name).split('_i2d')[0]}")

# also plot the associated catalog
cat = Table.read(cat_name)
extended_sources = cat[cat['is_extended'] == 1] # 1 is True; i.e. is extended
point_sources = cat[cat['is_extended'] == 0] # 0 is False; i.e. is a point source

for color, sources in zip(['darkred', 'black'], [extended_sources, point_sources]):
    # plotting the sources
    ax2.scatter(sources['xcentroid'], sources['ycentroid'], s=40, facecolors='None', edgecolors=color, alpha=0.9, lw=2)

    # adding source labels 
    for i, source_num in enumerate(sources['label']):
        ax2.annotate(source_num, 
                     (sources['xcentroid'][i]+0.5, sources['ycentroid'][i]+0.5), 
                     fontsize=12,
                     color=color)

fig.tight_layout()

Continue to explore further, including using the spec3 stage of the pipeline!

Space Telescope Logo