NIRISS SOSS Pipeline Notebook

NIRISS SOSS Pipeline Notebook#

Authors: R. Cooper, A. Carter, N. Espinoza, T. Baines
Last Updated: November 7, 2025
Pipeline Version: 1.20.2 (Build 12.1)

Purpose:
This notebook provides a framework for processing Near-Infrared Imager and Slitless Spectrograph (NIRISS) Single Object Slitless Spectrograph (SOSS) data through all three James Webb Space Telescope (JWST) pipeline stages. Data is assumed to be located in one observation folder according to paths set up below. It should not be necessary to edit any cells other than in the Configuration section unless modifying the standard pipeline processing options.

Data:
This notebook uses an example dataset from Early Release Observation (ER0) Program 2734 (PI: K. Pontoppidan). This program consists of time series observations (TSO) of confirmed exoplanets HAT-P-18b and WASP-96b, intended to demonstrate the power and precision of the JWST TSO modes. In this notebook, we will reduce the NIRISS SOSS observations of transiting exoplanet WASP-96b.

Example input data to use will be downloaded automatically unless disabled (i.e., to use local files instead).

JWST pipeline version and CRDS context:
This notebook was written for the above-specified pipeline version and associated build context for this version of the JWST Calibration Pipeline. Information about this and other contexts can be found in the JWST Calibration Reference Data System (CRDS server). If you use different pipeline versions, please refer to the table here to determine what context to use. To learn more about the differences for the pipeline, read the relevant documentation.

Please note that pipeline software development is a continuous process, so results in some cases may be slightly different if a subsequent version is used. For optimal results, users are strongly encouraged to reprocess their data using the most recent pipeline version and associated CRDS context, taking advantage of bug fixes and algorithm improvements. Any known issues for this build are noted in the notebook.

Visualization:
This notebook uses the batman package (Pypi batman-package) for analysis of the SOSS data. Some versions of this package may be incompatible with certain python versions and CPU architectures. If issues are encountered with this package it can be disabled in the ‘Package Imports’ section below; the function to make the final data product visualization must then be called without the modelparams argument.

Updates:
This notebook is regularly updated as improvements are made to the pipeline. Find the most up to date version of this notebook at: https://github.com/spacetelescope/jwst-pipeline-notebooks/

Recent Changes:
November 07, 2025: original notebook released

Table of Contents#

Configuration
Package Imports
Demo Mode Setup (ignore if not using demo data)
Directory Setup
Detector 1 Pipeline
Spec2 Pipeline
TSO3 Pipeline
Visualize the data

1. Configuration#

Install dependencies and parameters#

To make sure that the pipeline version is compatabile with the steps discussed below and the required dependencies and packages are installed, you can create a fresh conda environment and install the provided requirements.txt file:

conda create -n niriss_soss_pipeline python=3.13
conda activate niriss_soss_pipeline
pip install -r requirements.txt

Set the basic parameters to use with this notebook. These will affect what data is used, where data is located (if already in disk), and pipeline modules run in this data. The list of parameters are:

demo_mode
directories with data
pipeline modules

# Basic import necessary for configuration
import os

Note that demo_mode must be set appropriately below.

Set demo_mode = True to run in demonstration mode. In this mode this notebook will download example data from the Barbara A. Mikulski Archive for Space Telescopes (MAST) and process it through the pipeline. This will all happen in a local directory unless modified in Section 3 below.

Set demo_mode = False if you want to process your own data that has already been downloaded and provide the location of the data.

# Set parameters for demo_mode and processing steps.

# -----------------------------Demo Mode---------------------------------
demo_mode = True

if demo_mode:
    print('Running in demonstration mode using online example data!')

# --------------------------User Mode Directories------------------------
# If demo_mode = False, look for user data in these paths
if not demo_mode:
    # Set directory paths for processing specific data; these will need
    # to be changed to your local directory setup (below are given as
    # examples)
    basedir = os.path.join(os.getcwd(), '')

    # Point to location of science observation data.
    # Assumes uncalibrated data in sci_dir/uncal/ and results in stage1,
    # stage2, stage3 directories
    sci_dir = os.path.join(basedir, 'JWSTData/PID_2734/')

# --------------------------Set Processing Steps--------------------------
# Individual pipeline stages can be turned on/off here.  Note that a later
# stage won't be able to run unless data products have already been
# produced from the prior stage.

# Science processing
dodet1 = True  # calwebb_detector1
dospec2 = True  # calwebb_spec2
dotso3 = True  # calwebb_tso3
doviz = True  # Visualize calwebb_tso3 output

Set CRDS context and server#

Before importing CRDS and JWST modules, we need to configure our environment. This includes defining a CRDS cache directory in which to keep the reference files that will be used by the calibration pipeline.

If the root directory for the local CRDS cache directory has not been set already, it will be set to create one in the home directory.

# ------------------------Set CRDS context and paths----------------------
# Each version of the calibration pipeline is associated with a specific CRDS
# context file. The pipeline will select the appropriate context file behind
# the scenes while running. However, if you wish to override the default context
# file and run the pipeline with a different context, you can set that using
# the CRDS_CONTEXT environment variable. Here we show how this is done,
# although we leave the line commented out in order to use the default context.
# If you wish to specify a different context, uncomment the line below.
#os.environ['CRDS_CONTEXT'] = 'jwst_1464.pmap'  # CRDS context for 1.20.2

# Check whether the local CRDS cache directory has been set.
# If not, set it to the user home directory
if (os.getenv('CRDS_PATH') is None):
    os.environ['CRDS_PATH'] = os.path.join(os.path.expanduser('~'), 'crds')
    
# Check whether the CRDS server URL has been set.  If not, set it.
if (os.getenv('CRDS_SERVER_URL') is None):
    os.environ['CRDS_SERVER_URL'] = 'https://jwst-crds.stsci.edu'

# Echo CRDS path in use
print(f"CRDS local filepath: {os.environ['CRDS_PATH']}")
print(f"CRDS file server: {os.environ['CRDS_SERVER_URL']}")

2. Package Imports#

# Use the entire available screen width for this notebook
from IPython.display import display, HTML
display(HTML("<style>.container { width:95% !important; }</style>"))

# Basic system utilities for interacting with files
# ----------------------General Imports------------------------------------
import glob
import time
import json
from pathlib import Path
from collections import defaultdict

# Numpy for calculations
import numpy as np

# Pandas for loading data into tables
import pandas as pd

# Astroquery for downloading demo files
from astroquery.mast import Observations

# For visualizing data
import matplotlib.pyplot as plt
from astropy.visualization import (ManualInterval, LogStretch,
                                   ImageNormalize, simple_norm)
from astropy.stats import sigma_clip
from astropy.time import Time
import batman # Transit modeling

# For file manipulation
from astropy.io import fits

# For JWST calibration pipeline
import jwst
import crds

from jwst.pipeline import Detector1Pipeline
from jwst.pipeline import Spec2Pipeline
from jwst.pipeline import Tso3Pipeline

# JWST pipeline utilities
from jwst import datamodels
from jwst.associations import asn_from_list  # Tools for creating association files
from jwst.associations.lib.rules_level3_base import DMS_Level3_Base  # Definition of a Lvl3 association file

# Echo pipeline version and CRDS context in use
print(f"JWST Calibration Pipeline Version: {jwst.__version__}")
print(f"Using CRDS Context: {crds.get_context_name('jwst')}")

Define convenience functions#

These functions are used within the notebook and assist with selecting certain kinds of input data.

# Sort files into types: TA, spectrum, and F277W
def sort_files(files):
    tafiles = []
    scifiles = []
    f277wfiles = []
    for file in files:
        exptype = fits.getval(file, 'EXP_TYPE')
        filt = fits.getval(file, 'FILTER')
        if ((exptype == 'NIS_TACQ') | (exptype == 'NIS_TACONFIRM')):
            tafiles.append(file)
        if ((exptype == 'NIS_SOSS') & (filt == 'CLEAR')):
            scifiles.append(file)
        if ((exptype == 'NIS_SOSS') & (filt == 'F277W')):
            f277wfiles.append(file)
        
    return tafiles, scifiles, f277wfiles

# Start a timer to keep track of runtime
time0 = time.perf_counter()

3. Demo Mode Setup (ignore if not using demo data)#

If running in demonstration mode, set up the program information to retrieve the uncalibrated data automatically from MAST using astroquery. MAST has a dedicated service for JWST data retrieval, so the archive can be searched by instrument keywords rather than just filenames or proposal IDs.

The list of searchable keywords for filtered JWST MAST queries is here.

For this notebook, we will examine a single TSO of the target, which uses the GR700XD/CLEAR grating/filter combination. Note that the TSO data are typically split into multiple files to faciliate data processing; for more information see the documentation about Segmented Products.

We will start with the uncalibrated data products. The files we are interested in are named jw02734002001_04101_00001-segNNN_nis_uncal.fits, where NNN refers to the segment number.

More information about the JWST file naming conventions can be found at: https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/file_naming.html

# Set up the program information and paths for demo program
if demo_mode:
    print('Running in demonstration mode and will download example data from MAST!')
    
    # --------------Program and observation information--------------
    program = '02734'
    instr = 'NIRISS/SOSS'
    filt_pupil = 'CLEAR;GR700XD'
    targname = 'WASP-96'

    # --------------Program and observation directories--------------
    data_dir = os.path.join('.', 'nis_soss_demo_data')
    sci_dir = os.path.join(data_dir, 'PID_2734')
    uncal_dir = os.path.join(sci_dir, 'uncal')  # Uncalibrated pipeline inputs should be here

    if not os.path.exists(uncal_dir):
        os.makedirs(uncal_dir)

    # Create directory if it does not exist
    if not os.path.isdir(data_dir):
        os.mkdir(data_dir)

Identify list of science (SCI) uncalibrated files associated with visits.

# Obtain a list of observation IDs for the specified demo program
if demo_mode:
    # Science data
    sci_obs_id_table = Observations.query_criteria(instrument_name=[instr],
                                                   proposal_id=[program],
                                                   filters=[filt_pupil],  # Data for specific filter/pupil
                                                   obs_id=['jw' + program + '*'],
                                                   target_name=targname)

# Turn the list of visits into a list of uncalibrated data files
if demo_mode:
    # Define types of files to select
    file_dict = {'uncal': {'product_type': 'SCIENCE',
                           'productSubGroupDescription': 'UNCAL',
                           'calib_level': [1]}}

    # Science files
    sci_files = []

    # Loop over visits identifying uncalibrated files that are associated
    # with them
    for exposure in (sci_obs_id_table):
        products = Observations.get_product_list(exposure)
        for filetype, query_dict in file_dict.items():
            filtered_products = Observations.filter_products(products, productType=query_dict['product_type'],
                                                             productSubGroupDescription=query_dict['productSubGroupDescription'],
                                                             calib_level=query_dict['calib_level'])
            sci_files.extend(filtered_products['dataURI'])

    print(f"Science files selected for downloading: {len(sci_files)}")

Download all the uncal files and place them into the appropriate directories.

Warning: If this notebook is halted during this step the downloaded file may be incomplete, and cause crashes later on!

if demo_mode:
    for filename in sci_files:
        sci_manifest = Observations.download_file(filename,
                                                  local_path=os.path.join(uncal_dir, Path(filename).name))

4. Directory Setup#

Set up detailed paths to input/output stages here.

# Define output subdirectories to keep science data products organized
# -----------------------------Science Directories------------------------------
uncal_dir = os.path.join(sci_dir, 'uncal')  # Uncalibrated pipeline inputs should be here
det1_dir = os.path.join(sci_dir, 'stage1')  # calwebb_detector1 pipeline outputs will go here
spec2_dir = os.path.join(sci_dir, 'stage2')  # calwebb_spec2 pipeline outputs will go here
tso3_dir = os.path.join(sci_dir, 'stage3')  # calwebb_tso3 pipeline outputs will go here

# We need to check that the desired output directories exist, and if not create them
# Ensure filepaths for input data exist
if not os.path.exists(uncal_dir):
    os.makedirs(uncal_dir)

if not os.path.exists(det1_dir):
    os.makedirs(det1_dir)
if not os.path.exists(spec2_dir):
    os.makedirs(spec2_dir)
if not os.path.exists(tso3_dir):
    os.makedirs(tso3_dir)

Print the exposure parameters of all potential input files:

uncal_files = sorted(glob.glob(os.path.join(uncal_dir, '*_uncal.fits')))

for file in uncal_files:
    model = datamodels.open(file)
    # print file name
    print(model.meta.filename)
    # Print out exposure info
    print(f"Instrument: {model.meta.instrument.name}")
    print(f"Filter: {model.meta.instrument.filter}")
    print(f"Pupil: {model.meta.instrument.pupil}")
    print(f"Exposure type: {model.meta.exposure.type}")
    print(f"Total number of integrations: {model.meta.exposure.nints}")
    if model.meta.exposure.nints != 1:
        print(f"Integration range: {model.meta.exposure.integration_start}-{model.meta.exposure.integration_end}")
    print(f"Exposure start time (UTC): {Time(model.meta.exposure.start_time, format='mjd').fits}")
    print(f"Number of groups: {model.meta.exposure.ngroups}")
    print(f"Readout pattern: {model.meta.exposure.readpatt}")
    print("\n")
    model.close()

Since this is a NIRISS SOSS observation, the first four files are target aquisition (TA) exposures. Target acquisition is performed in a 64x64 pixel subarray before the target is moved to its position in the science subarray. The TA exposures have exposure type NIS_TACQ or NIS_TACONFIRM and use the F480M filter. These exposures, particularly the final confirmation image, can be helpful for diagnosing potential problems with the data. For more information about the SOSS TA procedure, see the NIRISS TA documentation.

The following three exposures are our time series observation, split into three segments: seg001 through seg003 in the filenames. These exposures use the CLEAR/GR700XD filter/pupil combination and consist of 280 integrations in total, each composed of 14 groups up the ramp, corresponding to a total exposure time of 6.41 hours. Each exposure uses the NISRAPID readout pattern.

The final exposure uses the F277W filter and was obtained because it is useful for masking order-zero sources and to isolate the first spectral order in the $2.4 \ \mu m-2.8 \ \mu m$ wavelength range, where they overlap significantly in the CLEAR exposures. We will process the F277W exposure through stage 1 in this notebook, but the background subtraction step in the second stage does not currently work with these exposures.

For more information about how JWST exposures are defined by up-the-ramp sampling, see the Understanding Exposure Times JDox article.

In this notebook, we will focus on processing the CLEAR/GR700XD exposures (though we will also process the single F277W exposure through Stage 1), so we can update the list of uncalibrated files to remove the TA exposures:

# Print out the time benchmark
time1 = time.perf_counter()
print(f"Runtime so far: {time1 - time0:0.0f} seconds")

5. Detector1 Pipeline#

Run the data through the Detector1 stage of the pipeline to apply detector level calibrations and create a countrate data product where slopes are fitted to the integration ramps. These _rateints.fits products are 3D (nintegrations x nrows x ncols) and contain the fitted ramp slopes for each integration. 2D countrate data products (_rate.fits) are also created (nrows x ncols) which have been averaged over all integrations.

The following Detector1 steps are available for NIRISS SOSS:

group_scale
dq_init
saturation
superbias
refpix
linearity
dark_current
jump
clean_flicker_noise
ramp_fit
gain_scale

By default, these Detector1 steps are currently skipped for NIRISS SOSS exposures: group_scale, clean_flicker_noise, and gain_scale.

Each observing mode of JWST has different requirements when it comes to correcting for detector effects. The clean_flicker_noise step was designed to remove 1/f noise from calibrated ramp images, but SOSS users have found its performance insufficient due to the lack of non-illuminated background pixels in the SOSS subarrays. A more rigorous group-level subtraction is likely needed, and is currently in development by the SOSS team. By default, this step is currently skipped.

It is also unclear whether TSO science benefits from the dark current step in its current implementation. In the following example we leave the step on, but it can easily be turned off as shown.

For more information about each step and a full list of step arguments, please refer to the official documentation on ReadtheDocs.

# Set up a dictionary to define how the Detector1 pipeline should be configured

# Boilerplate dictionary setup
det1dict = defaultdict(dict)

# Step names are copied here for reference
det1_steps = ['group_scale', 'dq_init', 'saturation', 'superbias', 'refpix',
              'linearity', 'dark_current', 'jump', 'clean_flicker_noise',
              'ramp_fit', 'gain_scale']

# Overrides for whether or not certain steps should be skipped
# Optionally, skip the dark step
# det1dict['dark_current']['skip'] = True

# Overrides for various reference files
# Files should be in the base local directory or provide full path
#det1dict['dq_init']['override_mask'] = 'myfile.fits'  # Bad pixel mask
#det1dict['saturation']['override_saturation'] = 'myfile.fits'  # Saturation
#det1dict['linearity']['override_linearity'] = 'myfile.fits'  # Linearity
#det1dict['dark_current']['override_dark'] = 'myfile.fits'  # Dark current subtraction
#det1dict['jump']['override_gain'] = 'myfile.fits'  # Gain used by jump step
#det1dict['ramp_fit']['override_gain'] = 'myfile.fits'  # Gain used by ramp fitting step
#det1dict['jump']['override_readnoise'] = 'myfile.fits'  # Read noise used by jump step
#det1dict['ramp_fit']['override_readnoise'] = 'myfile.fits'  # Read noise used by ramp fitting step

# Turn on multi-core processing (off by default). Choose what fraction of cores to use (quarter, half, or all)
det1dict['jump']['maximum_cores'] = 'half'

Run Detector1 stage of pipeline:

# Run Detector1 stage of pipeline, specifying:
# output directory to save *_rateints.fits files
# save_results flag set to True so the *rateints.fits files are saved

if dodet1:
    for uncal in uncal_files:
        rate_result = Detector1Pipeline.call(uncal,
                                             output_dir=det1_dir,
                                             steps=det1dict,
                                             save_results=True)
else:
    print('Skipping Detector1 processing')

# Print out the time benchmark
time1 = time.perf_counter()
print(f"Runtime for Detector1: {time1 - time0:0.0f} seconds")

Exploring the data#

Identify *_rateints.fits files and verify which pipeline steps were run and which calibration reference files were applied.

if dodet1:
    # find rateints files
    rateints_files = sorted(glob.glob(os.path.join(det1_dir, '*_rateints.fits')))
    # Restrict to selected filter if applicable
    #rateints_files = select_filter_files(rateints_files, use_filter)
    
    # Read in the first file as datamodel as an example
    rateints = datamodels.open(rateints_files[0])
    
    # Check which steps were run
    for step, status in rateints.meta.cal_step.instance.items():
        print(f"{step}: {status}")

Check which CRDS version and reference files were used to calibrate the dataset:

if dodet1:
    for key, val in rateints.meta.ref_file.instance.items():
        print(f"{key}:")
        for param in rateints.meta.ref_file.instance[key]:
            print(f"\t{rateints.meta.ref_file.instance[key][param]}")

6. Spec2 Pipeline#

In the Spec2 stage of the pipeline, each exposure is corrected with further instrumental calibrations, and then a 1D spectrum is extracted. The steps applied to NIRISS SOSS exposures are, in order:

assign_wcs
background
srctype
straylight
flat_field
pathloss
extract_1d
photom

Note that while most JWST spectroscopic modes perform the photometric calibration (photom) followed by spectral extraction (extract_1d), the order of these steps is switched for SOSS so that the overlapping spectral orders can be disentangled before they are converted to flux units, as required for the Algorithm to Treat Order ContAmination (ATOCA).

The Spec2 Pipeline can produce *_x1d.fits or *_x1dints.fits files, depending on whether the input files are *_rate.fits or *_rateints.fits. These products are the 1D extracted spectra which will be used as input to the following pipeline stage. In this case, we are interested in the time series of observations, so we want to preserve the multiple integrations by using the *_rateints.fits files as input.

The Spec2 Pipeline can also save *_cal.fits or *_calints.fits, which are 2D or 3D fully calibrated images, again depending on the dimensions of the input.

For more information about each step and a full list of step arguments, please refer to the official documentation on ReadtheDocs.

time_spec2 = time.perf_counter()

# Set up a dictionary to define how the Spec2 pipeline should be configured.

# Boilerplate dictionary setup
spec2dict = defaultdict(dict)

# Step names are copied here for reference
spec2steps = ['assign_wcs', 
              'bkg_subtract',
              'srctype',
              'straylight',
              'flat_field',
              'pathloss',
              'extract_1d',
              'photom']

# Overrides for whether or not certain steps should be skipped (example)
# spec2dict['bkg_subtract']['skip'] = True

spec2dict['bkg_subtract']['soss_source_percentile'] = 50.0

# Overrides for various reference files
# Files should be in the base local directory or provide full path
#spec2dict['flat_field']['override_flat'] = 'myfile.fits'  # Pixel flatfield

Find and sort the input files, ensuring use of absolute paths. At this time we will discard the last exposure, the F277W spectrum, from the list of files, as it cannot currently be processed through Stage 2. We will also remove the TA exposures from the list.

# Use files from the detector1 output folder
rateints_files = sorted(glob.glob(os.path.join(det1_dir, 'jw*rateints.fits')))

for ii in range(len(rateints_files)):
    rateints_files[ii] = os.path.abspath(rateints_files[ii])

# Discard any TA exposures and the final F277W exposure from the list
tafiles, scifiles, f277wfiles = sort_files(rateints_files)
rateints_files = scifiles

print(f"Found {str(len(rateints_files))} science files")

# Run Spec2 stage of pipeline, specifying:
# output directory to save files

if dospec2:
    for rateints in rateints_files:
        calints_result = Spec2Pipeline.call(rateints,
                                            output_dir=spec2_dir,
                                            steps=spec2dict,
                                            )
else:
    print("Skipping Spec2 processing.")

# Print out the time benchmark
time1 = time.perf_counter()
print(f"Runtime so far: {time1 - time0:0.0f} seconds")
print(f"Runtime for Spec2: {time1 - time_spec2:0.0f} seconds")

Verify which pipeline steps were run:

if dospec2:
    # Identify *_calints.fits files
    calints_files = sorted(glob.glob(os.path.join(spec2_dir, '*_calints.fits')))
    # Restrict to selected filter if applicable

    # Read in the first file as datamodel as an example
    calints = datamodels.open(calints_files[0])
    
    # Check which steps were run
    for step, status in calints.meta.cal_step.instance.items():
        print(f"{step}: {status}")

Check which reference files were used to calibrate the dataset:

if dospec2:
    for key, val in calints.meta.ref_file.instance.items():
        print(f"{key}:")
        for param in calints.meta.ref_file.instance[key]:
            print(f"\t{calints.meta.ref_file.instance[key][param]}")

7. TSO3 Pipeline#

In the TSO3 stage of the pipeline, an association of calibrated TSO exposures is used to produce calibrated time-series spectra of the target. By default his stage consists of three steps for SOSS TSOs:

outlier_detection
extract_1d
white_light

This stage also includes an optional pixel replacement step (pixel_replace) that is off by default for SOSS, but can be enabled to reduce the noise introduced by bad pixels in the spectrum. To run it, uncomment the relevant line in the tso3dict below.

In order to run the TSO3 stage, an Association file must first be created to instruct the pipeline to process the segments of the time series together.

time_tso3 = time.perf_counter()

# Set up a dictionary to define how the TSO3 pipeline should be configured
# Boilerplate dictionary setup
tso3dict = defaultdict(dict)

# Options for each step (example)
#tso3dict['outlier_detection']['snr'] = '5.0 4.0' # Signal-to-noise thresholds for bad pixel identification
#tso3dict['extract_1d']['soss_atoca'] = False  # Turn off the ATOCA algorithm for order contamination (default=True)
#tso3dict['whitelight']['min_wavelength'] = 0.8 # minimum wavelength from which to sum the flux array
tso3dict['extract_1d']['save_results'] = True
tso3dict['white_light']['save_results'] = True
# tso3dict['pixel_replace']['skip'] = False # Run the pixel replacement step

# Overrides for whether or not certain steps should be skipped (example)
#tso3dict['white_light']['skip'] = True

# Overrides for various reference files (example)
# Files should be in the base local directory or provide full path
#tso3dict['extract_1d']['override_extract1d'] = 'myx1dfile.fits'  # override spectral extraction parameters

Find and sort all of the input files, ensuring use of absolute paths

# TSO3 takes the calints.fits files output by Spec2

calints_files = sorted(glob.glob(os.path.join(spec2_dir, 'jw*calints.fits')))
calints_files = [os.path.abspath(calints) for calints in calints_files]

print(f'Found {str(len(calints_files))} science files to process')

Create Association Files#

An association file lists the exposures to calibrate together in Stage 3 of the pipeline. Note that an association file is available for download from MAST, with a filename of *_asn.json, though it may require additional manipulation.

Here we create a Level 3 association file that will tell the pipeline to process the segments of the time series together. Since we only have a single time series, we only need one association file.

Note that the final output products will have a rootname that is specified by the product_name in the association file.

# Create Level 3 Association
if dotso3:
    # Get the program, target name from the header of one file
    hdr = fits.getheader(calints_files[0])
    program = hdr['PROGRAM']
    name = hdr['TARGNAME']
    # Create and save the asn file to the TSO3 directory
    asnfile = os.path.join(tso3_dir, f'level3_{program}_asn.json')
    asn = asn_from_list.asn_from_list(calints_files, rule=DMS_Level3_Base, product_name=name)
    asn.data['asn_type'] = 'tso3'
    asn.data['program'] = program

    with open(asnfile, 'w') as f:
        f.write(asn.dump()[1])
    if os.path.exists(asnfile):
        print(rf"Level 3 association successfully created and saved to: {asnfile}")

# Examine the ASN file.
if dotso3:
    with open(asnfile, 'r') as f_obj:
        asnfile_data = json.load(f_obj)
    expanded_json = json.dumps(asnfile_data, indent=2)
    print(expanded_json)

Run TSO3 stage of the pipeline#

From the four exposures listed in the association file, the TSO3 stage of the pipeline will produce:

*_cfrints.fits files from the outlier_detection step containing cosmic-ray-flagged images with updated DQ arrays for each input calints file
*_x1dints.fits file from the extract_1d step containing extracted spectra for all integrations in the input exposures
*_whtlt.ecsv file from the white_light step containing an ASCII catalog of wavelength-integrated white-light flux as a function of time

# Run Stage 3
if dotso3:
    tso3_result = Tso3Pipeline.call(asnfile,
                                    save_results=True,
                                    output_dir=tso3_dir,
                                    steps=tso3dict)
else:
    print('Skipping TSO3 processing')

# Print out the time benchmark
time1 = time.perf_counter()
print(f"Runtime so far: {time1 - time0:0.0f} seconds")
print(f"Runtime for TSO3: {time1 - time_tso3:0.0f} seconds")

Verify which pipeline steps were run#

if dotso3:
    # Identify *x1dints.fits file and open as datamodel
    x1dints = glob.glob(os.path.join(tso3_dir, "*_x1dints.fits"))[0]

    with datamodels.open(x1dints) as x1d_model:
        # Check which steps were run
        for step, status in x1d_model.meta.cal_step.instance.items():
            print(f"{step}: {status}")

Check which reference files were used to calibrate the dataset

if dotso3:
    for key, val in x1d_model.meta.ref_file.instance.items():
        print(f"{key}:")
        for param in x1d_model.meta.ref_file.instance[key]:
            print(f"\t{x1d_model.meta.ref_file.instance[key][param]}")

8. Visualize the results#

Define plotting functions#

def prepare_spectra(filelist, order=[1, 2]):
    """
    Join segmented 1D spectra and their corresponding timestamps
    and wavelengths to form single arrays. If only one file given, 
    just extract quantities and return them in suitable shape for plotting.
    Handles x1dints products from either spec2 or tso3.

    Notes
    -----
    The contents of the returned dictionary are structured as follows:
        specdict = {
                    "ORDER1" : {"FLUX" : arr(nints, npix),
                                "FLUX_ERROR" : arr(nints, npix),
                                "WAVELENGTH" : arr(nints, npix),
                                "INT_TIMES" : arr(nints),
                                }
                    "ORDER2" : ...
                    }
    
    Parameters
    ----------
    filelist: str, list of str
        Name(s) of x1dints file(s) to join. 

    Returns
    -------
    specdict: dict
        Nested dictionary containing dict of spectral data for each order. See Notes for contents
    """
    
    # Check inputs
    if isinstance(filelist, str):
        filelist = [filelist]
    if isinstance(order, int):
        order = [order]
    # Prepare a dictionary for outputs
    specdict = defaultdict(dict)
    # Find out how many orders are contained in the dataproducts
    ordlst = []
    with fits.open(filelist[0]) as hdulist:
        for ext in hdulist:
            try:
                ordlst.append(hdulist[ext].header["SPORDER"])
            except KeyError:
                continue
    ordersinfile = max(ordlst)
    # Loop over the *requested* orders
    for nord in np.arange(order[0] - 1, order[-1]):
        first = True
        # Loop over input files
        for seg in filelist:
            with datamodels.open(seg) as model:
                specidxlist = np.arange(len(model.spec))
                idx = specidxlist[nord::ordersinfile]
                for i in idx:
                    segflux = model.spec[i].spec_table.FLUX
                    segfluxerr = model.spec[i].spec_table.FLUX_ERROR
                    seginttimes = model.spec[i].spec_table['TDB-MID']
                    # wavelength array for each integration is the same, but we include them all for completeness
                    segwavel = model.spec[i].spec_table.WAVELENGTH
                    if first:
                        fullflux = segflux
                        fullfluxerr = segfluxerr
                        fulltimes = seginttimes
                        fullwavels = segwavel
                        first = False
                    else:
                        fullflux = np.concatenate((fullflux, segflux), axis=0)
                        fullfluxerr = np.concatenate((fullfluxerr, segfluxerr), axis=0)
                        fullwavels = np.concatenate((fullwavels, segwavel), axis=0)
                        fulltimes = np.concatenate((fulltimes, seginttimes), axis=0)
                        
        # Populate the dictionary
        specdict[f'ORDER{nord+1}']['FLUX'] = fullflux
        specdict[f'ORDER{nord+1}']['FLUX_ERROR'] = fullfluxerr
        specdict[f'ORDER{nord+1}']['WAVELENGTH'] = fullwavels
        specdict[f'ORDER{nord+1}']['INT_TIMES'] = fulltimes
        
        print(f"Order {nord+1} dimensions: {fullflux.shape[0]} integrations, {fullflux.shape[1]} wavelengths")
    return specdict

def display_soss_im(filename, intnum=0, groupnum=-1, lognormalize=True):
    """
    Display a SOSS image.

    Can display 2D (e.g. rate), 3D (e.g. calints), 4D (e.g. uncal) images.
    Defaults to show the first integration of 3D files; first integration/
    last group of 4D files.

    Parameters
    ----------
    filename: str
        Name of file to display
    intnum: int
        Integration number in each file to display. Default is 0.
    groupnum: int
        Group number to display, if relevant. Default is -1 (last group)
    """

    # Open file as a datamodel
    with datamodels.open(filename) as model:
        # Read the data
        data = model.data
        # Get some other useful info
        units = model.meta.bunit_data
        filt = model.meta.instrument.filter
        pupil = model.meta.instrument.pupil
        targname = model.meta.target.catalog_name
        
        if len(data.shape) == 2:
            data = np.expand_dims(data, axis=0)
        # Check inputs
        if isinstance(intnum, int) and -1 <= intnum < data.shape[0]: # if intnum is between -1 and the first dimension
            data = data[intnum] # now 2d 
        else:
            raise ValueError(f"Invalid integration number '{intnum}' for data with shape {data.shape}")
        if len(data.shape) == 3:
            if isinstance(groupnum, int) and -1 <= groupnum < data.shape[0]:
                data = data[groupnum]
            else:
                raise ValueError(f"Invalid group number '{groupnum}' for data with shape {data.shape}")
    # Make the figure
    fig, ax = plt.subplots(1, figsize=(12, 5))
    if lognormalize is True:
        # sigma_clipped_data = sigma_clip(data)
        # vmin = np.nanmin(sigma_clipped_data)
        # vmax = np.nanmax(sigma_clipped_data)
        vmin = np.nanmin(data)
        vmax = np.nanmax(data)
        # norm = simple_norm(data, vmin=vmin, vmax=vmax)
        norm = ImageNormalize(data,
                              interval=ManualInterval(vmin=vmin-(vmin), vmax=vmax+(vmin)),
                              stretch=LogStretch())
        im = ax.imshow(data, origin='lower', norm=norm)
    else:
        im = ax.imshow(data, origin='lower')
    titlestring = f"{os.path.basename(filename)}: {targname} {pupil}/{filt}"
    ax.set_title(titlestring)
    ax.set_xlabel('X pixel')
    ax.set_ylabel('Y pixel')
    fig.colorbar(im, orientation='horizontal', label=units)

def display_spectrum(filename, intnum=0, order=1):
    """
    Display a SOSS spectrum.

    This function take either a single file or a list of files,
    in which case it will assume they are a single TSO and concatenate 
    the segments to produce a single set of extracted measurements.
    Can display an averaged spectrum ('x1d') or a single integration of a 
    multi-integration spectrum ('x1dints').

    Parameters
    ----------
    filename: str or list of str
        Name of file to display
    intnum: int
        Integration number in each file to display. Default is 0.
    order: int
        Which spectral order to plot. Options are 1, 2, and 3.
    """
    
    # Check inputs
    if isinstance(order, int):
        order = [order]
    # If multiple segments, join the spectra
    # We assume they are in the correct order already...
    specdict = prepare_spectra(filename, order=order)
    
    # Set up the plot.
    fig, ax = plt.subplots(1, figsize=(10, 5))
    
    for ordr in order:
        orderdict = specdict[f"ORDER{ordr}"]
        flux = orderdict['FLUX']
        fluxerr = orderdict['FLUX_ERROR']
        wavel = orderdict['WAVELENGTH']
        ax.errorbar(wavel[intnum], flux[intnum], yerr=fluxerr[intnum], label=f"Order {ordr}")
    plt.legend()
    ax.set_xlabel(r'Wavelength [$\mu$m]')
    ax.set_ylabel('Flux [MJy]')
    ax.set_title(os.path.basename(filename))

def spectrum_timeseries(specfiles, normalize=True):
    """
    Display a SOSS spectrum timeseries.

    This function creates a plot of flux as a function of both wavelength and time,
    to visualize how the flux changes over the duration of the time series. The flux
    at each wavelength is normalized by default to emphasize the transit.

    Parameters
    ----------
    specfiles: str or list of str
        Name of file(s) containing a time-series of spectra
    normalize: bool
        If True, normalize the flux at each wavelength by an approximate out-of-transit flux
        Default=True
    """
    
    # Get data 
    specdict = prepare_spectra(specfiles)
    
    flux = specdict["ORDER1"]["FLUX"]
    wavel = specdict["ORDER1"]["WAVELENGTH"][0] 
    times = specdict["ORDER1"]["INT_TIMES"]

    fig, ax = plt.subplots(1, figsize=(7, 7))

    if normalize:

        # Average flux at each wavelength for first 100 integrations (assumed out-of-transit)
        first100 = np.nanmean(flux[:100, :], axis=0)
        # Normalize the flux to emphasize the transit
        flux = flux/first100
        sigma_clipped_data = sigma_clip(flux)
        vmin = np.nanmin(sigma_clipped_data)
        vmax = np.nanmax(sigma_clipped_data)
        norm = simple_norm(sigma_clipped_data, vmin=vmin, vmax=vmax)
    # Trim last row/column for mesh plotting
    fluxtrim = flux[:-1, :-1].T

    # Map times in MJD to time from midtransit
    # Transit ephemera from Carter et al. 2024 (doi:10.1038/s41550-024-02292-x)
    T_c = 2459787.5567843 - 2400000.5
    
    reltime = (times - T_c) * 24
    if normalize:
        mesh = ax.pcolormesh(reltime, wavel, fluxtrim, norm=norm)
    else:
        mesh = ax.pcolormesh(reltime, wavel, fluxtrim)
    fig.colorbar(mesh, label='Relative Flux')
    ax.set_xlabel('Time from Mid-Transit [hr]')
    ax.set_ylabel(r'Wavelength [$\mu$m]')

def display_lightcurve(whtlt_file, order=[1, 2], modelparams=None):
    """
    Display a SOSS whitelight curve.

    This function displays the total flux over all wavelengths for 
    each integration in the time series, for each order. Optionally, 
    we can pass `batman` model parameters to overplot a transit model.

    Parameters
    ----------
    whtlt_file: str
        Name of file(s) containing a time-series of spectra
    order: int
        Which spectral order to plot. Options are 1, 2, and 3.
    modelparams: batman.TransitParams()
        Transit model parameters to plot with the data
    """
    # Read the file into a pandas dataframe
    whitelight_df = pd.read_csv(whtlt_file, on_bad_lines='skip', comment='#', sep=r'\s+')
    # Check inputs
    if isinstance(order, int):
        order = [order]
    # Set up the figure
    fig, axs = plt.subplots(len(order), 1, figsize=(7, 5), sharex=True)
    if not isinstance(axs, np.ndarray):
        axs = [axs]
    
    times = np.array(whitelight_df['BJD_TDB'])
    if modelparams is not None:
        # Plot a simple transit model using the best-fit parameters from the Carter et al. 2024 paper
        transitmodel = batman.TransitModel(modelparams, times)    # initializes model
        modelflux = transitmodel.light_curve(modelparams)         # calculates light curve
        reltime = (times - modelparams.t0)*24
    
    for ii, ordr in enumerate(order):
        flux = whitelight_df[f'whitelight_flux_order_{ordr}']
        # Normalize to have an out-of-transit flux of ~1
        fluxnorm = flux/np.nanmean(flux[:100])
        
        axs[ii].set_ylabel('Relative Flux')
        axs[ii].set_title(f'Order {ordr}')
        if modelparams is not None:
            axs[ii].plot(reltime, fluxnorm, label='Data')
            axs[ii].plot(reltime, modelflux, 'k-', label='Model')
            axs[-1].set_xlabel('Time to Mid-Transit [hr]')
        else:
            axs[ii].plot(times, fluxnorm, label='Data')
            axs[-1].set_xlabel('BJD [TDB]')
    
    plt.suptitle(os.path.basename(whtlt_file))
    plt.legend()

Gather the data#

if doviz:
    # Get all the filenames
    rateintsfiles = sorted(glob.glob(os.path.join(det1_dir, '*rateints.fits')))
    tafiles, scifiles, f277wfiles = sort_files(rateintsfiles)
    rateintsfiles = scifiles
    calintsfiles = sorted(glob.glob(os.path.join(spec2_dir, '*calints.fits')))
    x1d_spec2files = sorted(glob.glob(os.path.join(spec2_dir, '*x1dints.fits')))
    x1d_tso3file = sorted(glob.glob(os.path.join(tso3_dir, '*x1dints.fits')))[0]
    whtltfile = sorted(glob.glob(os.path.join(tso3_dir, '*whtlt.ecsv')))[0]

Examine Target Acquisition images#

We will not do any analysis of the TA images in this instance, but we will check them briefly to confirm that the source appears at the center of the final TACONFIRM image. These 4D uncal files contain a single integration composed of 13 groups up the ramp, so we will look at the last group of each file.

if (doviz & (len(tafiles) > 0)):
    # Prepare the figure
    fig, axs = plt.subplots(1, 4, figsize=(12, 3))
    for ii, ta_fn in enumerate(tafiles):
        # Plot the data
        with datamodels.open(ta_fn) as ta_dm:
            ta_im = ta_dm.data
            # Exclude outlier pixels
            sigma_clipped_data = sigma_clip(ta_im[0])
            imshape = ta_im[0].shape
            vmin = np.nanmin(sigma_clipped_data)
            vmax = np.nanmax(sigma_clipped_data)
            # normalize, adjusting the limits a bit
            norm = simple_norm(sigma_clipped_data, vmin=vmin - (0.5*abs(vmin)), vmax=vmax + (0.5*abs(vmax)))

            im = axs[ii].imshow(ta_im[0], origin='lower', norm=norm)
            units = ta_dm.meta.bunit_data
            axs[ii].set_title(ta_dm.meta.exposure.type)
            axs[ii].scatter(imshape[0]/2, imshape[1]/2, marker='x', color='k')

            axs[ii].set_axis_off()

    fig.subplots_adjust(bottom=.2)
    cbar_ax = fig.add_axes([0.12, 0.0, 0.78, 0.12])
    cbar = fig.colorbar(im, cax=cbar_ax, orientation='horizontal')
    cbar.set_label(units, fontsize=12)

We can see that the source is being dithered in the first three images, and is nicely centered in the final image, as indicated by the black ‘x’ at the center of the array. The white pixels are those replaced by NaNs by the Detector1 pipeline due to saturation, outliers, or other data quality issues.

Examine Detector1 products#

We will begin by looking at the output of the first pipeline stage, the rateints files. As the 2D spectra over the duration of the time series will look very similar, we will only plot the first.

if doviz:
    display_soss_im(rateintsfiles[0])

We are displaying these images on a logarithmic scale to be able to see the features of both the cross-dispersed spectral traces and the background. To see them on a linear scale, pass the argument lognormalize=False.

Orders 1, 2, and 3 are clearly visible in this image, along with some faint background sources. Order 1 extends across the full subarray and covers wavelengths from 0.9 $\mu m$ to 2.8 $\mu m$. The Order 2 trace, covering 0.6$ \mu m$ to 1.4$ \mu m$, overlaps with Order 1 at the longer-wavelength end of the traces. Order 3 peaks at ~0.6 $ \mu m$, though it is very faint.

For bright targets that would saturate the detector with the longer readout times of the SUBSTRIP256 subarray, users can elect to use the SUBSTRIP96 array. The 96-pixel vertical dimension of this subarray covers only the Order 1. For more information about the optics and spectral traces, see the GR700X grism documentation.

Although we did not extract a spectrum from it, it is interesting to look at the F277W spectral image:

if (doviz & (len(f277wfiles) > 0)):
    display_soss_im(f277wfiles[0])

Compared to the CLEAR image above, we can clearly see that the Order 0 trace extends over only a small portion of the subarray, and Orders 2 and 3 are entirely absent. This is expected, because the left side of the subarray (in DMS orientation) corresponds to the longer wavelengths that SOSS is sensitive to, and the F277W filter covers only 2.413 $\mu m$ to 3.143 $\mu m$. This allows users to isolate the first order spectrum in the range where the first and second order overlap considerably.

We can also see several potentially contaminating zeroth-order sources in the image (the Order 0 for our target does not fall on the detector). Characterizing these sources in the F277W image can allow users to model and remove them in the full wavelength coverage spectra, although this capability is not currently included in the pipeline. Best practices for obtaining an F277W exposure are included in the SOSS Recommended Strategies.

The 1/f noise (vertical striping) is also more apparent in this exposure because of the lower signal relative to the background.

Examine Spec2 products#

Next we will look at the outputs of the second pipeline stage: the calibrated calints images and the preliminary x1dints extracted spectra. For more information about the structure and contents of these data products, see the documentation about x1d/x1dints and calints files.

if doviz:
    display_soss_im(calintsfiles[0])

The SOSS sky background is characterized by a smooth rising gradient towards longer wavelengths, with a sharp discontinuity at around X=700 (corresponding to 2.1 $\mu m$ in Order 1) caused by the edges of the pick-off mirror (POM). The background varies with sky position and relative pointing due to the contribution of zodiacal light. Because of the variation in relative intensity on either side of the discontinuity, an empirical model cannot be simply linearly scaled to match a given observation. However, it is important to remove the background for most SOSS science use cases, so in the absence of a contemporaneous background exposure, the Spec2 bkg_subtract step finds the best match for each exposure from a library of empirical models scaled independently on either side of the discontinuity.

We can see that the background is considerably lower in the background-subtracted calints image compared to the previous rateints image, and the discontinuity is not as prominent, though some structure remains. The NIRISS team intends to continue improving the background subtraction algorithm in the future.

Now, let’s look at one of the extracted spectra.

if doviz:
    display_spectrum(x1d_spec2files[0], order=1)

Note that not all of the “features” of this spectrum are absorption lines; many are due to bad pixels in the spectrum. As mentioned in the TSO3 Pipeline section, there is an optional pixel replacement step that may reduce this type of noise.

This is the first integration of the time series, but we are especially interested in how the spectrum changes over the duration of the observations. Let’s look at the spectrum as a function of both wavelength and time:

if doviz:
    spectrum_timeseries(x1d_spec2files)

In this plot, the flux at each wavelength is normalized so that the decrease in flux caused by the transiting exoplanet is emphasized.

If we pass the argument normalize=False to the plotting function, the transit is much less obvious but we can see some interesting spectral features, as well as how the flux decreases at longer wavelengths.

if doviz:
    spectrum_timeseries(x1d_spec2files, normalize=False)

Examine TSO3 products#

Lastly, we will examine the products of the third and final pipeline stage: the final x1dints spectra and the white light (whtlt) curve for orders 1 and 2.

if doviz:
    display_spectrum(x1d_tso3file, order=1)

if doviz:
    display_spectrum(x1d_tso3file, order=2)

By eye, there is not much difference apparent between the Stage 2 x1dints files and the Stage 3 x1dints files; the main difference is that the outlier detection step has now been run. However, it is worth noting that in Order 2, measurements beyond ~1 $\mu m$ are not reliable due to contamination from Order 1.

We can also make the 2d transit plot again, using the final extracted spectrum:

if doviz:
    spectrum_timeseries(x1d_tso3file)

Lastly, let’s look at the white light curve. In this file, the flux at each wavelength has been summed to give us a single value at each timestamp.

We will plot the white light curve for orders 1 and 2, along with a simple transit model using the batman package (Kreidberg 2015). We adopt all of our planetary parameters from McGruder et al. 2023, except for the ratio of the planet radius to the stellar radius ($R_{p}/R_{*}$), which is from Kokori et al. 2023.

if doviz:
    # Set up the model parameters
    params = batman.TransitParams()
    params.t0 = 2456258.06272 - 2400000.5       # time of inferior conjunction
    params.per = 3.4252567                        # orbital period
    params.rp = 0.1175                            # planet radius (in units of stellar radii)
    params.a = 9.13                               # semi-major axis (in units of stellar radii)
    params.inc = 85.45                            # orbital inclination (in degrees)
    params.ecc = 0.                               # eccentricity
    params.w = 90.                                # longitude of periastron (in degrees)
    params.u = [0.16, 0.26]                       # limb darkening coefficients [u1, u2]
    params.limb_dark = "quadratic"                # limb darkening model

    # Display the light curves and models
    display_lightcurve(whtltfile, order=[1, 2], modelparams=params)

Since the model parameters were derived from a different reduction that used the same data, it makes sense that this model looks like a pretty good fit to the data by eye. Further model fitting is beyond the scope of this notebook, but can also be done using batman or a variety of other open-source software packages.