NIRCam Imaging Pipeline Notebook

This notebook currently fails to execute, use as reference only

NIRCam Imaging Pipeline Notebook#

Authors: B. Hilbert, based on the NIRISS imaging notebook by R. Diaz
Last Updated: May 5, 2025
Pipeline Version: 1.18.1 (Build 11.3)

Purpose:
This notebook provides a framework for processing generic Near-Infrared Camera (NIRCam) Imaging data through all three James Webb Space Telescope (JWST) pipeline stages. Data is assumed to be located in a folder structure following the paths set up below. It should not be necessary to edit any cells other than in the Configuration section unless modifying the standard pipeline processing options.

Data:
This example is set up to use an example dataset is from Program ID 2739 (PI: Pontoppidan) which is a Cycle 1 Outreach program. We focus on the data from Observation 001 Visit 002, in which M-16, or the “Pillars of Creation” were observed. Example input data to use will be downloaded automatically unless disabled (i.e., to use local files instead).

JWST pipeline version and CRDS context:
This notebook was written for the calibration pipeline version given above. It sets the CRDS context to the latest context in the JWST Calibration Reference Data System (CRDS) associated with that pipeline version. If you use different pipeline versions or CRDS context, please read the relevant release notes (here for pipeline, here for CRDS) for possibly relevant changes.

Updates:
This notebook is regularly updated as improvements are made to the pipeline. Find the most up to date version of this notebook at: https://github.com/spacetelescope/jwst-pipeline-notebooks/

Recent Changes:
Sept 5, 2024: original notebook created
Nov 11, 2024: Comment out line to set the context
Nov 18, 2024: Do not require both SW and LW user-provided data
November 22, 2024: Updates to workflow when skipping pipeline modules
January 31, 2025: Update to build 11.2, update JDAViz Links Control to Orientation call
February 25, 2025: Add optional call to clean_flicker_noise
April 02, 2025: Update JDAviz call to work with JDAviz 4.2.1
May 5, 2025: Updated to jwst 1.18.0 (no significant changes)

Table of Contents#

Configuration
Package Imports
Demo Mode Setup (ignore if not using demo data)
Directory Setup
Detector1 Pipeline
Image2 Pipeline
Image3 Pipeline
Visualize the resampled images
Visualize Detected Sources
Notes

1. Configuration#

Set basic configuration for runing notebook.

Install dependencies and parameters#

To make sure that the pipeline version is compatabile with the steps discussed below and the required dependencies and packages are installed, you can create a fresh conda environment and install the provided requirements.txt file:

conda create -n nircam_imaging_pipeline python=3.11
conda activate nircam_imaging_pipeline
pip install -r requirements.txt

Set the basic parameters to use with this notebook. These will affect what data is used, where data is located (if already in disk), and pipeline modules run in this data. The list of parameters are:

demo_mode
directories with data
pipeline modules

# Basic import necessary for configuration
import os

Note that demo_mode must be set appropriately below.

Set demo_mode = True to run in demonstration mode. In this mode this notebook will download example data from the Barbara A. Mikulski Archive for Space Telescopes (MAST) and process it through the pipeline. This will all happen in a local directory unless modified in Section 3 below.

Set demo_mode = False if you want to process your own data that has already been downloaded and provide the location of the data.

# Set parameters for demo_mode, channel, band, data mode directories, and 
# processing steps.

# -----------------------------Demo Mode---------------------------------
demo_mode = True

if demo_mode:
    print('Running in demonstration mode using online example data!')

# --------------------------User Mode Directories------------------------
# If demo_mode = False, look for user data in these paths
if not demo_mode:
    # Set directory paths for processing specific data; these will need
    # to be changed to your local directory setup (below are given as
    # examples)
    user_home_dir = os.path.expanduser('~')

    # Point to where science observation data are
    # Assumes uncalibrated data in <sci_dir>/uncal/ and results in stage1,
    # stage2, stage3 directories
    sci_dir = os.path.join(user_home_dir, 'PID2739/Obs001/')

# --------------------------Set Processing Steps--------------------------
# Individual pipeline stages can be turned on/off here.  Note that a later
# stage won't be able to run unless data products have already been
# produced from the prior stage.

# Science processing
dodet1 = True  # calwebb_detector1
doimage2 = True  # calwebb_image2
doimage3 = True  # calwebb_image3

Set CRDS context and server#

Before importing CRDS and JWST modules, we need to configure our environment. This includes defining a CRDS cache directory in which to keep the reference files that will be used by the calibration pipeline.

If the root directory for the local CRDS cache directory has not been set already, it will be set to create one in the home directory.

# ------------------------Set CRDS context and paths----------------------

# Each version of the calibration pipeline is associated with a specific CRDS
# context file. The pipeline will select the appropriate context file behind
# the scenes while running. However, if you wish to override the default context
# file and run the pipeline with a different context, you can set that using
# the CRDS_CONTEXT environment variable. Here we show how this is done,
# although we leave the line commented out in order to use the default context.
# If you wish to specify a different context, uncomment the line below.
#%env CRDS_CONTEXT jwst_1293.pmap

# Check whether the local CRDS cache directory has been set.
# If not, set it to the user home directory
if (os.getenv('CRDS_PATH') is None):
    os.environ['CRDS_PATH'] = os.path.join(os.path.expanduser('~'), 'crds')
# Check whether the CRDS server URL has been set.  If not, set it.
if (os.getenv('CRDS_SERVER_URL') is None):
    os.environ['CRDS_SERVER_URL'] = 'https://jwst-crds.stsci.edu'

# Echo CRDS path in use
print(f"CRDS local filepath: {os.environ['CRDS_PATH']}")
print(f"CRDS file server: {os.environ['CRDS_SERVER_URL']}")
if os.getenv('CRDS_CONTEXT'):
    print(f"CRDS CONTEXT: {os.environ['CRDS_CONTEXT']}")

2. Package Imports#

# Use the entire available screen width for this notebook
from IPython.display import display, HTML
display(HTML("<style>.container { width:95% !important; }</style>"))

# Basic system utilities for interacting with files
# ----------------------General Imports------------------------------------
import glob
import time
from pathlib import Path

# Numpy for doing calculations
import numpy as np

# To display full ouptut of cell, not just the last result
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# -----------------------Astroquery Imports--------------------------------
# ASCII files, and downloading demo files
from astroquery.mast import Observations

# Astropy routines for visualizing detected sources:
from astropy.table import Table
from astropy.coordinates import SkyCoord

# ------------ Pipeline and  Visualization Imports -----------------------

# for JWST calibration pipeline
import jwst
import crds

from jwst.pipeline import Detector1Pipeline
from jwst.pipeline import Image2Pipeline
from jwst.pipeline import Image3Pipeline

# JWST pipeline utilities
from asdf import AsdfFile
from jwst import datamodels
from jwst.associations import asn_from_list  # Tools for creating association files
from jwst.associations.lib.rules_level3_base import DMS_Level3_Base  # Definition of a Lvl3 association file

# For visualizing images
from jdaviz import Imviz

# Echo pipeline version and CRDS context in use
print(f"JWST Calibration Pipeline Version: {jwst.__version__}")
print(f"Using CRDS Context: {crds.get_context_name('jwst')}")

# Start a timer to keep track of runtime
time0 = time.perf_counter()

3. Demo Mode Setup (ignore if not using demo data)#

If running in demonstration mode, set up the program information to retrieve the uncalibrated data automatically from MAST using astroquery. MAST allows for flexibility of searching by the proposal ID and the observation ID instead of just filenames.

For illustrative purposes, we focus on data taken using the NIRCam F200W and F444W filters and start with uncalibrated data products. The files are named jw02739001002_02105_0000<dither>_nrc<det>_uncal.fits, where dither refers to the dither step number, and det is the detector name. Through this notebook we will refer to data with filter F200W as SW data and F444W as LW data.

More information about the JWST file naming conventions can be found at: https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/file_naming.html

# Set up the program information and paths for demo program
if demo_mode:
    print('Running in demonstration mode and will download example data from MAST!')
    program = "02739"
    sci_observtn = "001"
    
    data_dir = os.path.join('.', 'nrc_im_demo_data')
    download_dir = data_dir
    sci_dir = os.path.join(data_dir, 'Obs' + sci_observtn)
    uncal_dir = os.path.join(sci_dir, 'uncal')

    # Ensure filepaths for input data exist
    if not os.path.exists(uncal_dir):
        os.makedirs(uncal_dir)
        
    # Create directory if it does not exist
    if not os.path.isdir(data_dir):
        os.mkdir(data_dir)

Identify list of science (SCI) uncalibrated files associated with visits.

Work one filter at a time, so that we can more easily filter by detector and keep only the module A files.

First download the F200W data.

# Obtain a list of observation IDs for the specified demo program
if demo_mode:
    # Science data
    sci_obs_id_table = Observations.query_criteria(instrument_name=["NIRCAM/IMAGE"],
                                                   provenance_name=["CALJWST"],  # Executed observations
                                                   filters=['F200W'],  # Data for Specific Filter
                                                   obs_id=['jw' + program + '-o' + sci_observtn + '*']
                                                   )

if demo_mode:
    sci_obs_id_table

# Turn the list of visits into a list of uncalibrated data files
if demo_mode:
    # Define types of files to select
    file_dict = {'uncal': {'product_type': 'SCIENCE',
                           'productSubGroupDescription': 'UNCAL',
                           'calib_level': [1]}}

    # Science files
    sci_files_to_download = []
    # Loop over visits identifying uncalibrated files that are associated
    # with them
    for exposure in (sci_obs_id_table):
        products = Observations.get_product_list(exposure)
        for filetype, query_dict in file_dict.items():
            filtered_products = Observations.filter_products(products, productType=query_dict['product_type'],
                                                             productSubGroupDescription=query_dict['productSubGroupDescription'],
                                                             calib_level=query_dict['calib_level'])
            sci_files_to_download.extend(filtered_products['dataURI'])

    # To limit data volume, keep only files from visit 002, dithers 1 and 2, and only A-module
    sw_sci_files_to_download = [fname for fname in sci_files_to_download if 'jw02739001002_02105' in fname and 
                                ('nrca2' in fname or 'nrca4' in fname) and ('00001' in fname or '00002' in fname)]
    sw_sci_files_to_download = sorted(sw_sci_files_to_download)
    print(f"Science files selected for downloading: {len(sw_sci_files_to_download)}")

# List the SW files to download
if demo_mode:
    sw_sci_files_to_download

Now repeat the process for the F444W data.

# Obtain a list of observation IDs for the specified demo program
if demo_mode:
    # Science data
    sci_obs_id_table = Observations.query_criteria(instrument_name=["NIRCAM/IMAGE"],
                                                   provenance_name=["CALJWST"],  # Executed observations
                                                   filters=['F444W'],  # Data for Specific Filter
                                                   obs_id=['jw' + program + '-o' + sci_observtn + '*']
                                                   )

if demo_mode:
    sci_obs_id_table

# Turn the list of visits into a list of uncalibrated data files
if demo_mode:
    # Define types of files to select
    file_dict = {'uncal': {'product_type': 'SCIENCE',
                           'productSubGroupDescription': 'UNCAL',
                           'calib_level': [1]}}

    # Science files
    sci_files_to_download = []
    # Loop over visits identifying uncalibrated files that are associated
    # with them
    for exposure in (sci_obs_id_table):
        products = Observations.get_product_list(exposure)
        for filetype, query_dict in file_dict.items():
            filtered_products = Observations.filter_products(products, productType=query_dict['product_type'],
                                                             productSubGroupDescription=query_dict['productSubGroupDescription'],
                                                             calib_level=query_dict['calib_level'])
            sci_files_to_download.extend(filtered_products['dataURI'])

    # To limit data volume, keep only files from visit 002, dithers 1 and 2, and only A-module
    lw_sci_files_to_download = [fname for fname in sci_files_to_download if 'jw02739001002_02105' in fname and 
                                'nrca' in fname and ('00001' in fname or '00002' in fname)]
    lw_sci_files_to_download = sorted(lw_sci_files_to_download)
    print(f"Science files selected for downloading: {len(lw_sci_files_to_download)}")

# List the LW files to download
if demo_mode:
    lw_sci_files_to_download

# Full list the science files to download
if demo_mode:
    sci_files_to_download = sw_sci_files_to_download + lw_sci_files_to_download
    sci_files_to_download

Download all the uncal files and place them into the appropriate directories.

Warning: If this notebook is halted during this step the downloaded file may be incomplete, and cause crashes later on!

# Download the demo data if it does not already exist
if demo_mode:
    for filename in sci_files_to_download:
        sci_manifest = Observations.download_file(filename,
                                                  local_path=os.path.join(uncal_dir, Path(filename).name))

4. Directory Setup#

Set up detailed paths to input/output stages here.

# Define output subdirectories to keep science data products organized
uncal_dir = os.path.join(sci_dir, 'uncal')  # Uncalibrated pipeline inputs should be here
det1_dir = os.path.join(sci_dir, 'stage1')  # calwebb_detector1 pipeline outputs will go here
image2_dir = os.path.join(sci_dir, 'stage2')  # calwebb_spec2 pipeline outputs will go here
image3_dir = os.path.join(sci_dir, 'stage3')  # calwebb_spec3 pipeline outputs will go here

# We need to check that the desired output directories exist, and if not
# create them
if not os.path.exists(det1_dir):
    os.makedirs(det1_dir)
if not os.path.exists(image2_dir):
    os.makedirs(image2_dir)
if not os.path.exists(image3_dir):
    os.makedirs(image3_dir)

Look at the first file to determine exposure parameters and practice using JWST datamodels¶

# List uncal files
uncal_files = sorted(glob.glob(os.path.join(uncal_dir, '*_uncal.fits')))
    
# Separate SW from LW files
sw_uncal_files = [uncfile for uncfile in uncal_files if 'long' not in uncfile]
lw_uncal_files = [uncfile for uncfile in uncal_files if 'long' in uncfile]

colnames = ('Instrument', 'Filter', 'Pupil', 'Number of Integrations', 'Number of Groups',
            'Readout pattern', 'Dither position number')
dtypes = ('S7', 'S10', 'S10', 'i4', 'i4', 'S15', 'i4')
meta_check = Table(names=(colnames), dtype=dtypes)

# Open example files and get metadata for display
if len(sw_uncal_files) > 0:
    sw_examine = datamodels.open(sw_uncal_files[0])
    sw_row = [sw_examine.meta.instrument.name, sw_examine.meta.instrument.filter,
              sw_examine.meta.instrument.pupil, sw_examine.meta.exposure.nints,
              sw_examine.meta.exposure.ngroups, sw_examine.meta.exposure.readpatt,
              sw_examine.meta.dither.position_number]
    meta_check.add_row(sw_row)

if len(lw_uncal_files) > 0:
    lw_examine = datamodels.open(lw_uncal_files[0])
    lw_row = [lw_examine.meta.instrument.name, lw_examine.meta.instrument.filter,
              lw_examine.meta.instrument.pupil, lw_examine.meta.exposure.nints,
              lw_examine.meta.exposure.ngroups, lw_examine.meta.exposure.readpatt,
              lw_examine.meta.dither.position_number]
    meta_check.add_row(lw_row)

# Print out exposure info
meta_check

The table above shows basic exposure information from the first shortwave as well as the first longwave file. When using the demo data, we confirm that the data file is for the NIRCam instrument using the F200W and F444W filters in the Filter Wheel crossed with the CLEAR filter in the Pupil Wheel. This observation uses the BRIGHT1 readout pattern, 8 groups per integration, and 1 integration per exposure. This data file is the 1st dither position in this exposure sequence. For more information about how JWST exposures are defined by up-the-ramp sampling, see the Understanding Exposure Times JDox article.

This metadata will be the same for all exposures in this observation, except for the dither position number.

# Print out the time benchmark
time_det1 = time.perf_counter()
print(f"Runtime so far: {time_det1 - time0:0.0f} seconds")

5. Detector1 Pipeline#

Run the datasets through the Detector1 stage of the pipeline to apply detector level calibrations and create a countrate data product where slopes are fitted to the integration ramps. These *_rate.fits products are 2D (nrows x ncols), averaged over all integrations. 3D countrate data products (*_rateints.fits) are also created (nintegrations x nrows x ncols) which have the fitted ramp slopes for each integration.

By default, all steps in the Detector1 stage of the pipeline are run for NIRCam except the ipc correction step and the gain_scale step. Note that the persistence step has been turned off by default starting with CRDS context jwst_1264.pmap. This step does not automatically correct the science data for persistence. The persistence step creates a *_trapsfilled.fits file which is a model that records the number of traps filled at each pixel at the end of an exposure. This file would be used as an input to the persistence step, via the input_trapsfilled argument, to correct the subsequent science exposure for persistence. Since persistence is not well calibrated for NIRCam, the step has been turned off in order to speed up calibration and to not create empty *_trapsfilled.fits files. This step can be turned on when running the pipeline in Python by doing:

rate_result = Detector1Pipeline.call(uncal, steps={'persistence': {'skip': False}})

or as indicated in the cell bellow using a dictionary.

As of CRDS context jwst_1155.pmap and later, the jump step of the Detector1 stage of the pipeline will remove signal associated with snowballs in the NIRCam imaging mode. This correction is turned on using the parameter expand_large_events=True. This and other parameters related to the snowball correction are specified in the pars-jumpstep parameter reference file. Users may wish to alter parameters to optimize removal of snowball residuals. Available parameters are discussed in the Detection and Flagging of Showers and Snowballs in JWST Technical Report (Regan 2023).

# Set up a dictionary to define how the Detector1 pipeline should be configured

# Boilerplate dictionary setup
det1dict = {}
det1dict['group_scale'], det1dict['dq_init'], det1dict['saturation'] = {}, {}, {}
det1dict['ipc'], det1dict['superbias'], det1dict['refpix'] = {}, {}, {}
det1dict['linearity'], det1dict['persistence'], det1dict['dark_current'], = {}, {}, {}
det1dict['charge_migration'], det1dict['jump'], det1dict['clean_flicker_noise'] = {}, {}, {}
det1dict['ramp_fit'], det1dict['gain_scale'] = {}, {}

# Overrides for whether or not certain steps should be skipped
# skipping the persistence step
det1dict['persistence']['skip'] = True

# Overrides for various reference files
# Files should be in the base local directory or provide full path
#det1dict['dq_init']['override_mask'] = 'myfile.fits' # Bad pixel mask
#det1dict['saturation']['override_saturation'] = 'myfile.fits'  # Saturation
#det1dict['reset']['override_reset'] = 'myfile.fits'  # Reset
#det1dict['linearity']['override_linearity'] = 'myfile.fits'  # Linearity
#det1dict['rscd']['override_rscd'] = 'myfile.fits'  # RSCD
#det1dict['dark_current']['override_dark'] = 'myfile.fits'  # Dark current subtraction
#det1dict['jump']['override_gain'] = 'myfile.fits'  # Gain used by jump step
#det1dict['ramp_fit']['override_gain'] = 'myfile.fits'  # Gain used by ramp fitting step
#det1dict['jump']['override_readnoise'] = 'myfile.fits'  # Read noise used by jump step
#det1dict['ramp_fit']['override_readnoise'] = 'myfile.fits'  # Read noise used by ramp fitting step

# Turn on multi-core processing (This is off by default). Choose what fraction
# of cores to use (quarter, half, all, or an integer number)
det1dict['jump']['maximum_cores'] = 'half'

# Explicitly turn on snowball correction. (Even though it is on by default)
det1dict['jump']['expand_large_events'] = True

# Turn on 1/f correction if desired
# For guidance see https://jwst-docs.stsci.edu/known-issues-with-jwst-data/1-f-noise
#det1dict['clean_flicker_noise']['skip'] = False
#det1dict['clean_flicker_noise']['fit_method'] = 'median' # 'median' or 'fft'
#det1dict['clean_flicker_noise']['background_method'] = 'median' # 'median' or 'model'
#det1dict['clean_flicker_noise']['fit_by_channel'] = False

Run the Detector1 pipeline on all input data, regardless of filter.

# Run Detector1 stage of pipeline, specifying:
# output directory to save *_rate.fits files
# save_results flag set to True so the rate files are saved
if dodet1:
    for uncal in uncal_files:
        rate_result = Detector1Pipeline.call(uncal, output_dir=det1_dir, steps=det1dict, save_results=True)
else:
    print('Skipping Detector1 processing')

# Print out the time benchmark
time1 = time.perf_counter()
print(f"Runtime so far: {time1 - time0:0.0f} seconds")
print(f"Runtime for Detector1: {time1 - time_det1:0.0f} seconds")

Exploring the data#

Identify *_rate.fits files and verify which pipeline steps were run and which calibration reference files were applied.

The header contains information about which calibration steps were completed and skipped and which reference files were used to process the data.

if dodet1:
    # find rate files
    rate_files = sorted(glob.glob(os.path.join(det1_dir, '*_rate.fits')))

    # Read in file as datamodel
    rate_f = datamodels.open(rate_files[0])

    # Check which steps were run
    rate_f.meta.cal_step.instance

For this particular rate file, show which reference files were used to calibrate the dataset. Note that these files will be different for each NIRCam detector.

if dodet1:
    rate_f.meta.ref_file.instance

6. Image2 Pipeline#

In the Image2 stage of the pipeline, calibrated unrectified data products are created (*_cal.fits or *_calints.fits files, depending on whether the input files are *_rate.fits or *_rateints.fits).

In this pipeline processing stage, the world coordinate system (WCS) is assigned, the data are flat fielded, and a photometric calibration is applied to convert from units of countrate (ADU/s) to surface brightness (MJy/sr).

By default, the background subtraction step and the resampling step are turned off for NIRCam. The background subtraction is turned off since there is no background template for the imaging mode and the local background is subtracted as part of the photometry perfoemd in the source catalog step in the Image3 pipeline.

The resampling step occurs during the Image3 stage by default.

While the resampling step can be run on individual images in the Image2 stage, e.g., to prepare for generating a source catalog for each image, the default behavior is to run the step only in the Image3 stage, where multiple images are combined into a final mosaic after the outlier detection step flags bad pixels.

To turn on the resampling step in the Image2 stage, uncomment the line in the dicitionary below which sets image2dict['resample']['skip'] = False

time_image2 = time.perf_counter()

# Set up a dictionary to define how the Image2 pipeline should be configured.

# Boilerplate dictionary setup
image2dict = {}
image2dict['assign_wcs'], image2dict['flat_field'] = {}, {}
image2dict['photom'], image2dict['resample'] = {}, {}

# Overrides for whether or not certain steps should be skipped (example)
#image2dict['resample']['skip'] = False

# Overrides for various reference files
# Files should be in the base local directory or provide full path
#image2dict['assign_wcs']['override_distortion'] = 'myfile.asdf' # Spatial distortion (ASDF file)
#image2dict['assign_wcs']['override_filteroffset'] = 'myfile.asdf' # Imager filter offsets (ASDF file)
#image2dict['assign_wcs']['override_specwcs'] = 'myfile.asdf' # Spectral distortion (ASDF file)
#image2dict['assign_wcs']['override_wavelengthrange'] = 'myfile.asdf' # Wavelength channel mapping (ASDF file)
#image2dict['flat_field']['override_flat'] = 'myfile.fits' # Pixel flatfield
#image2dict['photom']['override_photom'] = 'myfile.fits' # Photometric calibration array

Find and sort all of the input files, ensuring use of absolute paths

sstring = os.path.join(det1_dir, 'jw*rate.fits')  # Use files from the detector1 output folder
rate_files = sorted(glob.glob(sstring))
rate_files = [os.path.abspath(fname) for fname in rate_files]

print(f"Found  {len(rate_files)} science files")

# List rate files
rate_files

Run the Image2 pipeline on all of the rate files, regardless of filter. Note that if you have exposures with multiple integrations and you wish to keep the integrations separate, you should call the pipeline on the *rateints.fits files, rather than the *rate.fits files.

# Run Image2 stage of pipeline, specifying:
# output directory to save *_cal.fits files
# save_results flag set to True so the rate files are saved

if doimage2:
    for rate in rate_files:
        cal_result = Image2Pipeline.call(rate, output_dir=image2_dir, steps=image2dict, save_results=True)
else:
    print("Skipping Image2 processing.")

# Print out the time benchmark
time1 = time.perf_counter()
print(f"Runtime so far: {time1 - time0:0.0f} seconds")
print(f"Runtime for Image2: {time1 - time_image2:0.0f} seconds")

Verify which pipeline steps were run#

if doimage2:
    # Identify *_cal.fits file products
    cal_files = sorted(glob.glob(os.path.join(image2_dir, '*_cal.fits')))

    # Select first file to gather information
    cal_f = datamodels.open(cal_files[0])

    # Check which steps were run:
    cal_f.meta.cal_step.instance

Check which reference files were used to calibrate the first file. Some of these will be detector-dependent.

if doimage2:
    cal_f.meta.ref_file.instance

7. Image3 Pipeline#

In the Image3 stage of the pipeline, the individual *_cal.fits files for each filter are combined to one single distortion corrected image. Unlike the previous stages, we must run the Image3 stage separately for the files from each filter as well as channel (i.e. shortwave vs longwave).

First, we need to create Associations, to inform the pipeline which files are linked together for each filter.

By default, the Image3 stage of the pipeline performs the following steps on NIRCam data:

tweakreg - creates source catalogs of pointlike sources for each input image. The source catalog for each input image is compared to each other to derive coordinate transforms to align the images relative to each other.
- tweakreg has many input parameters that can be adjusted to improve the image alignment in cases where the default values do not perform well.
- One tweakreg parameter that is not set by default but can be very useful is abs_refcat. When this parameter is set to GAIADR3, the tweakreg step performs an absolute astrometric correction of the data using the GAIA data release 3 catalog. In cases where multiple unsaturated GAIA stars are present in the input images, this can improve the absolute astrometric alignment. However, in sparse or very crowded fields, this can potentially result in poor performance, so users are encouraged to check astrometric accuracy and revisit this step if necessary.
- As of pipeline version 1.14.0, the default source finding algorithm in the tweakreg step is IRAFStarFinder. Other options include DAOStarFinder, whose results are not as good in cases where the PSF is undersampled, such as in the blue filters of the NIRCam shortwave channel. Finally photutils segmentation SourceFinder, which does not assume sources are point-like.
skymatch - measures the background level from the sky to use as input into the subsequent outlier detection and resample steps.
outlier detection - flags any remaining cosmic rays, bad pixels, or other artifacts not already flagged during the Detector1 stage of the pipeline, using all input images to create a median image so that outliers in individual images can be identified.
resample - resamples each input image based on its WCS and distortion information and creates a single undistorted image.
source catalog - creates a catalog of detected sources along with photometric results and morphologies (i.e., point-like vs extended). These catalogs are generally useful for quick checks, but optimization is likely needed for specific science cases. Users may wish to experiment with changing the snr_threshold and deblend options. Modifications to the following parameters will not significantly improve data quality and it is advised to keep them at their default values: aperture_ee1, aperture_ee2, aperture_ee3, ci1_star_threshold, ci2_star_threshold.

time_image3 = time.perf_counter()

# Set up a dictionary to define how the Image3 pipeline should be configured

# Boilerplate dictionary setup
image3dict = {}
image3dict['assign_mtwcs'], image3dict['tweakreg'], image3dict['skymatch'] = {}, {}, {}
image3dict['outlier_detection'], image3dict['resample'], image3dict['source_catalog'] = {}, {}, {}

# Overrides for whether or not certain steps should be skipped (example)
#image3dict['outlier_detection']['skip'] = True

# Overrides for various reference files
# Files should be in the base local directory or provide full path
#image3dict['source_catalog']['override_apcorr'] = 'myfile.fits'  # Aperture correction parameters
#image3dict['source_catalog']['override_abvegaoffset'] = 'myfile.asdf'  # Data to convert from AB to Vega magnitudes (ASDF file)

# Turn on alignment to GAIA in the tweakreg step
# For data such as these demo data, where there are some heavily saturated stars in the field
# of view, alignment to GAIA sometimes does not work well due to tweakreg doing a poor job
# finding the centroids of the sources.
#image3dict['tweakreg']['abs_refcat'] = 'GAIADR3'

Find and sort all of the input files, ensuring use of absolute paths. Keep files for the two filters separated.

# Science Files need the cal.fits files
sw_sstring = os.path.join(image2_dir, 'jw*nrc??_cal.fits')     # shortwave files. Detectors a1-a4, b1-b4
lw_sstring = os.path.join(image2_dir, 'jw*nrc*long_cal.fits')  # longwave files. Detectors along, blong 

# Identify SW and LW cal files
sw_cal_files = sorted(glob.glob(sw_sstring))
lw_cal_files = sorted(glob.glob(lw_sstring))

# Expand the relative paths into absolute paths
sw_cal_files = [os.path.abspath(fname) for fname in sw_cal_files]
lw_cal_files = [os.path.abspath(fname) for fname in lw_cal_files]

print(f'Found {len(sw_cal_files)} shortwave science files to process')
print(f'Found {len(lw_cal_files)} longwave science files to process')

Create Association File#

An association file lists the files to calibrate together in Stage3 of the pipeline. Note that association files are available for download from MAST, with filenames of *_asn.json. Here we show how to create an association file to point to the data products created in the steps above. This is useful in cases where you want to work with a set of data that is different than that in the association files from MAST.

Note that the output products will have a rootname that is specified by the product_name in the association file. For this tutorial, the rootnames of the output products will be image3_sw for filter F200W and image3_lw for filter F444W.

# List of data to use
print('List of SW cal files to use:')
sw_cal_files
print('\nList of LW cal files to use:')
lw_cal_files

# Create Level 3 Association for SW products
do_swimage3 = False
if doimage3:
    if len(sw_cal_files) > 0:
        # Only create an association file if there are SW data files to process
        do_swimage3 = True
        sw_product_name = 'image3_sw'
        sw_association = asn_from_list.asn_from_list(sw_cal_files,
                                                     rule=DMS_Level3_Base,
                                                     product_name=sw_product_name)
    
        sw_association.data['asn_type'] = 'image3'
        program = datamodels.open(sw_cal_files[0]).meta.observation.program_number
        sw_association.data['program'] = program
    
        # Format association as .json file
        sw_asn_filename, sw_serialized = sw_association.dump(format="json")

        # Write out association file
        sw_association_im3 = os.path.join(sci_dir, sw_asn_filename)
        with open(sw_association_im3, "w") as fd:
            fd.write(sw_serialized)

# Create Level 3 Associations for LW products
do_lwimage3 = False
if doimage3:
    if len(lw_cal_files) > 0:
        # Only create an association file if there are SW data files to process
        do_lwimage3 = True
        lw_product_name = 'image3_lw'
        lw_association = asn_from_list.asn_from_list(lw_cal_files,
                                                     rule=DMS_Level3_Base,
                                                     product_name=lw_product_name)
    
        lw_association.data['asn_type'] = 'image3'
        program = datamodels.open(lw_cal_files[0]).meta.observation.program_number
        lw_association.data['program'] = program
    
        # Format association as .json file
        lw_asn_filename, lw_serialized = lw_association.dump(format="json")

        # Write out association file. Note that you can use your own filename in
        # place of lw_asn_filename and everything will still work.
        lw_association_im3 = os.path.join(sci_dir, lw_asn_filename)
        with open(lw_association_im3, "w") as fd:
            fd.write(lw_serialized)

Run Image3 stage of the pipeline#

For each set of grouped exposures in an association file, the Image3 stage of the pipeline will produce:

a *_crf.fits file produced by the outlier_detection step, where the DQ array marks the pixels flagged as outliers.
a final combined, rectified image with name *_i2d.fits,
a source catalog with name *_cat.ecsv,
a segmentation map file (*_segm.fits) which has integer values at the pixel locations where a source is detected where the pixel values match the source ID number in the catalog.

Run Image3 on the LW data#

# Run Stage3 on the LW data
if doimage3 and do_lwimage3:
    lw_i2d_result = Image3Pipeline.call(lw_association_im3, output_dir=image3_dir, steps=image3dict, save_results=True)
else:
    print('Skipping Image3 LW processing')

Some users wish to resample data from multiple filters onto the same WCS and pixel grid in order to create color images or help with subsequent analyses. In order to do that, we’ll save the gWCS from the *i2d.fits file created with the LW data above. The gWCS will be saved into an asdf file.

if doimage3 and do_lwimage3:
    # First we identify the dataset and read it using datamodels.
    lw_i2d_file = os.path.join(image3_dir, f'{lw_product_name}_i2d.fits')
    lw_data = datamodels.open(lw_i2d_file)
    
    # Pull out the resulting gWCS and save it in an asdf file
    tree = {"wcs": lw_data.meta.wcs}
    wcs_file = AsdfFile(tree)
    gwcs_filename = os.path.join(image3_dir + 'lw_gwcs.asdf')
    print(f'Saving gWCS into {gwcs_filename}')
    wcs_file.write_to(gwcs_filename)

    # Get the size of the mosaic image
    ysize, xsize = lw_data.data.shape

Run Image3 on the SW data#

Prepare to call the Image3 pipeline on the SW data. If you wish to resample the SW data onto the same pixel grid as the LW data above, uncomment the lines below. This will tell the resample step to use the gWCS and the array size from the LW data when resampling the SW data.

# Uncoment this cell in order to resample the SW data onto the same pixel grid as the LW data
#if doimage3:
#    image3dict['resample']['output_wcs'] = gwcs_filename
#    image3dict['resample']['output_shape'] = (xsize, ysize)

if doimage3 and do_swimage3:
    sw_i2d_result = Image3Pipeline.call(sw_association_im3, output_dir=image3_dir, steps=image3dict, save_results=True)
else:
    print('Skipping Image3 SW processing')

# Print out the time benchmark
time1 = time.perf_counter()
print(f"Runtime so far: {time1 - time0:0.0f} seconds")
print(f"Runtime for Image3: {time1 - time_image3:0.0f} seconds")

Verify which pipeline steps were run#

# Identify *_i2d file and open as datamodel
if doimage3:
    if do_swimage3:
        sw_i2d_file = os.path.join(image3_dir, f'{sw_product_name}_i2d.fits')
        i2d_sw_model = datamodels.open(sw_i2d_file)
        step_check_model = i2d_sw_model
        
    if do_lwimage3:
        lw_i2d_file = os.path.join(image3_dir, f'{lw_product_name}_i2d.fits')
        i2d_lw_model = datamodels.open(lw_i2d_file)
        step_check_model = i2d_lw_model

    # Check which steps were run. This should be the same regardless of whether
    # a sw or lw file is used.
    step_check_model.meta.cal_step.instance

Check which reference files were used to calibrate the dataset

if doimage3:
    step_check_model.meta.ref_file.instance

8. Visualize the resampled images#

If you specified that the LW and SW outputs should be resampled onto the same pixel grid, you should be able to open the two i2d files and overlay them and see that the sources and pixel grids line up. If there is any misalignment, you may need to adjust tweakreg parameters in the calls to the Image3 pipeline in order to improve the alignment.

Below we use the Imviz tool within the jdaviz package to visualize the images, one filter at a time.

# Create an Imviz instance and set up default viewer for the F200W data
if doimage3 and do_swimage3:
    imviz_sw_i2d = Imviz()
    viewer_sw_i2d = imviz_sw_i2d.default_viewer

    # Read in the science array for our visualization dataset:
    i2d_sw_science = i2d_sw_model.data

    # Load the dataset into Imviz
    imviz_sw_i2d.load_data(i2d_sw_science)

    # Visualize the dataset:
    imviz_sw_i2d.show()

Remember that in this mosaic we have only two detectors: NRC2 and NRC4 (left and right, respectively). The dither is not large enough to cover the gap between the detectors, and so that gap is still visible in the mosaic.

if doimage3 and do_swimage3:
    viewer_sw_i2d.stretch = 'sqrt'
    viewer_sw_i2d.set_colormap('Viridis')
    viewer_sw_i2d.cuts = '95%'

# Create an Imviz instance and set up default viewer for the F444W data
if doimage3 and do_lwimage3:
    imviz_lw_i2d = Imviz()
    viewer_lw_i2d = imviz_lw_i2d.default_viewer

    # Read in the science array for our visualization dataset:
    i2d_lw_science = i2d_lw_model.data

    # Load the dataset into Imviz
    imviz_lw_i2d.load_data(i2d_lw_science)

    # Visualize the dataset:
    imviz_lw_i2d.show()

if doimage3 and do_lwimage3:
    viewer_lw_i2d.stretch = 'sqrt'
    viewer_lw_i2d.set_colormap('Viridis')
    viewer_lw_i2d.cuts = '95%'

Ovelaying the LW and SW images#

Let’s try putting the SW and LW images on top of one another to create a color image. This should work regardless of whether you resampled the two images onto the same pixel grid.

Let’s get the data first

if doimage3 and do_swimage3 and do_lwimage3:
    imviz_color = Imviz()
    viewer_color = imviz_color.default_viewer

    # Load the datasets into Imviz
    imviz_color.load_data(sw_i2d_file, data_label='sw')
    imviz_color.load_data(lw_i2d_file, data_label='lw')

    # Link images by WCS
    imviz_color.link_data(align_by='wcs')

Now define some options to make the picture look nice.

# Set the colors for the two images. 
if doimage3 and do_swimage3 and do_lwimage3:
    plot_options = imviz_color.plugins['Plot Options']
    plot_options.image_color_mode = 'Color'
    img_settings = {'sw': {'image_color': '#61d3e1',
                           #'stretch_vmin': 0,
                           #'stretch_vmax': 4,
                           #'image_opacity': 0.32,
                           #'image_contrast': 0.69,
                           #'image_bias': 0.39
                           },
                    'lw': {'image_color': '#ff767c',
                           #'stretch_vmin': 0,
                           #'stretch_vmax': 16,
                           #'image_opacity': 0.4,
                           #'image_contrast': 0.94,
                           #'image_bias': 0.74
                           }
                    }

Populate the imviz instance with the settings in the cell above and visualize the dataset

# Now populate the imviz instance with the settings in the cell above.
if doimage3 and do_swimage3 and do_lwimage3:
    for layer, settings in img_settings.items():
        plot_options.layer = f'{layer}[DATA]'
        for k, v in settings.items():
            setattr(plot_options, k, v)

# Visualize the dataset
if doimage3 and do_swimage3 and do_lwimage3:
    imviz_color.show()

9. Visualize Detected Sources#

Using the source catalogs created by the Image3 stage of the pipeline, mark the detected sources, using different markers for point sources and extended sources. The source catalogs are saved in image3/image3_sw_cat.ecsv and image3/image3_lw_cat.ecsv. This time, we will provide the i2d filename to the imviz load_data function, rather than just the array of pixel values. This way, imviz will be able to make use of the WCS in the file. This will allow the sources in the source catalog to be accurately marked in the display.

Read in catalog file and identify point/extended sources#

if doimage3:
    if do_swimage3:
        sw_catalog_file = sw_i2d_file.replace('i2d.fits', 'cat.ecsv')
        sw_catalog = Table.read(sw_catalog_file)
    
        # To identify point/extended sources, use the 'is_extended' column in the source catalog
        sw_pt_src, = np.where(~sw_catalog['is_extended'])
        sw_ext_src, = np.where(sw_catalog['is_extended'])
    
        # Define coordinates of point and extended sources
        sw_pt_coord = Table({'coord': [SkyCoord(ra=sw_catalog['sky_centroid'][sw_pt_src].ra,
                                                dec=sw_catalog['sky_centroid'][sw_pt_src].dec)]})
        sw_ext_coord = Table({'coord': [SkyCoord(ra=sw_catalog['sky_centroid'][sw_ext_src].ra,
                                                 dec=sw_catalog['sky_centroid'][sw_ext_src].dec)]})

    if do_lwimage3:
        lw_catalog_file = lw_i2d_file.replace('i2d.fits', 'cat.ecsv')
        lw_catalog = Table.read(lw_catalog_file)

        # To identify point/extended sources, use the 'is_extended' column in the source catalog
        lw_pt_src, = np.where(~lw_catalog['is_extended'])
        lw_ext_src, = np.where(lw_catalog['is_extended'])

        # Define coordinates of point and extended sources
        lw_pt_coord = Table({'coord': [SkyCoord(ra=lw_catalog['sky_centroid'][lw_pt_src].ra,
                                                dec=lw_catalog['sky_centroid'][lw_pt_src].dec)]})
        lw_ext_coord = Table({'coord': [SkyCoord(ra=lw_catalog['sky_centroid'][lw_ext_src].ra,
                                                 dec=lw_catalog['sky_centroid'][lw_ext_src].dec)]})

Mark the extended and point sources on the images#

Display the image with sources indicated by circles. Point sources will be marked by small pink circles and extended sources will be marked by white circles. Looking at the entire mosaic, there are so many sources found that it’s hard to see much of anything. To get a clearer view, try zooming in on various areas using the magnifying glass icon on the banner immediately above the image.

First we visualize the data without the point sources.

# Read in SW i2d file to Imviz
if doimage3 and do_swimage3:
    imviz_sw_cat = Imviz()
    viewer_sw_cat = imviz_sw_cat.default_viewer
    imviz_sw_cat.load_data(sw_i2d_file)

    # Adjust settings for viewer
    viewer_sw_cat.stretch = 'sqrt'
    viewer_sw_cat.set_colormap('Viridis')
    viewer_sw_cat.cuts = '95%'

    imviz_sw_cat.show()

Now we add the point sources

# Add marker for point sources:
if doimage3 and do_swimage3:
    viewer_sw_cat.marker = {'color': 'pink', 'markersize': 50, 'fill': False}

    viewer_sw_cat.add_markers(sw_pt_coord, use_skycoord=True, marker_name='point_sources')

    # Add marker for extended sources:
    viewer_sw_cat.marker = {'color': 'white', 'markersize': 100, 'fill': False}

    viewer_sw_cat.add_markers(sw_ext_coord, use_skycoord=True, marker_name='extended_sources')

We do the same with the LW file. First we visualize the data.

# Repeat using the LW file
if doimage3 and do_lwimage3:
    imviz_lw_cat = Imviz()
    viewer_lw_cat = imviz_lw_cat.default_viewer
    imviz_lw_cat.load_data(lw_i2d_file)

    # Adjust settings for viewer
    viewer_lw_cat.stretch = 'sqrt'
    viewer_lw_cat.set_colormap('Viridis')
    viewer_lw_cat.cuts = '95%'

    imviz_lw_cat.show()

Now we mark the point sources

# Add marker for point sources:
if doimage3 and do_lwimage3:
    viewer_lw_cat.marker = {'color': 'pink', 'markersize': 50, 'fill': False}

    viewer_lw_cat.add_markers(lw_pt_coord, use_skycoord=True, marker_name='point_sources')

    # Add marker for extended sources:
    viewer_lw_cat.marker = {'color': 'white', 'markersize': 100, 'fill': False}

    viewer_lw_cat.add_markers(lw_ext_coord, use_skycoord=True, marker_name='extended_sources')

10. Notes#

Note that the strategy presented in this notebook for placing the SW data onto the same pixel grid as the LW data can be applied to data from any two datasets, regardless of filter or channel. By saving the gWCS from the first dataset into an asdf file and providing that file to the Image3 call with the second dataset, the resulting i2d images will be aligned onto the same pixel grid.
If you notice poor alignment across tiles within a single i2d image, or between i2d images that you expect to be aligned, try adjusting the parameters in the tweakreg step. With these, you can customize which sources tweakreg identifies and uses for the alignment.