
This notebook currently fails to execute, use as reference only
NIRCam Imaging Pipeline Notebook#
Authors: B. Hilbert, based on the NIRISS imaging notebook by R. Diaz
Last Updated: May 5, 2025
Pipeline Version: 1.18.1 (Build 11.3)
Purpose:
This notebook provides a framework for processing generic Near-Infrared
Camera (NIRCam) Imaging data through all three James Webb Space Telescope
(JWST) pipeline stages. Data is assumed to be located in a folder structure
following the paths set up below. It should not be necessary to edit
any cells other than in the Configuration section unless
modifying the standard pipeline processing options.
Data:
This example is set up to use an example dataset is from
Program ID
2739 (PI: Pontoppidan) which is a Cycle 1 Outreach program.
We focus on the data from Observation 001 Visit 002, in which M-16, or the
“Pillars of Creation” were observed.
Example input data to use will be downloaded automatically unless
disabled (i.e., to use local files instead).
JWST pipeline version and CRDS context:
This notebook was written for the calibration pipeline version given
above. It sets the CRDS context to the latest context in the JWST
Calibration Reference Data System (CRDS) associated with that
pipeline version. If you use different pipeline versions or
CRDS context, please read the relevant release notes
(here for pipeline,
here for CRDS) for possibly relevant
changes.
Updates:
This notebook is regularly updated as improvements are made to the
pipeline. Find the most up to date version of this notebook at:
https://github.com/spacetelescope/jwst-pipeline-notebooks/
Recent Changes:
Sept 5, 2024: original notebook created
Nov 11, 2024: Comment out line to set the context
Nov 18, 2024: Do not require both SW and LW user-provided data
November 22, 2024: Updates to workflow when skipping pipeline modules
January 31, 2025: Update to build 11.2, update JDAViz Links Control to Orientation call
February 25, 2025: Add optional call to clean_flicker_noise
April 02, 2025: Update JDAviz call to work with JDAviz 4.2.1
May 5, 2025: Updated to jwst 1.18.0 (no significant changes)
Table of Contents#
Configuration
Package Imports
Demo Mode Setup (ignore if not using demo data)
Directory Setup
Detector1 Pipeline
Image2 Pipeline
Image3 Pipeline
Visualize the resampled images
Visualize Detected Sources
Notes
1. Configuration#
Set basic configuration for runing notebook.
Install dependencies and parameters#
To make sure that the pipeline version is compatabile with the steps
discussed below and the required dependencies and packages are installed,
you can create a fresh conda environment and install the provided
requirements.txt
file:
conda create -n nircam_imaging_pipeline python=3.11
conda activate nircam_imaging_pipeline
pip install -r requirements.txt
Set the basic parameters to use with this notebook. These will affect what data is used, where data is located (if already in disk), and pipeline modules run in this data. The list of parameters are:
demo_mode
directories with data
pipeline modules
# Basic import necessary for configuration
import os
demo_mode
must be set appropriately below.
Set demo_mode = True
to run in demonstration mode. In this
mode this notebook will download example data from the Barbara A.
Mikulski Archive for Space Telescopes
(MAST) and process it through
the pipeline. This will all happen in a local directory unless modified in
Section 3 below.
Set demo_mode = False
if you want to process your own data
that has already been downloaded and provide the location of the data.
# Set parameters for demo_mode, channel, band, data mode directories, and
# processing steps.
# -----------------------------Demo Mode---------------------------------
demo_mode = True
if demo_mode:
print('Running in demonstration mode using online example data!')
# --------------------------User Mode Directories------------------------
# If demo_mode = False, look for user data in these paths
if not demo_mode:
# Set directory paths for processing specific data; these will need
# to be changed to your local directory setup (below are given as
# examples)
user_home_dir = os.path.expanduser('~')
# Point to where science observation data are
# Assumes uncalibrated data in <sci_dir>/uncal/ and results in stage1,
# stage2, stage3 directories
sci_dir = os.path.join(user_home_dir, 'PID2739/Obs001/')
# --------------------------Set Processing Steps--------------------------
# Individual pipeline stages can be turned on/off here. Note that a later
# stage won't be able to run unless data products have already been
# produced from the prior stage.
# Science processing
dodet1 = True # calwebb_detector1
doimage2 = True # calwebb_image2
doimage3 = True # calwebb_image3
Set CRDS context and server#
Before importing CRDS
and JWST
modules, we need
to configure our environment. This includes defining a CRDS cache
directory in which to keep the reference files that will be used by the
calibration pipeline.
If the root directory for the local CRDS cache directory has not been set already, it will be set to create one in the home directory.
# ------------------------Set CRDS context and paths----------------------
# Each version of the calibration pipeline is associated with a specific CRDS
# context file. The pipeline will select the appropriate context file behind
# the scenes while running. However, if you wish to override the default context
# file and run the pipeline with a different context, you can set that using
# the CRDS_CONTEXT environment variable. Here we show how this is done,
# although we leave the line commented out in order to use the default context.
# If you wish to specify a different context, uncomment the line below.
#%env CRDS_CONTEXT jwst_1293.pmap
# Check whether the local CRDS cache directory has been set.
# If not, set it to the user home directory
if (os.getenv('CRDS_PATH') is None):
os.environ['CRDS_PATH'] = os.path.join(os.path.expanduser('~'), 'crds')
# Check whether the CRDS server URL has been set. If not, set it.
if (os.getenv('CRDS_SERVER_URL') is None):
os.environ['CRDS_SERVER_URL'] = 'https://jwst-crds.stsci.edu'
# Echo CRDS path in use
print(f"CRDS local filepath: {os.environ['CRDS_PATH']}")
print(f"CRDS file server: {os.environ['CRDS_SERVER_URL']}")
if os.getenv('CRDS_CONTEXT'):
print(f"CRDS CONTEXT: {os.environ['CRDS_CONTEXT']}")
2. Package Imports#
# Use the entire available screen width for this notebook
from IPython.display import display, HTML
display(HTML("<style>.container { width:95% !important; }</style>"))
# Basic system utilities for interacting with files
# ----------------------General Imports------------------------------------
import glob
import time
from pathlib import Path
# Numpy for doing calculations
import numpy as np
# To display full ouptut of cell, not just the last result
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
# -----------------------Astroquery Imports--------------------------------
# ASCII files, and downloading demo files
from astroquery.mast import Observations
# Astropy routines for visualizing detected sources:
from astropy.table import Table
from astropy.coordinates import SkyCoord
# ------------ Pipeline and Visualization Imports -----------------------
# for JWST calibration pipeline
import jwst
import crds
from jwst.pipeline import Detector1Pipeline
from jwst.pipeline import Image2Pipeline
from jwst.pipeline import Image3Pipeline
# JWST pipeline utilities
from asdf import AsdfFile
from jwst import datamodels
from jwst.associations import asn_from_list # Tools for creating association files
from jwst.associations.lib.rules_level3_base import DMS_Level3_Base # Definition of a Lvl3 association file
# For visualizing images
from jdaviz import Imviz
# Echo pipeline version and CRDS context in use
print(f"JWST Calibration Pipeline Version: {jwst.__version__}")
print(f"Using CRDS Context: {crds.get_context_name('jwst')}")
# Start a timer to keep track of runtime
time0 = time.perf_counter()
3. Demo Mode Setup (ignore if not using demo data)#
If running in demonstration mode, set up the program information to
retrieve the uncalibrated data automatically from MAST using
astroquery.
MAST allows for flexibility of searching by the proposal ID and the
observation ID instead of just filenames.
For illustrative purposes, we focus on data taken using the NIRCam
F200W and F444W filters
and start with uncalibrated data products. The files are named
jw02739001002_02105_0000<dither>_nrc<det>_uncal.fits
, where dither refers to the
dither step number, and det is the detector name. Through this notebook we will refer to data
with filter F200W
as SW data and F444W
as LW data.
More information about the JWST file naming conventions can be found at: https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/file_naming.html
# Set up the program information and paths for demo program
if demo_mode:
print('Running in demonstration mode and will download example data from MAST!')
program = "02739"
sci_observtn = "001"
data_dir = os.path.join('.', 'nrc_im_demo_data')
download_dir = data_dir
sci_dir = os.path.join(data_dir, 'Obs' + sci_observtn)
uncal_dir = os.path.join(sci_dir, 'uncal')
# Ensure filepaths for input data exist
if not os.path.exists(uncal_dir):
os.makedirs(uncal_dir)
# Create directory if it does not exist
if not os.path.isdir(data_dir):
os.mkdir(data_dir)
Identify list of science (SCI) uncalibrated files associated with visits.
First download the F200W data.
# Obtain a list of observation IDs for the specified demo program
if demo_mode:
# Science data
sci_obs_id_table = Observations.query_criteria(instrument_name=["NIRCAM/IMAGE"],
provenance_name=["CALJWST"], # Executed observations
filters=['F200W'], # Data for Specific Filter
obs_id=['jw' + program + '-o' + sci_observtn + '*']
)
if demo_mode:
sci_obs_id_table
# Turn the list of visits into a list of uncalibrated data files
if demo_mode:
# Define types of files to select
file_dict = {'uncal': {'product_type': 'SCIENCE',
'productSubGroupDescription': 'UNCAL',
'calib_level': [1]}}
# Science files
sci_files_to_download = []
# Loop over visits identifying uncalibrated files that are associated
# with them
for exposure in (sci_obs_id_table):
products = Observations.get_product_list(exposure)
for filetype, query_dict in file_dict.items():
filtered_products = Observations.filter_products(products, productType=query_dict['product_type'],
productSubGroupDescription=query_dict['productSubGroupDescription'],
calib_level=query_dict['calib_level'])
sci_files_to_download.extend(filtered_products['dataURI'])
# To limit data volume, keep only files from visit 002, dithers 1 and 2, and only A-module
sw_sci_files_to_download = [fname for fname in sci_files_to_download if 'jw02739001002_02105' in fname and
('nrca2' in fname or 'nrca4' in fname) and ('00001' in fname or '00002' in fname)]
sw_sci_files_to_download = sorted(sw_sci_files_to_download)
print(f"Science files selected for downloading: {len(sw_sci_files_to_download)}")
# List the SW files to download
if demo_mode:
sw_sci_files_to_download
Now repeat the process for the F444W data.
# Obtain a list of observation IDs for the specified demo program
if demo_mode:
# Science data
sci_obs_id_table = Observations.query_criteria(instrument_name=["NIRCAM/IMAGE"],
provenance_name=["CALJWST"], # Executed observations
filters=['F444W'], # Data for Specific Filter
obs_id=['jw' + program + '-o' + sci_observtn + '*']
)
if demo_mode:
sci_obs_id_table
# Turn the list of visits into a list of uncalibrated data files
if demo_mode:
# Define types of files to select
file_dict = {'uncal': {'product_type': 'SCIENCE',
'productSubGroupDescription': 'UNCAL',
'calib_level': [1]}}
# Science files
sci_files_to_download = []
# Loop over visits identifying uncalibrated files that are associated
# with them
for exposure in (sci_obs_id_table):
products = Observations.get_product_list(exposure)
for filetype, query_dict in file_dict.items():
filtered_products = Observations.filter_products(products, productType=query_dict['product_type'],
productSubGroupDescription=query_dict['productSubGroupDescription'],
calib_level=query_dict['calib_level'])
sci_files_to_download.extend(filtered_products['dataURI'])
# To limit data volume, keep only files from visit 002, dithers 1 and 2, and only A-module
lw_sci_files_to_download = [fname for fname in sci_files_to_download if 'jw02739001002_02105' in fname and
'nrca' in fname and ('00001' in fname or '00002' in fname)]
lw_sci_files_to_download = sorted(lw_sci_files_to_download)
print(f"Science files selected for downloading: {len(lw_sci_files_to_download)}")
# List the LW files to download
if demo_mode:
lw_sci_files_to_download
# Full list the science files to download
if demo_mode:
sci_files_to_download = sw_sci_files_to_download + lw_sci_files_to_download
sci_files_to_download
Download all the uncal files and place them into the appropriate directories.
# Download the demo data if it does not already exist
if demo_mode:
for filename in sci_files_to_download:
sci_manifest = Observations.download_file(filename,
local_path=os.path.join(uncal_dir, Path(filename).name))
4. Directory Setup#
Set up detailed paths to input/output stages here.
# Define output subdirectories to keep science data products organized
uncal_dir = os.path.join(sci_dir, 'uncal') # Uncalibrated pipeline inputs should be here
det1_dir = os.path.join(sci_dir, 'stage1') # calwebb_detector1 pipeline outputs will go here
image2_dir = os.path.join(sci_dir, 'stage2') # calwebb_spec2 pipeline outputs will go here
image3_dir = os.path.join(sci_dir, 'stage3') # calwebb_spec3 pipeline outputs will go here
# We need to check that the desired output directories exist, and if not
# create them
if not os.path.exists(det1_dir):
os.makedirs(det1_dir)
if not os.path.exists(image2_dir):
os.makedirs(image2_dir)
if not os.path.exists(image3_dir):
os.makedirs(image3_dir)
Look at the first file to determine exposure parameters and practice using JWST datamodels¶
# List uncal files
uncal_files = sorted(glob.glob(os.path.join(uncal_dir, '*_uncal.fits')))
# Separate SW from LW files
sw_uncal_files = [uncfile for uncfile in uncal_files if 'long' not in uncfile]
lw_uncal_files = [uncfile for uncfile in uncal_files if 'long' in uncfile]
colnames = ('Instrument', 'Filter', 'Pupil', 'Number of Integrations', 'Number of Groups',
'Readout pattern', 'Dither position number')
dtypes = ('S7', 'S10', 'S10', 'i4', 'i4', 'S15', 'i4')
meta_check = Table(names=(colnames), dtype=dtypes)
# Open example files and get metadata for display
if len(sw_uncal_files) > 0:
sw_examine = datamodels.open(sw_uncal_files[0])
sw_row = [sw_examine.meta.instrument.name, sw_examine.meta.instrument.filter,
sw_examine.meta.instrument.pupil, sw_examine.meta.exposure.nints,
sw_examine.meta.exposure.ngroups, sw_examine.meta.exposure.readpatt,
sw_examine.meta.dither.position_number]
meta_check.add_row(sw_row)
if len(lw_uncal_files) > 0:
lw_examine = datamodels.open(lw_uncal_files[0])
lw_row = [lw_examine.meta.instrument.name, lw_examine.meta.instrument.filter,
lw_examine.meta.instrument.pupil, lw_examine.meta.exposure.nints,
lw_examine.meta.exposure.ngroups, lw_examine.meta.exposure.readpatt,
lw_examine.meta.dither.position_number]
meta_check.add_row(lw_row)
# Print out exposure info
meta_check
The table above shows basic exposure information from the first shortwave as well as the first longwave file. When using
the demo data, we confirm that the data file is for the NIRCam instrument
using the F200W
and F444W
filters in the Filter Wheel
crossed with the CLEAR
filter in the Pupil Wheel. This observation uses
the BRIGHT1
readout pattern,
8 groups per integration, and 1 integration per exposure. This data file
is the 1st dither position in this exposure sequence. For more information
about how JWST exposures are defined by up-the-ramp sampling, see the
Understanding Exposure Times JDox article.
This metadata will be the same for all exposures in this observation, except for the dither position number.
# Print out the time benchmark
time_det1 = time.perf_counter()
print(f"Runtime so far: {time_det1 - time0:0.0f} seconds")
5. Detector1 Pipeline#
Run the datasets through the
Detector1
stage of the pipeline to apply detector level calibrations and create a
countrate data product where slopes are fitted to the integration ramps.
These *_rate.fits
products are 2D (nrows x ncols), averaged over all
integrations. 3D countrate data products (*_rateints.fits
) are also
created (nintegrations x nrows x ncols) which have the fitted ramp slopes
for each integration.
By default, all steps in the Detector1
stage of the pipeline are run for
NIRCam except the ipc
correction step and the gain_scale
step. Note
that the persistence
step
has been turned off by default starting with CRDS context jwst_1264.pmap
.
This step does not automatically correct the science data for persistence.
The persistence
step creates a *_trapsfilled.fits
file which is a model
that records the number of traps filled at each pixel at the end of an exposure.
This file would be used as an input to the persistence
step, via the input_trapsfilled
argument, to correct the subsequent science exposure for persistence. Since persistence
is not well calibrated for NIRCam, the step has been turned off in order to speed up
calibration and to not create empty *_trapsfilled.fits
files. This step
can be turned on when running the pipeline in Python by doing:
rate_result = Detector1Pipeline.call(uncal, steps={'persistence': {'skip': False}})
or as indicated in the cell bellow using a dictionary.
As of CRDS context jwst_1155.pmap
and later, the
jump
step
of the Detector1
stage of the pipeline will remove signal associated
with snowballs
in the NIRCam imaging mode. This correction is turned on using the parameter
expand_large_events=True
. This and other parameters related to the snowball correction
are specified in the pars-jumpstep
parameter reference file. Users may wish to alter
parameters to optimize removal of snowball residuals. Available parameters are discussed
in the Detection and Flagging of Showers and Snowballs in JWST Technical Report (Regan 2023).
# Set up a dictionary to define how the Detector1 pipeline should be configured
# Boilerplate dictionary setup
det1dict = {}
det1dict['group_scale'], det1dict['dq_init'], det1dict['saturation'] = {}, {}, {}
det1dict['ipc'], det1dict['superbias'], det1dict['refpix'] = {}, {}, {}
det1dict['linearity'], det1dict['persistence'], det1dict['dark_current'], = {}, {}, {}
det1dict['charge_migration'], det1dict['jump'], det1dict['clean_flicker_noise'] = {}, {}, {}
det1dict['ramp_fit'], det1dict['gain_scale'] = {}, {}
# Overrides for whether or not certain steps should be skipped
# skipping the persistence step
det1dict['persistence']['skip'] = True
# Overrides for various reference files
# Files should be in the base local directory or provide full path
#det1dict['dq_init']['override_mask'] = 'myfile.fits' # Bad pixel mask
#det1dict['saturation']['override_saturation'] = 'myfile.fits' # Saturation
#det1dict['reset']['override_reset'] = 'myfile.fits' # Reset
#det1dict['linearity']['override_linearity'] = 'myfile.fits' # Linearity
#det1dict['rscd']['override_rscd'] = 'myfile.fits' # RSCD
#det1dict['dark_current']['override_dark'] = 'myfile.fits' # Dark current subtraction
#det1dict['jump']['override_gain'] = 'myfile.fits' # Gain used by jump step
#det1dict['ramp_fit']['override_gain'] = 'myfile.fits' # Gain used by ramp fitting step
#det1dict['jump']['override_readnoise'] = 'myfile.fits' # Read noise used by jump step
#det1dict['ramp_fit']['override_readnoise'] = 'myfile.fits' # Read noise used by ramp fitting step
# Turn on multi-core processing (This is off by default). Choose what fraction
# of cores to use (quarter, half, all, or an integer number)
det1dict['jump']['maximum_cores'] = 'half'
# Explicitly turn on snowball correction. (Even though it is on by default)
det1dict['jump']['expand_large_events'] = True
# Turn on 1/f correction if desired
# For guidance see https://jwst-docs.stsci.edu/known-issues-with-jwst-data/1-f-noise
#det1dict['clean_flicker_noise']['skip'] = False
#det1dict['clean_flicker_noise']['fit_method'] = 'median' # 'median' or 'fft'
#det1dict['clean_flicker_noise']['background_method'] = 'median' # 'median' or 'model'
#det1dict['clean_flicker_noise']['fit_by_channel'] = False
Run the Detector1
pipeline on all input data, regardless of filter.
# Run Detector1 stage of pipeline, specifying:
# output directory to save *_rate.fits files
# save_results flag set to True so the rate files are saved
if dodet1:
for uncal in uncal_files:
rate_result = Detector1Pipeline.call(uncal, output_dir=det1_dir, steps=det1dict, save_results=True)
else:
print('Skipping Detector1 processing')
# Print out the time benchmark
time1 = time.perf_counter()
print(f"Runtime so far: {time1 - time0:0.0f} seconds")
print(f"Runtime for Detector1: {time1 - time_det1:0.0f} seconds")
Exploring the data#
Identify *_rate.fits
files and verify which pipeline steps were run and
which calibration reference files were applied.
The header contains information about which calibration steps were completed and skipped and which reference files were used to process the data.
if dodet1:
# find rate files
rate_files = sorted(glob.glob(os.path.join(det1_dir, '*_rate.fits')))
# Read in file as datamodel
rate_f = datamodels.open(rate_files[0])
# Check which steps were run
rate_f.meta.cal_step.instance
For this particular rate file, show which reference files were used to calibrate the dataset. Note that these files will be different for each NIRCam detector.
if dodet1:
rate_f.meta.ref_file.instance
6. Image2 Pipeline#
In the Image2 stage of the pipeline,
calibrated unrectified data products are created (*_cal.fits
or
*_calints.fits
files, depending on whether the input files are
*_rate.fits
or *_rateints.fits
).
In this pipeline processing stage, the world coordinate system (WCS) is assigned, the data are flat fielded, and a photometric calibration is applied to convert from units of countrate (ADU/s) to surface brightness (MJy/sr).
By default, the background subtraction step
and the resampling step
are turned off for NIRCam. The background
subtraction is turned off since there is no background template for the
imaging mode and the local background is subtracted as part of the photometry
perfoemd in the source catalog step in the Image3
pipeline.
The
resampling step occurs during the Image3
stage by default.
While the
resampling step can be run on individual images in the Image2
stage, e.g.,
to prepare for generating a source catalog for each image, the default behavior
is to run the step only in the Image3
stage, where multiple images are
combined into a final mosaic after the outlier detection step
flags bad pixels.
To turn on the resampling step in the Image2
stage, uncomment the line in the
dicitionary below which sets image2dict['resample']['skip'] = False
time_image2 = time.perf_counter()
# Set up a dictionary to define how the Image2 pipeline should be configured.
# Boilerplate dictionary setup
image2dict = {}
image2dict['assign_wcs'], image2dict['flat_field'] = {}, {}
image2dict['photom'], image2dict['resample'] = {}, {}
# Overrides for whether or not certain steps should be skipped (example)
#image2dict['resample']['skip'] = False
# Overrides for various reference files
# Files should be in the base local directory or provide full path
#image2dict['assign_wcs']['override_distortion'] = 'myfile.asdf' # Spatial distortion (ASDF file)
#image2dict['assign_wcs']['override_filteroffset'] = 'myfile.asdf' # Imager filter offsets (ASDF file)
#image2dict['assign_wcs']['override_specwcs'] = 'myfile.asdf' # Spectral distortion (ASDF file)
#image2dict['assign_wcs']['override_wavelengthrange'] = 'myfile.asdf' # Wavelength channel mapping (ASDF file)
#image2dict['flat_field']['override_flat'] = 'myfile.fits' # Pixel flatfield
#image2dict['photom']['override_photom'] = 'myfile.fits' # Photometric calibration array
Find and sort all of the input files, ensuring use of absolute paths
sstring = os.path.join(det1_dir, 'jw*rate.fits') # Use files from the detector1 output folder
rate_files = sorted(glob.glob(sstring))
rate_files = [os.path.abspath(fname) for fname in rate_files]
print(f"Found {len(rate_files)} science files")
# List rate files
rate_files
Run the Image2 pipeline on all of the rate files, regardless of filter. Note that if you have exposures with multiple integrations and you wish to keep the integrations separate, you should call the pipeline on the *rateints.fits files, rather than the *rate.fits files.
# Run Image2 stage of pipeline, specifying:
# output directory to save *_cal.fits files
# save_results flag set to True so the rate files are saved
if doimage2:
for rate in rate_files:
cal_result = Image2Pipeline.call(rate, output_dir=image2_dir, steps=image2dict, save_results=True)
else:
print("Skipping Image2 processing.")
# Print out the time benchmark
time1 = time.perf_counter()
print(f"Runtime so far: {time1 - time0:0.0f} seconds")
print(f"Runtime for Image2: {time1 - time_image2:0.0f} seconds")
Verify which pipeline steps were run#
if doimage2:
# Identify *_cal.fits file products
cal_files = sorted(glob.glob(os.path.join(image2_dir, '*_cal.fits')))
# Select first file to gather information
cal_f = datamodels.open(cal_files[0])
# Check which steps were run:
cal_f.meta.cal_step.instance
Check which reference files were used to calibrate the first file. Some of these will be detector-dependent.
if doimage2:
cal_f.meta.ref_file.instance
7. Image3 Pipeline#
In the Image3 stage of the pipeline, the individual *_cal.fits
files for each filter are combined to one single distortion corrected image. Unlike the previous stages, we must run the Image3
stage separately for the files from each filter as well as channel (i.e. shortwave vs longwave).
First, we need to create Associations, to inform the pipeline which files are linked together for each filter.
By default, the Image3
stage of the pipeline performs the following steps on NIRCam data:
tweakreg - creates source catalogs of pointlike sources for each input image. The source catalog for each input image is compared to each other to derive coordinate transforms to align the images relative to each other.
tweakreg
has many input parameters that can be adjusted to improve the image alignment in cases where the default values do not perform well.One tweakreg parameter that is not set by default but can be very useful is
abs_refcat
. When this parameter is set toGAIADR3
, the tweakreg step performs an absolute astrometric correction of the data using the GAIA data release 3 catalog. In cases where multiple unsaturated GAIA stars are present in the input images, this can improve the absolute astrometric alignment. However, in sparse or very crowded fields, this can potentially result in poor performance, so users are encouraged to check astrometric accuracy and revisit this step if necessary.As of pipeline version 1.14.0, the default source finding algorithm in the
tweakreg step
isIRAFStarFinder
. Other options includeDAOStarFinder
, whose results are not as good in cases where the PSF is undersampled, such as in the blue filters of the NIRCam shortwave channel. Finally photutils segmentation SourceFinder, which does not assume sources are point-like.
skymatch - measures the background level from the sky to use as input into the subsequent
outlier detection
andresample
steps.outlier detection - flags any remaining cosmic rays, bad pixels, or other artifacts not already flagged during the
Detector1
stage of the pipeline, using all input images to create a median image so that outliers in individual images can be identified.resample - resamples each input image based on its WCS and distortion information and creates a single undistorted image.
source catalog - creates a catalog of detected sources along with photometric results and morphologies (i.e., point-like vs extended). These catalogs are generally useful for quick checks, but optimization is likely needed for specific science cases. Users may wish to experiment with changing the
snr_threshold
anddeblend
options. Modifications to the following parameters will not significantly improve data quality and it is advised to keep them at their default values:aperture_ee1
,aperture_ee2
,aperture_ee3
,ci1_star_threshold
,ci2_star_threshold
.
time_image3 = time.perf_counter()
# Set up a dictionary to define how the Image3 pipeline should be configured
# Boilerplate dictionary setup
image3dict = {}
image3dict['assign_mtwcs'], image3dict['tweakreg'], image3dict['skymatch'] = {}, {}, {}
image3dict['outlier_detection'], image3dict['resample'], image3dict['source_catalog'] = {}, {}, {}
# Overrides for whether or not certain steps should be skipped (example)
#image3dict['outlier_detection']['skip'] = True
# Overrides for various reference files
# Files should be in the base local directory or provide full path
#image3dict['source_catalog']['override_apcorr'] = 'myfile.fits' # Aperture correction parameters
#image3dict['source_catalog']['override_abvegaoffset'] = 'myfile.asdf' # Data to convert from AB to Vega magnitudes (ASDF file)
# Turn on alignment to GAIA in the tweakreg step
# For data such as these demo data, where there are some heavily saturated stars in the field
# of view, alignment to GAIA sometimes does not work well due to tweakreg doing a poor job
# finding the centroids of the sources.
#image3dict['tweakreg']['abs_refcat'] = 'GAIADR3'
Find and sort all of the input files, ensuring use of absolute paths. Keep files for the two filters separated.
# Science Files need the cal.fits files
sw_sstring = os.path.join(image2_dir, 'jw*nrc??_cal.fits') # shortwave files. Detectors a1-a4, b1-b4
lw_sstring = os.path.join(image2_dir, 'jw*nrc*long_cal.fits') # longwave files. Detectors along, blong
# Identify SW and LW cal files
sw_cal_files = sorted(glob.glob(sw_sstring))
lw_cal_files = sorted(glob.glob(lw_sstring))
# Expand the relative paths into absolute paths
sw_cal_files = [os.path.abspath(fname) for fname in sw_cal_files]
lw_cal_files = [os.path.abspath(fname) for fname in lw_cal_files]
print(f'Found {len(sw_cal_files)} shortwave science files to process')
print(f'Found {len(lw_cal_files)} longwave science files to process')
Create Association File#
An association file lists the files to calibrate together in Stage3
of the pipeline. Note that association files are available for download from MAST, with filenames of *_asn.json
. Here we show how to create an association file to point to the data products created in the steps above. This is useful in cases where you want to work with a set of data that is different than that in the association files from MAST.
Note that the output products will have a rootname that is specified by the product_name
in the association file. For this tutorial, the rootnames of the output products will be image3_sw
for filter F200W
and image3_lw
for filter F444W
.
# List of data to use
print('List of SW cal files to use:')
sw_cal_files
print('\nList of LW cal files to use:')
lw_cal_files
# Create Level 3 Association for SW products
do_swimage3 = False
if doimage3:
if len(sw_cal_files) > 0:
# Only create an association file if there are SW data files to process
do_swimage3 = True
sw_product_name = 'image3_sw'
sw_association = asn_from_list.asn_from_list(sw_cal_files,
rule=DMS_Level3_Base,
product_name=sw_product_name)
sw_association.data['asn_type'] = 'image3'
program = datamodels.open(sw_cal_files[0]).meta.observation.program_number
sw_association.data['program'] = program
# Format association as .json file
sw_asn_filename, sw_serialized = sw_association.dump(format="json")
# Write out association file
sw_association_im3 = os.path.join(sci_dir, sw_asn_filename)
with open(sw_association_im3, "w") as fd:
fd.write(sw_serialized)
# Create Level 3 Associations for LW products
do_lwimage3 = False
if doimage3:
if len(lw_cal_files) > 0:
# Only create an association file if there are SW data files to process
do_lwimage3 = True
lw_product_name = 'image3_lw'
lw_association = asn_from_list.asn_from_list(lw_cal_files,
rule=DMS_Level3_Base,
product_name=lw_product_name)
lw_association.data['asn_type'] = 'image3'
program = datamodels.open(lw_cal_files[0]).meta.observation.program_number
lw_association.data['program'] = program
# Format association as .json file
lw_asn_filename, lw_serialized = lw_association.dump(format="json")
# Write out association file. Note that you can use your own filename in
# place of lw_asn_filename and everything will still work.
lw_association_im3 = os.path.join(sci_dir, lw_asn_filename)
with open(lw_association_im3, "w") as fd:
fd.write(lw_serialized)
Run Image3 stage of the pipeline#
For each set of grouped exposures in an association file, the Image3
stage of the pipeline will produce:
a
*_crf.fits
file produced by theoutlier_detection
step, where theDQ
array marks the pixels flagged as outliers.a final combined, rectified image with name
*_i2d.fits
,a source catalog with name
*_cat.ecsv
,a segmentation map file (
*_segm.fits
) which has integer values at the pixel locations where a source is detected where the pixel values match the source ID number in the catalog.
Run Image3 on the LW data#
# Run Stage3 on the LW data
if doimage3 and do_lwimage3:
lw_i2d_result = Image3Pipeline.call(lw_association_im3, output_dir=image3_dir, steps=image3dict, save_results=True)
else:
print('Skipping Image3 LW processing')
Some users wish to resample data from multiple filters onto the same WCS and pixel grid in order to create color images or help with subsequent analyses. In order to do that, we’ll save the gWCS from the *i2d.fits file created with the LW data above. The gWCS will be saved into an asdf file.
if doimage3 and do_lwimage3:
# First we identify the dataset and read it using datamodels.
lw_i2d_file = os.path.join(image3_dir, f'{lw_product_name}_i2d.fits')
lw_data = datamodels.open(lw_i2d_file)
# Pull out the resulting gWCS and save it in an asdf file
tree = {"wcs": lw_data.meta.wcs}
wcs_file = AsdfFile(tree)
gwcs_filename = os.path.join(image3_dir + 'lw_gwcs.asdf')
print(f'Saving gWCS into {gwcs_filename}')
wcs_file.write_to(gwcs_filename)
# Get the size of the mosaic image
ysize, xsize = lw_data.data.shape
Run Image3 on the SW data#
Prepare to call the Image3 pipeline on the SW data. If you wish to resample the SW data onto the same pixel grid as the LW data above, uncomment the lines below. This will tell the resample step to use the gWCS and the array size from the LW data when resampling the SW data.
# Uncoment this cell in order to resample the SW data onto the same pixel grid as the LW data
#if doimage3:
# image3dict['resample']['output_wcs'] = gwcs_filename
# image3dict['resample']['output_shape'] = (xsize, ysize)
if doimage3 and do_swimage3:
sw_i2d_result = Image3Pipeline.call(sw_association_im3, output_dir=image3_dir, steps=image3dict, save_results=True)
else:
print('Skipping Image3 SW processing')
# Print out the time benchmark
time1 = time.perf_counter()
print(f"Runtime so far: {time1 - time0:0.0f} seconds")
print(f"Runtime for Image3: {time1 - time_image3:0.0f} seconds")
Verify which pipeline steps were run#
# Identify *_i2d file and open as datamodel
if doimage3:
if do_swimage3:
sw_i2d_file = os.path.join(image3_dir, f'{sw_product_name}_i2d.fits')
i2d_sw_model = datamodels.open(sw_i2d_file)
step_check_model = i2d_sw_model
if do_lwimage3:
lw_i2d_file = os.path.join(image3_dir, f'{lw_product_name}_i2d.fits')
i2d_lw_model = datamodels.open(lw_i2d_file)
step_check_model = i2d_lw_model
# Check which steps were run. This should be the same regardless of whether
# a sw or lw file is used.
step_check_model.meta.cal_step.instance
Check which reference files were used to calibrate the dataset
if doimage3:
step_check_model.meta.ref_file.instance
8. Visualize the resampled images#
If you specified that the LW and SW outputs should be resampled onto the same pixel grid, you should be able to open the two i2d files and overlay them and see that the sources and pixel grids line up. If there is any misalignment, you may need to adjust tweakreg parameters in the calls to the Image3 pipeline in order to improve the alignment.
Below we use the Imviz tool within the jdaviz
package to visualize the images, one filter at a time.
# Create an Imviz instance and set up default viewer for the F200W data
if doimage3 and do_swimage3:
imviz_sw_i2d = Imviz()
viewer_sw_i2d = imviz_sw_i2d.default_viewer
# Read in the science array for our visualization dataset:
i2d_sw_science = i2d_sw_model.data
# Load the dataset into Imviz
imviz_sw_i2d.load_data(i2d_sw_science)
# Visualize the dataset:
imviz_sw_i2d.show()
Remember that in this mosaic we have only two detectors: NRC2 and NRC4 (left and right, respectively). The dither is not large enough to cover the gap between the detectors, and so that gap is still visible in the mosaic.
if doimage3 and do_swimage3:
viewer_sw_i2d.stretch = 'sqrt'
viewer_sw_i2d.set_colormap('Viridis')
viewer_sw_i2d.cuts = '95%'
# Create an Imviz instance and set up default viewer for the F444W data
if doimage3 and do_lwimage3:
imviz_lw_i2d = Imviz()
viewer_lw_i2d = imviz_lw_i2d.default_viewer
# Read in the science array for our visualization dataset:
i2d_lw_science = i2d_lw_model.data
# Load the dataset into Imviz
imviz_lw_i2d.load_data(i2d_lw_science)
# Visualize the dataset:
imviz_lw_i2d.show()
if doimage3 and do_lwimage3:
viewer_lw_i2d.stretch = 'sqrt'
viewer_lw_i2d.set_colormap('Viridis')
viewer_lw_i2d.cuts = '95%'
Ovelaying the LW and SW images#
Let’s try putting the SW and LW images on top of one another to create a color image. This should work regardless of whether you resampled the two images onto the same pixel grid.
Let’s get the data first
if doimage3 and do_swimage3 and do_lwimage3:
imviz_color = Imviz()
viewer_color = imviz_color.default_viewer
# Load the datasets into Imviz
imviz_color.load_data(sw_i2d_file, data_label='sw')
imviz_color.load_data(lw_i2d_file, data_label='lw')
# Link images by WCS
imviz_color.link_data(align_by='wcs')
Now define some options to make the picture look nice.
# Set the colors for the two images.
if doimage3 and do_swimage3 and do_lwimage3:
plot_options = imviz_color.plugins['Plot Options']
plot_options.image_color_mode = 'Color'
img_settings = {'sw': {'image_color': '#61d3e1',
#'stretch_vmin': 0,
#'stretch_vmax': 4,
#'image_opacity': 0.32,
#'image_contrast': 0.69,
#'image_bias': 0.39
},
'lw': {'image_color': '#ff767c',
#'stretch_vmin': 0,
#'stretch_vmax': 16,
#'image_opacity': 0.4,
#'image_contrast': 0.94,
#'image_bias': 0.74
}
}
Populate the imviz instance with the settings in the cell above and visualize the dataset
# Now populate the imviz instance with the settings in the cell above.
if doimage3 and do_swimage3 and do_lwimage3:
for layer, settings in img_settings.items():
plot_options.layer = f'{layer}[DATA]'
for k, v in settings.items():
setattr(plot_options, k, v)
# Visualize the dataset
if doimage3 and do_swimage3 and do_lwimage3:
imviz_color.show()
9. Visualize Detected Sources#
Using the source catalogs created by the Image3
stage of the pipeline, mark the detected sources, using different markers for point sources and extended sources. The source catalogs are saved in image3/image3_sw_cat.ecsv
and image3/image3_lw_cat.ecsv
. This time, we will provide the i2d filename to the imviz
load_data
function, rather than just the array of pixel values. This way, imviz
will be able to make use of the WCS in the file. This will allow the sources in the source catalog to be accurately marked in the display.
Read in catalog file and identify point/extended sources#
if doimage3:
if do_swimage3:
sw_catalog_file = sw_i2d_file.replace('i2d.fits', 'cat.ecsv')
sw_catalog = Table.read(sw_catalog_file)
# To identify point/extended sources, use the 'is_extended' column in the source catalog
sw_pt_src, = np.where(~sw_catalog['is_extended'])
sw_ext_src, = np.where(sw_catalog['is_extended'])
# Define coordinates of point and extended sources
sw_pt_coord = Table({'coord': [SkyCoord(ra=sw_catalog['sky_centroid'][sw_pt_src].ra,
dec=sw_catalog['sky_centroid'][sw_pt_src].dec)]})
sw_ext_coord = Table({'coord': [SkyCoord(ra=sw_catalog['sky_centroid'][sw_ext_src].ra,
dec=sw_catalog['sky_centroid'][sw_ext_src].dec)]})
if do_lwimage3:
lw_catalog_file = lw_i2d_file.replace('i2d.fits', 'cat.ecsv')
lw_catalog = Table.read(lw_catalog_file)
# To identify point/extended sources, use the 'is_extended' column in the source catalog
lw_pt_src, = np.where(~lw_catalog['is_extended'])
lw_ext_src, = np.where(lw_catalog['is_extended'])
# Define coordinates of point and extended sources
lw_pt_coord = Table({'coord': [SkyCoord(ra=lw_catalog['sky_centroid'][lw_pt_src].ra,
dec=lw_catalog['sky_centroid'][lw_pt_src].dec)]})
lw_ext_coord = Table({'coord': [SkyCoord(ra=lw_catalog['sky_centroid'][lw_ext_src].ra,
dec=lw_catalog['sky_centroid'][lw_ext_src].dec)]})
Mark the extended and point sources on the images#
Display the image with sources indicated by circles. Point sources will be marked by small pink circles and extended sources will be marked by white circles. Looking at the entire mosaic, there are so many sources found that it’s hard to see much of anything. To get a clearer view, try zooming in on various areas using the magnifying glass icon on the banner immediately above the image.
First we visualize the data without the point sources.
# Read in SW i2d file to Imviz
if doimage3 and do_swimage3:
imviz_sw_cat = Imviz()
viewer_sw_cat = imviz_sw_cat.default_viewer
imviz_sw_cat.load_data(sw_i2d_file)
# Adjust settings for viewer
viewer_sw_cat.stretch = 'sqrt'
viewer_sw_cat.set_colormap('Viridis')
viewer_sw_cat.cuts = '95%'
imviz_sw_cat.show()
Now we add the point sources
# Add marker for point sources:
if doimage3 and do_swimage3:
viewer_sw_cat.marker = {'color': 'pink', 'markersize': 50, 'fill': False}
viewer_sw_cat.add_markers(sw_pt_coord, use_skycoord=True, marker_name='point_sources')
# Add marker for extended sources:
viewer_sw_cat.marker = {'color': 'white', 'markersize': 100, 'fill': False}
viewer_sw_cat.add_markers(sw_ext_coord, use_skycoord=True, marker_name='extended_sources')
We do the same with the LW file. First we visualize the data.
# Repeat using the LW file
if doimage3 and do_lwimage3:
imviz_lw_cat = Imviz()
viewer_lw_cat = imviz_lw_cat.default_viewer
imviz_lw_cat.load_data(lw_i2d_file)
# Adjust settings for viewer
viewer_lw_cat.stretch = 'sqrt'
viewer_lw_cat.set_colormap('Viridis')
viewer_lw_cat.cuts = '95%'
imviz_lw_cat.show()
Now we mark the point sources
# Add marker for point sources:
if doimage3 and do_lwimage3:
viewer_lw_cat.marker = {'color': 'pink', 'markersize': 50, 'fill': False}
viewer_lw_cat.add_markers(lw_pt_coord, use_skycoord=True, marker_name='point_sources')
# Add marker for extended sources:
viewer_lw_cat.marker = {'color': 'white', 'markersize': 100, 'fill': False}
viewer_lw_cat.add_markers(lw_ext_coord, use_skycoord=True, marker_name='extended_sources')
10. Notes#
Note that the strategy presented in this notebook for placing the SW data onto the same pixel grid as the LW data can be applied to data from any two datasets, regardless of filter or channel. By saving the gWCS from the first dataset into an asdf file and providing that file to the
Image3
call with the second dataset, the resulting i2d images will be aligned onto the same pixel grid.If you notice poor alignment across tiles within a single i2d image, or between i2d images that you expect to be aligned, try adjusting the parameters in the
tweakreg
step. With these, you can customize which sourcestweakreg
identifies and uses for the alignment.
