Correcting for Scattered Light in WFC3/IR Exposures: Using calwf3
to Mask Bad Reads#
Learning Goals#
This notebook shows one of two available methods to correct for time-variable background
(TVB) due to scattered light from observing close to the Earth’s limb. This method illustrates how to mask bad reads in the RAW image and then reprocess with calwf3
, and it may be used for rejecting anomalous reads occurring either at the beginning or at the end of an exposure.
By the end of this tutorial, you will:
Compute and plot the difference between IMA reads to identify the reads affected by TVB.
Reprocess a single exposure with
calwf3
by excluding the first few reads which are affected by scattered light.Compare the original FLT to the reprocessed FLT image.
Compare the original DRZ to the reprocessed DRZ image.
A second method (manually subtracting bad reads from the final IMA read) can be found in the notebook Correcting for Scattered Light in WFC3/IR Exposures: Manually Subtracting Bad Reads (O’Connor 2023). This provides a method for removing anomalies such as satellite trails which appear in the middle of an IMA exposure. One caveat of the second method is that it does not perform the ‘ramp fitting’ step and therefore the calibrated FLT products will still contain cosmic rays.
Please note that this method may leave large sky residuals in regions corresponding to WFC3/IR ‘blobs’, flagged in the FLT data quality (DQ) array as a value of 512. This method is therefore recommended for observations acquired using an WFC3-IR-DITHER-BLOB dither (or POSTARG equivalent) to step over blob regions. Software such as AstroDrizzle may then be used to combine the FLT exposures, while excluding pixels with 512 flags (or any bit-wise combination of 512 and another flag e.g. 576 = 512 + 64).
Table of Contents#
Introduction
1. Imports
2. Downloading Data
3. Identifying Reads with Time Variable Background
4. Querying CRDS for the Reference File
5. Reprocessing the Observation
5.1 Re-running
calwf3
5.2 Comparing FLT Products
6. Drizzling Nominal and Reprocessed FLT Products
7. Conclusions
Additional Resources
About this Notebook
Citations
This notebook currently fails to execute, use as reference only
Introduction #
Observations with strong time variability in the sky background during a MULTIACCUM ramp can corrupt the WFC3 calwf3
‘linear ramp fit’ and cosmic-ray identification algorithm (CRCORR, Section 3.3.10 of the Data Handbook). The CRCORR
algorithm assumes that a given pixel sees a constant count rate throughout the read from a
combination of sources and diffuse background (i.e., the integrated signal “ramps” are linear). A background
varying strongly with time (i.e. where the “ramps” are non-linear) can trigger the cosmic-ray
(CR) flagging thresholds in calwf3
, causing the algorithm to identify most or all of the pixels as a CR at any given read.
In this notebook we will examine an observation affected by strong TVB,
specifically due to Earth limb scattered light (Section 7.10 of the WFC3 Data Handbook) affecting the first few reads and producing a spatially variable background. We will run calwf3
while rejecting the affected reads to create an improved FLT product with a flat background and an improved ramp fit. One caveat is that the new FLT will
have a reduced total exposure time, given the rejection of some number of reads, and
therefore a lower signal-to-noise ratio.
Please see the notebook WFC3/IR IMA Visualization with An Example of Time Variable Background (O’Connor 2023) for a walkthrough of how to identify a TVB in due to scattered light.
1. Imports #
This notebook assumes you have created the virtual environment in WFC3 Library’s installation instructions.
We import:
os for setting environment variables
glob for finding lists of files
shutil for managing directories
numpy for handling array functions
matplotlib.pyplot for plotting data
astropy.io fits for accessing FITS files
astroquery.mast Observations for downloading data
wfc3tools
calwf3
for calibrating WFC3 dataginga for finding min/max outlier pixels
stwcs for updating World Coordinate System images
drizzlepac for combining images with AstroDrizzle
We import the following modules:
ima_visualization_and_differencing to take the difference between reads, plot the ramp, and visualize the difference in images
import os
import glob
import shutil
import numpy as np
from matplotlib import pyplot as plt
from astropy.io import fits
from astroquery.mast import Observations
from wfc3tools import calwf3
from ginga.util.zscale import zscale
from stwcs import updatewcs
from drizzlepac import astrodrizzle
import ima_visualization_and_differencing as diff
%matplotlib inline
2. Downloading Data#
The following commands query MAST for the necessary data products and download them to the current directory. Here we obtain WFC3/IR observations from HST Frontier Fields program 14037, Visit BB. We specifically want the observation icqtbbbxq
, as it is strongly affected by Earth limb scattered light. The data products requested are the RAW, IMA, and FLT files. For an example of TVB at the end of an exposure, we include an alternate dataset (OBS_ID = ICXT27020
, file_id = icxt27hkq
) in which the Earth limb affects the reads at the end of the MULTIACCUM sequence (SCI,1 through SCI,5).
Warning: this cell may take a few minutes to complete.
OBS_ID = 'ICQTBB020' # Earth-limb at the start
# OBS_ID = 'ICXT27020' # Earth-limb at the end
data_list = Observations.query_criteria(obs_id=OBS_ID)
file_id = "icqtbbbxq"
# file_id = 'icxt27hkq'
Observations.download_products(data_list['obsid'], project='CALWF3',
obs_id=file_id, mrp_only=False,
productSubGroupDescription=['RAW', 'IMA', 'FLT'])
Now, we will move our IMA and FLT files to a separate directory called “orig/” so they are not overwritten when we run calwf3
. We leave our RAW file in the working directory for later use.
if not os.path.exists('orig/'):
os.mkdir('orig/')
shutil.copy(f'mastDownload/HST/{file_id}/{file_id}_ima.fits', f'orig/{file_id}_ima.fits')
shutil.copy(f'mastDownload/HST/{file_id}/{file_id}_flt.fits', f'orig/{file_id}_flt.fits')
raw_file = f'mastDownload/HST/{file_id}/{file_id}_raw.fits'
remove_files_list = glob.glob(f'./{file_id}_*.fits')
for rm_file in remove_files_list:
os.remove(rm_file)
shutil.copy(raw_file, f'{file_id}_raw.fits')
3. Identifying Reads with Time Variable Background#
In this section, we show how to identify the reads impacted by TVB by examining the difference in count rate between reads. This section was taken from the WFC3/IR IMA Visualization with An Example of Time Variable Background (O’Connor 2023) notebook, which includes a more comprehensive walkthrough of identifying time variable background in WFC3/IR images.
Here we implement a technique to examine the count rate difference between consecutive reads. In this case, we first convert from count rate (electrons/second) back to counts (electrons) before taking the difference, as shown in equation 3 from WFC3 ISR 2018-05.
We compare sky values in different regions of the detector (left side, right side, and full frame). If you would like to specify your own regions for the left and right sides of your image, you can change the lhs_region
and rhs_region
parameters. Each region must be specified as a dictionary including the four “corners” (x0, x1, y0, and y1) of the region you would like to select. You may want to avoid the edges of the detector which have a large number of bad pixels and higher flat field errors.
fig = plt.figure(figsize=(20, 10))
ima_filepath = f'orig/{file_id}_ima.fits'
path, filename = os.path.split(ima_filepath)
cube, integ_time = diff.read_wfc3(ima_filepath)
lhs_region = {"x0": 50, "x1": 250, "y0": 100, "y1": 900}
rhs_region = {"x0": 700, "x1": 900, "y0": 100, "y1": 900}
# Please use a limit that makes sense for your own data, when running your images through this notebook.
cube[np.abs(cube) > 3] = np.nan
diff_cube = diff.compute_diff_imas(cube, integ_time, diff_method="instantaneous")
median_diff_fullframe, median_diff_lhs, median_diff_rhs = (
diff.get_median_fullframe_lhs_rhs(
diff_cube,
lhs_region=lhs_region,
rhs_region=rhs_region))
plt.rc('xtick', labelsize=20)
plt.rc('ytick', labelsize=20)
plt.rcParams.update({'font.size': 30})
plt.rcParams.update({'lines.markersize': 15})
diff.plot_ramp(ima_filepath, integ_time, median_diff_fullframe, median_diff_lhs, median_diff_rhs)
plt.ylim(0.5, 2.5)
_ = plt.title(filename)
Here, we utilize a few functions from our module ima_visualization_and_differencing
. We use read_wfc3
to grab the IMA data from all reads and corresponding integration times. We also implement upper and lower limits on our pixel values to exclude sources when plotting our ramp. We take the instantaneous difference using compute_diff_imas
, which computes the difference as described in the equation above. Finally, we use plot_ramp
to plot the median count rate from the left side, right side, and full frame image.
For our scattered light exposure, we see zodiacal light at a level of ~0.9e-/s in later reads, with the scattered light component affecting the first several reads where the median count rate for the left side (orange triangles) is larger than the right side (green triangles). We can visualize this in 2 dimensions in the panel plot below, using the plot_ima_difference_subplots
.
In the panel plot, we see that sources (small galaxies) are visible in the difference images using this new method. Note that this may complicate the analysis of the spatial background (e.g. left versus right) for images with extended targets, such as large galaxies. In this case, users may wish to adjust the regions of the detector used for the ramp plots. We therefore recommend inspecting both the panel plots as well as the ramp fits for diagnosing any issues with the data.
fig = plt.figure(figsize=(20, 10))
ima_filepath = f'orig/{file_id}_ima.fits'
lhs_region = {"x0": 50, "x1": 250, "y0": 100, "y1": 900}
rhs_region = {"x0": 700, "x1": 900, "y0": 100, "y1": 900}
diff.plot_ima_difference_subplots(ima_filepath,
difference_method='instantaneous',
lhs_region=lhs_region,
rhs_region=rhs_region)
In this figure, we see that the ratio of instantaneous rate for the left versus right side of the image is ~1.0 for all but the first few reads (which are affected by scattered light). We choose to exclude reads with a ratio greater than 1.1 from the exposure and reprocess the image with calwf3
. While this reduces the total exposure from 1403 to 1000 seconds, it removes the spatial component from the sky background and allows for a more accurate ‘up-the-ramp’ fit with calwf3
.
4. Querying CRDS for the Reference File#
Before running calwf3
, we need to set some environment variables.
We will point to a subdirectory called crds_cache
using the IREF environment variable, which is used for WFC3 reference files. Other instruments use other variables, e.g., JREF for ACS.
os.environ['CRDS_SERVER_URL'] = 'https://hst-crds.stsci.edu'
os.environ['CRDS_SERVER'] = 'https://hst-crds.stsci.edu'
os.environ['CRDS_PATH'] = './crds_cache'
os.environ['iref'] = './crds_cache/references/hst/wfc3/'
The code block below will query CRDS for the best reference files currently available for our dataset and update the header keywords to point to these new files. We will use the Python package os
to run terminal commands. In the terminal, the line would be:
…where ‘filename’ is the name of your fits file.
Warning: this cell may take a few minutes to complete.
raw_file = f'{file_id}_raw.fits'
print(f"Querying CRDS for the reference file associated with {raw_file}.")
command_line_input = 'crds bestrefs --files {:} --sync-references=1 --update-bestrefs'.format(raw_file)
os.system(command_line_input)
5. Reprocessing the Observation#
As discussed in the introduction to this notebook, the accuracy of the ramp fit performed during the CRCORR cosmic-ray rejection step of the pipeline determines the quality of the calibrated WFC3/IR FLT data products. Given that a time variable background can compromise the quality of the ramp fit, observations affected by TVB will likely result in poor-quality calibrated FLT images (see the Appendix of WFC3 ISR 2021-01).
To address poorly calibrated FLT images where some reads are affected by scattered light TVB (as in our example observation), we can remove these reads and re-run calwf3
to produce cleaner FLT images. We choose to exclude reads where the ratio of background signal is greater than 1.1 e-/s (see the notebook WFC3/IR IMA Visualization with An Example of Time Variable Background (O’Connor 2023) for a more complete demonstration of how we find this ratio).
The following reprocessing example is replacing section 7.10.1 of the WFC3 Data Handbook.
5.1 Re-running calwf3
#
Below, we select our excluded reads (in this case reads 11-15), which are at the beginning of the exposure. We set the DQ value to 1024 for these reads, prompting calwf3
to flag all pixels in these reads as bad, effectively rejecting the reads.
raw_filepath = f'{file_id}_raw.fits'
# Remove existing products or calwf3 will die
flt_filepath = raw_filepath.replace('raw', 'flt')
ima_filepath = raw_filepath.replace('raw', 'ima')
for filepath in [flt_filepath, ima_filepath]:
if os.path.exists(filepath):
os.remove(filepath)
reads = np.arange(11, 16) # numpy arange creates an array including the start value (11) but excluding the stop value (16), so the array is actually 11-15.
for read in reads:
fits.setval(raw_filepath, extver=read, extname='DQ', keyword='pixvalue', value=1024)
calwf3(raw_filepath)
5.2 Comparing FLT Products #
Please move your reprocessed images to a new directory with a different name each time you run this notebook.
reprocessed_flt = f'{file_id}_flt.fits'
reprocessed_ima = f'{file_id}_ima.fits'
original_flt = f'orig/{file_id}_flt.fits'
original_ima = f'orig/{file_id}_ima.fits'
Now, we can compare our original and reprocessed FLT products.
image_new = fits.getdata(reprocessed_flt)
image_old = fits.getdata(original_flt)
fig = plt.figure(figsize=(20, 7))
fig
rows = 1
columns = 2
# add the total exptime in the title
ax1 = fig.add_subplot(rows, columns, 1)
ax1.set_title("Reprocessed FLT image", fontsize=20)
im1 = plt.imshow(image_new, vmin=0.8, vmax=1.3, origin='lower', cmap='Greys_r')
ax1.tick_params(axis='both', labelsize=10)
cbar1 = plt.colorbar(im1, ax=ax1)
cbar1.ax.tick_params(labelsize=10)
ax2 = fig.add_subplot(rows, columns, 2)
ax2.set_title("Original FLT image", fontsize=20)
im2 = plt.imshow(image_old, vmin=0.8, vmax=1.3, origin='lower', cmap='Greys_r')
ax2.tick_params(axis='both', labelsize=10)
cbar2 = plt.colorbar(im2, ax=ax2)
cbar2.ax.tick_params(labelsize=10)
plt.rc('xtick', labelsize=10)
plt.rc('ytick', labelsize=10)
This new image was produced by calwf3
after masking the first 5 reads (not including the zero read) in the RAW file, reducing the effective exposure time from 1403 to 1000 seconds. While the total exposure is reduced from 1403 seconds to 1000 seconds (thus decreasing the overall S/N of the image), the background in the reprocessed image is now uniform over the entire field of view. We can see that the new FLT image is free of the Earth limb scattered light visible in the old FLT image.
Finally, note that the reprocessed FLT product now includes a larger sky background in science pixels corresponding to IR “blobs”, regions of reduced sensitivity due particulate matter on the channel selection mechanism (CSM) mirror (section 7.5 of the WFC3 Data Handbook).
The method for correcting WFC3/IR images described in the notebook Correcting for Scattered Light in WFC3/IR Exposures: Manually Subtracting Bad Reads (O’Connor 2023) provides FLT products without blobs, but that do include cosmic rays.
We update the FLT header keywords “EXPTIME” (in the primary header) and “SAMPTIME” (in the science header) to reflect the new total exposure time.
with fits.open(reprocessed_flt, mode='update') as image_new, fits.open(original_ima) as ima_orig:
hdr = ima_orig[0].header
NSAMP = hdr['NSAMP']
hdr1 = ima_orig[1].header
integ_time = np.zeros(shape=(NSAMP))
for i in range(1, NSAMP+1):
integ_time[i-1] = ima_orig[('TIME', i)].header['PIXVALUE']
integ_time = integ_time[::-1]
dt = np.diff(integ_time)
final_time = integ_time[-1]
if (len(reads) > 0):
for read in reads:
index = len(integ_time)-read-1 # because the reads are stored in reverse order
final_time -= dt[index]
print(f'The final exposure time after reprocessing is {final_time}.')
image_new[0].header['EXPTIME'] = final_time
6. Drizzling Nominal and Reprocessed FLT Products #
In our example we use an exposure (icqtbbbxq
) from image association ICQTBB020
acquired in visit BB of program 14037. This visit consists of two orbits of two exposures each, and we now download the three other FLTs in the visit (icqtbbc0q_flt.fits
, icqtbbbrq_flt.fits
, icqtbbbtq_flt.fits
) and the pipeline drizzled DRZ product.
To produce a clean DRZ image (without blob residuals), we can drizzle the four FLTs together (from the nominal exposures and the reprocessed exposure), replacing pixels flagged as blobs with those from the dithered (nominal) image.
data_list = Observations.query_criteria(obs_id=OBS_ID)
Observations.download_products(data_list['obsid'],
project='CALWF3',
mrp_only=False,
productSubGroupDescription=['FLT', 'DRZ'])
nominal_file_ids = ["icqtbbc0q", "icqtbbbrq", "icqtbbbtq"]
# nominal_file_ids = ["icxt27hoq"]
nominal_list = []
for nominal_file_id in nominal_file_ids:
shutil.copy(f'mastDownload/HST/{nominal_file_id}/{nominal_file_id}_flt.fits', f'{nominal_file_id}_flt.fits')
nominal_list.append(f'{nominal_file_id}_flt.fits')
print(nominal_list)
Next, we update the image World Coordinate System of the reprocessed image in preparation for drizzling.
updatewcs.updatewcs(reprocessed_flt, use_db=True)
Finally, we combine the four FLT images with AstroDrizzle while exluding pixels flagged as blobs and replacing those pixels with those from dithered frames.
astrodrizzle.AstroDrizzle('icqtbb*flt.fits', output='f140w', preserve=False, build=False, context=False, skymethod='match', driz_separate=False, median=False, blot=False, driz_cr=False, final_bits='16')
# astrodrizzle.AstroDrizzle('icxt27*flt.fits', output='f105w', preserve=False, build=False, context=False, skymethod='match', driz_separate=False, median=False, blot=False, driz_cr=False, final_bits='16')
Comparing the new DRZ image made with the reprocessed FLT product against the original pipeline DRZ image, we see that the new DRZ image no longer includes scattered light but has a slightly lower S/N due the reduced total exposure time from 1403 to 1000 seconds.
with fits.open("f140w_drz_sci.fits") as DRZ_image, fits.open('mastDownload/HST/icqtbb020/icqtbb020_drz.fits') as Orig_DRZ:
fig = plt.figure(figsize=(20, 10))
rows = 1
columns = 2
ax1 = fig.add_subplot(rows, columns, 1)
ax1.set_title("Reprocessed DRZ Image", fontsize=20)
vmin, vmax = zscale(Orig_DRZ[1].data)
im1 = plt.imshow(DRZ_image[0].data, vmin=vmin, vmax=vmax, origin='lower', cmap='Greys_r')
_ = plt.colorbar()
ax2 = fig.add_subplot(rows, columns, 2)
ax2.set_title("Original Pipeline DRZ Image", fontsize=20)
im2 = plt.imshow(Orig_DRZ[1].data, vmin=vmin, vmax=vmax, origin='lower', cmap='Greys_r')
_ = plt.colorbar()
7. Conclusions #
Congratulations, you have completed the notebook.
You should now be familiar with how to reprocess an observation affected by Earth limb scattered light by removing the first few reads and rerunning calwf3
.
Thank you for following along!
Additional Resources #
Below are some additional resources that may be helpful. Please send any questions through the HST Help Desk.
About this Notebook #
Author: Anne O’Connor, Jennifer Mack, Annalisa Calamida, Harish Khandrika – WFC3 Instrument
Updated On: 2023-12-26
Citations #
If you use the following tools for published research, please cite the authors. Follow these links for more information about citing the tools:
If you use this notebook, or information from the WFC3 Data Handbook, Instrument Handbook, or WFC3 ISRs for published research, please cite them:
Citing this notebook: Please cite the primary author and year, and hyperlink the notebook or WFC3 Library