This Notebook is designed to walk the user (you) through: Filtering Cosmic Origins Spectrograph (COS) `TIME-TAG` data taken during the day from data taken during the night:
1. Processing a spectrum from a filtered dataset
- 1.1. Filtering the TIME-TAG
data
- 1.2. Creating a new association file
The Cosmic Origins Spectrograph (COS) is an ultraviolet spectrograph on-board the Hubble Space Telescope(HST) with capabilities in the near ultraviolet (NUV) and far ultraviolet (FUV).
This tutorial aims to prepare you to work with the COS data of your choice by walking you through filtering TIME-TAG
datapoints obtained by COS.
In particular, this tutorial will walk you through separating datapoints obtained during the Hubble Space Telescope's "night" - when the sun is below the geometric horizon from the observatory's point of view - from datapoints taken during the observatory's "day" - when the sun is above this horizon.
You may wish to disaggregate photons from these two time periods, as data taken during the day can be subject to different and higher background noise conditions, as well as more intense geocoronal Lyman-alpha or Oxygen-I emission lines from the Earth’s atmosphere (See Data Handbook 5.4.2).
This type of data separation is possible with the TIME-TAG
data obtained by the COS photon-counting detectors, because each individual encounter with a photon is recorded with its own metadata such as the time of the encounter, and can be linked to the physical position of Hubble at that time.
costools timefilter
to select TIME-TAG
datapoints by their metadata parameterscalcos
to re-process the datanumpy
to handle array functionsastropy.io fits
and astropy.table Table
for accessing FITS filesglob
and os
for working with system filesmatplotlib.pyplot
and gridspec
for plotting dataastroquery.mast Mast
and Observations
for finding and downloading data from the MAST archivepathlib Path
for managing system pathsNew versions of CalCOS
are currently incompatible with astroconda. To create a Python environment capable of running all the data analyses in these COS Notebooks, please see Section 1 of our Notebook tutorial on setting up an environment.
# Import for: filtering TIME TAG data i.e. by sun altitude
from costools import timefilter
# Import for: processing COS data
import calcos
# Import for: array manipulation
import numpy as np
# Import for: reading fits files
from astropy.io import fits
from astropy.table import Table
# Import for: dealing with system files
import glob
import os
# Import for: plotting
import matplotlib.pyplot as plt
from matplotlib import gridspec
# Import for: downloading the data
from astroquery.mast import Observations
#Import for: working with system paths
from pathlib import Path
The following tasks in the costools package can be run with TEAL: splittag timefilter x1dcorr
# These will be important directories for the Notebook
datadir = Path('./data')
outputdir = Path('./output/')
intermediate_dir = Path('./intermediate/')
plotsdir = Path('./output/plots')
# Make the directories if they don't exist
datadir.mkdir(exist_ok=True), outputdir.mkdir(exist_ok=True), plotsdir.mkdir(exist_ok=True), intermediate_dir.mkdir(exist_ok=True)
(None, None, None, None)
We choose the exposure with obs_id: lbry01i6q
, because we happen to know it contains data taken both in the observatory's night and day and shows strong airglow lines. For more information on downloading COS data, see our notebook tutorial on downloading COS data.
filename_root = 'lbry01i6q'
pl = Observations.get_product_list(Observations.query_criteria(proposal_id = '12604', obs_id = 'lbry01*')) # Create product list of all observations with that obs_id root
pl_mask = ((pl['productSubGroupDescription'] == "CORRTAG_A") | (pl['productSubGroupDescription'] == "CORRTAG_B")) & (pl['obs_id'] == filename_root) # create mask with only the corrtag files
Observations.download_products(pl[pl_mask], download_dir = str(datadir)) # Download those corrtag files
file_locations_orig = glob.glob('./data/**/*corrtag*.fits', recursive=True)
for file_orig in file_locations_orig:
os.rename(file_orig, './data/'+os.path.basename(file_orig)) # Aggregates the data from out of long convoluted filepaths
# NOTE - can cause problems if you have any non-unique filenames
# for large datasets don't aggregate your data in this simple way that can't handle repeat names
file_locations_a = glob.glob('./data/**/*corrtag_a.fits', recursive=True) # Finds all the FUVA files
file_locations_b = glob.glob('./data/**/*corrtag_b.fits', recursive=True) # Finds all the FUVB files
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/lbry01i6q_corrtag_a.fits to data/mastDownload/HST/lbry01i6q/lbry01i6q_corrtag_a.fits ... [Done] Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/lbry01i6q_corrtag_b.fits to data/mastDownload/HST/lbry01i6q/lbry01i6q_corrtag_b.fits ... [Done]
TIME-TAG
data¶The costools
package contains the TimelineFilter
class, which - upon instantiation - filters by the parameters you give it. In other words, you don't have to run any functions or methods aside from instantiating the class. This is done by passing the following parameters to timefilter.TimelineFilter
:
input
: the path to the input TIME-TAG
file. (string)output
: the path to the output TIME-TAG
file. (string)filter
: a comparison filter (see examples below). Data points who meet the filter criteria will be filtered out by turning on their "bad time interval" data quality flag. (string)Only TIME-TAG
data has the timing data necessary to be processed using timefilter
. We must first ensure that all the data we wish to process is TIME-TAG
data:
def check_time_tag(filepath):
"""
Function which yields descriptive errors if the input file is not a COS TIME-TAG file.
Inputs:
filepath (str): path to your data
"""
instrume, imagetyp = fits.getval(filepath, keyword='INSTRUME'), fits.getval(filepath, keyword='IMAGETYP')
assert instrume == 'COS', "DataTypeError: These files are not from COS"
assert imagetyp == 'TIME-TAG', "DataTypeError: These files are not TIME-TAG files. You cannot process them with timefilter."
for i, file in enumerate(file_locations_a + file_locations_b):
check_time_tag(file)
Below, we will filter by the sun's altitude above the geometric horizon from the point-of-view of the Hubble Observatory. Daytime is whenever this altitude is greater than 0 degrees.
The filtering parameters are explained in the Section 5.4.2 of the COS Data Handbook: Filtering Time-Tag Data.
All of the available filters are:
Property | Variable name |
---|---|
Time of photon encounter [32 ms bins] | time |
Observatory latitude/longitude | longitude /latitude |
Sun's altitude | sun_alt |
Sun-Earth-HST angle | sun_zd |
Observing target's current altitude | target_alt |
Radial velocity of HST towards target | radial_vel |
Shift along dispersion axis | shift1 |
Total $\frac{counts}{sec}$ in a box across the aperture about $Ly_\alpha$ | ly_alpha |
Total $\frac{counts}{sec}$ in a box across the aperture about $O I 1304$ | oi_1304 |
Total $\frac{counts}{sec}$ in a box across the aperture about $O I 1356$ | oi_1356 |
Detector dark current rate | darkrate |
... or any combination of these filters. There is one more filter keyword which behaves slightly differently: saa
. By specifying saa <n>
where $0\le n \le 32$, you can exclude the $n^{th}$ model contour of the South Atlantic Anomaly (SAA) as modeled by costools.saamodel
.
The following comparison operators are allowed:
Symbol | Example | Meaning ("Filter out points taken when...") |
---|---|---|
== |
time == 0 |
Time of photon interaction is equal to 0 seconds |
!= |
longitude != 3 |
HST's longitude is not equal to 3˚ |
<= |
sun_alt >= 0 |
Sun's altitude is greater than or equal to 0˚ |
>= |
target_alt <= 90 |
Target's altitude is less than or equal to 90˚ |
< |
radial_vel < 0.1 |
Radial velocity of HST towards target is less than 10 $\frac{km}{s}$ |
> |
ly_a > 5 |
Total $\frac{counts}{sec}$ in a box across the aperture about $Ly_\alpha$ is greater than 5 $\frac{counts}{sec}$ |
Combination | oi_1304 > 100 AND sun_alt <= 10 OR target_alt < 30 |
Total irradiance about 1304 Å is greater than 100 AND the Sun's altitude is less than or equal to 10˚ OR the target's altitude is below 30˚ |
Note that in the combination example above, the "OR" takes precedence over "AND". If a count does not satisfy both of the first 2 statements, but it does satisfy statement 3, it will be considered to meet the filter, its "bad time interval" data quality flag will be turned on, and it will not be allowed to contribute to the output spectrum.
Note also that the TimelineFilter
class will not overwrite existing files, so if you wish to filter the files using the following code more than once, you must delete the files you have already created in the intermediate
directory. To ease this process, the first several lines in the following cell attempt to delete any previously created files.
# If the files already exist, delete them so you may run the filter again:
if (intermediate_dir / "filtered_corrtag_a.fits").exists(): # Check if you have created a filtered FUVA file
print(f"File exists; deleting {intermediate_dir / 'filtered_corrtag_a.fits'}\n")
(intermediate_dir / "filtered_corrtag_a.fits").unlink() # if so, delete the file so we can re-create it with TimelineFilter
if (intermediate_dir / "filtered_corrtag_b.fits").exists(): # Repeat the check-delete process above with FUVB file
print(f"File exists; deleting {intermediate_dir / 'filtered_corrtag_b.fits'}\n")
(intermediate_dir / "filtered_corrtag_b.fits").unlink()
# Filter the files based on your criteria:
for afile, bfile in zip(file_locations_a,file_locations_b): # This must be repeated for all exposure files, so loop through both FUVA and FUVB
timefilter.TimelineFilter(input=afile, output=str(intermediate_dir / "filtered_corrtag_a.fits"), # Run TimelineFilter on FUVA file
filter = "sun_alt > 0.", verbose=True) # removes daytime data, i.e. where the Sun's altitude above the horizon is 0 degrees
timefilter.TimelineFilter(bfile, str(intermediate_dir / "filtered_corrtag_b.fits"), # Run TimelineFilter on FUVB file
"sun_alt > 0.")
Input file /Users/nkerman/Projects/notebooks/notebooks/COS/DayNight/data/lbry01i6q_corrtag_a.fits sun_alt > 0. flagged as bad EXPTIME changed from 1176.192 to 137.18394 New GTI extension appended Writing to /Users/nkerman/Projects/notebooks/notebooks/COS/DayNight/intermediate/filtered_corrtag_a.fits
In order to run the CalCOS
pipeline on your newly filtered data, an association (asn
) file must be made, instructing the pipeline where to look for the filtered TIME-TAG
data files. Association files, (and how to create and edit them,) are discussed in detail in our Notebook Tutorial on Association Files.
In the next cells, we build two new association files from scratch:
# First, create asn for the newly filtered-by-sun_alt files (this and next cell):
# This dict properly assigns the type of exposure in a way that CalCOS will recognize
type_dict = {'WAVECAL' : 'EXP-AWAVE',
'EXTERNAL/SCI' : 'EXP-FP'}
files = glob.glob(str(intermediate_dir/"*_corrtag_a.fits")) # Finds all the corrtag_a (just FUVA) files downloaded above
# These file locations, names will differ depending on the output name above
Note that the above cell only found the segment A files (i.e. from the COS FUVA detector.) You only need the A segment in the ASN, and CalCOS will find the associated B segment data for you, if the files are in the same directory.
We can now build the asn
file:
for f in files:
# Adding the file details to the association table
rootnames = [f] # MEMNAME
types = [type_dict[fits.getval(f, 'EXPTYPE')]] # MEMTYPE
included = [True] # MEMPRSNT
# Adding the ASN details to the end of the association table
# the rootname needs to be the full name, not just the rootname
asn_root = os.path.basename(f.split('corrtag')[0][:-1]) # remove the extra "_" at the end of string
rootnames.append(asn_root.upper()) # MEMNAME
types.append('PROD-FP') # MEMTYPE
included.append(True) # MEMPRSNT
# Putting together the fits table
# 40 is the number of characters allowed in this '40A' field. If your rootname is longer than 40,
# you will need to increase this
c1 = fits.Column(name='MEMNAME', array=np.array(rootnames), format='40A')
c2 = fits.Column(name='MEMTYPE', array=np.array(types), format='14A')
c3 = fits.Column(name='MEMPRSNT', format='L', array=included)
t = fits.BinTableHDU.from_columns([c1, c2, c3])
# Writing the fits table
t.writeto(asn_root.lower()+'_asn.fits', overwrite=True)
print('Saved: {}'.format(asn_root.lower()+'_asn.fits' ), "in the cwd directory")
Saved: filtered_asn.fits in the cwd directory
We can see the contents of this new association file below:
Table.read("filtered_asn.fits")
MEMNAME | MEMTYPE | MEMPRSNT |
---|---|---|
bytes40 | bytes14 | bool |
intermediate/filtered_corrtag_a.fits | EXP-FP | True |
FILTERED | PROD-FP | True |
Note that in the "MEMNAME" column, we define the location of the corrtag file we wish to process in terms of a local path (the path from the current working directory from which we run our code to the file in the intermediate
directory.) It is also possible to simply give the "MEMNAME" column the file's rootname (i.e filtered_corrtag_a
) as long as this file and the association files are in the same directory.
# In this cell, we make an association file for the unfiltered data files:
for f in file_locations_a:
# Adding the file details to the association table
rootnames = [f] # MEMNAME
types = [type_dict[fits.getval(f, 'EXPTYPE')]] # MEMTYPE
included = [True] # MEMPRSNT
# Adding the ASN details to the end of the association table
# the rootname needs to be the full name, not just the rootname
asn_root = os.path.basename(f.split('corrtag')[0][:-1]) # removing the extra "_" at end of string
rootnames.append(asn_root.upper()) # MEMNAME
types.append('PROD-FP') # MEMTYPE
included.append(True) # MEMPRSNT
# Putting together the fits table
c1 = fits.Column(name='MEMNAME', array=np.array(rootnames), format='40A')
c2 = fits.Column(name='MEMTYPE', array=np.array(types), format='14A')
c3 = fits.Column(name='MEMPRSNT', format='L', array=included)
t = fits.BinTableHDU.from_columns([c1, c2, c3])
# Writing the fits table
t.writeto(asn_root.lower()+'_asn.fits', overwrite=True)
print('Saved: {}'.format(asn_root.lower()+'_asn.fits' ), "in the current working directory")
Saved: lbry01i6q_asn.fits in the current working directory
Again, we can see the contents of this association file below:
Table.read(f"{filename_root}_asn.fits")
MEMNAME | MEMTYPE | MEMPRSNT |
---|---|---|
bytes40 | bytes14 | bool |
./data/lbry01i6q_corrtag_a.fits | EXP-FP | True |
LBRY01I6Q | PROD-FP | True |
CalCOS
pipeline¶Now we need to reduce the data using the CalCOS
pipeline. If you have not already checked out our tutorial on Running the CalCOS
pipeline, it contains vital information and is highly recommended.
[If you have not yet downloaded the files as in Section 3 of "Setup.ipynb", click to skip this cell](#skipcellF)
If you ran Section 3 of our Notebook on "Setting up your environment", you likely do not need to download more reference files.
You can instead simply point to the reference files you downloaded, using the crds bestrefs
command, as shown in the following three steps. Run these steps from your command line if and only if you already have the reference files in a local cache.
1. The following sets environment variable for crds to look for the reference data online:
$ export CRDS_SERVER_URL=https://hst-crds.stsci.edu
2. The following tells crds where to save the files it downloads - set this to the directory where you saved the crds_cache as in Section 3 of our Notebook on "Setup":
$ export CRDS_PATH=${HOME}/crds_cache
3. The following will update the data files you downloaded so that they will be processed with the reference files you previously downloaded.
Note that these reference files are continually updated and for documentation and to find the newest reference files, see the CRDS website.
$ crds bestrefs --files data/*.fits --update-bestrefs --new-context '<the imap or pmap file you used to download the reference files>'
Assuming everything ran successfully, you can now skip the next several cells to running the pipeline.
If you have not yet downloaded the reference files, you will need to do so, as shown below:
Unless we are connected to the STScI network, or already have the reference files on our machine, we will need to download the reference files and tell the pipeline where to look for the flat files, bad-pixel files, etc.
Caution!
The process in the following two cells can take a long time and strain network resources! If you have already downloaded up-to-date COS reference files, avoid doing so again.
Instead, keep these crds files in an accessible location, and point an environment variable lref
to this directory. For instance, if your lref files are on your username's home directory, in a subdirectory called crds_cache
, give Jupyter the following command then skip the next 2 cells:
%env lref /Users/<your username>/crds_cache/references/hst/cos/
Again, only run the following two cells if you have not downloaded these files before: In the next two cells, we will setup an environment of reference files, download the files, and save the output of the crds download process in a log file:
%%capture cap --no-stderr
%env lref ./data/reference/references/hst/cos/
%env CRDS_SERVER_URL https://hst-crds.stsci.edu
%env CRDS_PATH ./data/reference/
# The next line depends on your context and pmap file
# You can find the latest ("Operational") pmap file at https://hst-crds.stsci.edu
!crds bestrefs --files **/*corrtag*.fits --sync-references=2 --update-bestrefs --new-context 'hst_0940.pmap'
with open(str(outputdir/'crds_output_1.txt'), 'w') as f: # This file now contains the output of the last cell
f.write(cap.stdout)
And now, to run the pipeline itself:
Again, because we wish to compare against the unfiltered data, we must run the pipeline twice:
You can ignore any AstropyDeprecationWarning
s that pop up
%%capture cap --no-stderr
# First, run with the "filtered" data with only time-tag datapoints allowed by the filter
calcos.calcos('./filtered_asn.fits', outdir=str(outputdir / "filtered_data_outs"), verbosity=0)
WARNING: AstropyDeprecationWarning: The following keywords are now recognized as special column-related attributes and should be set via the Column objects: TCDLTn, TCRPXn, TCRVLn, TCTYPn, TCUNIn. In future, these values will be dropped from manually specified headers automatically and replaced with values generated based on the Column objects. [astropy.io.fits.hdu.table]
%%capture cap --no-stderr
# Now, run CalCOS with the initial "full" data - with all time-tag datapoints
calcos.calcos(f'./{filename_root}_asn.fits', outdir=str(outputdir / "full_data_outs"), verbosity=0)
Excellent! We're essentially done at this point.
Let's read in both of the processed spectra (the x1dsum
files) and plot the spectra against one another.
You can ignore any UnitsWarning
s that pop up about formatting.
# Read in the wvln, flux, flux error of the *UNfiltered* spectrum file
unfilt_tab = Table.read("./output/full_data_outs/"+filename_root+"_x1dsum.fits")['WAVELENGTH','FLUX', 'ERROR']
# Read in the wvln, flux, flux error of the *filtered* spectrum file
filt_tab = Table.read("./output/filtered_data_outs/filtered_x1dsum.fits")['WAVELENGTH','FLUX', 'ERROR']
combo_dict_f = {'WAVELENGTH':[], 'FLUX':[], 'ERROR':[]} # Convert to dict and combine segments
combo_dict_u = {'WAVELENGTH':[], 'FLUX':[], 'ERROR':[]}
for row in filt_tab[::-1]: # reverse segments FUVA and B for filtered data
for key in row.colnames:
combo_dict_f[key]+=(list(row[key]))
for row in unfilt_tab[::-1]: # reverse segments FUVA and B for UNfiltered data
for key in row.colnames:
combo_dict_u[key]+=(list(row[key]))
WARNING: UnitsWarning: 'erg /s /cm**2 /angstrom' contains multiple slashes, which is discouraged by the FITS standard [astropy.units.format.generic] WARNING: UnitsWarning: 'count /s /pixel' contains multiple slashes, which is discouraged by the FITS standard [astropy.units.format.generic]
# Set up the figure:
fig = plt.figure(figsize=(15, 15))
gs = gridspec.GridSpec(3,1, height_ratios=[3, 1,1])
ax0 = plt.subplot(gs[0])
ax1 = plt.subplot(gs[1])
ax2 = plt.subplot(gs[2])
# Plot the data in 3 subplots
ax0.scatter(combo_dict_u["WAVELENGTH"], combo_dict_u["FLUX"], s = 2, alpha = 1, c = 'C1',label = "Unfiltered")
ax0.scatter(combo_dict_f["WAVELENGTH"], combo_dict_f["FLUX"], s = 2, alpha = 1, c = 'C2', label = "Filtered to SUN_ALT<0")
ax1.scatter(combo_dict_u["WAVELENGTH"], combo_dict_u["FLUX"], s = 2, alpha = 1, c = 'C1',label = "Unfiltered")
ax2.scatter(combo_dict_f["WAVELENGTH"], combo_dict_f["FLUX"], s = 2, alpha = 1, c = 'C2', label = "Filtered to SUN_ALT<0")
# Format the figure
ax0.set_xticks([])
ax1.set_xticks([])
ax0.legend(fontsize=20)
ax1.legend(fontsize=20)
ax2.legend(fontsize=20)
ax0.set_ylim(-2E-14, 6.5E-13)
ax1.set_ylim(-2E-14, 6.5E-13)
ax2.set_ylim(ax1.get_ylim())
ax0.set_title("Fig 2.1\nSpectra Filtered and Unfiltered\nby Solar altitude (`SUN_ALT`)", size = 35)
ax0.set_ylabel("Flux [$\dfrac{erg}{cm\ s\ \AA}$]",
fontsize=30, y=0.0, horizontalalignment='center')
ax2.set_xlabel("Wavelength [$\AA$]", fontsize=30)
plt.tight_layout()
plt.savefig(str(plotsdir / 'compare_spectra_sunalt.png'), dpi=300)
We can see that the filtered spectrum largely follows the unfiltered spectrum; however, we significantly reduce the Lyman-alpha light from airglow around 1215 Å.
Because we filter out many of the datapoints used to calculate the spectrum, we can see a significant reduction in precision in flux space (visible as a banding in the y-axis). This can come about because with few datapoints, the bands represent wavelengths which received (0,1,2...n) discrete photons. The banding can sometimes be more pronounced at longer, redder wavelengths.
Below, let's make one final plot: a segment of the spectrum around Lyman-alpha (~1215 Å) showing the errors in flux:
fig, (ax0) = plt.subplots(nrows=1, ncols=1, figsize=(15,8))
ax0.errorbar(combo_dict_u["WAVELENGTH"], combo_dict_u["FLUX"], combo_dict_u["ERROR"],
linestyle = '', markersize = 2, alpha = 1, c = 'C1',label = "Unfiltered")
ax0.errorbar(combo_dict_f["WAVELENGTH"], combo_dict_f["FLUX"], combo_dict_f["ERROR"],
linestyle = '', markersize = 2, alpha = 1, c = 'C2', label = "Filtered to SUN_ALT<0")
ax0.legend(fontsize = 20)
ax0.set_xlabel("Wavelength [$\AA$]", fontsize = 30)
ax0.set_ylabel("Flux [$\dfrac{erg}{cm\ s\ \AA}$]", fontsize = 30, horizontalalignment='center')
ax0.set_title("Fig 2.2\nSpectra with Errors of Filtered and Unfiltered by `SUN_ALT`", size = 30)
ax0.set_xlim(1213,1218)
plt.tight_layout()
plt.savefig(str(plotsdir / 'ebar_compare_spectra_sunalt.png'), dpi = 300)
With substantially fewer datapoints, our filtered dataset has larger errors.
We can, however, understand why we might want to filter by this, or another sun_alt
filter. For instance, if most of your exposure was taken at night, but the last 10% was taken after the sun had risen and induced an atmospheric line which interferes with your data, it would often be necessary to filter out this last 10% of the exposure.
costools.timefilter
to separate data taken during HST's day from data taken during the observatory's nightAuthor: Nat Kerman nkerman@stsci.edu
Contributors: Elaine Mae Frazer
Updated On: 2021-09-03
This tutorial was generated to be in compliance with the STScI style guides and would like to cite the Jupyter guide in particular.
If you use astropy
, matplotlib
, astroquery
, or numpy
for published research, please cite the
authors. Follow these links for more information about citations: