# Learning Goals¶

This Notebook is designed to walk the user (you) through: Creating or altering the association (asn) file used by the Cosmic Origins Spectrograph (COS) pipeline to determine which data to process:

1. Examining an association file

2. Editing an existing association file

- 2.1. Removing an exposure

3. Creating an entirely new association file

- 3.1. Simplest method

# 0. Introduction¶

The Cosmic Origins Spectrograph (COS) is an ultraviolet spectrograph on-board the Hubble Space Telescope (HST) with capabilities in the near ultraviolet (NUV) and far ultraviolet (FUV).

This tutorial aims to prepare you to alter the association file used by the CalCOS pipeline. Association files are fits files containing a binary table extension, which lists science and calibration exposures which the pipeline will process together.

We'll demonstrate creating an asn file in two ways: First, we'll demonstrate editing an existing asn file to add or remove an exposure. Second, we'll show how to create an entirely new asn file.

## We will import the following packages:¶

• numpy to handle array functions
• astropy.io fits and astropy.table Table for accessing FITS files
• glob, os, and shutil for working with system files
• glob helps us search for filenames
• os and shutil for moving files and deleting folders, respectively
• astroquery.mast Mast and Observations for finding and downloading data from the MAST archive
• datetime for updating fits headers with today's date
• pathlib Path for managing system paths

These python packages are installed standard with the the STScI conda distribution. For more information, see our Notebook tutorial on setting up an environment.

In [1]:
# Import for: array manipulation
import numpy as np

# Import for: reading fits files
from astropy.io import fits
from astropy.table import Table

# Import for: system files
import glob
import os
import shutil

from astroquery.mast import Observations

# Import for: changing modification date in a fits header
import datetime

#Import for: working with system paths
from pathlib import Path


## We will also define a few directories we will need:¶

In [2]:
# These will be important directories for the Notebook

outputdir = Path('./output/')
plotsdir = Path('./output/plots/')

# Make the directories if they don't already exist

Out[2]:
(None, None, None)

## And we will need to download the data we wish to filter and analyze¶

We choose the exposures with the association obs_ids: ldif01010 and ldif02010 because we happen to know that some of the exposures in these groups failed, which gives us a real-world use case for editing an association file. Both ldif01010 and ldif02010 are far-ultraviolet (FUV) datasets on the quasi-stellar object (QSO) PDS 456.

In [3]:
pl = Observations.get_product_list(Observations.query_criteria(obs_id='ldif0*10')) # search for the correct obs_ids and get the product list
fpl = Observations.filter_products(pl,
productSubGroupDescription=['RAWTAG_A', 'RAWTAG_B','ASN']) # filter to rawtag and asn files in the product list

for gfile in glob.glob("**/ldif*/*.fits", recursive=True): # Move all fits files in this set to the base data directory

Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif01010_asn.fits to data/mastDownload/HST/ldif01010/ldif01010_asn.fits ... [Done]


# 1. Examining an association file¶

Above, we downloaded two association files and their rawtag data files. We will begin by searching for the association files and reading one of them (LDIF01010). We could just as easily pick ldif02010.

In [4]:
asnfiles = glob.glob("**/*ldif*asn*", recursive=True) # There will be two (ldif01010_asn.fits and ldif02010_asn.fits)
asnfile = asnfiles[0] # We want to work primarily with ldif01010_asn.fits

asn_contents = Table.read(asnfile) # Gets the contents of the asn file
asn_contents # Display these contents

Out[4]:
Table length=5
MEMNAMEMEMTYPEMEMPRSNT
bytes14bytes14bool
LDIF02NMQEXP-FP1
LDIF02NSQEXP-FP1
LDIF02NUQEXP-FP1
LDIF02NWQEXP-FP1
LDIF02010PROD-FP1

We see that the association file has five rows: four exposures denoted with the MEMTYPE = EXP-FP, and a product with MEMTYPE = PROD-FP.

In the cell below, we examine a bit about each of the exposures as a diagnostic:

In [5]:
for memname, memtype in zip(asn_contents['MEMNAME'], asn_contents["MEMTYPE"]): # Cycles through each file in asn table
memname = memname.lower() # Find file names in lower case letters
if memtype == 'EXP-FP': # We only want to look at the exposure files
rt_a = (glob.glob(f"**/*{memname}*rawtag_a*", recursive=True))[0] # Find the actual filepath of the memname for rawtag_a and rawtag_b
rt_b = (glob.glob(f"**/*{memname}*rawtag_b*", recursive=True))[0]

# Now print all these diagnostics:
print(f"Association {(fits.getheader(rt_a))['ASN_ID']} has {memtype} exposure {memname.upper()} with \

Association LDIF02010 has EXP-FP exposure LDIF02NMQ with exposure time 1653.0 seconds at cenwave 1280 Å and FP-POS 1.
Association LDIF02010 has EXP-FP exposure LDIF02NSQ with exposure time 1334.0 seconds at cenwave 1280 Å and FP-POS 2.
Association LDIF02010 has EXP-FP exposure LDIF02NUQ with exposure time 1334.0 seconds at cenwave 1280 Å and FP-POS 3.
Association LDIF02010 has EXP-FP exposure LDIF02NWQ with exposure time 2783.0 seconds at cenwave 1280 Å and FP-POS 4.


We notice that something seems amiss with exposure LDIF01TYQ: This file has an exposure time of 0.0 seconds - something has gone wrong. In this case, there was a guide star acquisition failure as described on the data preview page.

In the next section, we will correct this lack of data by replacing the bad exposure with an exposure from the other association group.

# 2. Editing an existing association file¶

## 2.1. Removing an exposure¶

We know that at least one of our exposures - ldif01tyq - is not suited for combination into the final product. It has an exposure time of 0.0 seconds, in this case from a guide star acquisition failure. This is a generalizable issue, as you may often know an exposure is "bad" for many reasons: perhaps it was taken with the shutter closed, or with anomolously high background noise, or any number of reasons we may wish to exclude an exposure from our data. To do this, we will need to alter our existing association file before we re-run CalCOS.

We again see the contents of our main association file below. Note that True/False and 1/0 are essentially interchangable in the MEMPRSNT column.

In [6]:
Table.read(asnfiles[0])

Out[6]:
Table length=5
MEMNAMEMEMTYPEMEMPRSNT
bytes14bytes14bool
LDIF02NMQEXP-FP1
LDIF02NSQEXP-FP1
LDIF02NUQEXP-FP1
LDIF02NWQEXP-FP1
LDIF02010PROD-FP1

We can set the MEMPRSNT value to False or 0 for our bad exposure:

In [7]:
with fits.open(asnfile, mode='update') as hdulist: # We need to change values with the asnfile opened and in 'update' mode
tbdata = hdulist[1].data # This is where the table data is read into
for expfile in tbdata: # Check if each file is one of the bad ones
if expfile['MEMNAME'] in ['LDIF01TYQ']:
expfile['MEMPRSNT'] = False # If so, set MEMPRSNT to False AKA 0


Out[7]:
Table length=5
MEMNAMEMEMTYPEMEMPRSNT
bytes14bytes14bool
LDIF02NMQEXP-FP1
LDIF02NSQEXP-FP1
LDIF02NUQEXP-FP1
LDIF02NWQEXP-FP1
LDIF02010PROD-FP1

We removed the failed exposure taken with FP-POS = 1. Usually we want to combine one of each of the four fixed-pattern noise positions (FP-POS), so let's add the FP-POS = 1 exposure from the other association group.

In the cell below, we determine which exposure from LDIF02010 was taken with FP-POS = 1.

• It does this by looping through the files listed in LDIF02010's association file, and then reading in that file's header to check if its FPPOS value equals 1.
• It also prints some diagnostic information about all of the esxposure files.
In [8]:
asn_contents_2 = Table.read(asnfiles[1]) # Reads the contents of the 2nd asn file

for memname, memtype in zip(asn_contents_2['MEMNAME'], asn_contents_2["MEMTYPE"]): # Loops through each file in asn table for LDIF02010
memname = memname.lower() # Convert file names to lower case letters, as in actual filenames
if memtype == 'EXP-FP': # We only want to look at the exposure files
rt_a = (glob.glob(f"**/*{memname}*rawtag_a*", recursive=True))[0] # Search for the actual filepath of the memname for rawtag_a
rt_b = (glob.glob(f"**/*{memname}*rawtag_b*", recursive=True))[0] # Search for the actual filepath of the memname for rawtag_b
# Now print all these diagnostics:
print(f"Association {(fits.getheader(rt_a))['ASN_ID']} has {memtype} exposure {memname.upper()} with \

print(f"^^^ The one above this has the FP-POS we are looking for ({memname.upper()})^^^\n")
asn2_fppos1_name = memname.upper() # Save the right file basename in a variable

Association LDIF01010 has EXP-FP exposure LDIF01TYQ with exptime 0.0 seconds at cenwave 1280 Å and FP-POS 1.
^^^ The one above this has the FP-POS we are looking for (LDIF01TYQ)^^^

Association LDIF01010 has EXP-FP exposure LDIF01U0Q with exptime 1404.0 seconds at cenwave 1280 Å and FP-POS 2.
Association LDIF01010 has EXP-FP exposure LDIF01U2Q with exptime 1404.0 seconds at cenwave 1280 Å and FP-POS 3.
Association LDIF01010 has EXP-FP exposure LDIF01U4Q with exptime 2923.0 seconds at cenwave 1280 Å and FP-POS 4.


It's a slightly different procedure to add a new exposure to the list rather than remove one.

Here we want to read the table in the fits association file into an astropy Table. We can then add a row into the right spot, filling it with the new file's MEMNAME, MEMTYPE, and MEMPRSNT. Finally, we have to save this table into the existing fits association file.

In [9]:
asn_orig_table = Table.read(asnfile) # Read in original data from the file
asn_orig_table.insert_row(len(asn_orig_table)- 1 , [asn2_fppos1_name,'EXP-FP',1]) # Add a row with the right name after all the original EXP-FP's
new_table = fits.BinTableHDU(asn_orig_table) # Turn this into a fits Binary Table HDU

with fits.open(asnfile, mode='update') as hdulist: # We need to change data with the asnfile opened and in 'update' mode
hdulist[1].data = new_table.data  # Change the orig file's data to the new table data we made


Now, we can see there is a new row with our exposure from the other asn file group: LDIF02NMQ.

In [10]:
Table.read(asnfile)

Out[10]:
Table length=6
MEMNAMEMEMTYPEMEMPRSNT
bytes14bytes14bool
LDIF02NMQEXP-FP1
LDIF02NSQEXP-FP1
LDIF02NUQEXP-FP1
LDIF02NWQEXP-FP1
LDIF01TYQEXP-FP1
LDIF02010PROD-FP1

Excellent. In the next section we will create a new association file from scratch.

# 3. Creating an entirely new association file¶

For the sake of demonstration, we will generate a new association file with four exposure members: even-numbered FP-POS (2,4) from the first original association (LDIF01010), and odd-numbered FP-POS (1,3) from from the second original association (LDIF02010).

From section 2, we see that this corresponds to :

Name Original asn FP-POS
LDIF02010 LDIF02NMQ 1
LDIF01010 LDIF01U0Q 2
LDIF02010 LDIF02NUQ 3
LDIF01010 LDIF01U4Q 4

## 3.1. Simplest method¶

Below, we manually build up an association file from the three necessary columns:

1. MEMNAME
2. MEMTYPE
3. MEMPRSNT
In [11]:
# Adding the exposure file details to the association table
new_asn_memnames = ['LDIF02NMQ','LDIF01U0Q','LDIF02NUQ','LDIF01U4Q'] # MEMNAME
types = ['EXP-FP', 'EXP-FP', 'EXP-FP', 'EXP-FP'] # MEMTYPE
included = [True, True, True, True] # MEMPRSNT

# Adding the ASN details to the end of the association table
new_asn_memnames.append('ldifcombo'.upper()) # MEMNAME column
types.append('PROD-FP') # MEMTYPE column
included.append(True) # MEMPRSNT column

# Putting together the fits table
#   40 is the number of characters allowed in this field with the MEMNAME format = 40A.
#    If your rootname is longer than 40, you will need to increase this
c1 = fits.Column(name='MEMNAME', array=np.array(new_asn_memnames), format='40A')
c2 = fits.Column(name='MEMTYPE', array=np.array(types), format='14A')
c3 = fits.Column(name='MEMPRSNT', format='L', array=included)
asn_table = fits.BinTableHDU.from_columns([c1, c2, c3])

# Writing the fits table
asn_table.writeto(outputdir / 'ldifcombo_asn.fits', overwrite=True)

print('Saved: '+ 'ldifcombo_asn.fits'+ f" in the output directory: {outputdir}")

Saved: ldifcombo_asn.fits in the output directory: output


Examining the file we have created:

We see that the data looks correct - exactly the table we want!

In [12]:
Table.read(outputdir / 'ldifcombo_asn.fits')

Out[12]:
Table length=5
MEMNAMEMEMTYPEMEMPRSNT
bytes40bytes14bool
LDIF02NMQEXP-FPTrue
LDIF01U0QEXP-FPTrue
LDIF02NUQEXP-FPTrue
LDIF01U4QEXP-FPTrue
LDIFCOMBOPROD-FPTrue

However, the 0th and 1st fits headers no longer contain useful information about the data:

In [13]:
fits.getheader(outputdir / 'ldifcombo_asn.fits', ext=0)

Out[13]:
SIMPLE  =                    T / conforms to FITS standard
BITPIX  =                    8 / array data type
NAXIS   =                    0 / number of array dimensions
EXTEND  =                    T                                                  
In [14]:
fits.getheader(outputdir / 'ldifcombo_asn.fits', ext=1)

Out[14]:
XTENSION= 'BINTABLE'           / binary table extension
BITPIX  =                    8 / array data type
NAXIS   =                    2 / number of array dimensions
NAXIS1  =                   55 / length of dimension 1
NAXIS2  =                    5 / length of dimension 2
PCOUNT  =                    0 / number of group parameters
GCOUNT  =                    1 / number of groups
TFIELDS =                    3 / number of table fields
TTYPE1  = 'MEMNAME '
TFORM1  = '40A     '
TTYPE2  = 'MEMTYPE '
TFORM2  = '14A     '
TTYPE3  = 'MEMPRSNT'
TFORM3  = 'L       '                                                            

We can instead build up a new file with our old file's fits header, and alter it to reflect our changes.

We first build a new association file, a piecewise combination of our original file's headers and our new table:

In [15]:
with fits.open(asnfile, mode='readonly') as hdulist: # Open up the old asn file
hdulist.info() # Shows the first hdu is empty except for the header we want
hdu0 = hdulist[0] # We want to directly copy over the old 0th header/data-unit AKA "hdu":
# essentially a section of data and its associated metadata, called a "header"
# see https://fits.gsfc.nasa.gov/fits_primer.html for info on fits structures
d0 = hdulist[0].data # gather the data from the header/data unit to allow the readout
h1 = hdulist[1].header # gather the header from the 1st header/data unit to copy to our new file

hdu1 = fits.BinTableHDU.from_columns([c1, c2, c3], header=h1) # Put together new 1st hdu from old header and new data

new_HDUlist = fits.HDUList([hdu0,hdu1]) # New HDUList from old HDU 0 and new combined HDU 1
new_HDUlist.writeto(outputdir / 'ldifcombo_2_asn.fits', overwrite=True) # Write this out to a new file
new_asnfile = outputdir / 'ldifcombo_2_asn.fits' # Path to this new file
print('\nSaved: '+ 'ldifcombo_2_asn.fits'+ f"in the output directory: {outputdir}")

Filename: data/ldif02010_asn.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
0  PRIMARY       1 PrimaryHDU      43   ()
1  ASN           1 BinTableHDU     24   6R x 3C   [14A, 14A, L]

Saved: ldifcombo_2_asn.fitsin the output directory: output


Now we edit the relevant values in our fits headers that are different from the original.

Note: It is possible that a generic fits file may have different values you may wish to change. It is highly recommended to examine your fits headers.

In [16]:
date = datetime.date.today() # Find today's date
# Below, make a dict of what header values we want to change, corresponding to [new value, extension the value lives in, 2nd extension if applies]
keys_to_change = {'DATE':[f'{date.year}-{date.month}-{date.day}',0], 'FILENAME':['ldifcombo_2_asn.fits',0],
'ROOTNAME':['ldifcombo_2',0,1], 'ASN_ID':['ldifcombo_2',0], 'ASN_TAB':['ldifcombo_2_asn.fits',0], 'ASN_PROD':['False',0],
'EXTVER':[2,1], 'EXPNAME':['ldifcombo_2',1]}
# Actually change the values below (verbosely):
for keyval in keys_to_change.items():
print(f"Editing {keyval[0]} in Extension {keyval[1][1]}")
fits.setval(filename=new_asnfile, keyword=keyval[0], value=keyval[1][0], ext=keyval[1][1])
# Below is necessary as some keys are repeated in both headers ('ROOTNAME')
if len(keyval[1])>2:
print(f"Editing {keyval[0]} in Extension {keyval[1][2]}")
fits.setval(filename=new_asnfile, keyword=keyval[0], value= keyval[1][0], ext=keyval[1][2])

Editing DATE in Extension 0
Editing FILENAME in Extension 0
Editing ROOTNAME in Extension 0
Editing ROOTNAME in Extension 1
Editing ASN_ID in Extension 0
Editing ASN_TAB in Extension 0
Editing ASN_PROD in Extension 0
Editing EXTVER in Extension 1
Editing EXPNAME in Extension 1


And now we have created our new association file. The file is now ready to be used in the CalCOS pipeline!

If you're interested in testing your file by running it through the CalCOS pipeline, you may wish to run the test_asn.py file included in this subdirectory of the GitHub repository. i.e. from the command line:

\$ python test_asn.py


Note that you must first...

1. Have created the file by running this Notebook
2. Alter line 21 of test_asn.py to set the lref directory to wherever you have your cache of CRDS reference files (see our Setup Notebook).

If the test runs successfully, it will create a plot in the subdirectory ./output/plots/ .

## Congratulations! You finished this Notebook!¶

There are more COS data walkthrough Notebooks on different topics. You can find them here.

If you use astropy, matplotlib, astroquery, or numpy for published research, please cite the authors. Follow these links for more information about citations: