Modifying or Creating an Association File

Learning Goals

This Notebook is designed to walk the user (you) through: Creating or altering the association (asn) file used by the Cosmic Origins Spectrograph (COS) pipeline to determine which data to process:

1. Examining an association file

2. Editing an existing association file

- 2.1. Removing an exposure

- 2.2. Adding an exposure

3. Creating an entirely new association file

- 3.1. Simplest method

- 3.2. With fits header metadata

0. Introduction

The Cosmic Origins Spectrograph (COS) is an ultraviolet spectrograph on-board the Hubble Space Telescope (HST) with capabilities in the near ultraviolet (NUV) and far ultraviolet (FUV).

This tutorial aims to prepare you to alter the association file used by the CalCOS pipeline. Association files are fits files containing a binary table extension, which lists science and calibration exposures which the pipeline will process together.

We'll demonstrate creating an asn file in two ways: First, we'll demonstrate editing an existing asn file to add or remove an exposure. Second, we'll show how to create an entirely new asn file.

We will import the following packages:

  • numpy to handle array functions
  • astropy.io fits and astropy.table Table for accessing FITS files
  • glob, os, and shutil for working with system files
    • glob helps us search for filenames
    • os and shutil for moving files and deleting folders, respectively
  • astroquery.mast Mast and Observations for finding and downloading data from the MAST archive
  • datetime for updating fits headers with today's date
  • pathlib Path for managing system paths

These python packages are installed standard with the the STScI conda distribution. For more information, see our Notebook tutorial on setting up an environment.

In [1]:
# Import for: array manipulation
import numpy as np

# Import for: reading fits files
from astropy.io import fits                                            
from astropy.table import Table

# Import for: system files
import glob
import os
import shutil

# Import for: downloading the data
from astroquery.mast import Observations

# Import for: changing modification date in a fits header
import datetime

#Import for: working with system paths
from pathlib import Path

We will also define a few directories we will need:

In [2]:
# These will be important directories for the Notebook

datadir = Path('./data/')
outputdir = Path('./output/')
plotsdir = Path('./output/plots/')

# Make the directories if they don't already exist
datadir.mkdir(exist_ok=True), outputdir.mkdir(exist_ok=True), plotsdir.mkdir(exist_ok=True)
Out[2]:
(None, None, None)

And we will need to download the data we wish to filter and analyze

We choose the exposures with the association obs_ids: ldif01010 and ldif02010 because we happen to know that some of the exposures in these groups failed, which gives us a real-world use case for editing an association file. Both ldif01010 and ldif02010 are far-ultraviolet (FUV) datasets on the quasi-stellar object (QSO) PDS 456.

For more information on downloading COS data, see our Notebook tutorial on downloading COS data.

In [3]:
pl = Observations.get_product_list(Observations.query_criteria(obs_id='ldif0*10')) # search for the correct obs_ids and get the product list 
fpl = Observations.filter_products(pl,
                                   productSubGroupDescription=['RAWTAG_A', 'RAWTAG_B','ASN']) # filter to rawtag and asn files in the product list

Observations.download_products(fpl, download_dir=str(datadir)) # Download these chosen products
for gfile in glob.glob("**/ldif*/*.fits", recursive=True): # Move all fits files in this set to the base data directory
    os.rename(gfile,datadir / os.path.basename(gfile))
shutil.rmtree(datadir / 'mastDownload') # Delete the now-empty, nested mastDownload directory
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif01010_asn.fits to data/mastDownload/HST/ldif01010/ldif01010_asn.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif01tyq_rawtag_a.fits to data/mastDownload/HST/ldif01tyq/ldif01tyq_rawtag_a.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif01tyq_rawtag_b.fits to data/mastDownload/HST/ldif01tyq/ldif01tyq_rawtag_b.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif01u0q_rawtag_a.fits to data/mastDownload/HST/ldif01u0q/ldif01u0q_rawtag_a.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif01u0q_rawtag_b.fits to data/mastDownload/HST/ldif01u0q/ldif01u0q_rawtag_b.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif01u2q_rawtag_a.fits to data/mastDownload/HST/ldif01u2q/ldif01u2q_rawtag_a.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif01u2q_rawtag_b.fits to data/mastDownload/HST/ldif01u2q/ldif01u2q_rawtag_b.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif01u4q_rawtag_a.fits to data/mastDownload/HST/ldif01u4q/ldif01u4q_rawtag_a.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif01u4q_rawtag_b.fits to data/mastDownload/HST/ldif01u4q/ldif01u4q_rawtag_b.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif02010_asn.fits to data/mastDownload/HST/ldif02010/ldif02010_asn.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif02nmq_rawtag_a.fits to data/mastDownload/HST/ldif02nmq/ldif02nmq_rawtag_a.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif02nmq_rawtag_b.fits to data/mastDownload/HST/ldif02nmq/ldif02nmq_rawtag_b.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif02nsq_rawtag_a.fits to data/mastDownload/HST/ldif02nsq/ldif02nsq_rawtag_a.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif02nsq_rawtag_b.fits to data/mastDownload/HST/ldif02nsq/ldif02nsq_rawtag_b.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif02nuq_rawtag_a.fits to data/mastDownload/HST/ldif02nuq/ldif02nuq_rawtag_a.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif02nuq_rawtag_b.fits to data/mastDownload/HST/ldif02nuq/ldif02nuq_rawtag_b.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif02nwq_rawtag_a.fits to data/mastDownload/HST/ldif02nwq/ldif02nwq_rawtag_a.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/ldif02nwq_rawtag_b.fits to data/mastDownload/HST/ldif02nwq/ldif02nwq_rawtag_b.fits ... [Done]

1. Examining an association file

Above, we downloaded two association files and their rawtag data files. We will begin by searching for the association files and reading one of them (LDIF01010). We could just as easily pick ldif02010.

In [4]:
asnfiles = glob.glob("**/*ldif*asn*", recursive=True) # There will be two (ldif01010_asn.fits and ldif02010_asn.fits)
asnfile = asnfiles[0] # We want to work primarily with ldif01010_asn.fits

asn_contents = Table.read(asnfile) # Gets the contents of the asn file
asn_contents # Display these contents
Out[4]:
Table length=5
MEMNAMEMEMTYPEMEMPRSNT
bytes14bytes14bool
LDIF02NMQEXP-FP1
LDIF02NSQEXP-FP1
LDIF02NUQEXP-FP1
LDIF02NWQEXP-FP1
LDIF02010PROD-FP1

We see that the association file has five rows: four exposures denoted with the MEMTYPE = EXP-FP, and a product with MEMTYPE = PROD-FP.

In the cell below, we examine a bit about each of the exposures as a diagnostic:

In [5]:
for memname, memtype in zip(asn_contents['MEMNAME'], asn_contents["MEMTYPE"]): # Cycles through each file in asn table
    memname = memname.lower() # Find file names in lower case letters
    if memtype == 'EXP-FP': # We only want to look at the exposure files
        rt_a = (glob.glob(f"**/*{memname}*rawtag_a*", recursive=True))[0] # Find the actual filepath of the memname for rawtag_a and rawtag_b
        rt_b = (glob.glob(f"**/*{memname}*rawtag_b*", recursive=True))[0]

        # Now print all these diagnostics:
        print(f"Association {(fits.getheader(rt_a))['ASN_ID']} has {memtype} exposure {memname.upper()} with \
exposure time {(fits.getheader(rt_a, ext=1))['EXPTIME']} seconds at cenwave {(fits.getheader(rt_a, ext=0))['CENWAVE']} \
Å and FP-POS {(fits.getheader(rt_a, ext=0))['FPPOS']}.")
Association LDIF02010 has EXP-FP exposure LDIF02NMQ with exposure time 1653.0 seconds at cenwave 1280 Å and FP-POS 1.
Association LDIF02010 has EXP-FP exposure LDIF02NSQ with exposure time 1334.0 seconds at cenwave 1280 Å and FP-POS 2.
Association LDIF02010 has EXP-FP exposure LDIF02NUQ with exposure time 1334.0 seconds at cenwave 1280 Å and FP-POS 3.
Association LDIF02010 has EXP-FP exposure LDIF02NWQ with exposure time 2783.0 seconds at cenwave 1280 Å and FP-POS 4.

We notice that something seems amiss with exposure LDIF01TYQ: This file has an exposure time of 0.0 seconds - something has gone wrong. In this case, there was a guide star acquisition failure as described on the data preview page.

In the next section, we will correct this lack of data by replacing the bad exposure with an exposure from the other association group.

2. Editing an existing association file

2.1. Removing an exposure

We know that at least one of our exposures - ldif01tyq - is not suited for combination into the final product. It has an exposure time of 0.0 seconds, in this case from a guide star acquisition failure. This is a generalizable issue, as you may often know an exposure is "bad" for many reasons: perhaps it was taken with the shutter closed, or with anomolously high background noise, or any number of reasons we may wish to exclude an exposure from our data. To do this, we will need to alter our existing association file before we re-run CalCOS.

We again see the contents of our main association file below. Note that True/False and 1/0 are essentially interchangable in the MEMPRSNT column.

In [6]:
Table.read(asnfiles[0])
Out[6]:
Table length=5
MEMNAMEMEMTYPEMEMPRSNT
bytes14bytes14bool
LDIF02NMQEXP-FP1
LDIF02NSQEXP-FP1
LDIF02NUQEXP-FP1
LDIF02NWQEXP-FP1
LDIF02010PROD-FP1

We can set the MEMPRSNT value to False or 0 for our bad exposure:

In [7]:
with fits.open(asnfile, mode='update') as hdulist: # We need to change values with the asnfile opened and in 'update' mode
    tbdata = hdulist[1].data # This is where the table data is read into
    for expfile in tbdata: # Check if each file is one of the bad ones
        if expfile['MEMNAME'] in ['LDIF01TYQ']:
            expfile['MEMPRSNT'] = False # If so, set MEMPRSNT to False AKA 0
            
Table.read(asnfile) # Re-read the table to see the change
Out[7]:
Table length=5
MEMNAMEMEMTYPEMEMPRSNT
bytes14bytes14bool
LDIF02NMQEXP-FP1
LDIF02NSQEXP-FP1
LDIF02NUQEXP-FP1
LDIF02NWQEXP-FP1
LDIF02010PROD-FP1

2.2. Adding an exposure

We removed the failed exposure taken with FP-POS = 1. Usually we want to combine one of each of the four fixed-pattern noise positions (FP-POS), so let's add the FP-POS = 1 exposure from the other association group.

In the cell below, we determine which exposure from LDIF02010 was taken with FP-POS = 1.

  • It does this by looping through the files listed in LDIF02010's association file, and then reading in that file's header to check if its FPPOS value equals 1.
  • It also prints some diagnostic information about all of the esxposure files.
In [8]:
asn_contents_2 = Table.read(asnfiles[1]) # Reads the contents of the 2nd asn file

for memname, memtype in zip(asn_contents_2['MEMNAME'], asn_contents_2["MEMTYPE"]): # Loops through each file in asn table for `LDIF02010`
    memname = memname.lower() # Convert file names to lower case letters, as in actual filenames
    if memtype == 'EXP-FP': # We only want to look at the exposure files
        rt_a = (glob.glob(f"**/*{memname}*rawtag_a*", recursive=True))[0] # Search for the actual filepath of the memname for rawtag_a 
        rt_b = (glob.glob(f"**/*{memname}*rawtag_b*", recursive=True))[0] # Search for the actual filepath of the memname for rawtag_b 
        # Now print all these diagnostics:
        print(f"Association {(fits.getheader(rt_a))['ASN_ID']} has {memtype} exposure {memname.upper()} with \
exptime {(fits.getheader(rt_a, ext=1))['EXPTIME']} seconds at cenwave {(fits.getheader(rt_a, ext=0))['CENWAVE']} Å and FP-POS {(fits.getheader(rt_a, ext=0))['FPPOS']}.")

        if (fits.getheader(rt_a, ext=0))['FPPOS'] == 1:
            print(f"^^^ The one above this has the FP-POS we are looking for ({memname.upper()})^^^\n")
            asn2_fppos1_name = memname.upper() # Save the right file basename in a variable
Association LDIF01010 has EXP-FP exposure LDIF01TYQ with exptime 0.0 seconds at cenwave 1280 Å and FP-POS 1.
^^^ The one above this has the FP-POS we are looking for (LDIF01TYQ)^^^

Association LDIF01010 has EXP-FP exposure LDIF01U0Q with exptime 1404.0 seconds at cenwave 1280 Å and FP-POS 2.
Association LDIF01010 has EXP-FP exposure LDIF01U2Q with exptime 1404.0 seconds at cenwave 1280 Å and FP-POS 3.
Association LDIF01010 has EXP-FP exposure LDIF01U4Q with exptime 2923.0 seconds at cenwave 1280 Å and FP-POS 4.

It's a slightly different procedure to add a new exposure to the list rather than remove one.

Here we want to read the table in the fits association file into an astropy Table. We can then add a row into the right spot, filling it with the new file's MEMNAME, MEMTYPE, and MEMPRSNT. Finally, we have to save this table into the existing fits association file.

In [9]:
asn_orig_table = Table.read(asnfile) # Read in original data from the file
asn_orig_table.insert_row(len(asn_orig_table)- 1 , [asn2_fppos1_name,'EXP-FP',1]) # Add a row with the right name after all the original EXP-FP's
new_table = fits.BinTableHDU(asn_orig_table) # Turn this into a fits Binary Table HDU 

with fits.open(asnfile, mode='update') as hdulist: # We need to change data with the asnfile opened and in 'update' mode
    hdulist[1].data = new_table.data  # Change the orig file's data to the new table data we made

Now, we can see there is a new row with our exposure from the other asn file group: LDIF02NMQ.

In [10]:
Table.read(asnfile)
Out[10]:
Table length=6
MEMNAMEMEMTYPEMEMPRSNT
bytes14bytes14bool
LDIF02NMQEXP-FP1
LDIF02NSQEXP-FP1
LDIF02NUQEXP-FP1
LDIF02NWQEXP-FP1
LDIF01TYQEXP-FP1
LDIF02010PROD-FP1

Excellent. In the next section we will create a new association file from scratch.

3. Creating an entirely new association file

For the sake of demonstration, we will generate a new association file with four exposure members: even-numbered FP-POS (2,4) from the first original association (LDIF01010), and odd-numbered FP-POS (1,3) from from the second original association (LDIF02010).

From section 2, we see that this corresponds to :

Name Original asn FP-POS
LDIF02010 LDIF02NMQ 1
LDIF01010 LDIF01U0Q 2
LDIF02010 LDIF02NUQ 3
LDIF01010 LDIF01U4Q 4

3.1. Simplest method

Below, we manually build up an association file from the three necessary columns:

  1. MEMNAME
  2. MEMTYPE
  3. MEMPRSNT
In [11]:
# Adding the exposure file details to the association table
new_asn_memnames = ['LDIF02NMQ','LDIF01U0Q','LDIF02NUQ','LDIF01U4Q'] # MEMNAME
types = ['EXP-FP', 'EXP-FP', 'EXP-FP', 'EXP-FP'] # MEMTYPE
included = [True, True, True, True] # MEMPRSNT

# Adding the ASN details to the end of the association table
new_asn_memnames.append('ldifcombo'.upper()) # MEMNAME column
types.append('PROD-FP') # MEMTYPE column
included.append(True) # MEMPRSNT column

# Putting together the fits table
#   40 is the number of characters allowed in this field with the MEMNAME format = 40A. 
#    If your rootname is longer than 40, you will need to increase this
c1 = fits.Column(name='MEMNAME', array=np.array(new_asn_memnames), format='40A') 
c2 = fits.Column(name='MEMTYPE', array=np.array(types), format='14A')
c3 = fits.Column(name='MEMPRSNT', format='L', array=included)
asn_table = fits.BinTableHDU.from_columns([c1, c2, c3])

# Writing the fits table
asn_table.writeto(outputdir / 'ldifcombo_asn.fits', overwrite=True)

print('Saved: '+ 'ldifcombo_asn.fits'+ f" in the output directory: {outputdir}")
Saved: ldifcombo_asn.fits in the output directory: output

Examining the file we have created:

We see that the data looks correct - exactly the table we want!

In [12]:
Table.read(outputdir / 'ldifcombo_asn.fits')
Out[12]:
Table length=5
MEMNAMEMEMTYPEMEMPRSNT
bytes40bytes14bool
LDIF02NMQEXP-FPTrue
LDIF01U0QEXP-FPTrue
LDIF02NUQEXP-FPTrue
LDIF01U4QEXP-FPTrue
LDIFCOMBOPROD-FPTrue

However, the 0th and 1st fits headers no longer contain useful information about the data:

In [13]:
fits.getheader(outputdir / 'ldifcombo_asn.fits', ext=0)
Out[13]:
SIMPLE  =                    T / conforms to FITS standard                      
BITPIX  =                    8 / array data type                                
NAXIS   =                    0 / number of array dimensions                     
EXTEND  =                    T                                                  
In [14]:
fits.getheader(outputdir / 'ldifcombo_asn.fits', ext=1)
Out[14]:
XTENSION= 'BINTABLE'           / binary table extension                         
BITPIX  =                    8 / array data type                                
NAXIS   =                    2 / number of array dimensions                     
NAXIS1  =                   55 / length of dimension 1                          
NAXIS2  =                    5 / length of dimension 2                          
PCOUNT  =                    0 / number of group parameters                     
GCOUNT  =                    1 / number of groups                               
TFIELDS =                    3 / number of table fields                         
TTYPE1  = 'MEMNAME '                                                            
TFORM1  = '40A     '                                                            
TTYPE2  = 'MEMTYPE '                                                            
TFORM2  = '14A     '                                                            
TTYPE3  = 'MEMPRSNT'                                                            
TFORM3  = 'L       '                                                            

3.2. With fits header metadata

We can instead build up a new file with our old file's fits header, and alter it to reflect our changes.

We first build a new association file, a piecewise combination of our original file's headers and our new table:

In [15]:
with fits.open(asnfile, mode='readonly') as hdulist: # Open up the old asn file
    hdulist.info() # Shows the first hdu is empty except for the header we want
    hdu0 = hdulist[0] # We want to directly copy over the old 0th header/data-unit AKA "hdu":
                        # essentially a section of data and its associated metadata, called a "header"
                        # see https://fits.gsfc.nasa.gov/fits_primer.html for info on fits structures
    d0 = hdulist[0].data # gather the data from the header/data unit to allow the readout
    h1 = hdulist[1].header # gather the header from the 1st header/data unit to copy to our new file
    
hdu1 = fits.BinTableHDU.from_columns([c1, c2, c3], header=h1) # Put together new 1st hdu from old header and new data

new_HDUlist = fits.HDUList([hdu0,hdu1]) # New HDUList from old HDU 0 and new combined HDU 1
new_HDUlist.writeto(outputdir / 'ldifcombo_2_asn.fits', overwrite=True) # Write this out to a new file
new_asnfile = outputdir / 'ldifcombo_2_asn.fits' # Path to this new file
print('\nSaved: '+ 'ldifcombo_2_asn.fits'+ f"in the output directory: {outputdir}")
Filename: data/ldif02010_asn.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU      43   ()      
  1  ASN           1 BinTableHDU     24   6R x 3C   [14A, 14A, L]   

Saved: ldifcombo_2_asn.fitsin the output directory: output

Now we edit the relevant values in our fits headers that are different from the original.

Note: It is possible that a generic fits file may have different values you may wish to change. It is highly recommended to examine your fits headers.

In [16]:
date = datetime.date.today() # Find today's date
# Below, make a dict of what header values we want to change, corresponding to [new value, extension the value lives in, 2nd extension if applies]
keys_to_change = {'DATE':[f'{date.year}-{date.month}-{date.day}',0], 'FILENAME':['ldifcombo_2_asn.fits',0],
                      'ROOTNAME':['ldifcombo_2',0,1], 'ASN_ID':['ldifcombo_2',0], 'ASN_TAB':['ldifcombo_2_asn.fits',0], 'ASN_PROD':['False',0],
                     'EXTVER':[2,1], 'EXPNAME':['ldifcombo_2',1]}
# Actually change the values below (verbosely):
for keyval in keys_to_change.items():
    print(f"Editing {keyval[0]} in Extension {keyval[1][1]}")
    fits.setval(filename=new_asnfile, keyword=keyval[0], value=keyval[1][0], ext=keyval[1][1])
    # Below is necessary as some keys are repeated in both headers ('ROOTNAME')
    if len(keyval[1])>2:
        print(f"Editing {keyval[0]} in Extension {keyval[1][2]}")
        fits.setval(filename=new_asnfile, keyword=keyval[0], value= keyval[1][0], ext=keyval[1][2])
Editing DATE in Extension 0
Editing FILENAME in Extension 0
Editing ROOTNAME in Extension 0
Editing ROOTNAME in Extension 1
Editing ASN_ID in Extension 0
Editing ASN_TAB in Extension 0
Editing ASN_PROD in Extension 0
Editing EXTVER in Extension 1
Editing EXPNAME in Extension 1

And now we have created our new association file. The file is now ready to be used in the CalCOS pipeline!

If you're interested in testing your file by running it through the CalCOS pipeline, you may wish to run the test_asn.py file included in this subdirectory of the GitHub repository. i.e. from the command line:

$ python test_asn.py

Note that you must first...

  1. Have created the file by running this Notebook
  2. Alter line 21 of test_asn.py to set the lref directory to wherever you have your cache of CRDS reference files (see our Setup Notebook).

If the test runs successfully, it will create a plot in the subdirectory ./output/plots/ .

Congratulations! You finished this Notebook!

There are more COS data walkthrough Notebooks on different topics. You can find them here.


About this Notebook

Author: Nat Kerman: nkerman@stsci.edu

Contributors: Elaine Mae Frazer

Updated On: 2021-7-06

This tutorial was generated to be in compliance with the STScI style guides and would like to cite the Jupyter guide in particular.

Citations

If you use astropy, matplotlib, astroquery, or numpy for published research, please cite the authors. Follow these links for more information about citations:


Top of Page Space Telescope Logo