ASDF Example#

Use case: Create ASDF (Advanced Scientific Data Format) file from FITS file.
Data: CANDELS image of the COSMOS field.
Tools: asdf, gwcs, astrocut.
Cross-intrument: all instruments.
Documentation: This notebook is part of a STScI’s larger post-pipeline Data Analysis Tools Ecosystem.

Introduction#

JWST data files make use of the Advanced Scientific Data Format. The ASDF metadata are stored in a FITS extension. The JWST pipline software reads and writes these from the in-memory datamodels.

However, it is relatively straightforward to read and write a pure ASDF file, skipping FITS and datamodels entirely. This notebook illustrates some aspects of ASDF using a FITS file as a starting point.

Imports#

  • astrocut for getting the data via the astrocut service at MAST

  • the fits library from astropy for reading in the FITS file

  • the astropy coordinates SkyCoord object for dealing with celestial coordinates

  • matplotlib for making plots

  • asdf and the AsdfFile object

  • the astropy Table object for a notebook-friendly view of the header

  • Items from the modeling, coordinates and wcs libraries for an example of converting world coordinate system information from FITS keywords to a gwcs data structure.

from astrocut import fits_cut
from astropy.io import fits
from astropy.coordinates import SkyCoord
import matplotlib.pyplot as plt
import asdf
from asdf import AsdfFile

# For example 4
from astropy.table import Table

# For example 6
from astropy.modeling import models
from astropy import coordinates as coord
from astropy import units as u
from astropy.wcs import WCS
from gwcs.wcstools import wcs_from_fiducial
%matplotlib inline

Get the data#

We’ll grab a cutout from the CANDELS observations of COSMOS using astroquery.

url = "https://archive.stsci.edu/pub/hlsp/candels/cosmos/cos-tot/v1.0/"
input_files = [url + "hlsp_candels_hst_acs_cos-tot-sect23_f606w_v1.0_drz.fits"]
center_coord = SkyCoord("150.0946 2.38681", unit='deg')
cutout_size = [100, 100]
cutout_file = fits_cut(input_files, center_coord, cutout_size, single_outfile=True)
print(cutout_file)

Read in the FITS file, look at its structure and display the data#

cutout_hdulist = fits.open(cutout_file)
cutout_hdulist.info()

Pull apart the FITS components, for convenience later on.

data = cutout_hdulist[1].data
header0 = cutout_hdulist[0].header
header1 = cutout_hdulist[1].header
plt.imshow(data)
header1

Example 1: Store just the key-value pairs of the metadata in ASDF#

The basic asdf data structure is a dictionary. The astropy FITS header object acts like a python dictionary. We can copy it into a pure dictionary, which will be useful when we want to add the data.

tree1 = {**header1} 

One more line of code to turn it into asdf.

myfile = AsdfFile(tree1)

We won’t save this to a file yet. First, let’s inspect the tree.

myfile.tree

Example 2: Save the FITS header annotations#

The fits header includes comments for many of the keywords. It’s a bit clunky, but we can save those by storing a tuple instead of just the value in the dictionary. While we’re at it, let’s toss some of the FITS keywords that aren’t useful.

toss_these = ['XTENSION', 'BITPIX', 'NAXIS', 'NAXIS1', 'NAXIS2', 'CHECKSUM',
              'DATASUM', 'EXTNAME', 'FILETYPE', 'PCOUNT', 'GCOUNT',
              'IRAF-BPX', 'IRAF-MIN', 'IRAF-MAX', 'IRAFNAME', 'IRAFTYPE']
annotated_tree = {}
for card in header1.cards:
    if card[0] not in toss_these:
        annotated_tree[card[0]] = (card[1], card[2])
myfile = AsdfFile(annotated_tree)
myfile.tree

The value is now in the first element of each tuple. For example, to get just the value of CRVAL1, we could do something like the following.

myfile['CRVAL1'][0]
# Update O_EXT_NM key since it was blank and causing crashing while saving below.

myfile.tree['O_EXT_NM'] = 'Original Name Filler'

Example 3: View the file as a searchable table#

For both FITS and ASDF, it can sometimes be painful to search through a long header. This example shows one way of putting the ASDF file into an Astropy table, and then using the show_in_notebook method to provide an interactive tabular view. You can then search by keyword or sort by column (by clicking on the headers).

In this example, we pull apart two-element tuples into values and comments. For the other data structures, we just put them into the comment column.

def tree_to_table(tree):
    keys = list(tree.keys())
    values, other = ([] for i in range(2))
    for k in keys:
        try:
            values += [tree[k][0]]
            other += [tree[k][1]]
        except Exception as e:
            values += [None]
            other += [tree[k]]
            print("An error occured ", e)
    return Table([keys, values, other], names=['key', 'value', 'comment or data structure'])
t = tree_to_table(myfile.tree)
t.show_in_notebook()

Example 4: Add the data and write to a file#

So that’s the header. Now we just need to add the data to the dictionary. We can use any descriptive key we like. Maybe we should call it data.

myfile['data'] = data # Equivalent to myfile.tree['data'] = data
myfile['data']
type(myfile)
myfile.info()
myfile.write_to('myfile.asdf')

Read the asdf file from disk and look at the tree and the data

ff = asdf.open('myfile.asdf')
ff.tree
plt.imshow(ff['data'])

Example 5: Storing multiple extensions#

The perspicacious reader will have noticed that the previous examples only dealt with extension 1 of the FITS file, leaving the primary header out of the ASDF file. There is no prescribed way to arrange the multiple extensions of a FITS file into an ASDF file. One option would be to create a separate dictionary for each extension and then make a dictionary of these, e.g.:

    ext1, ext2 = dict(**header0), dict(**header1)
    tree = {'ext1':ext1, 'ext2':ext2}

In this case, that would be a bit silly, because the only information of potential value in the ASDF file might be the ORIGIN, DATE, PROC_VER, RA_OBJ and DEC_OBJ. However, looking at the extension, there is an ORIGIN there that will conflict with the ORIGIN. (Somewhat amusingly, they have different meanings, according to the comments.)

header0

One solution to the naming conflict might be to stuff this extra information into its own namespace, as a sub-item of the original dictionary.

keywords = ['ORIGIN', 'DATE', 'PROCVER', 'RA_OBJ', 'DEC_OBJ']
primary_header = {}
for card in header0.cards:
    if card[0] in keywords:
        primary_header[card[0]] = (card[1], card[2])
ff.tree['primary_header'] = primary_header
ff.tree

Example 6: Converting from FITS WCS keywords to a gwcs object#

This is a bit complicated, and not particularly advantageous for this example.

The generalized world-coordinate system, gwcs package is built to support complex mappings between detector coordinates and coordinates. In this case it’s overkill. But it does illustrate saving a complex data object to an asdf file. The gwcs package extends asdf to specify the wcs object. In doing so, it makes extensive use of the transforms that are defined in the asdf standard.

This example follows the instructions in the gwcs documentation. There is no rotation in this example, so we can follow the example that makes use of convenience function wcs_from_fiducial.

Down the road there should be some convenience methods for converting common FITS WCSs and dealing gracefully with the (unfortunately quite common) inconsistencies.

For convenience, let’s use and astropy wcs object to grab the world-coordinate system information. This particular example reveals a problem that is quite common in FITS files: redundant and potentially inconsistent WCS information. In this case, the file is using both the PC matrix approach and the CD matrix approach. We’ll take the PC matrix as “truth”.

fitswcs = WCS(header1)
print(fitswcs)

Grab some values from the wcs for improved readability of cells further down

crval1, crval2 = fitswcs.wcs.crval
cunit1, cunit2 = [u.Unit(cu) for cu in fitswcs.wcs.cunit]
pcmatrix = fitswcs.wcs.pc
cunit1, cunit2

To create a WCS from a pointing on the sky, as a minimum pass a sky coordinate and a projection to the function.

fiducial = coord.SkyCoord(crval1*cunit1, crval2*cunit2, frame='icrs')
tan = models.Pix2Sky_TAN()

Create a pipeline for the coordinate transformations. In this case apply a shift and then a rescaling. The function wcs_from_fiducial prepends these onto the sky projection.

trans = models.Shift(-crval1) & models.Shift(-crval2) |\
        models.Scale(-pcmatrix[0, 0]) & models.Scale(pcmatrix[1, 1])
wcsobj = wcs_from_fiducial(fiducial, projection=tan, transform=trans)
wcsobj

Get rid of the now obsolete FITS WCS keywords from the ASDF header. It’s blissfully short, and one could argue devoid of almost anything useful.

fits_wcs_keywords = [
    'CRPIX1', 'CRVAL1', 'CTYPE1', 'CD1_1', 'CD2_1', 'CRPIX2', 'CRVAL2', 
    'CTYPE2', 'CD1_2', 'CD2_2', 'WCSAXES', 'PC1_1', 'PC2_2', 'CDELT1', 
    'CDELT2', 'CUNIT1', 'CUNIT2', 'LONPOLE', 'LATPOLE', 'RADESYS']
[ff.tree.pop(old_kw, None) for old_kw in fits_wcs_keywords]

Add the wcs object to the tree. The gwcs package takes care of the machinery for serializing this object to ASDF.

ff.tree['wcs'] = wcsobj
t = tree_to_table(ff.tree)
t.show_in_notebook()