JWST Engineering Data Retrieval

JWST Engineering Data Retrieval#

This tutorial will show you how to retrieve JWST engineeering data and use it in the context of a Python session. The Engineering Data chapter of the JWST Archive Manual describes how data from thousands of engineering telemetry points on JWST are stored in the Engineering Database in the form of timeseries. These data may be searched by means of an identifier, or mnenomic.

Some quantities of interest require more than one mnemonic (a tuple) for meaningful analysis. This tutorial illustrates how to retrieve a tuple of mnenomics and visualize the result. In the following example, timeseries will be retrieved for mnemonics: SA_ZADUCMDX and SA_ZADUCMDY which are the x- and y- angles on the sky of the JWST Fine Steering Mirror (FSM). For more details on constructing a mnemonic, see the Engineering Data page.

Note that this folder includes a companion script; after completing the tutorial, this offers a compact, customizable way to download the data.

The workflow consists of:

Setup
Downloading Data
- Define the attributes for the mnemonics of interest
- Construct the filenames to contain the mnemonic timeseries
- Call the web service to fetch the data and return files containing the timeseries
- Prepare the data for analysis
Visualize the data
- Split the data into mini-series at time boundaries
- Plot the timeseries
Additional resources

Setup#

Begin by importing the relevant Python libraries to retrieve data.

os for handling file separators, i.e. “/” on Unix-like machines and “" on Windows
urllib to complete the web request
pathlib to create a directory for the downloaded files
pandas for convenient data manipulation

import os
import urllib.error
import urllib.request
from pathlib import Path
import pandas as pd

Below is a function to connect to the EDB web service and retrieve the data files. It will be used later in this tutorial.

def download_edb_datafiles(filenames, folder, prefix=''):
    '''
    Download filenames to directory
    
    Parameters
    ----------
    filenames : iterable
        List of string-valued file names to contain the desired mnemonic timeseries
    folder: str
        Directory (relative to cwd) in which to write output files
    prefix: str
        Prefix in MAST server URL (blank except for developer testing)
        
    Returns
    -------
    int
       Success status for each mnemonic retrieval
    '''
            
    Path(folder).mkdir(exist_ok=True)
    
    urlStr = "https://mast.stsci.edu{}/api/v0.1/Download/file?uri=mast:jwstedb/{}"
    status = 0
    for fname in filenames:
        print(
            f"Downloading File: mast:jwstedb/{fname}\n",
            f" To: {folder}/{fname}",
        )
        url = urlStr.format(prefix, fname)
        try:
            urllib.request.urlretrieve(url, filename=f"{folder}/{fname}")
        except urllib.error.HTTPError:
            print("  ***Error downloading file***")
            status = 1
    
    return status

Downloading Data#

To download data, you’ll need to format the request correctly. That requires defining nmemonics, naming files to match, and then calling the webservice to begin the download.

Define Mnemonic Parameters#

Next, define the parameters of each mnemonic of interest. The parameters are:

The mnenomic name
Start time
End time

The start and end times are in UTC and have a “compact” ISO-8601 formatting: yyyymmddThhmmss, where the T is a literal character. The definitions can be stored in multiple ways: here they will be stored in a Python dictionary, which could be stored in an external .yaml file. In the companion script they are stored in an external .csv file.

Since the mnemonics of interest are a tuple, the start/end times are the same: from 00:00 to 06:00 on 2022 July 01. Define these times first, followed by the full parameter dictionary.

times = { 
         't_start': '20220701T000000',
         't_end':   '20220701T030000'
        }
mnemonics = {
            'SA_ZADUCMDX': times,
            'SA_ZADUCMDY': times
           }
for m, v in mnemonics.items():
    print(m, v)

SA_ZADUCMDX {'t_start': '20220701T000000', 't_end': '20220701T030000'}
SA_ZADUCMDY {'t_start': '20220701T000000', 't_end': '20220701T030000'}

Construct File Names#

The key to fetching data from the web service is to construct file names to contain the data for each mnemonic. The web service will parse the file names to determine how to query the engineering database and retrieve the timeseries of interest.

The file names have the form:

`<mnemonic_name>-<t_start>-<t_end>.csv`

Use a dictionary comprehension to construct a list of file names; these will be passed to the webservice calling function.

fnames = ['-'.join([m, v['t_start'], v['t_end']]) + '.csv' for m, v in mnemonics.items()]
print(fnames)

['SA_ZADUCMDX-20220701T000000-20220701T030000.csv', 'SA_ZADUCMDY-20220701T000000-20220701T030000.csv']

Call the Webservice#

Set the (optional) output folder name prior to the webservice call.

# Sub-directory where the data files will be written:
subdir = 'edb-data'

Now call the EDB web service. The files containing data will be written to your local storage, in the specified subdirectory.

The webservice may take a long time (or timeout), depending upon the quantity of data in the timeseries within the chosen date range. The three-hour time interval in this example returns over 40,000 rows.

status = download_edb_datafiles(fnames, folder=subdir)

Downloading File: mast:jwstedb/SA_ZADUCMDX-20220701T000000-20220701T030000.csv
  To: edb-data/SA_ZADUCMDX-20220701T000000-20220701T030000.csv

Downloading File: mast:jwstedb/SA_ZADUCMDY-20220701T000000-20220701T030000.csv
  To: edb-data/SA_ZADUCMDY-20220701T000000-20220701T030000.csv

Prepare the Data for Analysis#

Create a list of Pandas dataframes from the mnemonics data that were just written to disk.

df = [pd.read_csv(subdir+os.path.sep+f) for f in fnames]

Make sure the sizes of the dataframes are equal, and take a look at the first dataframe.

print('Dataframes have the same size? {}'.format(len(df[0]) == len(df[1])))
df[0]

Dataframes have the same size? True

	theTime	MJD	euvalue	sqldataType
0	2022-06-30 23:59:59.839000	59760.999998	0.151660	float
1	2022-07-01 00:00:00.095000	59761.000001	0.150761	float
2	2022-07-01 00:00:00.351000	59761.000004	0.150312	float
3	2022-07-01 00:00:00.607000	59761.000007	0.149759	float
4	2022-07-01 00:00:00.863000	59761.000010	0.148798	float
...	...	...	...	...
42185	2022-07-01 02:59:59.189000	59761.124991	0.140328	float
42186	2022-07-01 02:59:59.445000	59761.124994	0.139679	float
42187	2022-07-01 02:59:59.701000	59761.124997	0.139530	float
42188	2022-07-01 02:59:59.957000	59761.125000	0.138832	float
42189	2022-07-01 03:00:00.213000	59761.125002	0.138384	float

42190 rows × 4 columns

Visualize the Data Tuple#

Create an x-y plot for analysis. This is easy to do by plotting the Pandas dataframes. It is more interesting to add color to indicate successive moves of the FSM. Begin by loading a numeric and some Bokeh plotting libraries.

from bokeh.io import output_notebook
import bokeh.plotting as bp
from bokeh.models import ColorBar, FixedTicker, Span
from bokeh.palettes import Spectral10 as cm
from bokeh.transform import linear_cmap

import numpy as np

# The following method is needed for bokeh display in a Notebook.
# Note that it does not activate the display. This happens in the 'Plot Timeseries' section.
output_notebook()

Loading BokehJS ...

Identify Subseries in the Data#

Engineering data may contain periods of sampling between observations where the returned values do not change. The following function attempts to break up the timeseries by looking for these stretches of unchanging values.

def find_breaks(x_data, y_data, max_flats=5):
    """
    Parameters
    ----------
    x_data : pandas.DataFrame
        X-axis timeseries data.
    y_data : pandas.DataFrame 
        Y-axis timeseries data.
    max_flats : int, default=5
        After this many data points with unchanging values, timeseries data will be broken up.
        
    Returns
    -------
    list of pandas.DataFrame
        Each DataFrame contains a continuous set of changing EDB timeseries data with X/Y-paired values.
    """
    
    # Get the MJD and position values out of the DataFrames.
    x_vals = x_data['euvalue'].values
    x_dates = x_data['MJD'].values
    y_vals = y_data['euvalue'].values
    y_dates = y_data['MJD'].values
    
    # Combine the x and y data into a single DataFrame.
    xy_frame = pd.DataFrame(data=x_dates, columns=['MJD'])
    xy_frame['timestamp'] = x_data['theTime'].values
    xy_frame['x_value'] = x_vals
    xy_frame['y_value'] = y_vals
    
    # Scan the timeseries data to look for flat periods of no reading change.
    results = []
    m = 0
    flat = 0
    recording = True
    
    for n in range(1, len(x_vals)):
        
        # Make sure the x and y timestamps match.
        if x_dates[n] == y_dates[n]:
            
            # Calculate the distance from the current positions to the following.
            x_diff = np.abs(x_vals[n-1] - x_vals[n])
            y_diff = np.abs(y_vals[n-1] - y_vals[n])
            
            # Multiple points with no change will stop recording and store the current series.
            if (x_diff == 0 and y_diff == 0):
                flat += 1
                if not recording:
                    continue
                elif flat >= max_flats:
                    size = (n-max_flats) - m
                    if size > 1:
                        results.append(xy_frame[m:n-(max_flats)])
                    recording = False
                    
            # Start recording if changes detected.
            elif (x_diff > 0 or y_diff > 0) and not recording:
                flat = 0
                m = n
                recording = True
    
    # Capture the final series if still recording.
    if recording and (n - m) > 1:
        results.append(xy_frame[m:])
    
    print("returning {} timeseries".format(len(results)))
    
    return results

Report the start/end times of each identified subseries. Since there are many of them, we print only the last result as a sample.

split_series = find_breaks(df[0], df[1], max_flats=5)
for ss in split_series:
    v = ss['timestamp'].values

# Inserting this print statement into the for loop will print all 41 timeseries
print("    {0} - {1}".format(v[0], v[-1]))

returning 41 timeseries
    2022-07-01 02:57:26.101000 - 2022-07-01 03:00:00.213000

Plot the Segmented Timeseries#

The following function plots a single subseries of the x/y paired data and applies a color gradiant based on the associated time stamps.

def plot_x_v_y_color(data):
    """
    Plot x-versus-y timeseries data with color mapping based on the timing.
    
    Parameters
    ----------
    data : pandas.DataFrame
        A combined x & y timeseries data set.
    """
    
    mjd = data['MJD']
    n_ticks = 10
    
    # Create a bokeh.plotting figure object.
    n = bp.figure(height=600, width=900, match_aspect=True)
    
    # Set up a linear color map based on the MJD data.
    mapper = linear_cmap(field_name='MJD', palette=cm, low=min(mjd), high=max(mjd))
    
    # Add lines to make 0 axis a bit more obvious.
    lw = 1.3
    vline = Span(location=0, dimension='height', line_color='black', line_width=lw)
    hline = Span(location=0, dimension='width', line_color='black', line_width=lw)
    n.renderers.extend([vline, hline])
    
    # Add a circle plot of x vs y with the color map applied.
    radius = (data['x_value'].max() - data['x_value'].min()) / 100  # Standardize the radius of points
    n.circle(source=data, x='x_value', y='y_value', fill_alpha=0.6, fill_color=mapper, line_color=None, radius=radius)
    
    # Translate legend values from MJD to time stamps.
    indices = list(range(0, len(mjd), int(len(mjd)/n_ticks)))
    tick_dict = {mjd.values[x]: data['timestamp'].values[x] for x in indices}
    ticks = FixedTicker(ticks=list(tick_dict.keys()))
    
    # Add some labels to our axes
    n.xaxis.axis_label = "FSM x-axis commanded angle (arcsec)"
    n.yaxis.axis_label = "FSM y-axis commanded angle (arcsec)"
    
    # Add a color bar legend for the MJD data.
    color_bar = ColorBar(color_mapper=mapper['transform'], 
                         width=12,
                         ticker=ticks,
                         major_label_overrides=tick_dict,
                         location=(0, 0), 
                         label_standoff=45,
                         )
    n.add_layout(color_bar, 'right')
    
    # Display the figure.
    bp.show(n)

In the following command you can update the index to change which split timeseries you are plotting. Once the plot renders, use the plot control tools in the upper right to pan, zoom, and save the plot.

plot_x_v_y_color(split_series[0])

len(split_series)

Additional Resources#

The JWST Engineering Database Portal
The Engineering Data chapter of the JWST Archive Manual.
For more information about retreiving parameter metadata, see the API for JWST Metadata Page. As an example, you might run a query to confirm the units on the above figure should be in arcseconds.

About this Notebook#

This notebook was developed by MAST staff, chiefly Dick Shaw, Peter Forshay, and Bernie Shiao. Additional editing was provided by Thomas Dutkiewicz.

Last updated: October 2023

For support, please contact the Archive HelpDesk at archive@stsci.edu.

Return to top of page