JWST Engineering Data Retrieval#

This tutorial will show you how to retrieve JWST engineeering data and use it in the context of a Python session. The Engineering Data chapter of the JWST Archive Manual describes how data from thousands of engineering telemetry points on JWST are stored in the Engineering Database in the form of timeseries. These data may be searched by means of an identifier, or mnenomic.

Some quantities of interest require more than one mnemonic (a tuple) for meaningful analysis. This tutorial illustrates how to retrieve a tuple of mnenomics and visualize the result. In the following example, timeseries will be retrieved for mnemonics: SA_ZADUCMDX and SA_ZADUCMDY which are the x- and y- angles on the sky of the JWST Fine Steering Mirror (FSM). For more details on constructing a mnemonic, see the Engineering Data page.

Note that this folder includes a companion script; after completing the tutorial, this offers a compact, customizable way to download the data.

The workflow consists of:

  • Setup

  • Downloading Data

    • Define the attributes for the mnemonics of interest

    • Construct the filenames to contain the mnemonic timeseries

    • Call the web service to fetch the data and return files containing the timeseries

    • Prepare the data for analysis

  • Visualize the data

    • Split the data into mini-series at time boundaries

    • Plot the timeseries

  • Additional resources

Setup#

Begin by importing the relevant Python libraries to retrieve data.

  • os for handling file separators, i.e. “/” on Unix-like machines and “" on Windows

  • urllib to complete the web request

  • pathlib to create a directory for the downloaded files

  • pandas for convenient data manipulation

import os
import urllib.error
import urllib.request
from pathlib import Path
import pandas as pd
/tmp/ipykernel_1955/1128306300.py:5: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd

Below is a function to connect to the EDB web service and retrieve the data files. It will be used later in this tutorial.

def download_edb_datafiles(filenames, folder, prefix=''):
    '''
    Download filenames to directory
    
    Parameters
    ----------
    filenames : iterable
        List of string-valued file names to contain the desired mnemonic timeseries
    folder: str
        Directory (relative to cwd) in which to write output files
    prefix: str
        Prefix in MAST server URL (blank except for developer testing)
        
    Returns
    -------
    int
       Success status for each mnemonic retrieval
    '''
            
    Path(folder).mkdir(exist_ok=True)
    
    urlStr = "https://mast.stsci.edu{}/api/v0.1/Download/file?uri=mast:jwstedb/{}"
    status = 0
    for fname in filenames:
        print(
            f"Downloading File: mast:jwstedb/{fname}\n",
            f" To: {folder}/{fname}",
        )
        url = urlStr.format(prefix,fname)
        try:
            urllib.request.urlretrieve(url, filename=f"{folder}/{fname}")
        except urllib.error.HTTPError:
            print("  ***Error downloading file***")
            status = 1
    
    return status

Downloading Data#

To download data, you’ll need to format the request correctly. That requires defining nmemonics, naming files to match, and then calling the webservice to begin the download.

Define Mnemonic Parameters#

Next, define the parameters of each mnemonic of interest. The parameters are:

  • The mnenomic name

  • Start time

  • End time

The start and end times are in UTC and have a “compact” ISO-8601 formatting: yyyymmddThhmmss, where the T is a literal character. The definitions can be stored in multiple ways: here they will be stored in a Python dictionary, which could be stored in an external .yaml file. In the companion script they are stored in an external .csv file.

Since the mnemonics of interest are a tuple, the start/end times are the same: from 00:00 to 06:00 on 2022 July 01. Define these times first, followed by the full parameter dictionary.

times = { 
         't_start': '20220701T000000',
         't_end':   '20220701T030000'
        }
mnemonics = {
            'SA_ZADUCMDX': times,
            'SA_ZADUCMDY': times
           }
for m,v in mnemonics.items():
    print(m, v)
SA_ZADUCMDX {'t_start': '20220701T000000', 't_end': '20220701T030000'}
SA_ZADUCMDY {'t_start': '20220701T000000', 't_end': '20220701T030000'}

Construct File Names#

The key to fetching data from the web service is to construct file names to contain the data for each mnemonic. The web service will parse the file names to determine how to query the engineering database and retrieve the timeseries of interest.

The file names have the form:

`<mnemonic_name>-<t_start>-<t_end>.csv`

Use a dictionary comprehension to construct a list of file names; these will be passed to the webservice calling function.

fnames = ['-'.join([m, v['t_start'], v['t_end']]) + '.csv' for m,v in mnemonics.items()]
print(fnames)
['SA_ZADUCMDX-20220701T000000-20220701T030000.csv', 'SA_ZADUCMDY-20220701T000000-20220701T030000.csv']

Call the Webservice#

Set the (optional) output folder name prior to the webservice call.

# Sub-directory where the data files will be written:
subdir = 'edb-data'

Now call the EDB web service. The files containing data will be written to your local storage, in the specified subdirectory.

The webservice may take a long time (or timeout), depending upon the quantity of data in the timeseries within the chosen date range. The three-hour time interval in this example returns over 40,000 rows.
status = download_edb_datafiles(fnames, folder=subdir)
Downloading File: mast:jwstedb/SA_ZADUCMDX-20220701T000000-20220701T030000.csv
  To: edb-data/SA_ZADUCMDX-20220701T000000-20220701T030000.csv
Downloading File: mast:jwstedb/SA_ZADUCMDY-20220701T000000-20220701T030000.csv
  To: edb-data/SA_ZADUCMDY-20220701T000000-20220701T030000.csv

Prepare the Data for Analysis#

Create a list of Pandas dataframes from the mnemonics data that were just written to disk.

df = [pd.read_csv(subdir+os.path.sep+f) for f in fnames]

Make sure the sizes of the dataframes are equal, and take a look at the first dataframe.

print('Dataframes have the same size? {}'.format(len(df[0]) == len(df[1])))
df[0]
Dataframes have the same size? True
theTime MJD euvalue sqldataType
0 2022-06-30 23:59:59.839000 59760.999998 0.151660 float
1 2022-07-01 00:00:00.095000 59761.000001 0.150761 float
2 2022-07-01 00:00:00.351000 59761.000004 0.150312 float
3 2022-07-01 00:00:00.607000 59761.000007 0.149759 float
4 2022-07-01 00:00:00.863000 59761.000010 0.148798 float
... ... ... ... ...
42185 2022-07-01 02:59:59.189000 59761.124991 0.140328 float
42186 2022-07-01 02:59:59.445000 59761.124994 0.139679 float
42187 2022-07-01 02:59:59.701000 59761.124997 0.139530 float
42188 2022-07-01 02:59:59.957000 59761.125000 0.138832 float
42189 2022-07-01 03:00:00.213000 59761.125002 0.138384 float

42190 rows × 4 columns

Visualize the Data Tuple#

Create an x-y plot for analysis. This is easy to do by plotting the Pandas dataframes. It is more interesting to add color to indicate successive moves of the FSM. Begin by loading a numeric and some Bokeh plotting libraries.

from bokeh.io import output_notebook
import bokeh.plotting as bp
from bokeh.models import ColorBar, SingleIntervalTicker, FixedTicker, Range1d, Span
from bokeh.palettes import Spectral10 as cm
from bokeh.transform import linear_cmap

import numpy as np

# The following method is needed for bokeh display in a Notebook.
# Note that it does not activate the display. This happens in the 'Plot Timeseries' section.
output_notebook()
Loading BokehJS ...

Identify Subseries in the Data#

Engineering data may contain periods of sampling between observations where the returned values do not change. The following function attempts to break up the timeseries by looking for these stretches of unchanging values.

def find_breaks(x_data, y_data, max_flats=5):
    """
    Parameters
    ----------
    x_data : pandas.DataFrame
        X-axis timeseries data.
    y_data : pandas.DataFrame 
        Y-axis timeseries data.
    max_flats : int, default=5
        After this many data points with unchanging values, timeseries data will be broken up.
        
    Returns
    -------
    list of pandas.DataFrame
        Each DataFrame contains a continuous set of changing EDB timeseries data with X/Y-paired values.
    """
    
    # Get the MJD and position values out of the DataFrames.
    x_vals = x_data['euvalue'].values
    x_dates = x_data['MJD'].values
    y_vals = y_data['euvalue'].values
    y_dates = y_data['MJD'].values
    
    # Combine the x and y data into a single DataFrame.
    xy_frame = pd.DataFrame(data=x_dates, columns=['MJD'])
    xy_frame['timestamp'] = x_data['theTime'].values
    xy_frame['x_value'] = x_vals
    xy_frame['y_value'] = y_vals
    
    # Scan the timeseries data to look for flat periods of no reading change.
    results = []
    m = 0
    flat = 0
    recording = True
    
    for n in range(1, len(x_vals)):
        
        # Make sure the x and y timestamps match.
        if x_dates[n] == y_dates[n]:
            
            # Calculate the distance from the current positions to the following.
            x_diff = np.abs(x_vals[n-1] - x_vals[n])
            y_diff = np.abs(y_vals[n-1] - y_vals[n])
            
            # Multiple points with no change will stop recording and store the current series.
            if (x_diff == 0 and y_diff == 0):
                flat += 1
                if not recording:
                    continue
                elif flat >= max_flats:
                    size = (n-max_flats) - m
                    if size > 1:
                        results.append(xy_frame[m:n-(max_flats)])
                    recording = False
                    
            # Start recording if changes detected.
            elif (x_diff > 0 or y_diff > 0) and not recording:
                flat = 0
                m = n
                recording = True
    
    # Capture the final series if still recording.
    if recording and (n - m) > 1:
        results.append(xy_frame[m:])
    
    print("returning {} timeseries".format(len(results)))
    
    return results

Report the start/end times of each identified subseries. Since there are many of them, we print only the last result as a sample.

split_series = find_breaks(df[0], df[1], max_flats=5)
for ss in split_series:
    v = ss['timestamp'].values

# Inserting this print statement into the for loop will print all 41 timeseries
print("    {0} - {1}".format(v[0], v[-1]))
returning 41 timeseries
    2022-07-01 02:57:26.101000 - 2022-07-01 03:00:00.213000

Plot the Segmented Timeseries#

The following function plots a single subseries of the x/y paired data and applies a color gradiant based on the associated time stamps.

def plot_x_v_y_color(data):
    """
    Plot x-versus-y timeseries data with color mapping based on the timing.
    
    Parameters
    ----------
    data : pandas.DataFrame
        A combined x & y timeseries data set.
    """
    
    mjd = data['MJD']
    n_ticks = 10
    
    # Create a bokeh.plotting figure object.
    n = bp.figure(height=600, width=900, match_aspect=True)
    
    # Set up a linear color map based on the MJD data.
    mapper = linear_cmap(field_name='MJD', palette=cm ,low=min(mjd) ,high=max(mjd))
    
    # Add lines to make 0 axis a bit more obvious.
    lw = 1.3
    vline = Span(location=0, dimension='height', line_color='black', line_width=lw)
    hline = Span(location=0, dimension='width', line_color='black', line_width=lw)
    n.renderers.extend([vline, hline])
    
    # Add a circle plot of x vs y with the color map applied.
    n.circle(source=data, x='x_value', y='y_value', fill_alpha=0.6, fill_color=mapper, line_color=None)
    
    # Translate legend values from MJD to time stamps.
    indices = list(range(0, len(mjd), int(len(mjd)/n_ticks)))
    tick_dict = {mjd.values[x]: data['timestamp'].values[x] for x in indices}
    ticks = FixedTicker(ticks=list(tick_dict.keys()))
    
    # Add some labels to our axes
    n.xaxis.axis_label = "FSM x-axis commanded angle (arcsec)"
    n.yaxis.axis_label = "FSM y-axis commanded angle (arcsec)"
    
    # Add a color bar legend for the MJD data.
    color_bar = ColorBar(color_mapper=mapper['transform'], 
                         width=12,
                         ticker=ticks,
                         major_label_overrides=tick_dict,
                         location=(0,0), 
                         label_standoff=45,
                         )
    n.add_layout(color_bar, 'right')
    
    # Display the figure.
    bp.show(n)

In the following command you can update the index to change which split timeseries you are plotting. Once the plot renders, use the plot control tools in the upper right to pan, zoom, and save the plot.

plot_x_v_y_color(split_series[0])
len(split_series)
41

Additional Resources#

About this Notebook#

This notebook was developed by MAST staff, chiefly Dick Shaw, Peter Forshay, and Bernie Shiao. Additional editing was provided by Thomas Dutkiewicz.

Last updated: October 2023

For support, please contact the Archive HelpDesk at archive@stsci.edu.


Space Telescope Logo

Return to top of page