JWST Engineering Data Retrieval#
This tutorial will show you how to retrieve JWST engineeering data and use it in the context of a Python session. The Engineering Data chapter of the JWST Archive Manual describes how data from thousands of engineering telemetry points on JWST are stored in the Engineering Database in the form of timeseries. These data may be searched by means of an identifier, or mnenomic.
Some quantities of interest require more than one mnemonic (a tuple) for meaningful analysis. This tutorial illustrates how to retrieve a tuple of mnenomics and visualize the result. In the following example, timeseries will be retrieved for mnemonics: SA_ZADUCMDX
and SA_ZADUCMDY
which are the x- and y- angles on the sky of the JWST Fine Steering Mirror (FSM). For more details on constructing a mnemonic, see the Engineering Data page.
Note that this folder includes a companion script; after completing the tutorial, this offers a compact, customizable way to download the data.
The workflow consists of:
Setup
Downloading Data
Define the attributes for the mnemonics of interest
Construct the filenames to contain the mnemonic timeseries
Call the web service to fetch the data and return files containing the timeseries
Prepare the data for analysis
Visualize the data
Split the data into mini-series at time boundaries
Plot the timeseries
Additional resources
Setup#
Begin by importing the relevant Python libraries to retrieve data.
os
for handling file separators, i.e. “/” on Unix-like machines and “" on Windowsurllib
to complete the web requestpathlib
to create a directory for the downloaded filespandas
for convenient data manipulation
import os
import urllib.error
import urllib.request
from pathlib import Path
import pandas as pd
Below is a function to connect to the EDB web service and retrieve the data files. It will be used later in this tutorial.
def download_edb_datafiles(filenames, folder, prefix=''):
'''
Download filenames to directory
Parameters
----------
filenames : iterable
List of string-valued file names to contain the desired mnemonic timeseries
folder: str
Directory (relative to cwd) in which to write output files
prefix: str
Prefix in MAST server URL (blank except for developer testing)
Returns
-------
int
Success status for each mnemonic retrieval
'''
Path(folder).mkdir(exist_ok=True)
urlStr = "https://mast.stsci.edu{}/api/v0.1/Download/file?uri=mast:jwstedb/{}"
status = 0
for fname in filenames:
print(
f"Downloading File: mast:jwstedb/{fname}\n",
f" To: {folder}/{fname}",
)
url = urlStr.format(prefix,fname)
try:
urllib.request.urlretrieve(url, filename=f"{folder}/{fname}")
except urllib.error.HTTPError:
print(" ***Error downloading file***")
status = 1
return status
Downloading Data#
To download data, you’ll need to format the request correctly. That requires defining nmemonics, naming files to match, and then calling the webservice to begin the download.
Define Mnemonic Parameters#
Next, define the parameters of each mnemonic of interest. The parameters are:
The mnenomic name
Start time
End time
The start and end times are in UTC and have a “compact” ISO-8601 formatting: yyyymmddThhmmss
, where the T is a literal character. The definitions can be stored in multiple ways: here they will be stored in a Python dictionary, which could be stored in an external .yaml
file. In the companion script they are stored in an external .csv
file.
Since the mnemonics of interest are a tuple, the start/end times are the same: from 00:00 to 06:00 on 2022 July 01. Define these times first, followed by the full parameter dictionary.
times = {
't_start': '20220701T000000',
't_end': '20220701T030000'
}
mnemonics = {
'SA_ZADUCMDX': times,
'SA_ZADUCMDY': times
}
for m,v in mnemonics.items():
print(m, v)
SA_ZADUCMDX {'t_start': '20220701T000000', 't_end': '20220701T030000'}
SA_ZADUCMDY {'t_start': '20220701T000000', 't_end': '20220701T030000'}
Construct File Names#
The key to fetching data from the web service is to construct file names to contain the data for each mnemonic. The web service will parse the file names to determine how to query the engineering database and retrieve the timeseries of interest.
The file names have the form:
`<mnemonic_name>-<t_start>-<t_end>.csv`
Use a dictionary comprehension to construct a list of file names; these will be passed to the webservice calling function.
fnames = ['-'.join([m, v['t_start'], v['t_end']]) + '.csv' for m,v in mnemonics.items()]
print(fnames)
['SA_ZADUCMDX-20220701T000000-20220701T030000.csv', 'SA_ZADUCMDY-20220701T000000-20220701T030000.csv']
Call the Webservice#
Set the (optional) output folder name prior to the webservice call.
# Sub-directory where the data files will be written:
subdir = 'edb-data'
Now call the EDB web service. The files containing data will be written to your local storage, in the specified subdirectory.
status = download_edb_datafiles(fnames, folder=subdir)
Downloading File: mast:jwstedb/SA_ZADUCMDX-20220701T000000-20220701T030000.csv
To: edb-data/SA_ZADUCMDX-20220701T000000-20220701T030000.csv
Downloading File: mast:jwstedb/SA_ZADUCMDY-20220701T000000-20220701T030000.csv
To: edb-data/SA_ZADUCMDY-20220701T000000-20220701T030000.csv
Prepare the Data for Analysis#
Create a list of Pandas dataframes from the mnemonics data that were just written to disk.
df = [pd.read_csv(subdir+os.path.sep+f) for f in fnames]
Make sure the sizes of the dataframes are equal, and take a look at the first dataframe.
print('Dataframes have the same size? {}'.format(len(df[0]) == len(df[1])))
df[0]
Dataframes have the same size? True
theTime | MJD | euvalue | sqldataType | |
---|---|---|---|---|
0 | 2022-06-30 23:59:59.839000 | 59760.999998 | 0.151660 | float |
1 | 2022-07-01 00:00:00.095000 | 59761.000001 | 0.150761 | float |
2 | 2022-07-01 00:00:00.351000 | 59761.000004 | 0.150312 | float |
3 | 2022-07-01 00:00:00.607000 | 59761.000007 | 0.149759 | float |
4 | 2022-07-01 00:00:00.863000 | 59761.000010 | 0.148798 | float |
... | ... | ... | ... | ... |
42185 | 2022-07-01 02:59:59.189000 | 59761.124991 | 0.140328 | float |
42186 | 2022-07-01 02:59:59.445000 | 59761.124994 | 0.139679 | float |
42187 | 2022-07-01 02:59:59.701000 | 59761.124997 | 0.139530 | float |
42188 | 2022-07-01 02:59:59.957000 | 59761.125000 | 0.138832 | float |
42189 | 2022-07-01 03:00:00.213000 | 59761.125002 | 0.138384 | float |
42190 rows × 4 columns
Visualize the Data Tuple#
Create an x-y plot for analysis. This is easy to do by plotting the Pandas dataframes. It is more interesting to add color to indicate successive moves of the FSM. Begin by loading a numeric and some Bokeh plotting libraries.
from bokeh.io import output_notebook
import bokeh.plotting as bp
from bokeh.models import ColorBar, SingleIntervalTicker, FixedTicker, Range1d, Span
from bokeh.palettes import Spectral10 as cm
from bokeh.transform import linear_cmap
import numpy as np
# The following method is needed for bokeh display in a Notebook.
# Note that it does not activate the display. This happens in the 'Plot Timeseries' section.
output_notebook()
Identify Subseries in the Data#
Engineering data may contain periods of sampling between observations where the returned values do not change. The following function attempts to break up the timeseries by looking for these stretches of unchanging values.
def find_breaks(x_data, y_data, max_flats=5):
"""
Parameters
----------
x_data : pandas.DataFrame
X-axis timeseries data.
y_data : pandas.DataFrame
Y-axis timeseries data.
max_flats : int, default=5
After this many data points with unchanging values, timeseries data will be broken up.
Returns
-------
list of pandas.DataFrame
Each DataFrame contains a continuous set of changing EDB timeseries data with X/Y-paired values.
"""
# Get the MJD and position values out of the DataFrames.
x_vals = x_data['euvalue'].values
x_dates = x_data['MJD'].values
y_vals = y_data['euvalue'].values
y_dates = y_data['MJD'].values
# Combine the x and y data into a single DataFrame.
xy_frame = pd.DataFrame(data=x_dates, columns=['MJD'])
xy_frame['timestamp'] = x_data['theTime'].values
xy_frame['x_value'] = x_vals
xy_frame['y_value'] = y_vals
# Scan the timeseries data to look for flat periods of no reading change.
results = []
m = 0
flat = 0
recording = True
for n in range(1, len(x_vals)):
# Make sure the x and y timestamps match.
if x_dates[n] == y_dates[n]:
# Calculate the distance from the current positions to the following.
x_diff = np.abs(x_vals[n-1] - x_vals[n])
y_diff = np.abs(y_vals[n-1] - y_vals[n])
# Multiple points with no change will stop recording and store the current series.
if (x_diff == 0 and y_diff == 0):
flat += 1
if not recording:
continue
elif flat >= max_flats:
size = (n-max_flats) - m
if size > 1:
results.append(xy_frame[m:n-(max_flats)])
recording = False
# Start recording if changes detected.
elif (x_diff > 0 or y_diff > 0) and not recording:
flat = 0
m = n
recording = True
# Capture the final series if still recording.
if recording and (n - m) > 1:
results.append(xy_frame[m:])
print("returning {} timeseries".format(len(results)))
return results
Report the start/end times of each identified subseries. Since there are many of them, we print only the last result as a sample.
split_series = find_breaks(df[0], df[1], max_flats=5)
for ss in split_series:
v = ss['timestamp'].values
# Inserting this print statement into the for loop will print all 41 timeseries
print(" {0} - {1}".format(v[0], v[-1]))
returning 41 timeseries
2022-07-01 02:57:26.101000 - 2022-07-01 03:00:00.213000
Plot the Segmented Timeseries#
The following function plots a single subseries of the x/y paired data and applies a color gradiant based on the associated time stamps.
def plot_x_v_y_color(data):
"""
Plot x-versus-y timeseries data with color mapping based on the timing.
Parameters
----------
data : pandas.DataFrame
A combined x & y timeseries data set.
"""
mjd = data['MJD']
n_ticks = 10
# Create a bokeh.plotting figure object.
n = bp.figure(height=600, width=900, match_aspect=True)
# Set up a linear color map based on the MJD data.
mapper = linear_cmap(field_name='MJD', palette=cm ,low=min(mjd) ,high=max(mjd))
# Add lines to make 0 axis a bit more obvious.
lw = 1.3
vline = Span(location=0, dimension='height', line_color='black', line_width=lw)
hline = Span(location=0, dimension='width', line_color='black', line_width=lw)
n.renderers.extend([vline, hline])
# Add a circle plot of x vs y with the color map applied.
n.circle(source=data, x='x_value', y='y_value', fill_alpha=0.6, fill_color=mapper, line_color=None)
# Translate legend values from MJD to time stamps.
indices = list(range(0, len(mjd), int(len(mjd)/n_ticks)))
tick_dict = {mjd.values[x]: data['timestamp'].values[x] for x in indices}
ticks = FixedTicker(ticks=list(tick_dict.keys()))
# Add some labels to our axes
n.xaxis.axis_label = "FSM x-axis commanded angle (arcsec)"
n.yaxis.axis_label = "FSM y-axis commanded angle (arcsec)"
# Add a color bar legend for the MJD data.
color_bar = ColorBar(color_mapper=mapper['transform'],
width=12,
ticker=ticks,
major_label_overrides=tick_dict,
location=(0,0),
label_standoff=45,
)
n.add_layout(color_bar, 'right')
# Display the figure.
bp.show(n)
In the following command you can update the index to change which split timeseries you are plotting. Once the plot renders, use the plot control tools in the upper right to pan, zoom, and save the plot.
plot_x_v_y_color(split_series[0])
len(split_series)
41
Additional Resources#
The Engineering Data chapter of the JWST Archive Manual.
For more information about retreiving parameter metadata, see the API for JWST Metadata Page. As an example, you might run a query to confirm the units on the above figure should be in arcseconds.
About this Notebook#
This notebook was developed by MAST staff, chiefly Dick Shaw, Peter Forshay, and Bernie Shiao. Additional editing was provided by Thomas Dutkiewicz.
Last updated: October 2023
For support, please contact the Archive HelpDesk at archive@stsci.edu.
Return to top of page