Advanced Monitor Options ======================== In :doc:`creating_monitors` we outlined how to create a very simple monitor that produces a simple plot. In this section we will dive into options available for creating a more complex Monitor. .. note:: There are no further options for creating DataModels, and so the previous section should be referenced for the creation of new DataModels. Running the monitoring steps manually ------------------------------------- If there is a need to run the monitoring steps manually, ``BaseMonitor`` includes the following methods that can be called independently: - ``initialize_data``: retrieve data defined by the data model set the data attribute, create hover text. - ``run_analysis``: execute the track method, find outliers if defined, set notifications. - ``plot``: create the plotly figure (creates the html output) - ``write_figure``: save the plotly figure to an html file. - ``notify``: send notifications .. note:: If the monitoring steps are run individually, they must be executed in logical order. For example, ``initialize_data`` must be executed first, followed by ``run_analysis`` if the intent is to only execute the analysis portion of the monitor. Notifications ------------- The ``monitorframe`` frame work provides support for email notifications upon execution of a monitor. There are two steps for activating email notifications: 1. Define ``notification_settings`` in the new monitor class required keys: - *active*: turn the notifications on or off - *username*: user that's used for sending the messages - *recipients*: additional users that should be notified of results 2. Define the message that the monitor should send in ``set_notification`` For example: .. code-block:: python class MyMonitor(BaseMonitor): data_model = MyNewModel notification_settings = dict( active=True, username='user', recipients=['other@stsci.edu', 'other2@stsci.edu'] # recipients can also be a single string if there's only one ) plottype = 'line' x = 'col1' y = 'col2' def track(self): """Measure the mean of the first column""" return self.data.col1.mean() # Remember that data is a pandas DataFrame! def set_notification(self): return f'The mean of col1 is {self.results}!' # The return value of track is stored in the results attribute! .. note:: ``set_notification`` should return a string. Databases --------- ``monitorframe`` provides support for and an interface to two SQLite databases through the ``peewee`` ORM with one of these databases is for storing monitor data, while the other is used for storing monitoring analysis results. Each of these databases are created automatically when configured, and tables for those databases are also automatically created when the ingestion methods are called (``ingest`` for a DataModel and ``store_results`` for a Monitor). The configuration of these databases is discussed in :doc:`overview`. DataModel Database ^^^^^^^^^^^^^^^^^^ The main difference between the DataModel database and the Monitor results database is that while the Monitor results database is pre-defined (although broadly), the DataModel database is *not*. The table is constructed based on the input data, which means that each DataModel can have completely different sets of columns. This type of implementation does have a drawback though: due to the dynamic nature of how tables are defined, it's possible to have unintended consequences in how the data is ingested. In particular, it's possible to have duplicate entries. To protect against this issue, it's recommended that DataModels are defined with a ``primary_key`` attribute. This will prevent duplicate entries from being added to the database (an example of this is included in the :doc:`creating_monitors` section). Once the DataModel's database and table exist, the DataModel's ``model`` attribute can be utilized. The ``model`` attribute is a ``peewee.Model`` object that represents the DataModel's table and can be used to query the data stored there. Users can take advantage of the DataModel's ``model`` attribute when implementing a Monitor's ``get_data`` method. For examples of querying and filtering, see `peewee's Querying section `_. Monitor Results Database ^^^^^^^^^^^^^^^^^^^^^^^^ Each Monitor that is defined will automatically create a database table based on the name of the monitor if the ``store_results`` method is called (with the default method):: class, MyMonitor -> results database table name, "MyMonitor" The results table is defined with two columns: 1. ``Datetime`` 2. ``Result`` The ``Datetime`` column corresponds to the date and time that the monitor was executed. Each monitor that is derived from ``BaseMonitor`` will have a ``date`` attribute that is set when an instance of the monitor is created. ``date`` is a python ``datetime`` object, and will be stored in the "isoformat" The ``Result`` column is a JSON field. A JSON field is used to standardize the tables for each monitor while allowing for flexibility in what exactly each monitor stores in the table. The only caveat to this is that whatever results that users desire to store, must be compatible with python's ``json`` encoder and decoder which performs the following translations: .. table:: ============= ====== JSON Python ============= ====== object dict array list string str number (int) int number (real) float true True false False null None ============= ====== This means that whatever is intended to be stored should be composed of those Python data structures. There is some support for this with pandas. Both ``Series`` and ``DataFrame`` objects have a ``to_json`` method for automatically translating those data structures to JSON friendly structures. For more information on pandas' ``to_json`` method, see `this `_, and for more on Python's JSON encoder and decoder, see `their documentation `_. .. _custom-storage: The data database columns are defined based on the data recovered by the ``get_new_data`` method. Storing and accessing results ............................. ``BaseMonitor`` does provide a "default" attempt at storing the results, but for more complicated results (or just for more custom storage), a ``format_results`` method must be implemented. Building off of the previous ``MyMonitor`` example: .. code-block:: python def format_results(self): # Create a custom result with json-friendly python data structures results = { 'my result 1': self.data.col1.to_json # store the whole column if you want! 'my result mean': self.results # MyMonitor's track method returns the mean of col1 } return results The new entry will be created on execution, and if ``format_results`` has been implemented, that resulting object will be used. To query the Monitor's table for a specific result, ``results_table`` and the table's column definitions (which are used in querying) are available as attributes: .. code-block:: python monitor = MyMonitor() query_results = monitor.results_table # Returns all results as a peewee ModelSelect object # Further querying more_specific = query_results.where(monitor.datetime_col == '2019-04-23T14:07:03.500365') # Format rows as a list of dictionaries list(more_specific.dicts()) .. note:: If a Monitor has been defined, but has not been executed (specifically the ``store_results`` method), the database table for that monitor will not exist yet. In this case, the ``results_table`` property will be ``None``. For information on how to perform queries, see `peewee's documentation `_. Customizing Plotting -------------------- ``BaseMonitor`` provides some basic plotting functionality that produces ``ploty`` interactive plots. There are some additional options that can be set for controlling this basic plotting Setting a specific output file name or destination ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ By default, the resulting figure of a monitor derived from ``BaseMonitor`` will be given a name that is a combination of the monitor's class name and the date that the monitor instance was created, and will be placed in the current working directory. To change the path of the output file, assign ``output`` to a directory: .. code-block:: python class MyMonitor(BaseMonitor) data_model = MyNewModel ... output = '/new/path/to/file/' # For setting the path, but not the filename To change the name of the file, assign ``output`` to a full path: .. code-block:: python class MyMonitor(BaseMonitor) data_model = MyNewModel ... output = '/new/path/to/file/new_file_name.html' # For setting the path, but not the filename Adding a third dimension to the output ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The basic plotting functionality of ``BaseMonitor`` restricts the dimensionality to 3 dimensions at the maximum (it is basic after all). The third dimension is a *color* dimension supports either an array of the same shape as ``x`` and ``y``. To specify a color dimension to the data, simply set the ``z`` attribute. The third dimension can also be used to create an image plot. Adding additional information to the hover labels ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If additional information should be displayed on hover for each data point, that information should be included the data retrieved by the data model. For example, if in the simple line plot created in :doc:`creating_monitors` needed to also include a "name" for each data point, ``get_data`` would need to be modified like so: .. code-block:: python class MyNewModel(BaseDataModel): def get_new_data(self): reuturn { 'col1': [1, 2, 3], 'col2': [4, 5, 6], 'names': ['first', 'second', 'third'] } In the definition of the monitor, the new "names" column would need to be identified as a label: .. code-block:: python class MyMonitor(BaseMonitor): data_model = MyNewModel labels = ['names'] # List of column names in data that should be used in hover labels plottype = 'line' x = 'col1' y = 'col2' def track(self): """Measure the mean of the first column""" return self.data.col1.mean() # Remember that data is a pandas DataFrame! This will add each "name" to the corresponding point in the hover labels in the plotly figure. More complex plotting ^^^^^^^^^^^^^^^^^^^^^ For more complex plotting, ``plot`` should be overridden with whatever is needed, but ``plotly`` is still required. When a new instance of a monitor is created, a plotly figure is created automatically. .. note:: If subplots are needed, the ``subplots`` and ``subplots_layout`` attributes need to be defined in the monitor class. This is because the plotly figure object is different for subplots. To set the monitor to use a subplots figure: .. code-block:: python class MyMonitor(BaseMonitor): data_model = MyNewModel ... subplots = True subplot_layout = (2, 2) # 2x2 grid of plots The ``plot`` method should add whatever *traces* (``plotly``'s term) and *layouts* necessary to that monitor figure attribute: .. code-block:: python def plot(self): ... # Lot's of complicated plotting stuff that results in a "plot" trace object and a new layout object self.figure.add_trace(plot) self.figure['layout'].update(layout) If users want to integrate existing matplotlib plots without have to rewrite the entire plot, ``plotly``'s ``mpl_to_plotly`` function can be used: .. code-block:: python import plotly.tools as tls new_plotly = tls.mpl_to_plotly(existing_mpl_figure) This figure could then be assigned to the figure attribute on the monitor: .. code-block:: python def plot(self): self.figure = new_plotly Once plotting is all done, the figure can be written to an html file (with the default or specified path and/or name) with the ``write_figure`` method: .. code-block:: python monitor.write_figure() Finding Outliers ---------------- If part of the monitor is to locate outliers, then the ``find_outliers`` method must be implemented. This method should return a *mask* array that can be used with the ``data`` attribute of the Monitor if the user intends to use the basic plotting functionality, but otherwise can return whatever is needed. Outliers will be accessible via the ``outliers`` attribute of the monitor. When using the basic plotting functionality, outliers will automatically be plotted in red, but for more advanced plotting that requires that the ``plot`` method be overridden, the user will have to determine how to visualize any outliers. For example, if we add a ``find_outliers`` implementation to ``MyMonitor``: .. code-block:: python def find_outliers(self): return self.data.col1 > 1 # Returns a pandas Series mask After the analysis has been run, you can access the outlying data with: .. code-block:: python monitor = MyMonitor() monitor.monitor() outliers = monitor.data[monitor.outliers]