Data Gaps and Quality Flags#
Automated testing has found an error in this Notebook. The authors have been notified and are working on the issue; in the meantime, please use this as a reference only.
By the end of this tutorial, you will:
Have a working knowledge of Kepler quality flags and be able to access them in light curve and TPF data.
Be able to identify the cause of various types of gaps in the data.
Understand the most common reasons for individual cadence exclusions.
This notebook is the first part of a series on identifying instrumental and systematic sources of noise in Kepler and K2 data. In this tutorial, we will look at practical examples of data gaps and single-cadence quality issues in Kepler data, and learn to identify their causes. Assumed knowledge for this tutorial is a good familiarity with light curve and target pixel file (TPF) data products, and the ability to work with their metadata.
import lightkurve as lk import numpy as np import matplotlib.pyplot as plt %matplotlib inline
The Kepler space telescope observed the same patch of sky for four continuous years, between 2009 and 2013. Over the course of 18 observing quarters, it collected light curves and pixel data for 150,000 stars in 30-minute Long Cadence mode, and 512 stars per quarter in one-minute Short Cadence mode. Following the failure of two of the telescope’s four reaction wheels, the telescope continued as the K2 mission, which observed along the ecliptic plane for 20 campaigns.
In this tutorial, we’ll explore some quality issues that arose during both the Kepler and K2 missions, and learn how to identify and mitigate them. It’s important to note that many of these issues only appear in the calibrated pixels or simple aperture photometry (SAP) data, and more still are removed by a quality masking process which we’ll discuss in the next section. If you’re working with presearch data conditioning SAP (PDCSAP) light curves, you won’t run into many of these issues.
1.1 Quality flags#
Before we look at some practical examples in time series data, let’s familiarize ourselves with how data gaps and single-cadence quality events are identified in Kepler data files. Every Kepler FITS file has a
QUALITY column, which contains a quality flag for each individual data cadence. These flags comprise of one or more binary bits, which are expressed as an integer. A handy reference document for Kepler’s quality flags is Table 2-3 in the MAST Kepler Archive Manual.
Let’s start by downloading the data we’ll work with throughout this tutorial:
lc = lk.search_lightcurve('KIC 2436365', mission='kepler', author='kepler', quarter=2).download()
By default, Lightkurve downloads quality-masked data. This means that a variety of cadences with a non-zero quality flag will already be removed from the light curves or TPFs you download using the instruction above.
For this tutorial, we are also going to download some TPF data with no quality mask applied. We can do this by passing the optional
quality_bitmask=0 argument to the
download() method as follows:
tpf = lk.search_targetpixelfile('KIC 2436365', quarter=2).download(quality_bitmask=0)
quality_bitmask=0 has the effect of including every cadence in the data, even those with serious quality issues or NaN (not a number) values in the flux. This is not necessarily recommended when using Lightkurve, but here it will allow us to explore a wide variety of data quality issues, many of which will be useful to know about if you’re working directly with FITS files from MAST.
Now that we have our data, let’s have a look at the range of quality flags present. Remember, this is the unmasked data, so every flagged cadence is included.
Here, we’re using the NumPy function
unique(), which takes an array as its input and returns every unique value:
You’ll notice that some of these appear to be integers that correspond to bits — that is, powers of two — but others are additive. This indicates that multiple quality issues are present in a particular cadence.
We can use Lightkurve’s
decode() function for accessing the information stored in each quality flag:
for flag in np.unique(tpf.quality): print(flag, lk.KeplerQualityFlags.decode(flag))
We can use Python’s “bitwise and” operator (&) to select the cadence numbers affected by a specific quality flag as follows:
tpf.cadenceno[(tpf.quality & 64) > 0] # cadence numbers flagged for "Argabrightening" (flag 64)
In the following sections, we’ll walk through the majority of these data quality events, and look at practical examples in the light curve and TPF we downloaded above.
Some of these flags are not covered in this tutorial, such as rolling bands. For more information on other data quality issues, see other tutorials in this series, as well as the various Kepler Data Handbooks.
2. Common Data Gaps#
The nominal Kepler mission observed one area continuously between 2009 and 2013 — but of course, there were various necessary breaks in that continuity. For example, the telescope rotated at the end of each quarter, which necessitated a break in data collection.
We can use quality flags to identify these various events, but often it’s more convenient to check the Kepler Mission Timeline, from the Kepler Data Characteristics Handbook:
Throughout this section we’ll explore the most common reasons for gaps in Kepler data: monthly data downlinks, safe modes, and coarse pointing/loss of fine pointing. Though we’re looking at Kepler data here, these data gaps can also be found in K2 mission data.
2.2 Safe modes#
Safe modes (Kepler Data Characteristics Handbook, Section 4.2) are another type of thermal transient that appears in Kepler data. A safe mode occured when the telescope temporarily shut off operation due to an unexpected event, usually caused by an issue with the detector electronics.
There were eleven safe mode events throughout the Kepler mission, and three during K2, in Campaigns 0, 9, and 12. The following code zooms in on the Quarter 2 safe mode, in the same data we’ve used above; note the similarity in appearance between the two thermal transient events.
ax = lc.plot(column='sap_flux') ax.set_xlim(170, 200) ax.set_ylim(5450, 5800) ax.fill_betweenx(ax.get_ylim(), 181.5, 183.8, facecolor='r', alpha=0.3);
2.3 Coarse pointing and loss of fine pointing#
Running the code below, you’ll see two highlighted regions where there is no thermal transient, but still a gap in the data. Often, gaps like this are caused by a loss of fine pointing (Kepler Data Characteristics Handbook, Section 4.3). Because there is lower photometric precision when this occurs, these cadences are not suggested for use in photometry, and are replaced with NaNs.
QUALITY column, there are two flags for this situation: a 4 (bit 3) for coarse point, and 32768 (bit 16) for a loss of fine point. Coarse point is a manual flag, based on preprocessing, and coarse point cadences are removed based on an expected loss of fine pointing. Cadences marked as “no fine point” are due to unexpected events; in practice, these two quality issues manifest in the same manner as gapped data.
ax = lc.plot(column='sap_flux') ax.set_xlim(215,260) ax.set_ylim(ymax=5550) ax.fill_betweenx(ax.get_ylim(), 223.5, 224, facecolor='r', alpha=0.3) ax.fill_betweenx(ax.get_ylim(), 255.3, 256.4, facecolor='r', alpha=0.3);
Let’s also look at a TPF cadence affected by a loss of fine pointing:
print(tpf.flux[np.argwhere((tpf.quality & 32804) > 0)]) tpf.plot(cadenceno=tpf.cadenceno[np.argwhere(tpf.quality == 32804)]);
As you can see, there’s no data available for this cadence at the TPF level. The vast majority of data affected by a loss of fine pointing/coarse pointing during the Kepler and K2 missions is “gapped” like this. When performing photometry or light curve corrections, it’s important to pay attention to the data on either side of these gaps and make sure you don’t overcorrect it.
3. Single-Cadence Events#
We’ve now seen some situations in Kepler data where you’ll observe a gap in the data — or NaN flux — and what the time series looks like on either side of those events. But there are many quality issues which can have an impact on one or more cadences at a time, which aren’t necessarily as evident as data gaps. In the following sections, we’ll look at some of the causes behind these single-cadence quality issues, where you’ll find them, and how to mitigate them in a practical context.
3.1 Cosmic rays#
Cosmic rays on the detector (Kepler Instrument Handbook, Section 4.16) are an unavoidable source of quality issues in all space based data. Depending on the severity of the event, an incidence of a cosmic ray can lead to sudden pixel sensitivity dropouts (SPSD) — covered in another tutorial in this series — or even long-term damage. Here, we’ll only look at the short-term events.
The cell below plots a section of our light curve for KIC 2436365 from above, showing a spike caused by a cosmic ray hitting the detector. Note that this is PDCSAP flux: most cosmic rays are removed by presearch data conditioning (PDC), but those that aren’t can be caught by outlier clipping or, in this case, may not be a large enough spike to cause problems.
ax = lc.plot() ax.set_xlim(243, 257) ax.set_ylim(ymax=6540) ax.fill_betweenx(ax.get_ylim(), 250, 250.8, facecolor='r', alpha=0.3)
Argabrightening (Kepler Data Characteristics Handbook, Section 5.8) is a single-cadence quality issue that looks similar to a cosmic ray on the detector. In fact, Argabrightening is thought to be caused by debris hitting the instrument, causing a brief increase in flux. This is distinct from the electronic event caused by a cosmic ray corrupting the pixel readout: Argabrightening is the result of physical illumination.
KeplerQualityFlags.decode() function in Section 1.1, we saw that a quality flag of 4160 indicates an Argabrightening event on the charge-coupled device (CCD) and in the optimal aperture used for photometry. Let’s see where that shows up in our data:
tpf.time.value[(tpf.quality & 4160) > 0]
Argabrightening events are removed by the Kepler pipeline for both SAP and PDCSAP data. To see this Argabrightening event, let’s create our own light curve from the TPF, using custom aperture photometry. We can confirm that the spike in the data is right where we expect it:
ax = tpf.to_lightcurve().plot() ax.set_xlim(235,245) ax.set_ylim(5250,5600) ax.fill_betweenx(ax.get_ylim(), 240.1, 240.6, facecolor='r', alpha=0.3);
3.3 Attitude tweaks#
Kepler’s orientation, or attitude, was adjusted every few days during Quarters 0, 1, and 2 of the nominal mission. Specifically, attitude tweaks ensured that no star would ever move more than 1/100th of a pixel from its expected location in each cadence (Kepler Data Characteristics Handbook, Section 4.4). From Quarter 3 onwards, changes to the telescope’s Fine Guidance Sensor (FGS) system (see the Kepler Instrument Handbook, particularly sections 2.1 and 2.5.1) led to reduced drift, which meant that attitude tweaks were no longer necessary.
Because the drift distances in Kepler’s first three quarters were so small, they’re hard to detect without checking the quality flags. Additionally, the discontinuities were for the most part corrected by the data processing pipeline, with only a few remaining:
ax = lc.plot(column='sap_flux') ax.set_xlim(241, 251) ax.set_ylim(5300, 5400) ax.fill_betweenx(ax.get_ylim(), 246, 246.5, facecolor='r', alpha=0.3);
3.4 Reaction wheel events#
Kepler’s attitude was controlled initially by four reaction wheels. Bits 5 and 6 are allocated to two data quality events caused by the reaction wheels: zero crossings and momentum desaturation.
Zero crossings (bit 5/integer 16) occur when the reaction wheels have zero angular velocity (Kepler Data Characteristics Handbook, Section 5.4). This caused the telescope’s point to degrade for a few minutes at a time. Because of this short timescale, reaction wheel zero crossings are mostly an issue in Short Cadence data, where they manifest as negative spikes in the flux data. Zero crossing events became less prominent after the failure of one of Kepler’s reaction wheels in Quarter 14, due to an increase in speed of the remaining reaction wheels. There were no reaction wheel zero crossings in the K2 mission.
Momentum desaturation (bit 6/integer 32) was a consequence of a build up of torque on the reaction wheels (Kepler Data Characteristics Handbook, Section 5.3). Desaturation events occurred every 146 Long Cadences during the Kepler mission, leading to coarse pointing mode (as above) and NaN values in the light curves for one Long Cadence or several Short Cadences at a time.
3.5 Manual exclusions#
For various reasons, some cadences were manually excluded during the data processing stage. In general, these cadences were those on either side of gaps and discontinuities, but sometimes manual exclusions were used to cover specific events that didn’t fall under any other category. For example, during Quarter 12, a series of three coronal mass ejections from the Sun led to multiple cadences being manually excluded (Kepler Data Release 25 Notes, Section 12.2). Further solar flares and coronal mass ejections led to manual exclusions in Campaign 15 of K2 (K2 Data Release 22 Notes, Section 2.2). On both occasions, these incidences led to increased noise and reduced accuracy in pointing.
The code below plots the SAP light curve for Quarter 12 of KIC 8805616 with no quality-flagged data excluded. Here, you can clearly see the data quality issues caused by the coronal mass ejections, which are highlighted:
lc_12 = lk.search_lightcurve('KIC 8805616', quarter=12).download(quality_bitmask=0) ax = lc_12.plot(column='sap_flux') ax.set_ylim(47200,48200) ax.fill_betweenx(ax.get_ylim(), 1116.2, 1118.7, facecolor='r', alpha=0.3) ax.fill_betweenx(ax.get_ylim(), 1121.1, 1122.3, facecolor='r', alpha=0.3) ax.fill_betweenx(ax.get_ylim(), 1160.2, 1164.0, facecolor='r', alpha=0.3)
The K2 mission relied on only two of Kepler’s reaction wheels, meaning it required an additional thruster firing every six hours to maintain pointing. This led to a six-hour drift in K2 data; while the drift is corrected in Long Cadence data by PDC, data taken during the thruster firings is treated the same way as gaps or single-cadence quality issues in Kepler data. There are two quality bits allocated to thruster firings: bit 21 for a thruster firing, and bit 20 for a possible thruster firing.
To get an idea of what the six-hour drift looks like, let’s start by downloading some K2 data.
lc_k2 = lk.search_lightcurve('EPIC 211414081', cadence='long', campaign=5).download(quality_bitmask=0)
quality_bitmask=0, the data is gapped at all cadences flagged with bit 20 or 21. You can test this yourself by running the following code, and noting that there are no
time values provided for any cadences with these quality flags:
for val in lc_k2.quality.value: print(lc_k2.time[np.argwhere((val & 20) > 0 or (val & 21) > 0)])
Now let’s look at the SAP light curve for this star, and see what the six-hour drift looks like in practice:
Fortunately, this systematic is well-represented by the cotrending basis vectors (CBVs) used in the PDC pipeline:
The K2 Handbook cautions that these effects can never be fully corrected — you can see this particularly to the left of the plot above. But overall, PDC successfully removes the six-hour drift from K2 Long Cadence data, providing high-quality time series data for all targets.
In the next tutorial in this series, we’ll revisit EPIC 211414081, and look at how the thruster firings lead to persistent systematics in Short Cadence data.
About this Notebook#
Author: Isabel Colman (
Updated on: 2020-09-29
Citing Lightkurve and Astropy#
If you use
astropy for published research, please cite the authors. Click the buttons below to copy BibTeX entries to your clipboard.