Welcome to isithot documentation!¶
Installation¶
via https
pip install git+https://github.com/RUBclim/isithot
via ssh
pip install git+ssh://git@github.com/RUBclim/isithot
Quick start¶
An initial app can be create quite simple by adding a single data provider.
Adding data providers¶
Add a new
isithot.DataProviderinstance. TheDataProvider.get_current_data()andDataProvider.get_daily_data()methods need to be implemented.Create a
isithot.ColumnMappinginstance which maps the columns of your data source to the columns the package expects.Create a dictionary of
isithot.DataProviderwhere the key must match theid.register the data providers with the current app
app.config['DATA_PROVIDERS'] = data_providers.
from datetime import date
import pandas as pd
from flask import Flask
from isithot import ColumnMapping
from isithot import create_app
from isithot import DataProvider
from isithot.config import Config
class TestProvider(DataProvider):
def get_current_data(self, d: date) -> pd.DataFrame:
df = pd.DataFrame({
'date': [pd.Timestamp(d)],
'temp_max': [30.0],
'temp_min': [20.0],
})
return df.set_index('date')
def get_daily_data(self, d: date) -> pd.DataFrame:
x = pd.read_csv(
'testing/monthly_input_data/lmss_daily_long.csv',
parse_dates=['date'],
index_col='date',
)
x['doy'] = x.index.dayofyear
return x
def my_app() -> Flask:
col_map = ColumnMapping(
datetime='date',
temp_mean='temp_mean_mannheim',
temp_max='temp_max',
temp_min='temp_min',
day_of_year='doy',
)
data_providers = {
'test': TestProvider(
col_mapping=col_map,
name='Test',
id='test',
min_year=2010,
),
}
app = create_app(Config)
app.config['DATA_PROVIDERS'] = data_providers
return app
if __name__ == '__main__':
app = my_app()
app.run(debug=True)
implementing caching¶
The isithot app comes with caches that can be added to a function. E.g. the daily data
will likely not changes very often, hence we can cache it for e.g. one hour.
from isithot.cache import cache
class TestProvider(DataProvider):
@cache.cached(timeout=60*60, key_prefix='daily_data')
def get_daily_data(self, d: date) -> pd.DataFrame:
...
more complex data retrieval¶
An example for a more complex example can be found in
testing/example_app.py
which uses database queries. All implementations need to consider performance since this
is executed during handling of the http request.
Another option for data retrieval is the server performing and API request e.g.
def get_current_data(self, d: date) -> DataFrame:
"""
fetch the latest weather data from the DWD. ``self.id`` corresponds to the
station ID by DWD which is set during DataProvider creation.
"""
ret = urllib.request.urlopen(
f'https://dwd.api.proxy.bund.dev/v30/stationOverviewExtended?stationIds={self.id}',
timeout=3,
)
data = current_app.json.loads(ret.read())
temp_min = data[self.id]['days'][0]['temperatureMin'] / 10
temp_max = data[self.id]['days'][0]['temperatureMax'] / 10
date = datetime.strptime(
data[self.id]['days'][0]['dayDate'], '%Y-%m-%d',
)
return pd.DataFrame(
{
self.col_mapping.temp_min: temp_min,
self.col_mapping.temp_max: temp_max,
},
index=pd.DatetimeIndex([date], name=self.col_mapping.datetime),
)
API-Documentation¶
i18n¶
This web-app uses internationalization (i18n) to also have this page available in
german, since the audience will mostly be german. This is setup via Babel and all
english text (both, in .py and .html files) is wrapped in _(...) a function. This
can be extracted automatically via:
pybabel extract -F babel.cfg -o isithot/translations/messages.pot .
This will generate a messages.pot file which is the basis for all translations. Based
on this a translation can be initialized with this command. In this case this is for
German (de).
pybabel init -i isithot/translations/messages.pot -d isithot/translations/ -l de
This will now create a subfolder for the specific language (in this case de for
German). The messages.pot can now be used to translate all messages.
Finally, the languages have to be compiled into a messages.mo file. This needs to be
done manually for testing. It is done automatically for production while building the
docker image.
pybabel compile -d isithot/translations
Important
If there are changes made to any of the strings (in the .py or .html file
that are wrapped in a _(...) function) the .pot file needs to be updated
using these commands:
pybabel extract -F babel.cfg -o isithot/translations/messages.pot .
pybabel update -i isithot/translations/messages.pot -d isithot/translations
app¶
blueprints¶
- isithot.blueprints.isithot.get_locale()[source]¶
utility for getting the lang from the
Language-Acceptheader
- isithot.blueprints.isithot.index()[source]¶
A simple route to have nicer link to share.
- Return type:
- isithot.blueprints.isithot.last_years_calendar(station, year)[source]¶
Returns the calendar figure data as
jsonfor the specified year.This route is cached indefinitely and does not take the locale into account, since it’s only static data.
- isithot.blueprints.isithot.plots(station)[source]¶
Renders the isithot page with all plots.
This route is cached since compiling the data and generating the plots is quite expensive. The cache expires after 5 minutes hence it is still almost live data.
- class isithot.blueprints.plots.ColumnMapping(datetime: str, temp_mean: str, temp_max: str, temp_min: str, day_of_year: str)[source]¶
Class for defining the columns mapping the different parameters needed
- Parameters:
datetime – the column name of the column that stores the date (and maybe time) information
temp_mean – the column name of the column that stores the average air-temperature information
temp_max – the column name of the column that stores the maximum air-temperature information
temp_min – the column name of the column that stores the minimum air-temperature information
day_of_year – the column name of the column that stores the day of year number
- class isithot.blueprints.plots.DataProvider(col_mapping, name, id, min_year)[source]¶
Base Class for defining a custom data provider.
get_daily_data()andget_current_data()need to be overridden.- Parameters:
col_mapping (
ColumnMapping) – aColumnMapping()mapping the column names returned byget_daily_data()orget_current_data()to variables so they can be used latername (
str) – the name of the station that is displayed on the websiteid (
str) – the ID of the station that is used for compiling links. If multiple DataProviders are used, each one must have a uniquestation_id.min_year (
int) – the minimum year for which data is available. This is used to determine the first year for which a calendar plot is created.
- calendar_fig(calendar_data)[source]¶
Creates a figures representing a calendar plot of the current year indicating the percentile of each day as a color and a number.
- Parameters:
calendar_data (
DataFrame) – apd.DataFrame()containing all data necessary for creating the plot- Return type:
Figure- Returns:
a
Figure()object that can be used as ajsonon the page, defining the plot including all data
- distrib_fig(fig_data)[source]¶
Creates a figures representing the distribution with 5% and 95% percentile and the trends for the time of year and the overall warming trend.
- Parameters:
fig_data (
PlotData) – aPlotData()object containing all data necessary for creating the plot- Return type:
Figure- Returns:
a
Figure()object that can be used as ajsonon the page, defining the plot including all data
- get_current_data(d)[source]¶
This needs to be implemented and most likely be a database query or a file that is read. It might makes sense to cache this function.
dmay be used as a cache-key.This should return a
pd.DataFrame()with columns containing:date (as a datetime object)
maximum temperature
minimum temperature
The index must be a
pd.DatetimeIndex()The column names must match those defined viacol_mapping
- get_daily_data(d)[source]¶
This needs to be implemented and most likely be a database query or a file that is read. It might makes sense to cache this function.
dmay be used as a cache-key.This should return a
pd.DataFrame()with columns containing:date a datetime object
mean temperature
the day of the year
The index must be a
pd.DatetimeIndex()The column names must match those defined viacol_mapping
- hist_fig(fig_data)[source]¶
Creates a figures representing a histogram or more specifically a kernel density estimate. This includes lines for the 5% percentile and 95% percentile as well as the median. A red line for today’s value is added.
- Parameters:
fig_data (
PlotData) – aPlotData()object containing all data necessary for creating the plot- Return type:
Figure- Returns:
a
Figure()object that can be used as ajsonon the page, defining the plot including all data
- prepare_daily_and_calendar_data(d, current_avg=None)[source]¶
This get the daily data from the database and creates the calendar plot data. This is separated from
_prepare_data()so it can be used vialast_years_calendar()- Parameters:
d (
date) – the date for which to prepare data. This will usually be today or in this case the first day of the year to prepare the calendar data forcurrent_avg (
float|None) – This is used to add the current day which has no entry in the daily data just yet. When working with previous years, this should be left asNone(default:None)
- Return type:
- Returns:
a tuple of
pd.DataFrame():(daily, calendar_data)
- prepare_data(d)[source]¶
The purpose of this function is to compile a
isithot.blueprints.plots.PlotData()object which is used for the creation of all plots.- Parameters:
d (
date) – the date for which to prepare data. This will usually be today- Return type:
- Returns:
the data needed for creating the plots and texts all contained in a
isithot.blueprints.plots.PlotData()object
- class isithot.blueprints.plots.PlotData(current_date: date, daily: pd.DataFrame, now: pd.DataFrame, toy_data: pd.DataFrame, trend_overall_data: pd.DataFrame, trend_month_data: pd.DataFrame, calendar_data: pd.DataFrame, trend_overall_slope: float, trend_overall_intercept: float, trend_month_slope: float, trend_month_intercept: float, current_avg: float, current_avg_percentile: float, q5: float, median: float, q95: float)[source]¶
- Parameters:
current_date – The date for which the data is compiled. This is usually today
daily – A pandas dataframe containing all daily data that is available in the database
now – The latest data from the station (high resolution raw data)
toy_data – Data for the current time of year (toy). For this a week before
current_dataand a week aftercurrent_dateis extractedtrend_overall_data – (Yearly) data needed to calculate the overall trend since the start of the measurements
trend_month_data – Data needed for calculating the trend for the current month
calendar_data – Data needed to create a calendar plot for the current year
trend_overall_slope – The slope of the line for the overall warming trend across all years and times of year
trend_overall_intercept – The intercept of the line for the overall warming trend across all years and times of year
trend_month_slope – The slope of the line for the current warming trend across all years for the current time of year \(\pm\) 7 days
trend_month_intercept – The intercept of the line for the current warming trend across all years for the current time of year \(\pm\) 7 days
current_avg – The current average of today calculated from averaging the minimum and maximum temperature
current_avg_percentile – The percentile of
current_avgq5 – the 5% percentile for this time of the year
median – the median/50% percentile for this time of the year
q95 – the 95% percentile for this time of the year