Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

57 documentation update #62

Merged
merged 12 commits into from
Oct 18, 2023
Prev Previous commit
Next Next commit
Add new tutorials documentation
  • Loading branch information
ecasellas committed Oct 18, 2023
commit f3242249ca0cd8729cd56d2b1ba6d7f45e7b56cf
8 changes: 0 additions & 8 deletions TODO

This file was deleted.

285 changes: 285 additions & 0 deletions docs/source/01_howto_prepare_data.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
01. Preparing Weather Station Data for PyMica
=============================================

In this tutorial, we’ll cover the preparation of weather station data
for use in PyMica.

The data format for weather station data used by PyMica is a list
containing a dictionary for each weather station, including at least the
following variables:

- ``id``: Identification code.
- ``lon``: Longitude coordinate.
- ``lat``: Latitude coordinate.
- ``value``: Observation value.

It can also contain other keys referring to the variables used in
interpolation, such as altitude or distance to the coast. Altitude must
be named ‘altitude’; the names of other explanatory variables do not
need to be specific in PyMica.

An element of the list containing these variables is organized as
follows for each weather station:

::

{
"id": "id_code",
"lon": "longitude coordinate value",
"lat": "latitude coordinate value",
"value": "value",
"altitude": "altitude value"
}

The weather station data is supplied to
:py:meth:`pymica.pymica.PyMica.interpolate()` as a list of dictionaries, one
for each station.

As an example, we’ll work with data from the Automatic Weather Station
Network (XEMA) of the Meteorological Service of Catalonia
(`XEMA <https://www.meteo.cat/observacions/xema>`__). However, you can
also provide your own data to PyMica.

First, let’s import the required library.

.. code:: python

import pandas as pd

Now, let’s suppose that our data is in a .csv format. In the
``sample-data/data`` directory, we’ll find data from the XEMA network
for 2017/02/21 12:00 UTC and its corresponding metadata.

We’ll open both .csv files, ``XEMA_20170221_1200.csv`` and
``XEMA_metadata.csv``, using the pandas library and present the head of
data file.

.. code:: python

file_data = 'sample-data/data/XEMA_20170221_1200.csv'
file_metadata = 'sample-data/data/XEMA_metadata.csv'

station_data = pd.read_csv(file_data)
metadata = pd.read_csv(file_metadata)

station_data.head()




.. raw:: html

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>key</th>
<th>altitude</th>
<th>dist</th>
<th>hr</th>
<th>lat</th>
<th>lon</th>
<th>temp</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>C6</td>
<td>264.0</td>
<td>0.858731</td>
<td>80.0</td>
<td>41.65660</td>
<td>0.95172</td>
<td>8.8</td>
</tr>
<tr>
<th>1</th>
<td>C7</td>
<td>427.0</td>
<td>0.839116</td>
<td>86.0</td>
<td>41.66695</td>
<td>1.16234</td>
<td>7.1</td>
</tr>
<tr>
<th>2</th>
<td>C8</td>
<td>554.0</td>
<td>0.825381</td>
<td>76.0</td>
<td>41.67555</td>
<td>1.29609</td>
<td>9.3</td>
</tr>
<tr>
<th>3</th>
<td>C9</td>
<td>240.0</td>
<td>0.448604</td>
<td>47.0</td>
<td>40.71825</td>
<td>0.39988</td>
<td>15.7</td>
</tr>
<tr>
<th>4</th>
<td>CC</td>
<td>626.0</td>
<td>0.849968</td>
<td>47.0</td>
<td>42.07398</td>
<td>2.20862</td>
<td>15.2</td>
</tr>
</tbody>
</table>
</div>



And we also present the head of metedata.

.. code:: python

metadata.head()




.. raw:: html

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>key</th>
<th>altitude</th>
<th>dist</th>
<th>lat</th>
<th>lon</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>C6</td>
<td>264.0</td>
<td>0.858731</td>
<td>41.65660</td>
<td>0.95172</td>
<td>Castellnou de Seana</td>
</tr>
<tr>
<th>1</th>
<td>C7</td>
<td>427.0</td>
<td>0.839116</td>
<td>41.66695</td>
<td>1.16234</td>
<td>Tàrrega</td>
</tr>
<tr>
<th>2</th>
<td>C8</td>
<td>554.0</td>
<td>0.825381</td>
<td>41.67555</td>
<td>1.29609</td>
<td>Cervera</td>
</tr>
<tr>
<th>3</th>
<td>C9</td>
<td>240.0</td>
<td>0.448604</td>
<td>40.71825</td>
<td>0.39988</td>
<td>Mas de Barberans</td>
</tr>
<tr>
<th>4</th>
<td>CC</td>
<td>626.0</td>
<td>0.849968</td>
<td>42.07398</td>
<td>2.20862</td>
<td>Orís</td>
</tr>
</tbody>
</table>
</div>



Now, let’s prepare the data in the format required by PyMICA, selecting
the air temperature variable (``temp``) and using ``altitude`` and
``dist`` as predictor variables. The variable ``dist`` refers to the
distance from a station to the coastline to account for proximity to sea
influence.

.. code:: python

data = []
for key in station_data['key']:
df_data = station_data[station_data['key'] == key]
df_meta = metadata[metadata['key'] == key]
data.append(
{
'id': key,
'lon': float(df_meta['lon'].iloc[0]),
'lat': float(df_meta['lat'].iloc[0]),
'value': float(df_data['temp'].iloc[0]),
'altitude': float(df_meta['altitude'].iloc[0]),
'dist': float(df_meta['dist'].iloc[0])
}
)

If we print the first element of ``data``, we can see all the required
variables for a station, which include identification code, longitude,
latitude, temperature value, altitude, and distance to the coastline.

.. code:: python

print('Sample data: ', data[0])
print('Number of points: ', len(data))


.. parsed-literal::

Sample data: {'id': 'C6', 'lon': 0.95172, 'lat': 41.6566, 'value': 8.8, 'altitude': 264.0, 'dist': 0.8587308027349195}
Number of points: 180


We have now completed this tutorial on how to prepare raw observation
station data to be ready to feed the PyMICA class.
103 changes: 103 additions & 0 deletions docs/source/02_howto_predictor_variables.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
02. Preparing predictor variable fields
=======================================

In this tutorial, we’ll cover the preparation of the distance to the
coastline field using the parameters from another typical predictor
variable for air temperatures, altitude. The altitude predictor variable
is obtained using a Digital Elevation Model (DEM).

In PyMica, predictor variable fields used for interpolation must have
the same extension, spatial resolution, and projection. Therefore, the
DEM will be used as a reference to build the others.

Distance to Coastline
~~~~~~~~~~~~~~~~~~~~~

PyMica provides a utility to build a distance to the coastline field
using a DEM and a coastline GeoJSON file. Let’s now import the necessary
modules.

.. code:: python

from osgeo import gdal

from pymica.utils.distance_to_coastline import get_dist_array

For the ``get_dist_array`` function, we need four parameters:
projection, geotransform, size, and a coastline file. We’ll obtain the
first three from the DEM, and the coastline file will be sourced from
the explanatory folder.

.. code:: python

dem_file = 'sample-data/explanatory/cat_dem_25831.tif'
dem = gdal.Open(dem_file)

projection = 25831
geotransform = dem.GetGeoTransform()
size = [dem.RasterXSize, dem.RasterYSize]
coast_line = 'sample-data/explanatory/cat_coast_line.json'

Now, let’s check the values of each parameter.

.. code:: python

print('Geotransform: ', geotransform)
print('Size : ', size)


.. parsed-literal::

Geotransform: (260000.0, 270.0, 0.0, 4750000.0, 0.0, -270.0)
Size : [1000, 970]


Once all the parameters are set, we can call the ``get_dist_array``
function. First we’ll import it from
:py:mod:`pymica.utils.distance_to_coastline` and then apply it the previously
defined parameters.

.. code:: python

dcoast_field = get_dist_array(proj=projection, geotransform=geotransform, size=size, dist_file=coast_line)


.. parsed-literal::

Progress: 0.1%Progress: 100%


Now, we can get a quick look of the ``dcoast_field`` array using
``matplotlib``.

.. code:: python

import matplotlib.pyplot as plt

plt.imshow(dcoast_field)
plt.colorbar()
plt.show()



.. image:: _static/02_howto_predictor_variables_9_0.png


The coastline of Catalonia can be clearly identified. Values close to it
are low and exponentially grow as the distance increases from the coast.

Given that the distance to the coastline may be used as a predictor
variable, it would be interesting to save it in a raster file for future
use in multiple linear regression interpolations.

Then, let’s use the :py:meth:`pymica.utils.geotools.save_array_as_geotiff()` to
save the ``dcoast_field`` into a GeoTIFF file.

.. code:: python

from pymica.utils.geotools import save_array_as_geotiff

save_array_as_geotiff("sample-data/results/dcoast_example.tif", dcoast_field, geotransform, projection)

We have now completed this tutorial on how to prepare predictor variable
fields to use in the PyMICA class.
Loading