Skip to content

Commit

Permalink
Develop (#46)
Browse files Browse the repository at this point in the history
* Develop branch.

* Feature/python 3 8 update (#25)

* docker-compose works fine and can enter docker container locally. Haven't tried running the model with updated GDAL and Python yet.

* Successfully installs the required Python packages. Haven't tried running test tile yet.

* Runs test tile 00N_000E. Didn't check that outputs were correct but did verify that the output rasters load and have values in ArcMap.

* Feature/add testing and linting (#26)

* Froze all dependencies in `requirements.txt` to their current version.
Also, added testing folder and file but haven't tried using them yet.

* Used pylint on mp_create_carbon_pools.py. Addressed pretty much all the messages I wanted to.

* Continued delinting in create_carbon_pools. Changed all obvious print statements in mp_create_carbon_pools.py and create_carbon_pools.py to fprint. Added docstrings to each function. Testing carbon pool creation for 00N_000E seems to work fine.

* Experimenting with setting some variables as global so that I don't have to pass them as arguments: sensit_type, no_upload, save_intermediates, etc. For saving and modifying variables between files, this page seems to be helpful: https://thewebdev.info/2021/10/19/how-to-use-global-variables-between-files-in-python/#:~:text=To%20use%20global%20variables%20between%20files%20in%20Python%2C%20we%20can,reference%20the%20global%20variable%20directly.&text=We%20import%20the%20settings%20and,Then%20we%20call%20settings.

* Testing global variables with no_upload. I seem to be able to reset the global variable from run_full_model.py, including in the log. Need to make sure this is actually carrying through to uploading.

* Added global variables to constants_and_names.py and top of run_full_model.py.

* Changed run_full_model.py through carbon pool step.

* Changed carbon pool creation to use global variables. Decided to have functions pass carbon_pool_extent because it's a key parameter of carbon pool creation.

* Changed all model stages to use global variables from the command line. Still testing that I didn't break anything in local runs.

* Changed some universal_util.py functions to use the global variables instead of passing arguments to them.

* Starting to change print statements to f'' print statements throughout the model.

* Changed to f print statements for model extent and forest age category steps.

* Changed to f print statements for entire removals model.

* Changed to f print statements for carbon, emissions, and analyses. Haven't changed in universal_util or constants_and_names. Haven't checked if everything is working alright.

* Changed to f print statements for universal_util.py. Didn't change arguments to gdal commands for the most part, though.

* Used pylint on all regular model steps and run_full_model.py. Fixed most message that weren't about importing, too many variables, too many statements, or too many branches. I'll work on those structural issues later.

* Testing 00N_000E locally with linting of run_full_model.py and all model stages through net flux. Going to try running it on an ec2 instance now.

* 00N_000 works in a full local run and 00N_020E works in a full ec2 run. I've linted enough for now.

* Feature/single processing flag (#27)

* Added a command line argument `--single-processor` or `-sp` to run_full_model.py and each model step through net flux that sets whether the tile processing is done with the multiprocessing module or not. This involved adding another if...else statement (or sometimes statements) to each step to have it use the correct processing route. Also changed readme.md to add the new argument.

* Ran 00N_000E locally for all model steps with single and multiprocessing options to make sure both still worked after this reconfiguration. Both worked.
Single processing took (no uploading of outputs): 1 hour 23 minutes
Multi-processing took (no uploading of outputs):  1 hour 11 minutes

* Pairing on Carbon Pools:  2022-09-15 Tests and Refactor (#29)

* ✅ test(Carbon Pools): Mark failing tests with `xfail`

This is handy if we're writing the tests first or we have a large batch of tests failing for some reason and we want to cut down on the error output generated during a test run.

* 🎨 refactor(Carbon Pools): Extract `deadwood_litter_equations`

This refactoring pattern is described here:
https://refactoring.guru/extract-method

* 🎨 style(Carbon Pools): Add proper spacing between functions

* Feature/carbon pool testing (#30)

* Testing not working. Import errors.

* Testing works when I run pytest from usr/local/app/test. Added deadwood and litter pool tests for the simple numpy operations that represent the five categories of domain/elevation/precipitation. The tests are on 1x1 numpy arrays to keep things simple (not on actual tiles). Doing this testing involved refacting the numpy parts of create_deadwood_litter into their own function that inputs and outputs just arrays of any dimension.

* Carbon pool creation still works, even with the deadwood and litter equations factored out. All tests of the different equations work, too.

* Feature/carbon pool testing (#34)

* Testing not working. Import errors.

* Testing works when I run pytest from usr/local/app/test. Added deadwood and litter pool tests for the simple numpy operations that represent the five categories of domain/elevation/precipitation. The tests are on 1x1 numpy arrays to keep things simple (not on actual tiles). Doing this testing involved refacting the numpy parts of create_deadwood_litter into their own function that inputs and outputs just arrays of any dimension.

* Carbon pool creation still works, even with the deadwood and litter equations factored out. All tests of the different equations work, too.

* Pairing/component testing (#33)

* Feature/carbon pool testing (#30)

* Testing not working. Import errors.

* Testing works when I run pytest from usr/local/app/test. Added deadwood and litter pool tests for the simple numpy operations that represent the five categories of domain/elevation/precipitation. The tests are on 1x1 numpy arrays to keep things simple (not on actual tiles). Doing this testing involved refacting the numpy parts of create_deadwood_litter into their own function that inputs and outputs just arrays of any dimension.

* Carbon pool creation still works, even with the deadwood and litter equations factored out. All tests of the different equations work, too.

* ✅ test: Import sample rasters

* ✅ test: First integration test

This test module will show several components and the file system working together. These are not technically unit-tests since we're using the file system.

* ✅ test(pytest): Register custom marker

See the registered markers by running:
$> pytest --markers

Custom markers let us organize our tests and quickly run specific categories.
https://docs.pytest.org/en/7.1.x/example/markers.html#registering-markers

* ✅ test(pytest): Mark entire test module

Now, we can mark the entire module as `integration` tests.
To run tests/modules with our custom mark:
$> pytest -m integration

https://docs.pytest.org/en/7.1.x/example/markers.html#marking-whole-classes-or-modules
https://docs.pytest.org/en/7.1.x/example/markers.html#marking-test-functions-and-selecting-them-for-a-run

* 🎨 refactor: Remove redundant test

Removing this one since we have a separate test module that is marked as `integration`.

* ✅ test(component): Stub out universal_util

Co-authored-by: Gary Tempus Jr <gary.tempus@wri.org>

* Unit tests working. Rasterio test runs from /usr/local/app/test pytest -m integration but doesn't work. Not sure why.

* Changed the output paths in the fake test functions to make this work. Now it's creating deadwood and litter emissions in the correct pixels within the test area.

* Deletes the output test rasters of the previous test in the output test folder before running the test again.

* Output deadwood and litter pools from testing match the output deadwood and litter from the normal model run, for both mangrove and non-mangrove loss. Had to copy in the mangrove deadwood and litter AGB ratio dictionaries output from that specific function to get the mangrove pools to match.

* Made universal_util.py function that creates the 40000x20 test tiles. It successfully creates the mangrove test tile. Need to make it loop through input tiles for deadwood_litter creation.

* Made test tile loop in test_deadwood_litter_rasterio.py. Creates the output test tiles fine but now the deadwood and litter outputs are only correct where mangroves are and incorrect in non-mangrove loss. Don't know what happened.

* Rasterio test for deadwood_litter creation works in mangrove and non-mangrove loss pixels. Also, test function now creates the test tile fragments if they don't already exist and deletes the already created deadwood and litter output test fragments from the previous run.

* Changed other model steps to use uu.sensit_tile_rename_biomass and uu.make_tile_name that I made for deadwood_litter testing.

* Changed uu.make_test_tile to use do gdalwarp on vsis3 rasters, so rasters are directly operated on in s3 rather than downloaded to the Docker container. This is faster. Testing itself has not changed. I confirmed that the test tile fragments created by reading with vsis3 are the same as when gdalwarp was used on local tiles, and that the output deadwood and litter rasters are correct in loss inside and outside mangroves.

* Test tile fragment raster suffix is its own variable now.

* Added fixtures for mangrove deadwood and litter dictionary creation. Added fixture for deleting old outputs. Decided not to make a fixture for input rasters; it didn't seem like it would make things easier to read, but did make the test tile creation function have the for loop inside it rather than outside it. Testing runs and deadwood and litter emissions are correct in loss pixels inside and outside mangroves.

* Creates test fragments from existing versions of output tiles for comparison with test version of outputs.

* Assertion works with np.testing.assert_equal(). Full deadwood tile and deadwood tile fragment pass but comparing deadwood and litter fails.

* Assertion works for deadwood and litter comparison. Tests pass when I compare old and new deadwood or old and new litter; tests fail when I compare deadwood to litter or vice versa. Also makes deadwood and litter difference rasters, which are not used in testing but are for visualizing any differences between the old and new versions of the rasters. Deadwood and litter comparisons are in the same test, which isn't great. But it seems to work alright; if either one fails, the test alerts me. If the first assert fails, I don't get notified about the second one, but that's okay with me for now.

* Putting all test helper functions in conftest.py. So far I've moved the fixtures that were already in test_deadwood_litter_rasterio.py into conftest and everything seems to still be working.

* Tried putting make_test_tiles and assert_make_test_arrays_and_difference in conftest as fixtures but they didn't work as fixtures because fixtures don't handle arguments easily (as far as I can tell). So I tried just leaving those functions in conftest as non-fixture functions but test_deadwood_litter_rasterio.py couldn't find the two functions. So then I tried making a new file called test_helpers.py in test. When I ran the test with that, test_deadwood_litter_rasterio.py said there was no module named test_helpers. So I moved test_helpers.py to the project folder (outside test) and was able to load the functions fine. Testing works with this configuration but I don't like having the test helper functions outside test, and ideally they'd all be inside conftest.py (mixture of fixture and non-fixture functions).

* Parametrized test_deadwood_litter_rasterio.py so that it will separately run both deadwood and litter tests (do separate asserts for them). However, it's not creating the deadwood comparison raster now. I don't know why and I don't know if that was a problem before I switched to paramterization. Going to check old commits. Test otherwise seems to be working.

* Both difference rasters are available now. The problem was that when I parametrized the test, the fixture that deleted existing outputs deleted the difference output from the previous parametrization run, so the litter parametrization was deleting the deadwood difference output. I added a flag to the delete_old_outputs fixture so that it only runs on the first parametrization.

* create_deadwood_dictionary and create_litter_dicionary fixtures also only run on the first paramterization now (along with delete_old_outputs). I don't think I can make create_deadwood_litter run on just the first parametrization because then it'd have to be a fixture but I can't figure out how to pass arguments to a fixture. All four outputs are created now (test deadwood and litter and the deadwood and litter difference rasters). The two remaining improvements I'd like are: 1) non-fixture helper functions are in test_helpers.py (rather than conftest.py) and that test_helpers.py is in the project folder. 2) make_test_tiles and create_deadwood_litter run twice (once for each parametrization) because they are regular functions, not fixtures. They only need to run once per test, not for each parametrization. But because those functions use arguments, I don't see how to make them fixtures.

* create_deadwood_dictionary and create_litter_dicionary fixtures also only run on the first paramterization now (along with delete_old_outputs). I don't think I can make create_deadwood_litter run on just the first parametrization because then it'd have to be a fixture but I can't figure out how to pass arguments to a fixture. All four outputs are created now (test deadwood and litter and the deadwood and litter difference rasters). The two remaining improvements I'd like are: 1) non-fixture helper functions are in test_helpers.py (rather than conftest.py) and that test_helpers.py is in the project folder. 2) make_test_tiles and create_deadwood_litter run twice (once for each parametrization) because they are regular functions, not fixtures. They only need to run once per test, not for each parametrization. But because those functions use arguments, I don't see how to make them fixtures.

* Deadwood and litter comparison tiles are only each created once now, during their respective parametrization of the test.

* Can now run tests from project directory. Had to change docker-compose.yaml for unclear reasons. Tests run correctly and use test_helpers.py. Haven't tried actually running model, though.

* Separated fixtures into two conftest.py files: carbon pool-specific ones in test\carbon_pools\conftest.py and general ones in test\conftest.py. Testing works fine.

* I couldn't figure out how to combine test_utilities.py into conftest.py in the test folder, so they're separate. Whenever I tried accessing the non-fixture functions in test_utilities.py, I got errors about not being able to find the functions. So for now I have test_utilities.py for the non-fixture functions that apply to all tests, conftest.py in test for fixtures that would apply across all test modules, and conftest.py in test/carbon_pools for the fixtures that apply just to carbon pool tests. I tested the test module and it seems fine. I also ran mp_create_carbon_pools.py and the beginning of run_full_model.py to make sure that model ran correctly even with the docker-compose changes. The model seems to run fine, at least locally. Testing seems basically complete for deadwood_litter and has the elements needed for building out tests for other model stages.

* Deleting __init__.py in the project folder and changing docker-compose.yaml back to use app instead of carbon-budget made it so that I can delete most of the sys.path.append() statements during module import. I can also run tests from the project root (as before). I can run individual modules and mp_full_model_run. But to run the model modules or full thing, I have to change how I run it: /usr/local/app# python -m carbon_pools.mp_create_carbon_pools -l 00N_000E -nu -ce loss -t std. That is, I run from /usr/local/app, use the -m flag, and do folder.file if I want to run a specific module of the model. So, /usr/local/app# python -m data_prep.mp_model_extent -t std -l 00N_000E -d 20239999
/usr/local/app# python -m run_full_model -si -t std -s all -r -d 20239999 -l 00N_000E -ce loss -p biomass_soil -tcd 30 -ln "00N_000E test"
/usr/local/app# pytest -m integration -s

* run_full_model.py works for 00N_000E locally (1 hour 9 minutes) with the changed package imports when run as: '/usr/local/app# python -m run_full_model -si -t std -s all -r -d 20239999 -l 00N_000E -ce loss -p biomass_soil -tcd 30 -ln "00N_000E test with imports changed" -nu'. Updated readme to reflect the new way of running (with -m flag) and testing ability. I think that basic testing capabilities are added now.

* Correction to readme

Co-authored-by: Gary Tempus Jr <gary.tempus@wri.org>

* Feature/tree cover loss from fires (#35)

* Split prep_other_inputs.py between one for inputs that need to be preprocessed each year (drivers and tree cover loss from fires (TCLF)) and one for inputs that are static and don't need to be preprocessed each year. Added tree cover loss from fires tile creation to mp_prep_other_inputs_annual.py. It seems to work locally on test tile 00N_000E. Will try simplifying the code a bit, then I can do a full tile run on s3.

* Simplified the TCLF pre-processing before gdal_warp a bit (don't need to rename files anymore). Checking tiles for data seems to work with the light function (for local checking).

* Changed uu.create_combined_tile_list() to take a list of input s3 paths rather than each s3 folder being its own input. This means that I can now specify a list of s3 folders of any length and it will make a consolidated tile id list. Tested that function specifically on mp_prep_other_inputs_annual.py and mp_model_extent.py and it seemed to work for both. Haven't actually tried it in a full model run yet.

* Changed emissions module to use tree cover loss from fires and ran locally; it seems to run correctly.

* uu.create_combined_tile_list() fix.

* Fixing TCLF download from data-lake.

* Increased # processors for TCLF.

* Changing processor count again.

* I experimented with changing the datatype for the emissions node code output from float32 to UInt16 and UInt8 (https://gdal.org/api/raster_c_api.html#_CPPv412GDALDataType for GDAL and https://www.tutorialspoint.com/cplusplus/cpp_data_types.htm for C++).
Float32: runtime = 9:33.5; compressed size = 8661 KB
UInt8: runtime = 9:47.5; compressed size = 5560 KB
Run times were very similar but the UInt8 test tile is smaller, so I'm going to make the node codes UInt8.
Output emissions all gases and node code rasters for 00N_000E seem correct with the TCLF and UInt16 when run locally. I'm going to try a global run of emissions to see how it goes.

* Ready for global run of emissions using TCLF and UInt16 node codes. Also, changed everything in constants_and_names.py to f formatting.

* Changed global warming potential (GWP) factors for methane and nitrous oxide to use AR6 WG1. Ran locally on 00N_000E and it seems fine. (#36)

* Feature/add soil only emis step (#37)

* Added handling of "warning: ignoring return value of 'CPLErr GDALRasterBand::RasterIO(GDALRWFlag, int, int, int, int, void*, int, int, GDALDataType, GSpacing, GSpacing, GDALRasterIOExtraArg*)', declared with attribute warn_unused_result [-Wunused-result]" during soil_only emissions C++ compiling.
Working on making run_full_model.py run a separate step for soil_only emissions. But distracted now because Dockerfile doesn't seem to actually compile C++ emissions, even though the log says it is.

* run_full_model.py now automatically creates biomass_soil and soil_only emissions tiles in sequential steps. No way to make run_full_model.py do only one or the other anymore (for simplicity). However, mp_calculate_gross_emissions.py can still run just biomass_soil or soil_only, primarily for testing purposes.
Also, shiftag_flu and urb_flu in emissions C++ are defined in constants.h now, not in the gross emissions decision trees.

* soil_only emissions node code tiles are int rather than float now. More generally, biomass_soil and soil_only emissions steps seem incorporated into run_full_model.py, with a few other fixes along the way to emissions.

* Updated data_import.bat with v1.2.2 (2001-2021). I forgot to do this during the 2021 annual update.

* Updated documentationt. Also, now C++ is compiled for the relevant version of emissions each time mp_calculate_gross_emissions.py is run.

* Feature/combine aggreg and supplem out steps (#38)

* Trying to consolidate mp_aggregate_results_to_4_km and mp_create_supplementary_outputs into a single step. There's unnecessary overlap between them. Making a new mp_supplementary_outputs.py to do this.

* Through the rewindow stage. So far seems to be working.

* Aggregation seems to be working. Want to change some hard-coded numbers now.

* Fixed unit conversion after aggregation.

* Removing some unit hard coding.

* Removing some unit hard coding.

* Still some aggregation issues.

* Added basic profiling to deadwood_litter creation Pytest

* Aggregation of emissions seems to mostly be working. Need to do more checks and try on removals and net flux.

* Testing mp_derivative_outputs.py on on emissions, removals, and net flux.

* mp_derivative_outputs.py seems to work correctly as its own model stage. Trying it from run_full_model.py now. Updated documentation in mp_derivative_outputs.py and run_full_model.py.

* mp_derivative_outputs.py works as part of full_model_run.py. Also, had to add checks for empty output rasters to mp_derivative_outputs.py. That seems to work (although tested it on 00N_000E, which has data for the forest extent derivative outputs). As far as I can tell, the supplementary and aggregate outputs steps have been consolidated into a single derivative outputs step.

* Updated comments.

* Feature/bgb from ratio (#39)

* Working on process for rasterizing BGB:AGB.

* Rasterizing BGB:AGB seems to work.

* Going to make full set of BGB:AGB tiles on ec2 now.

* Changing processor count for BGB:AGB tile creation.

* Changing processor count for BGB:AGB tile creation.

* Made BGB:AGB 10x10 deg tiles globally on r5d.24xlarge instance. I actually converted the BGB and AGb NetCDF files into geotifs on my computer outside Docker because that conversion kept failing inside Docker. Then I made the global BGB:AGB geotif inside Docker with gdal_calc. I only made the 10x10 tiles in ec2. Tiles look correct and there seems to be a good number of them.

* Applied BGB:AGB to carbon pool creation and made pytest module for it.

* Modified mp_annual_gain_rate_AGC_BGC_all_forest_types.py and mp_annual_gain_rate_IPCC_defaults.py to use the BGB:AGB tiles. Testing both steps now.

* mp_annual_gain_rate_IPCC works. Working on making testing work for mp_annual_gain_rate_AGC_BGC_all_forest_types. However, I realized that the Huang et al. BGC map doesn't have values for every AGC pixel and they both don't cover everywhere that the model does, so I need to fill the gaps in the BGC:AGC raster. Going to work on that now.

* Made global BGB:AGB raster that has gaps filled with gdal_fillnodata locally in Docker. Now I'm going to recreate the 10x10 tiles with the gap-filled global BGB:AGB raster.

* Made global BGB:AGB raster that has gaps filled with gdal_fillnodata locally in Docker. Now I'm going to recreate the 10x10 tiles with the gap-filled global BGB:AGB raster.

* Tested annual_removals_all_forest_types with BGB:AGB map in 00N_000E and it's giving expected annual removals in IPCC forest, mangroves, and Cook-Patton pixels. However, getting errors when running test in 40N_090W. Need to figure out what's going on there.
Also, added some profiling to a few model steps for experimentation. Will clean up later.

* For 40N_090W, AGC changes with using BGB:AGB map because US removal factor map is AGC+BGC, so making the composite AGC tiles from that depends on the BGC ratio. 40N_090W seems to work fine.

* For 40N_020E, AGC changes with using BGB:AGB map because European removal factor tiles are AGC+BGC, so making the composite AGC tiles from that depends on the BGC ratio. 40N_020E seems to work fine.
Ran test module for 00N_000E, 40N_090W, 40N_020E and all seem to work fine. Tested those three tiles because they cover the full range of removal factor sources. For 00N_000E, only BGC and AGC+BGC changed from using BGB:AGB map. For 40N_020E and 40N_090W, all outputs except forest type changed: AGC and BGC changed because it is derived from AGC+BGC input tiles for US, European, and planted forest tiles (so AGC+BGC stayed the same for pixels that use those removal factors), while AGC+BGC changed for pixels that use IPCC and young forest removal factors (while AGC stayed the same). Also, ran full 00N_000E tile to make sure it ran completely. Did cursory output check to make sure the BGB:AGB map was still being used correctly, but main QC was with the test script. Incorporating BGB:AGB into flux model seems complete now (two removals steps and carbon pool generation).

* Feature/bgb from ratio (#40)

* Working on process for rasterizing BGB:AGB.

* Rasterizing BGB:AGB seems to work.

* Going to make full set of BGB:AGB tiles on ec2 now.

* Changing processor count for BGB:AGB tile creation.

* Changing processor count for BGB:AGB tile creation.

* Made BGB:AGB 10x10 deg tiles globally on r5d.24xlarge instance. I actually converted the BGB and AGb NetCDF files into geotifs on my computer outside Docker because that conversion kept failing inside Docker. Then I made the global BGB:AGB geotif inside Docker with gdal_calc. I only made the 10x10 tiles in ec2. Tiles look correct and there seems to be a good number of them.

* Applied BGB:AGB to carbon pool creation and made pytest module for it.

* Modified mp_annual_gain_rate_AGC_BGC_all_forest_types.py and mp_annual_gain_rate_IPCC_defaults.py to use the BGB:AGB tiles. Testing both steps now.

* mp_annual_gain_rate_IPCC works. Working on making testing work for mp_annual_gain_rate_AGC_BGC_all_forest_types. However, I realized that the Huang et al. BGC map doesn't have values for every AGC pixel and they both don't cover everywhere that the model does, so I need to fill the gaps in the BGC:AGC raster. Going to work on that now.

* Made global BGB:AGB raster that has gaps filled with gdal_fillnodata locally in Docker. Now I'm going to recreate the 10x10 tiles with the gap-filled global BGB:AGB raster.

* Made global BGB:AGB raster that has gaps filled with gdal_fillnodata locally in Docker. Now I'm going to recreate the 10x10 tiles with the gap-filled global BGB:AGB raster.

* Tested annual_removals_all_forest_types with BGB:AGB map in 00N_000E and it's giving expected annual removals in IPCC forest, mangroves, and Cook-Patton pixels. However, getting errors when running test in 40N_090W. Need to figure out what's going on there.
Also, added some profiling to a few model steps for experimentation. Will clean up later.

* For 40N_090W, AGC changes with using BGB:AGB map because US removal factor map is AGC+BGC, so making the composite AGC tiles from that depends on the BGC ratio. 40N_090W seems to work fine.

* For 40N_020E, AGC changes with using BGB:AGB map because European removal factor tiles are AGC+BGC, so making the composite AGC tiles from that depends on the BGC ratio. 40N_020E seems to work fine.
Ran test module for 00N_000E, 40N_090W, 40N_020E and all seem to work fine. Tested those three tiles because they cover the full range of removal factor sources. For 00N_000E, only BGC and AGC+BGC changed from using BGB:AGB map. For 40N_020E and 40N_090W, all outputs except forest type changed: AGC and BGC changed because it is derived from AGC+BGC input tiles for US, European, and planted forest tiles (so AGC+BGC stayed the same for pixels that use those removal factors), while AGC+BGC changed for pixels that use IPCC and young forest removal factors (while AGC stayed the same). Also, ran full 00N_000E tile to make sure it ran completely. Did cursory output check to make sure the BGB:AGB map was still being used correctly, but main QC was with the test script. Incorporating BGB:AGB into flux model seems complete now (two removals steps and carbon pool generation).

* Changing carbon pools in 2000 s3 directory.

* Changing carbon pools in 2000 s3 directory.

* Changing carbon pools in 2000 s3 directory.

* Changing carbon pools in 2000 s3 directory.

* Fixing carbon pool generation for year 2000.

* Changing number of processors.

* Changing number of processors.

* Changing number of processors and fixing windows source.

* Changing number of processors and fixing windows source.

* Fixing windows source

* Changing number of processors

* Changing number of processors

* Changing number of processors

* Changing number of processors

* Getting total carbon in 2000

* Created carbon pool in 2000 tiles using the new BGB:AGB maps. It was a doozy to do. Still takes forever!

* Feature/peat update 2023 (#41)

* Turned off memory profiling.

* Peatland processing works for <40N in local test tiles 00N_000E (only Gumbricht), 00N_010E (Gumbricht and Dargie), 00N_110E (Gumbricht and Miettinen). Not working for >40N yet (Xu et al.).

* Peatland processing also works for >40N tiles that use Xu et al.

* Going to try making full peat tile set now. It works in Gumbricht-only, Gumbricht+Dargie, Gumbricht+Miettinen, and Xu areas (00N_000E, 00N_010E, 00N_110E, 50N_050W, respectively). Also, output peat is successfully used in emissions model step.
Also, updated the ec2 launch tempalte user data/startup instructions.

* Rerunning with correct Xu et al. >40N shapefile and using more processors.

* Rerunning with correct Xu et al. >40N shapefile and using more processors.

* Rerunning with correct Xu et al. >40N shapefile and using more processors.

* Rerunning with correct Xu et al. >40N shapefile and using more processors.

* Created peat tiles. Done with peat tile generation.

* Feature/tree cover gain 2000 2020 (#42)

* s3_file_download() downloads tree cover gain 2000-2020 tiles from gfw-data-lake and renames them with a designated pattern. Need to do the same with s3_folder_download().

* mp_model_extent.py correctly uses the new 2000-2020 gain raster for a test tile; checked a pixel that had gain but not tree cover and that's included in the model extent. Also, changed cn.pattern_gain to cn.pattern_gain_ec2 or cn.pattern_gain_data_lake throughout the model, as appropriate in each case. Need to make s3_folder_download work with the new gain data now.

* s3_folder_download() now downloads the 2000-2020 gain tile folder into its own folder on ec2, then renames those tiles and copies them into the main tile folder. Testing 00N_000E all model stages now.

* s3_folder_download() now downloads the 2000-2020 gain tile folder into its own folder on ec2, then renames those tiles and copies them into the main tile folder. Testing 00N_000E all model stages now.

* Tree cover gain now uses 2000-2020 version instead of 2000-2012. Ran 00N_000E all the way through in 1:25:33. Checked forest age category and gain year count (gain only and loss-gain pixels) to make sure they were using new gain correctly. They seemed fine. I didn't check the output of mp_derivative_outputs but I did change the derivative_outputs.py to print whether a gain tile was found or not, so I'm pretty comfortable with that step working, too. Still need to check that I can correctly download all the gain 2000-2020 tiles from s3.

* Trying to use aws s3 sync instead of aws s3 cp for tree cover gain 2000-2020 folder download.

* Trying to use aws s3 sync instead of aws s3 cp for tree cover gain 2000-2020 folder download.

* Trying to use aws s3 sync instead of aws s3 cp for tree cover gain 2000-2020 folder download.

* Trying to use aws s3 sync instead of aws s3 cp for tree cover gain 2000-2020 folder download.

* Trying to use aws s3 sync instead of aws s3 cp for tree cover gain 2000-2020 folder download.

* Trying to use aws s3 sync instead of aws s3 cp for tree cover gain 2000-2020 folder download.

* Trying to use aws s3 sync instead of aws s3 cp for tree cover gain 2000-2020 folder download.

* Now using aws s3 sync to download s3 folders instead of aws s3 cp. This downloads just the files that haven't already been downloaded. Haven't tested it in every use, just in the standard model s3_folder_download() instances that I regularly use in the model (e.g., didn't test the sensitivity analysis uses).

* Addes source for tree cover gain.

* Modified gain_year_count loss-only and no-change gdal_calc expressions to use --hideNoData flag. Needed to do this because the 2000-2020 gain rasters don't use 0 for no gain but instead use NoData.

* Gain year count wasn't working correctly before because "no gain" pixels were NoData instead of 0. Confirmed that the gain year count outputs are correct now. Also, forest extent derivative outputs for gross removals and net flux are correct. (I didn't need to change anything about the numpy statement for to make forest extent work with the new gain rasters, though.)

* Feature/include pre 2000 plantations (#43)

* Removed pre-2000 plantations condition from model extent stage and added it to forest extent part of derivative output stage. Tested both stages and updated documentation. Changed documentation throughout repo (including readme).
Also, fixed mistake in carbon pool creation step that wasn't registering the gain tile.

* Changed derivative output documentation.

* Trying to get model to run on ec2.

* Trying to get model to run on ec2.

* Trying to get model to run on ec2.

* Trying to get model to run on ec2.

* Trying to get model to run on ec2.

* Trying to get count_tiles_s3 to properly count gain tiles on s3.

* Trying to get count_tiles_s3 to properly count gain tiles on s3.

* Trying to get count_tiles_s3 to properly count gain tiles on s3.

* Trying to get count_tiles_s3 to properly count gain tiles on s3.

* Trying to get count_tiles_s3 to properly count gain tiles on s3.

* Trying to get count_tiles_s3 to properly count gain tiles on s3.

* Trying to get count_tiles_s3 to properly count gain tiles on s3.

* Trying to get count_tiles_s3 to properly count gain tiles on s3.

* Gave up on trying to get count_tiles_s3 to properly count gain tiles on s3.

* Gave up on trying to get count_tiles_s3 to properly count gain tiles on s3.

* Changing aws s3 sync command to be based on --size-only and not timestamp

* Not using aws s3 sync anymore because I can't make it sync just based on file names, so it was redownloading tiles based on changes in size, which I didn't want. Back to the old approach of having it count the number of tiles on s3 and ec2. Bummer.

* Not using aws s3 sync anymore because I can't make it sync just based on file names, so it was redownloading tiles based on changes in size, which I didn't want. Back to the old approach of having it count the number of tiles on s3 and ec2. Bummer.

* Not using aws s3 sync anymore because I can't make it sync just based on file names, so it was redownloading tiles based on changes in size, which I didn't want. Back to the old approach of having it count the number of tiles on s3 and ec2. Bummer.

* Not using aws s3 sync anymore because I can't make it sync just based on file names, so it was redownloading tiles based on changes in size, which I didn't want. Back to the old approach of having it count the number of tiles on s3 and ec2. Bummer.

* Not using aws s3 sync anymore because I can't make it sync just based on file names, so it was redownloading tiles based on changes in size, which I didn't want. Back to the old approach of having it count the number of tiles on s3 and ec2. Bummer.

* Not using aws s3 sync anymore because I can't make it sync just based on file names, so it was redownloading tiles based on changes in size, which I didn't want. Back to the old approach of having it count the number of tiles on s3 and ec2. Bummer.

* Updating output folders to 20239999

* Changing processor count.

* Changing processor count.

* Changing processor count.

* Changing processor count.

* Changing processor count.

* Changing processor count.

* Changing processor count.

* Adding 0.26 as the default BGB:AGB for numpy windows that don't have the Huang et al. BGB:AGB.

* Adding 0.26 as the default BGB:AGB for numpy windows that don't have the Huang et al. BGB:AGB (e.g., 10N_180W). Also, changing processor count.

* Had to fix gain_year_count_all_forest_types.py to handle tiles that don't have any gain (e.g., 10S_180W). It was messing up the loss only,no-change, and loss-and-gain year count calculations.

* Changing processor count.

* Changed rasterio lines that look for tiles to "X tile found" or "X tile not found". Also, changing processor count.

* Changing processor count

* Version 1 2 3 2022 tcl update (#45)

* TCL 2022 constants changed.

* Output paths changed for TCL 2022. Going to run TCLF processing now.

* Output paths changed for TCL 2022. Going to run TCLF processing now.

* Output paths changed for TCL 2022. Going to run TCLF processing now.

* Output paths changed for TCL 2022. Going to run TCLF processing now.

* Done with TCLF pre-processing. Local test tile is next.

* Testing peat tile creation with Crezee et al. 2022 and Hastie et al. 2022 added

* Testing peat tile creation with Crezee et al. 2022 and Hastie et al. 2022 added

* Testing peat tile creation with Crezee et al. 2022 and Hastie et al. 2022 added

* Changing peat processor count.

* Changing peat processor count.

* Changing carbon pool processor count.

* Changing carbon pool processor count.

* Testing emissions onwards.

* Time to tile drivers and run emissions, net flux and derivative outputs.

* Changing processor count for derivative output steps.

* Issue with derivative outputs: tile that doesn't exist is being included in gross removals tile list set

* Issue with derivative outputs: tile that doesn't exist is being included in gross removals tile list set

* Issue with derivative outputs: tile that doesn't exist is being included in gross removals tile list set

* Going to run each model output through derivative output stage on its own now because I keep losing spot machines. Doing gross removals now.

* Going to run each model output through derivative output stage on its own now because I keep losing spot machines. Doing gross emissions now.

* Going to run each model output through derivative output stage on its own now because I keep losing spot machines. Doing net flux now.

* Creating corrected drivers tiles and rerunning emissions onwards with corrected drivers map.

* Need to recreate the aggregate maps. Accidentally modified mp_derivative_outputs.py to delete them during clean up.

* Need to recreate the aggregate maps. Accidentally modified mp_derivative_outputs.py to delete them during clean up.

* Small amendments here and there. Ready for new driver correction.

* revised driver preprocessing and output folders

* Version used for TCL 2022 update. Updated readme. Ready for release.

---------

Co-authored-by: Michelle Sims <michelle.sims@wri.org>

---------

Co-authored-by: Gary Tempus <gtempus@users.noreply.github.com>
Co-authored-by: Gary Tempus Jr <gary.tempus@wri.org>
Co-authored-by: Michelle Sims <michelle.sims@wri.org>
  • Loading branch information
4 people committed Jul 3, 2023
1 parent 8d35b1a commit 4e29674
Show file tree
Hide file tree
Showing 87 changed files with 5,254 additions and 4,961 deletions.
5 changes: 5 additions & 0 deletions .pylintrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# .pylintrc

[MASTER]

disable=line-too-long, redefined-outer-name, invalid-name
31 changes: 12 additions & 19 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
# Use osgeo GDAL image. It builds off Ubuntu 18.04 and uses GDAL 3.0.4
FROM osgeo/gdal:ubuntu-small-3.0.4

# # Use this if downloading hdf files for burn year analysis
# FROM osgeo/gdal:ubuntu-full-3.0.4
# Use osgeo GDAL image.
#Ubuntu 20.04.4 LTS, Python 3.8.10, GDAL 3.4.2
FROM osgeo/gdal:ubuntu-small-3.4.2

ENV DIR=/usr/local/app
ENV TMP=/usr/local/tmp
Expand All @@ -14,16 +12,17 @@ ENV SECRETS_PATH /usr/secrets
RUN ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime

# Install dependencies
# PostGIS extension version based on https://computingforgeeks.com/how-to-install-postgis-on-ubuntu-linux/
RUN apt-get update -y && apt-get install -y \
make \
automake \
g++ \
gcc \
libpq-dev \
postgresql-10 \
postgresql-server-dev-10 \
postgresql-contrib-10 \
postgresql-10-postgis-2.4 \
postgresql-12 \
postgresql-server-dev-12 \
postgresql-contrib-12 \
postgresql-12-postgis-3 \
python3-pip \
wget \
nano \
Expand Down Expand Up @@ -57,7 +56,7 @@ ENV PGDATABASE=ubuntu
# Commented out the start/restart commands because even with running them, postgres isn't running when the container is created.
# So there's no point in starting posgres here if it's not active when the instance opens.
#######################################
RUN cp pg_hba.conf /etc/postgresql/10/main/
RUN cp pg_hba.conf /etc/postgresql/12/main/
# RUN pg_ctlcluster 10 main start
# RUN service postgresql restart

Expand All @@ -68,9 +67,9 @@ RUN pip3 install -r requirements.txt
# Link gdal libraries
RUN cd /usr/include && ln -s ./ gdal

# Somehow, this makes gdal_calc.py accessible from anywhere in the Docker
#https://www.continualintegration.com/miscellaneous-articles/all/how-do-you-troubleshoot-usr-bin-env-python-no-such-file-or-directory/
RUN ln -s /usr/bin/python3 /usr/bin/python
# # Somehow, this makes gdal_calc.py accessible from anywhere in the Docker
# #https://www.continualintegration.com/miscellaneous-articles/all/how-do-you-troubleshoot-usr-bin-env-python-no-such-file-or-directory/
# RUN ln -s /usr/bin/python3 /usr/bin/python

# Enable ec2 to interact with GitHub
RUN git config --global user.email dagibbs22@gmail.com
Expand All @@ -81,11 +80,5 @@ RUN git config --global user.email dagibbs22@gmail.com
## Makes sure the latest version of the current branch is downloaded
#RUN git pull origin model_v_1.2.2

## Compile C++ scripts
#RUN g++ /usr/local/app/emissions/cpp_util/calc_gross_emissions_generic.cpp -o /usr/local/app/emissions/cpp_util/calc_gross_emissions_generic.exe -lgdal && \
# g++ /usr/local/app/emissions/cpp_util/calc_gross_emissions_soil_only.cpp -o /usr/local/app/emissions/cpp_util/calc_gross_emissions_soil_only.exe -lgdal && \
# g++ /usr/local/app/emissions/cpp_util/calc_gross_emissions_no_shifting_ag.cpp -o /usr/local/app/emissions/cpp_util/calc_gross_emissions_no_shifting_ag.exe -lgdal && \
# g++ /usr/local/app/emissions/cpp_util/calc_gross_emissions_convert_to_grassland.cpp -o /usr/local/app/emissions/cpp_util/calc_gross_emissions_convert_to_grassland.exe -lgdal

# Opens the Docker shell
ENTRYPOINT ["/bin/bash"]
276 changes: 0 additions & 276 deletions analyses/aggregate_results_to_4_km.py

This file was deleted.

149 changes: 0 additions & 149 deletions analyses/create_supplementary_outputs.py

This file was deleted.

Loading

0 comments on commit 4e29674

Please sign in to comment.