dynCluster implements a new dynamic clustering method that can effectively summarize massive amounts of granular dyadic flow data. For more details of the method and applications, see:
-
Measuring Trade Profile with Granular Product-level Trade Data
-
Authors: In Song Kim, Steven Liao, Kosuke Imai
We recommend installing dynCluster on Amazon Web Services (AWS). This approach will allow users to easily scale up to accommodate bigger datasets. For step-by-step instructions on how to install dynCluster on AWS, see our Wiki page
-
Once dynCluster is installed on AWS, we create a small simulated dataset following the data generating process described in our paper. For details, see our Wiki page.
- The data covers
10
countries (90
directed-dyads) trading40
products over10
time periods.
## year cty1 cty2 product_1 product_2 product_3 product_4 product_5 ... ## 1 1 1 2 0 0 0 0.0 0 ## 2 1 1 3 0 0 0 664344.6 0 ## 3 1 1 4 0 0 0 0.0 0 ## 4 1 1 5 372390 0 0 0.0 0 ## 5 1 1 6 3171746 2797487 4872051 981809.8 2497946 ## ...
- For each time period, dyads belong to
3
different clusters (or types of international trade). These data represent the "true" dyadic cluster memberships. The ultimate goal of this example is to see how well dynCluster can use the bilateral trade data above to recover the three clusters and dyadic cluster memberships.
## cty1 cty2 dyad z1 z2 z3 z4 z5 z6 z7 z8 z9 z10 ## 1 1 10 1_10 2 2 2 2 2 2 2 2 2 2 ## 2 10 1 1_10 2 2 2 2 2 2 2 2 2 2 ## 3 1 2 1_2 1 1 1 1 1 1 1 1 1 1 ## 4 2 1 1_2 1 1 1 1 1 1 1 1 1 1 ## 5 1 3 1_3 2 2 2 2 2 2 2 2 2 2 ## 6 3 1 1_3 2 2 2 2 2 2 2 2 2 2 ## 7 1 4 1_4 2 2 2 2 2 1 1 1 1 1 ## 8 4 1 1_4 2 2 2 2 2 1 1 1 1 1 ## 9 1 5 1_5 2 2 2 2 2 2 2 2 2 2 ## 10 5 1 1_5 2 2 2 2 2 2 2 2 2 2 ## ...
- The data covers
-
We then implement dynCluster in R using the function
mainZTM
. This function wraps and calls C++ functions (e.g.,mainRcpp
) from dynCluster. Note that this toy example runs on t2.micro instances in AWS, which is available as a free tier.# load library library(dynCluster) # run and time dynCluster ptm <- proc.time() # start the clock mainZTM("./example/toy", comeBack=TRUE) proc.time() - ptm # stop the clock
-
To assess the performance of dynCluster, we create product-trade heatmaps based on the "true" cluster membership data above and the estimated cluster membership from dynCluster. For details, see our Wiki page.
-
A side-by-side comparison of the two heatmaps below show that the composition of product trade is very similar. This suggests that dynCluster did well in recovering the original clusters.
True Product Proportion Estimated Product Proportion -
The table below cross-tabulates the true vs. estimated cluster membership for each dyad-period. The cells in the diagonal show the number of dyad-periods correctly classified. Overall, dynCluster correctly recovered 98.4% of the true dyadic cluster memberships.
Estimated Cluster 1 Cluster 2 Cluster 3 Total Cluster 1 167 0 0 167 True Cluster 2 1 187 6 194 Cluster 3 0 0 89 89 Total 168 187 95 450
-
For any questions or problems when using dynCluster, please e-mail the authors.