Project Overview

Why This Project Was Chosen

This project addresses a critical gap in economic data analysis by focusing on the delay in reporting quarterly Gross Domestic Product (GDP) figures. Such delays hinder the ability of policymakers and market analysts to make timely decisions in a rapidly changing economic landscape. Our goal is to bridge this gap by leveraging high-frequency data proxies to gain faster and more precise insights into consumer behavior, thereby supporting more agile economic policy and strategy formulation.

Specific Questions or Goals

Identification of high-frequency data sources as accurate proxies for consumer spending.
Validation of these proxies against established measures of consumer expenditure.
Development of techniques to ensure these proxies provide immediate and reliable insights.
Addressing potential discrepancies and harmonizing data frequencies for accurate analysis.
Ensuring the economic relevance of the findings beyond mere statistical correlations.

Reference to Similar Studies

Inspiration for this project comes from previous research, such as the work by McCracken, M.W., and Ng, S. (2015), "FRED-MD: A Monthly Database for Macroeconomic Research," which highlights the value of high-frequency data in economic forecasting. This underscores the potential of alternative data sources, like credit card transactions and online search trends, to enhance real-time economic trend analysis.

Data Sources Overview

Gross Domestic Product (BEAU)

Short Description: Data from the U.S. Bureau of Economic Analysis, detailing seasonally adjusted quarterly U.S. GDP rates and components.
Relevance: Crucial for nowcasting consumption with its detailed, time-series information.
Data Frequency: Quarterly.
Location & Access: Available for download in CSV format from the U.S. Bureau of Economic Analysis (BEA).
Variables of Interest: Private Consumption Expenditure (PCE).

Federal Reserve Economic Data (FRED)

Short Description: Managed by the Federal Reserve Bank of St. Louis, this database features 123 monthly economic indicators.
Relevance: Offers a granular view of economic trends potentially impacting consumer spending.
Data Frequency: Monthly.
Location & Access: Direct download in CSV format from FRED database.
Variables of Interest: Indicators from various economic sectors for evaluating alternative proxies for nowcasting.

Data Preparation and Analysis

Loading and Preprocessing GDP Data

Initial Loading: The GDP data is loaded from a CSV file, omitting the first three rows for titles and summaries, and reads the next 28 rows for analysis.
Column Clean-up: Removes the index column and trims all leading and trailing spaces for data cleanliness.
Column Renaming and Adjustment: Renames the first column to 'description' and adjusts column names for clarity by concatenating them with the first row's values.
Index Reset: Resets the DataFrame index to ensure clean, sequential indexing after row modifications.
Structuring Descriptions: Implements a hierarchical naming system to reflect component hierarchies and maps full component descriptions to abbreviations for simplified future reference.
Transforming Date Formats: Standardizes date columns into 'YYYYQX' format for easier temporal analysis and transposes the dataset to prioritize time series analysis.

Loading and Preprocessing FRED-MD Data

Data Retrieval: Loads the latest version of the FRED-MD dataset, filtering out rows with complete NAs and converting 'SASdate' to a Period Index for time-series analysis.
Column Name Mapping: Enhances data readability by mapping FRED-MD column names to descriptive titles using a definitions file.
Transforming Monthly Data to Quarterly: Filters monthly data to the last month of each quarter and converts it to a quarterly format to align with GDP report frequencies.

Joining Data Sources

Data Merge: After preprocessing, merges the FRED-MD dataset with the PCE data on their quarterly indices into a joined_dataset for combined analysis.

Initial Data Visualization and Inspection

PCE Rate of Change Calculation: Calculates the rate of change for PCE to identify trends and outliers, using a custom function analyze_and_plot for comprehensive visualization.

Data Filtering

Date Range Filtering: Narrows the dataset to observations from 1980 onwards to capture sufficient business cycles for long-term analysis.
Column Filtering: Removes less reliable indicators based on their relevance and timeliness for macro-economic research.

Handling Data Issues

Missing Data and Outliers: Utilizes visual tools like missingno to identify and manage missing data and applies the Z-score method to handle outliers without eliminating them, preserving critical information.

Normalisation and Data Transformation

Indicator Measurement Type Harmonization: Categorizes economic measures by type (e.g., dollar values, rates) for consistent analysis.

Data Transformation with Log and Differencing

Economic indicators often display significant variability and trends that can obscure underlying patterns. To address this, we employ data transformation techniques aimed at stabilizing the variance in datasets, especially for indicators exhibiting exponential growth or substantial fluctuations. Following the guidelines suggested by McCracken, we align our data handling practices with established methods in economic analysis to ensure consistency and accuracy.

Transformation Types

The transformation process is critical for preparing the data for in-depth analysis. Here are the transformation types as per the FRED column tcode, indicating the specific method applied to a series x:

No Transformation: The data remains unchanged and is used in its original form: x(t).
First Difference: This highlights trends by showing the change from one period to the next: ∆xt => x.diff().
Second Difference: It captures acceleration or deceleration in trends by examining the change in the first difference: ∆2xt => x.diff().diff().
Natural Log: This method stabilizes variance and linearizes exponential growth trends: log(xt) => np.log(x).
First Difference of Log: It transforms data into a stationary series, indicating percentage changes: ∆log(xt) => np.log(x).diff().
Second Difference of Log: Similar to the second difference but applied to logged data: ∆2log(xt) => np.log(x).diff().diff().
Percentage Change from Prior Period: This emphasizes relative changes by calculating percentage changes from the previous period: ∆(xt/xt−1 −1.0) => (indicator / indicator.shift(1) - 1.0) * 100.

Implementation

Our implementation strategy involves mapping the FRED transformation codes to the corresponding series in our joined_dataset. This mapping is facilitated by a specialized function, modified_log_transform, which applies the selected transformation to each series based on its associated transformation code from the fred_indicator_mappings dataset.

Transformation Function

The modified_log_transform function is designed to apply the appropriate transformation to each economic indicator in the dataset. This ensures that each indicator is processed according to its specific characteristics and the transformation code it is associated with.

Resulting Adjustments

After applying the transformations, we process the transformed data, dropping any initial rows containing NaN values to ensure a clean dataset for analysis. This approach enhances our dataset's suitability for advanced statistical modeling, aligning our methodology with established standards and enabling meaningful comparisons and insights.

By adhering to these transformation standards, we effectively prepare our dataset for the rigorous analysis required to achieve our project goals, ensuring that the economic indicators are accurately represented.

Correlation Analysis

Objective

Our correlation analysis aims to uncover linear relationships between economic indicators and Personal Consumption Expenditures (PCE) without presuming the nature of these relationships. This is crucial for handling economic data characterized by non-linear trends and outliers.

Methodology

Spearman's Rank Correlation: We utilize Spearman's rank correlation for its non-parametric nature, adept at identifying linear relationships and robust against outliers and NaN values.
Key Objectives:
- Identifying Influential Indicators: Sort correlations from highest to lowest based on absolute values to pinpoint strong linear relationships with PCE.
- Navigating the Correlation Landscape: View the magnitude and directionality of each relationship to understand how fluctuations in indicators resonate with PCE shifts.

Implementation

Visualization Strategy: We use horizontal bar plots, highlighting positive correlations in sky blue and negative correlations in coral, with a clear marker for zero correlation. This visual distinction helps underscore the directional influence of each indicator on PCE.
Analytical Precision: By focusing on the top N positively and negatively correlated indicators, we streamline our investigation towards variables with substantial predictive value for PCE.

Results

Labor and Housing Markets: These indicators stand out among the top correlated variables, emphasizing their crucial role in consumer expenditure dynamics.
Correlation Strengths: We observe moderate positive relationships (coefficients ranging from 0.39 to 0.55) and weak negative relationships (coefficients from -0.13 to -0.29) among the top indicators.

Multicollinearity Analysis with Variance Inflation Factor (VIF)

Overview

High multicollinearity among variables can obscure individual indicator impacts on PCE, making it crucial to assess and manage.

Techniques

Circular Correlation Heatmap: A visualization tool offering a comprehensive view of indicator interrelationships through hierarchical clustering of correlation matrices. It highlights clusters of closely correlated variables, aiding in multicollinearity detection.
Variance Inflation Factor (VIF) Analysis: Quantifies the inflation in the variance of estimated regression coefficients due to intercorrelations among predictors. A VIF above 10 indicates significant multicollinearity.

Observations and Strategic Steps Forward

Proxy Selection: We prioritize proxies with strong correlations to PCE and unique insights, employing linear regression to assess predictive strength and examining seasonality and stationarity.
Dimension Reduction: Utilizing techniques like Principal Component Analysis (PCA) to consolidate highly correlated variables into a manageable set of components, mitigating multicollinearity while enhancing model interpretability and efficiency.

Linear Regression Analysis for PCE Determinants

Objective

The primary goal is to utilize the R2 (coefficient of determination) metric to identify variables that significantly explain the variance in PCE, enabling us to pinpoint the most influential determinants.

Methodical Approach

Data Preparation:
- Exclusion of the dependent variable (PCE) from the pool of independent variables.
- Data cleansing to eliminate rows with NaN or infinite values.
- Model fitting involving linear regression models between each variable and PCE.
Assessment of R2 Values:
- Generation of PCE predictions for each independent variable, followed by calculation of the R2 value to gauge explanatory power.
- Higher R2 values indicate a stronger linear relationship and a higher degree of explained variance in PCE.

Utilising R2 and Correlation Coefficients Together

An interactive scatterplot combining R2 values and correlation coefficients with PCE enhances our understanding, enabling strategic proxy selection for further analysis. This dual-metric approach allows for a nuanced understanding of each variable's influence on PCE.

Why This Approach Is Beneficial

Identifies key PCE drivers by combining R2 values and correlation coefficients, offering a comprehensive view of each variable's influence.
Refines proxy selection, focusing on variables that provide significant insights into PCE dynamics.

Proxy Selection and Analysis

Defining Filtering Thresholds

Clear criteria based on correlation coefficients and R2 values help identify the most informative proxies for our model. Economic intuition also plays a crucial role in validating the logical connection of these proxies to PCE dynamics.

Seasonality and Stationarity Assessment

Seasonality Assessment: Utilizing Autocorrelation Function (ACF) analysis to identify and adjust for seasonal patterns within our dataset.
Stationarity Assessment: Employing the Augmented Dickey-Fuller (ADF) test to confirm the stationarity of our series, ensuring the validity of our statistical models.

Integration of Findings for Proxy Selection

By incorporating findings from seasonality, stationarity assessments, and correlation analysis, we refine our list of proxies, prioritizing indicators that are statistically sound and highly correlated with PCE.

Addressing Multicollinearity with PCA

Principal Component Analysis (PCA) Analysis

To tackle multicollinearity and enhance model performance, we implemented PCA for dimensionality reduction, transforming the dataset into a set of principal components used as new predictors in our regression model.

Methodological Overview

Data Preparation: Ensuring the dataset is clean and missing values are appropriately handled.
PCA Implementation: Reducing dataset complexity to address multicollinearity and improve interpretability.
Model Performance: Evaluating the predictive accuracy using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE), confirming the effectiveness of our model.

Model Insights

Scree Plot Analysis: A Scree Plot helps identify the principal components that account for the most variance, guiding the selection of components for regression.
Regression Using Principal Components: A Linear Regression model utilizing principal components as predictors to forecast PCE, enhancing model clarity and interpretability.

This final section encapsulates the thorough and strategic approach undertaken to identify, analyze, and model the determinants of PCE. Through meticulous data preparation, rigorous analysis, and sophisticated modeling techniques, this project offers valuable insights into consumer spending dynamics, laying a solid foundation for informed economic policy and strategy formulation.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
data		data
results		results
utils		utils
visualisations		visualisations
.gitignore		.gitignore
15-jjnag-adityash-ctevan.pdf		15-jjnag-adityash-ctevan.pdf
LICENSE		LICENSE
README.MD		README.MD
[1]M1_clean_and_preprocess.ipynb		[1]M1_clean_and_preprocess.ipynb
[2]M1_Exploratory_Data_Analysis.ipynb		[2]M1_Exploratory_Data_Analysis.ipynb
[3]M1_Analysis_for_Model_fitting.ipynb		[3]M1_Analysis_for_Model_fitting.ipynb
requirements.txt		requirements.txt

License

janjannagtegaal/NowCasting-M1

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Why This Project Was Chosen

Specific Questions or Goals

Reference to Similar Studies

Data Sources Overview

Gross Domestic Product (BEAU)

Federal Reserve Economic Data (FRED)

Data Preparation and Analysis

Loading and Preprocessing GDP Data

Loading and Preprocessing FRED-MD Data

Joining Data Sources

Initial Data Visualization and Inspection

Data Filtering

Handling Data Issues

Normalisation and Data Transformation

Data Transformation with Log and Differencing

Transformation Types

Implementation

Transformation Function

Resulting Adjustments

Correlation Analysis

Objective

Methodology

Implementation

Results

Multicollinearity Analysis with Variance Inflation Factor (VIF)

Overview

Techniques

Observations and Strategic Steps Forward

Linear Regression Analysis for PCE Determinants

Objective

Methodical Approach

Utilising R2 and Correlation Coefficients Together

Why This Approach Is Beneficial

Proxy Selection and Analysis

Defining Filtering Thresholds

Seasonality and Stationarity Assessment

Integration of Findings for Proxy Selection

Addressing Multicollinearity with PCA

Principal Component Analysis (PCA) Analysis

Methodological Overview

Model Insights

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages