FinancialTextMarketSignals

All codes used in this study are provided in a step by step manner to ensure transparency, reproducibility, and ease of implementation.

Data Availability

The datasets used in this project cannot be publicly released due to privacy and licensing restrictions. However, the full codebase is provided so that all processing steps and results can be reproduced once the data are obtained.

SEC 10-K Filings

The raw 10-K filings can be downloaded from the following source:

https://sraf.nd.edu/data/stage-one-10-x-parse-data/

After downloading the data, running the scripts in this repository will reproduce all preprocessing steps used in the paper.

StockTwits Data

The StockTwits dataset used in this project can be obtained from the dataset introduced in the following paper:

Li, Xingji, Aaron R. Kaufman, and Nasser Alansari (2025). StockTwits: Comprehensive records of a financial social media platform from 2008 to 2022. Journal of Quantitative Description: Digital Media.

The dataset and download instructions are available at:

https://stocktwits-nyu.s3.us-west-2.amazonaws.com/dataset/README.md

Model Checkpoints

For convenience, we also provide the trained checkpoints for the best-performing models reported in the paper. These checkpoints allow users to reproduce the reported results without retraining the models from scratch.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Checkpoints		Checkpoints
.gitattributes		.gitattributes
Best_Checkpoint_Finder.py		Best_Checkpoint_Finder.py
README.md		README.md
Step10_Eval_Firm_Segmentation.ipynb		Step10_Eval_Firm_Segmentation.ipynb
Step11_SocialMedia.ipynb		Step11_SocialMedia.ipynb
Step12_Aggregation.ipynb		Step12_Aggregation.ipynb
Step13_Merging.ipynb		Step13_Merging.ipynb
Step14_FinalDataset.ipynb		Step14_FinalDataset.ipynb
Step15_Training_Regression.ipynb		Step15_Training_Regression.ipynb
Step16_Eval_Twit_Truncated.ipynb		Step16_Eval_Twit_Truncated.ipynb
Step17_Chunk_Pool.ipynb		Step17_Chunk_Pool.ipynb
Step18_Eval_Twit_Segmentation.ipynb		Step18_Eval_Twit_Segmentation.ipynb
Step19_DAT_10KandStockTwits.ipynb		Step19_DAT_10KandStockTwits.ipynb
Step19_UpdatedForDeberta.ipynb		Step19_UpdatedForDeberta.ipynb
Step1_Extraction.ipynb		Step1_Extraction.ipynb
Step20_Visualisation.ipynb		Step20_Visualisation.ipynb
Step2_WRDS.ipynb		Step2_WRDS.ipynb
Step3_Link.ipynb		Step3_Link.ipynb
Step4_Merging.ipynb		Step4_Merging.ipynb
Step5_Item7.ipynb		Step5_Item7.ipynb
Step6_TrainTestSplit.ipynb		Step6_TrainTestSplit.ipynb
Step7_Training_Regression.ipynb		Step7_Training_Regression.ipynb
Step8_Eval_Firm_Truncated.ipynb		Step8_Eval_Firm_Truncated.ipynb
Step9_Firm_Chunk_Pool.ipynb		Step9_Firm_Chunk_Pool.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FinancialTextMarketSignals

Data Availability

SEC 10-K Filings

StockTwits Data

Model Checkpoints

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FinancialTextMarketSignals

Data Availability

SEC 10-K Filings

StockTwits Data

Model Checkpoints

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages