Back to Projects

TradeVision

Can news sentiment predict stock prices?

PythonPandasVADERFinBERTXGBoostLightGBMscikit-learn
Jun 2025 – Sep 2025
TradeVision

Overview

TradeVision is a machine learning pipeline that scrapes financial news, extracts sentiment using NLP models, and combines it with market data to predict stock price movements.

The Problem

Financial markets react to news, but the relationship between news sentiment and price movement is noisy and non-linear. Can we build a reliable prediction pipeline that avoids data leakage?

Approach

NLP features (VADER sentiment scores, FinBERT embeddings, recency weighting, keyword hits) aligned with OHLCV market data. Walk-forward validation with 5 folds to prevent data leakage. Models tested: Logistic Regression, Random Forest, ExtraTrees, XGBoost, LightGBM.

Challenges

Timestamp alignment between news and market data was error-prone — strict temporal ordering required. Preventing data leakage: ensuring the model never 'sees the future' during training. Scraping enough quality financial news data.

Results

Best model: ROC-AUC 0.82 / Accuracy 0.70 at 240-minute prediction horizon. Hold-out period: March–September 2025.