Data
Data Sources
Thesis AI integrates two primary external data providers: Massive API for market and company data, and FRED API for macroeconomic indicators. All agent analysis is grounded in data from these sources — no data is invented or interpolated.
Massive API
Massive is the primary market data vendor. It provides real-time and historical equity data across quotes, price bars, company fundamentals, news, and technical indicators. The Fundamentals Agent, News Agent, and Price & Trend Agent all draw from Massive.
| Data Type | Description |
|---|---|
| Quotes | Real-time bid/ask, last price, volume, and change |
| OHLC Bars | Intraday and daily candlestick data |
| Fundamentals | P/E, P/B, ROE, debt ratios, operating margins, EPS |
| News | Headline feed with relevance scoring by symbol |
| Technicals | RSI, MACD, moving averages, momentum indicators |
| Market Movers | Top gainers, losers, and volume leaders |
| Dividends & Splits | Corporate actions history and upcoming events |
Coverage
Massive API coverage is currently focused on US equities (NYSE, NASDAQ). Coverage of ETFs, international markets, and options is planned for future phases.
FRED API
The Federal Reserve Bank of St. Louis provides the FRED (Federal Reserve Economic Data) API, which is the authoritative source for US macroeconomic indicators. The Macro Agent uses a curated set of FRED series to construct its macro snapshot.
| Series ID | Description |
|---|---|
FEDFUNDS | Federal funds effective rate |
CPIAUCSL | Consumer Price Index (all urban, all items) |
UNRATE | US civilian unemployment rate |
GDP | Real gross domestic product (quarterly, annualized) |
DGS2 | 2-Year Treasury constant maturity rate |
DGS10 | 10-Year Treasury constant maturity rate |
T10Y2Y | 10-Year minus 2-Year Treasury spread (yield curve) |
The macro snapshot is assembled each time the Macro Agent runs. A Celery background task refreshes the cached snapshot hourly during market hours to reduce per-request latency.
Caching Strategy
To minimize vendor API costs and ensure fast response times, Thesis uses a two-tier caching strategy backed by Redis:
| Data Type | Cache TTL | Notes |
|---|---|---|
| Real-time quotes | 15 seconds | Refreshed on each request if stale |
| OHLC bars (intraday) | 1 minute | Stale bars served during off-market hours |
| News headlines | 5 minutes | Longer TTL acceptable for narrative analysis |
| Fundamentals snapshot | 24 hours | Fundamental ratios change infrequently |
| Macro snapshot (FRED) | 1 hour | Refreshed by Celery beat scheduler |
Data Integrity
All agent prompts explicitly constrain agents to use only the provided data snapshot. If a required data field is missing or stale beyond acceptable bounds, the agent notes the limitation in its output rather than extrapolating or inventing figures. This ensures every piece of analysis in a thesis is traceable to a real data point.
