Nova
Autonomous research agent that navigates academic papers, extracts claims, and cross-references findings across multiple sources
Keeping up with research across ML, computational biology, and systems is impossible for a single person. Nova is a research agent that reads papers, traverses citation graphs, extracts methodological claims, and builds a cross-referenced knowledge graph of experimental results - surfacing contradictions and gaps that would otherwise take months to notice.
- Parses and indexes research papers from arXiv, PubMed, and open-access repositories with automatic metadata extraction
- Extracts fine-grained claims from paper text using a fine-tuned NLI model trained on scientific discourse
- Builds a cross-referenced knowledge graph connecting claims, citations, and experimental results
- Supports natural language queries like 'what evidence exists for X' and returns cited passages with confidence scores
- Tracks contradictions and replication results across papers, flagging when new findings challenge existing claims
- Generates literature review drafts organized by thematic clusters rather than chronological order
Multi-stage ingestion pipeline: PDF parsing via Grobid, structural segmentation using layout-aware transformers, then claim extraction through a fine-tuned DeBERTa-v3 model operating on sentence windows with overlapping context. Each extracted claim is normalized to a subject-predicate-object triple and embedded using all-MiniLM-L6-v2.
Graph-based retrieval layer using a hybrid of dense vector search (ChromaDB) and citation graph traversal. Queries are decomposed into sub-questions, each routed to either the vector store for semantic similarity or the graph store for structural relationships. Results are re-ranked by a cross-encoder that scores relevance to the original question.