The Next Frontier for AI Research Agents: A Conversation with Miranda Yang, a Product Leader with unique expertise in Making Papers Truly Accessible

Photo: Miranda Yang headshot on a dark background. Photo courtesy of Miranda Yang.

Recognized by Nature as one of the ‘computer codes that transformed science’ and described by Wired as akin to ‘libraries and GPS for scientists,’ arXiv is de facto infrastructure for modern scientific research.

Today’s AI “research agents” can do something many grad students only dream of: they scan thousands of arXiv papers, pull out the relevant ones, and summarize them in a few seconds. Tools built on frameworks like LlamaIndex or CrewAI can search arXiv programmatically, download PDFs, and run retrieval‑augmented generation (RAG) pipelines to answer complex questions about a field.

For sighted researchers, this feels like magic. But for blind and low‑vision researchers, the story is more complicated—not because of the AI models themselves, but because of the formats we’ve historically used to share scientific work.

Yang’s work sits at the center of this shift, applying specialized expertise at the intersection of accessibility, large‑scale research platforms, and AI systems to reshape how scientific knowledge is consumed.

At arXiv, this work plays out at genuine internet scale: the platform serves over 5 million monthly active users across a corpus of roughly 3 million papers, the vast majority authored in LaTeX, and functions as the default preprint home for much of today’s cutting‑edge AI research. arXiv’s most intensive users today include leading LLM model companies such as Anthropic, underscoring how central the platform has become to both human and machine readers.

Against that backdrop, arXiv asked Yang to lead the effort to rethink how papers should be represented, shipped, and improved over time, starting from the lived experience of blind and low‑vision researchers.

Why Miranda Was Trusted to Lead This at arXiv

In 2022, before AI agents were the mainstream way to read research, product leader Miranda Yang served as the sole product manager for arXiv’s HTML papers initiative, leading it end to end from discovery through post‑launch iteration, to rethink how scientific papers could be shared as native HTML rather than just static PDFs.

Yang spent time in Cornell University’s library using NVDA, a widely used open‑source screen reader,  applying specialized methodological expertise to experience research papers the way BLV researchers do. The exercise was simple on paper: take a typical research article and try to navigate it using NVDA, no mouse, no visual scanning, just keyboard commands and synthesized speech. In practice, her work identified critical failure points that had a direct and significant impact on BLV researchers’ ability to conduct multi‑year research, highlighting how fragile the experience can be when everything is locked inside a PDF. Headings and sections weren’t always exposed in a predictable way. Tables could become long, flattened strings of numbers. Most importantly, math blocks and formulas were often treated as images or poorly tagged content, which made complex arguments much harder to follow line by line and could derail multi‑year research efforts.

Photo: research notes documenting good, okay, and bad formula examples, read by screen reader NVDA. Photo courtesy of Miranda Yang.

When the team tested the HTML prototype of the same content, the experience changed measurably. With structured semantic markup for headings and sections, NVDA could move through the paper more reliably. And when formulas were encoded using MathML or similar techniques, the screen reader could present equations as meaningful expressions instead of opaque descriptions.

What Changed Because of Her Work

Having experienced this herself and deeply understanding user needs, Yang pushed forward a native HTML solution rather than incremental PDF. She consequently owned the initial evaluation by auditing respective file format converters’ technical performance through automated accessibility tools and manual spot checks, to leverage existing tools rather than build a bespoke renderer from scratch.

She defined the minimum viable product by prioritizing “imperfect HTML” based on repeated user feedback that imperfect HTML was far better than none.

Knowing the HTML approach won’t magically fix every accessibility issue overnight and it’s only a long way to achieve a satisfactory performance, she designed a feedback‑processing mechanism as part of the MVP launch scope, where users self-report issues they experienced, collecting automatable information such as device and browser data, and explicit, user input like which screenreader a user relied on.

She then stood up and ran a post‑launch bug‑triage framework for four part‑time engineers:

  1. Cross‑check content in 3 formats, TeX, PDF, and HTML.
  2. Compare differences to determine where the bug lies.
  3. Tag issues in GitHub with a controlled vocabulary such as “figure,” “fidelity,” “mathml,” “formula,” and common math rendering packages
  4. Summarize findings weekly in bucketed form so she could prioritize low‑hanging fixes
  5. Codify the nomenclature and step‑by‑step instructions so that new contributors could quickly become productive despite constrained resources.

As this work scaled, the impact showed up in the numbers: as of today arXiv offered HTML for about 97% of LaTeX submissions, roughly 2.7 million papers, with approximately three‑quarters rendered in a way BLV researchers reported as readable, and a clear roadmap toward 90% error‑free rendering. The HTML feedback repository accumulated thousands of issues, nearly 3,000 open and just over 3,000 closed, as researchers and tool builders filed detailed reports into the triage system Yang had set up, turning individual pain points into a steadily improving, measurable surface.

Photo: arXiv team at the Cornell Tech campus in New York city

Why HTML‑First Papers Matter in an AI World

At the time, Yang and the arXiv team were focused on screen readers and human readers, not AI agents. But the same structural questions they wrestled with, how math is represented, how sections are labeled, how content is chunked, are now showing up in agentic workflows. Yang noted that she is witnessing 3 common downstream use cases: 1. corpus‑level infrastructures that RAG over many papers, 2. personalized gen-AI-powered curation tools that present a subset from the global corpus for deepdive, and 3. paper‑specific agents that work deeply within a single article with the ability to execute tasks.

In her experience advising teams building AI-native research tools, many current RAG implementations for research still follow a similar pattern: find a paper on arXiv, download the PDF, run it through a parser, and flatten the output into text before indexing. That’s often the easiest starting point, but it can discard the very structure that both screen readers and intelligent agents could use to reason more accurately about a document: clear headings, labeled equations, and meaningful relationships between sections and references.

If, instead, systems start from an HTML‑first paper with good semantic structure, both humans and AI systems gain a richer view of the same content. Screen readers can navigate predictable landmarks. Corpus‑level RAG pipelines can preserve section boundaries, equation markup, and other signals that inform chunking and retrieval, while paper‑specific agents can reliably anchor their actions to specific sections, formulas, and references. Yang emphasized that the underlying models don’t need to change to benefit; they simply receive cleaner, better‑organized input.​

Yang also drew our attention to a practical angle. HTML is generally more lightweight for web delivery, because text and assets are loaded separately rather than bundled into a single binary document. PDFs, especially those with embedded fonts, images, and complex layouts, can grow large and be slower to load over the network. For large‑scale RAG systems that routinely pull many documents, starting from well‑optimized HTML can help reduce the amount of data transferred and stored per paper, which in turn can improve responsiveness and cost efficiency. For individual researchers using paper‑specific agents, HTML‑first also means faster, more incremental loading of just the sections they are working with. These are not generic optimizations; they reflect specialized technical judgment about how to design infrastructures that are both accessible and efficient at global research scale.

Staying Grounded in What We Know

Yang is cautious not to overstate what has been formally studied. She has not yet run full user studies with BLV researchers using AI research agents end‑to‑end, so she doesn’t claim to speak for every aspect of their current experience. What she does emphasize is that the structural issues her team identified in 2022, particularly around math and navigation in PDFs, are the same issues that will shape how inclusive these new tools become.

If an agent is built on top of accessible HTML with rich semantics, it has more to work with than if everything is trapped in a flat PDF. That is true whether the consumer is a screen reader, a researcher using a conversational interface, or a team trying to build better retrieval pipelines.

From One Project to a Broader Practice

Since that initial HTML paper work, Yang’s role has shifted from designing a single format to advising and guiding multiple teams on making similar decisions earlier in their product cycles, reflecting a sustained record of leadership in this specialized domain. As a Beta fellow, she works with founders building AI‑native tools for knowledge work and research, encouraging them to think beyond “can we parse the file?” to “what is the most accessible representation for the people who will rely on this?”

Across those projects, a consistent theme has emerged: accessible science isn’t just about compliance; it’s about who gets to participate in the next wave of discovery. AI research agents will almost certainly change how we read and synthesize scientific work. The open question, in Yang’s view, is whether they will widen or narrow existing gaps.

Designing the Next Generation of Research Tools

The answer will not come from one format alone, and there is still more discovery needed, especially from BLV researchers working directly with agents. Nonetheless, the lessons from HTML‑first papers and NVDA testing now serve as foundational reference points for teams seeking to design the next generation of research tools.

These principles, which Yang has articulated and championed across multiple organizations, function as practical guidelines for product and engineering leaders building AI‑native research systems:

  • Preserve as much structure as possible—headings, equations, references—rather than flattening everything into plain text.​
  • Treat math accessibility as a first‑class requirement, not an edge case.
  • Choose representations, like well‑marked‑up HTML, that work for both assistive technologies and AI systems.

For Yang, the principle is simple: if an AI agent can read a paper, a blind researcher should be able to read it too. Getting there means revisiting decisions the research community made long before agents existed—starting with something as simple, and as powerful, as how we share a single scientific article.

 To learn more about Yang’s work, visit her widely cited arXiv writeup here.

Advertising disclosure: We may receive compensation for some of the links in our stories. Thank you for supporting the Village Voice and our advertisers.