Collecting training data for robots is far from glamorous work. Yet some of the world’s leading AI laboratories are already turning to XDOF to tackle the challenge.

Just two weeks ago, OpenAI announced that it would revive the robotics initiative it shut down in 2021, underscoring a broader trend among major AI organizations racing to develop machines capable of functioning in the physical world. However, creating highly capable robots requires a critical ingredient that the AI industry still lacks: the vast quantities of training data needed to support robotics models in the same way that text data supports large language models.
This shortage has opened the door for a new category of infrastructure company. Unlike LLMs, which benefited from enormous volumes of publicly available text, robots require data that reflects real-world physical interactions. Such datasets are extremely scarce. Video content from platforms like YouTube or recordings collected through gig-economy workers often lacks the precision and consistency needed to accurately represent the physical environment.
XDOF (pronounced “ecks-doff”), which is emerging from stealth today, believes the next major bottleneck in AI development will not be models or hardware, but rather the data collection and feedback systems necessary to teach robots how to engage with the real world.
The startup is building data pipelines, collection infrastructure, and annotation platforms that frontier AI labs and robotics companies would struggle to create on their own. To support this vision, XDOF has secured $70 million in funding from Thrive Capital, Spark Capital, a16z, Lux, and WndrCo. According to co-founder and CEO Philipp Wu, the company has grown to approximately 60 employees and is already serving 20 customers, including several leading AI labs, although he declined to identify them publicly.
“Every top-tier AI lab is pursuing robotics in some form,” Wu said. “We’ve already witnessed the consequences of falling behind in the language model race. No one wants to find themselves in a position where they arrive too late to the next technological shift. Right now, there’s a broad belief that physical AI represents that next frontier.”
Wu first encountered this challenge during his PhD studies at UC Berkeley, where his research focused on helping robots learn skills from large-scale datasets. The problem quickly became clear.
“We simply didn’t have large-scale datasets available,” he told TechCrunch. “There was a classic chicken-and-egg dilemma. Before we could even begin asking how to train a foundation model for robotics, we first needed to collect the data.”
Together with future XDOF co-founder and CTO Fred Shentu, Wu worked on GELLO, an affordable teleoperation system that enables human operators to control robotic arms and generate training data. “The project ultimately became highly influential within the robotics community because many researchers were facing the same limitations,” Wu explained. “A significant number of teams began using similar systems to gather data more effectively.”

Recognizing the market opportunity, Wu, Shentu, and third co-founder and Chief Operating Officer Nemo Jin launched XDOF in October 2024 with the goal of creating a comprehensive data ecosystem for organizations building robotics models. Understanding that simple data collection can become a commodity business, the company has expanded its focus to include data cleaning, annotation, and specialized tooling, creating a continuous and self-improving feedback loop for robotics training.
As an initial step, XDOF is collaborating with UC Berkeley’s AI Research Lab to release what it believes to be the largest collection of high-quality robotics training data ever assembled. Known as ABC, the dataset contains 130,000 robot manipulation trajectories, 300 hours of simulation data, and 100 hours of evaluation data. Academic researchers have never before had access to robotics pre-training data at this scale.
“We’ve repeatedly seen across language models, image generation, and other AI disciplines that when models and datasets become publicly available, the research community often achieves breakthroughs that would have been difficult to predict,” David McAllister, a Berkeley PhD student who helped coordinate the release, told TechCrunch.
Researchers have already leveraged the dataset to train robots on benchmark tasks such as folding T-shirts, flattening cardboard boxes, and placing AirPods into their charging cases.
Unlimited Degrees of Freedom
XDOF intends to operate across three levels of a robotics data pyramid. At the top sits the most valuable category: teleoperation data collected directly on the specific robot intended for deployment. The second layer consists of teleoperated robots gathering more generalized training data, similar to the approach pioneered through GELLO. The third layer includes “egocentric” data, captured from humans performing everyday activities, for which XDOF plans to develop proprietary wearable sensing technology.
“The choice of camera hardware directly influences the quality of the data you collect, which in turn affects the performance of hand-tracking systems and other algorithms,” Wu said. “If the hardware isn’t designed properly from the outset, the resulting dataset may contain hidden issues that only become apparent much later.”

To support this effort, XDOF plans to recruit and train large numbers of teleoperators and egocentric data collectors worldwide. The strategy is highly labor-intensive, which naturally raises an important question: why are major AI laboratories not producing this data themselves?
“You would need facilities spanning hundreds of thousands of square feet and housing hundreds of robots,” Wu explained. “Those robots must be maintained, calibrated, and monitored continuously, while operators need to be recruited and trained to work effectively with the systems.”
Building that kind of infrastructure requires substantial capital, operational expertise, and organizational focus—resources that many AI labs would prefer to allocate elsewhere. XDOF is positioning itself as the specialized partner capable of handling this complexity on their behalf.
The company’s name is derived from the robotics concept of “degrees of freedom,” which refers to the number of independent movements a robot can execute. A human arm, from shoulder to wrist, possesses seven degrees of freedom. Figure AI’s latest humanoid robot features 30.
The “X” in XDOF reflects the company’s broader ambition. As Wu describes it, the goal is to support “arbitrary degrees of freedom, unlimited degrees of freedom” — a vision centered on enabling the next generation of physical AI systems through scalable, high-quality robotics data.