How Is FARS, The Fully Automated Research System?

04-22-202604-22-2026 blog 5 minutes read (About 740 words) visits

Introduction

A few months ago, someone was advertising an fully automated research system, FARS, that can automatically conduct research, perform experiments, and write papers, in scale. Based on what’s being advertised, FARS can produce research papers of the average quality of the papers submitted to top-tier conferences. The average cost of producing a paper is around 1000 dollars. The research process is completely transparent and live streamed, which seems to be very interesting.

The papers it produced appears to be very academically formal at the first glance and there are also good-looking figures and diagrams in the papers. Some internet influencers were introducing FARS and one of its generated paper “Cap-and-Spill: Two-Pass CUDA-Graph MoE Dispatch Without Worst-Case Padding”, which talks about a new two-stage method that can accelerate the dispatch of Mixture-of-Experts (MoE) models on CUDA Graphs, and they were very excited about the break through.

It turns out that, in my own opinion, those internet influencers never think independently. In this blog post, I would like to share my reading experience of the “Cap-and-Spill” paper.

Cap-and-Spill

When I firstly read the paper, despite that the paper claims very good results, I found it difficult to understand where the improvements come from based on the technical descriptions in the paper.

Basically, in the baseline, the MoE dispatch sends the worst case 43 tokens to each expert, most of the tokens are just zero-initialized paddings most of the time, which is very inefficient. What Cap-and-Spill does is, based on statistical distributions, it split the dispatch into two stages. The first pass only sends the top 16 tokens, which is the Q99, to the experts via NCCL AllToAll communications, which sounds reasonable. The description of the second pass, however, sounds very vague and confusing. It says “Compact all overflow tokens across peer pairs into a contiguous buffer. Execute a second AllToAll to dispatch the overflow tokens. Since overflow is rare (5.4% at Q99), this pass handles a small volume of data.” What the fuck does that mean? How can you do AllToAll communications without segmenting the data. It also says “The first pass uses C-sized buffers (e.g., C = 16), while the second pass uses buffers sized for the maximum possible overflow volume. Since the total token count is bounded, the overflow buffer size is also bounded and can be pre-computed.” So its second pass must send 43 - 16 = 27 tokens to each expert. The two passes are sending 43 tokens to each expert in total anyway. How the hell can it improve the latency with completely no work reduction.

The experiment source code was also released for this paper. I asked Claude Sonnet 4.6 to carefully review the source code with all the critiques that I brought up, especially on how dispatch performance improvement was achieved assuming that the experiment measurements are all real. The following code analysis summary was from Claude Sonnet 4.6.

Cost	Maxcounts	Cap-and-Spill
CPU-GPU syncs	17	4
GPU allocation per call	2.75 MB	0
D2D copies per call	2 × 2.75 MB	0
AllToAll	1 × 2.75 MB	2 × 1.05 MB

The performance gap comes entirely from three implementation defects in the maxcounts baseline, not the algorithm. In their implementation, it only sends 16 tokens rather than 27 tokens to each expert in the second pass, which will result in some tokens being silently dropped. It is also very surprising that the code has CPU-GPU synchronizations given the system uses CUDA graphs. Not to mention other defects in the implementations and latency measurements.

The AI Reviewer By Stanford ML Group rated this paper 5.8/10, which is higher than the average paper score at ICLR. In my opinion, it should be rated 0.0/10, because this paper is completely useless.

Conclusions

The “Cap-and-Spill” paper generated by FARS is deceptively well-written and looks very formal, but it is fundamentally flawed. FARS, at least for this instance, just tried to fool the readers who don’t want to think independently and don’t want to understand the technical details.

References

How Is FARS, The Fully Automated Research System?

https://leimao.github.io/blog/How-Is-FARS/

Author

Lei Mao

Posted on

04-22-2026

Updated on

04-22-2026

Licensed under

Artificial Intelligence,

CUDA,

Research,

CUDA Graphs,

Mixture-of-Experts

How Is FARS, The Fully Automated Research System?

Introduction

Cap-and-Spill

Conclusions

References

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Comments

Advertisement

Catalogue