Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2 
Published in Inverse Problem, 2024
Recommended citation: Yang, Pu, and Bin Dong. "L2SR: Learning to Sample and Reconstruct for accelerated MRI via reinforcement learning." Inverse Problems (2024). https://iopscience.iop.org/article/10.1088/1361-6420/ad3b34
Published in ICML2024, 2024
Recommended citation: Dohmatob, E., Feng, Y., Yang, P., Charton, F., & Kempe, J. (2024). A Tale of Tails: Model Collapse as a Change of Scaling Laws. arXiv preprint arXiv:2402.07043. https://proceedings.mlr.press/v235/dohmatob24b.html
Published in arXiv, 2024
Recommended citation: Feng, X., Hu, W., Yang, P., Li, T., & Zhou, X. H. (2024). Identifying average causal effect in regression discontinuity design with auxiliary data. arXiv preprint arXiv:2412.20840. https://arxiv.org/abs/2412.20840
Published in arXiv, 2025
Recommended citation: Yang, P., & Dong, B. (2025). MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning. arXiv preprint arXiv:2501.01834. https://arxiv.org/abs/2501.01834
Published in ICLR 2025, 2025
Recommended citation: Feng, Y., Dohmatob, E., Yang, P., Charton, F., & Kempe, J. (2024). Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement. arXiv preprint arXiv:2406.07515. https://openreview.net/forum?id=MQXrTMonT1
Published in NeurIPS 2025 (Spotlight), 2025
Recommended citation: Yang, P., Feng, Y., Chen, Z., Wu, Y., & Li, Z. (2025). Spend Wisely: Maximizing Post-Training Gains in Iterative Synthetic Data Bootstrapping. arXiv preprint arXiv:2501.18962. https://arxiv.org/abs/2501.18962
Published in arXiv, 2025
Recommended citation: Shen, Z., Huang, N., Yang, F., Wang, Y., Gao, G., Xu, T., Jiang, J., He, W., Yang, P., Sun, M., Ju, H., Wu, P., Dai, B., & Dong, B. (2025). REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning. arXiv preprint arXiv:2505.20613. https://arxiv.org/abs/2505.20613
Published in arXiv, 2026
Recommended citation: Wang, Y., Li, X., Xie, P., Yang, P., Nie, B., Cai, Y., Zhang, Q., Qu, C., Wu, J., Song, J., Ren, X., Huang, J., Pan, M., Feng, S., Chen, Z., & Luo, J. (2026). Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies. arXiv preprint arXiv:2605.00416. https://arxiv.org/abs/2605.00416
Published in arXiv, 2026
Recommended citation: Zhou, P., Chen, S., Chen, D., Wang, J., Jin, R., Zhu, B., Pan, Y., Gu, S., Wang, K., Nan, S., Qiu, X., Qiu, C., Yang, P., Cai, Y., Gao, J., Li, Y., Fu, Y., Yue, X., Chen, Z., & Luo, J. (2026). $\tau_0$-WM: A Unified Video-Action World Model for Robotic Manipulation. arXiv preprint arXiv:2606.01027. https://arxiv.org/abs/2606.01027