| A Benchmark for Language Models in Real-World System Building |
Weilin Jin, Chenyu Zhao, Zeshun Huang, Chaoyun Zhang, Qingwei Lin, Chetan Bansal, Saravan Rajmohan, Shenglin Zhang, Yongqian Sun, Dan Pei, Yifan Wu, Tong Jia, Ying Li, Zhonghai Wu, Minghua Ma |
| A Spec-Driven Workflow for AI-Assisted Domain-Driven Development: Insights from Practice 📝 |
Jefferson de Barros Santos |
| Achieving Productivity Gains with AI-based IDE features: A Journey at Google |
Maxim Tabachnyk, Xu Shu, Alexander Frömmgen, Pavel Sychev, Vahid Meimand, Ilia Krets, Stanislav Pyatykh, Abner Araujo, Kristof Molnar, Satish Chandra |
| An Automated Methodology for Generating Labeled Datasets of Semantic Errors in Code |
Mahmoud Kassem, Francisco Ribeiro, Sarah Nadi |
| An Empirical Study of C to Rust Translation using Local Large-Language Models |
Nathan Rutherford, Dan O'Keeffe |
| An Initial Exploration of Contrastive Prompt Tuning to Generate Energy-Efficient Code |
Sophie Weidmann, Fernando Castor |
| Benchmarking LLM Commit Message Generation through a Developer-centric Pairwise Preference Framework |
Lucas Aguiar, Matheus Freitas, Matheus Paixao, Rafael Carmo |
| Code Roulette: How Prompt Variability Affects LLM Code Generation |
Andrei Paleyes, Diana Robinson, Radzim Sendyka, Christian Cabrera, Neil D. Lawrence |
| Code vs Serialized AST Inputs for LLM-Based Code Summarization: An Empirical Study |
Shijia Dong, Haoruo Zhao, Paul Harvey |
| ContextPilot: Code Context Engineering with Memory-Augmented Exploration Agents 📝 |
Shuzheng Gao, Chaozheng Wang, Shuqing Li, Yun Peng, Michael R. Lyu |
| Continuous Benchmark Generation for Evaluating Enterprise-scale LLM Agents 📝 |
Divyanshu Saxena, Rishikesh Maurya, Xiaoxuan Ou, Gagan Somashekar, Shachee Mishra Gupta, Arun Iyer, Yu Kang, Chetan Bansal, Aditya Akella, Saravan Rajmohan |
| CP-Agent: Agentic Constraint Programming |
Stefan Szeider |
| Diverse LLMs vs. Vulnerabilities: Who Detects and Fixes Them Better? |
Arastoo Zibaeirad, Marco Vieira |
| Do LLMs Dream of Energy-Efficient Code? |
Antimo Di Bernardo, Gianluca Capozzi, Pasquale De Rosa, Daniele Cono D'Elia, Leonardo Querzoni, Giuseppe Antonio Di Luna, Valerio Schiavoni |
| English or Chinese? Investigating the Impact of Prompt Language on Large Language Models for Code Summarization 📝 |
Yijia Tang, Zhiqiu Huang, Jian Xie, Yaoshen Yu, Bowei Xia, Enya Shen, Yukun Cao |
| Evaluating LLMs-Driven Java Code Refactoring from a Developer’s Perspective 💬 |
Javel Freitas, Guilherme Pereira, Lara Lima, Caio Rian de Sousa, Edivar Filho, José Cezar de Souza Filho, Paulo Henrique Maia, Carla Bezerra |
| Learning Functional Equivalence via Supervised Contrastive Code-Problem Alignment |
Siu Wun Cheung, Harshitha Menon |
| LLM-Driven SQL Remediation: Towards Safe and Explainable Code for Automated Schema Refactoring |
Antony Medeiros, Claudio Cavalcante, Nicolaas Ruberg, Sergio Lifschitz |
| LLM-Powered On-Demand Test Suites in Self-Graded Student Programming Assignments 💬 |
Chang Liu |
| MAsFL: Data-Secure, Efficient and Accurate Fault Localization with Multi-Agent Small Language Models |
DUONG PHAM DUC, HIROSHI SATO, MASAO KUBO |
| Multi-task Code LLMs: Data Mix or Model Merge? |
Mingzhi Zhu, Michele Merler, Stacy Patterson, Raju Pavuluri, Rahul Krishna, Boris Sobolev |
| Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures |
Amirkia Rafiei Oskooei, S. Selcan Yukcu, Mehmet Cevheri Bozoglan, Mehmet S. Aktas |
| RAG Against the Machine: Zero-Shot Software Vulnerabilities Classification using LLMs |
Edvin Nordqvist, Changjie Wang, Simone Ferlin, Mariano Scazzariello, Marco Chiesa |
| RubberDuckBench: A Benchmark for AI Coding Assistants |
Elizabeth Dinella, Ferida Mohammed, Fatma Ayad, Satish Chandra, Petros Maniatis |
| SecRepoBench: Benchmarking Code Agents for Secure Code Completion in Real-World Repositories |
Chihao Shen, Connor Dilgren, Purva Chiniya, Luke Griffith, Yu Ding, Yizheng Chen |
| Statistical Independence Aware Caching for LLM Workflows |
Yihan Dai, Dimitrios Stamatios Bouras, Haoxiang Jia, Sergey Mechtaev |
| The Hidden DNA of LLM-Generated JavaScript: Structural Patterns Enable High-Accuracy Authorship Attribution |
Norbert Tihanyi, Bilel Cherif, Mohamed Amine Ferrag, Richard A. Dubniczky, Tamas Bisztray |
| Towards Improving in-IDE Code Completion for Driver Development |
Batuhan Raif Karagoz, Mahesh Jayasankar, Saurabh Bodhe, Subhayan Roy, Lejin Varghese, Max Kiehn, Yonas Bedasso |
| Towards LLM-guided Semantic Validation of Autonomous Driving Safety Policies 📝 |
Qingzhao Zhang, Z. Morley Mao |
| TritonForge: Profiling-Guided Framework for Automated Triton Kernel Optimization |
Haonan Li, Keyu Man, Partha Kanuparthy, Hanning Chen, Wei Sun, Sreen Tallam, Chenguang Zhu, Kevin Zhu, Zhiyun Qian |
| Usage, Effects and Requirements for AI Coding Assistants in the Enterprise: An Empirical Study |
Michele Merler, Rangeet Pan, Rahul Krishna, Tin Kam Ho, Raju Pavuluri, Maja Vukovic |