This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
Agent-oriented software engineering (AOSE) is a new software engineering paradigm that arose to apply best practice in the development of complex Multi-Agent Systems (MAS) by focusing on the use of agents, and organizations (communities) of agents as the main abstractions. The field of Software Product Lines (SPL) covers all the software development lifecycle necessary to develop a family of products where the derivation of concrete products is made systematically and rapidly.
Commentary
editWith the advent of biologically inspired, pervasive, and autonomic computing, the advantages of, and necessity of, agent-based technologies and MASs has become obvious[citation needed]. Unfortunately, current AOSE methodologies are dedicated to developing single MASs. Clearly, many MASs will make use of significantly the same techniques, adaptations, and approaches. The field is thus ripe for exploiting the benefits of SPL: reduced costs, improved time-to-market, etc. and enhancing agent technology in such a way that it is more industrially applicable.
Multiagent Systems Product Lines (MAS-PL) is a research field devoted to combining the two approaches: applying the SPL philosophy for building a MAS. This will afford all of the advantages of SPLs and make MAS development more practical.
Benchmarks
editSeveral benchmarks have been developed to evaluate the capabilities of AI coding agents and large language models in software engineering tasks. Here are some of the key benchmarks:
Benchmark | Description |
---|---|
SWE-bench | Assesses the ability of AI models to solve real-world software engineering issues sourced from GitHub repositories. The benchmark involves:
|
ML-Agent-Bench | Designed to evaluate AI agent performance on machine learning tasks |
τ-Bench | τ-Bench is a benchmark developed by Sierra AI to evaluate AI agent performance and reliability in real-world settings. It focuses on:
|
WebArena | Evaluates AI agents in a simulated web environment. The benchmark tasks include:
|
AgentBench | A benchmark designed to assess the capabilities of AI agents in handling multi-agent coordination tasks. The key areas of evaluation include:
|
MMLU-Redux | An enhanced version of the MMLU benchmark, focusing on evaluating AI models across a broad range of academic subjects and domains. It measures:
|
McEval | A coding benchmark designed to test AI models' ability to solve coding challenges. The benchmark evaluates:
|
CS-Bench | A specialized benchmark for evaluating AI performance in computer science-related tasks. The key focus areas include:
|
WildBench | Tests AI models in understanding and reasoning about real-world wild environments. It emphasizes:
|
Test of Time | A benchmark that focuses on evaluating AI models' ability to reason about temporal sequences and events over time. It assesses:
|
Software engineering agent systems
editThere are several software engineering (SWE) agent systems in development. Here are some examples:
SWE Agent System | Backend LLM |
---|---|
Salesforce Research DEIBASE-1 | gpt4o |
Cosine Genie | Fine-tuned OpenAI GPT |
CodeStory Aide | gpt4o + Claude 3.5 Sonnet |
AbenteAI MentatBot | gpt4o |
Salesforce Research DEIBASE-2 | gpt4o |
Salesforce Research DEI-Open | gpt4o |
Bytedance MarsCode | gpt4o |
Alibaba Lingma | gpt-4-1106-preview |
Factory Code Droid | Anthropic + OpenAI |
AutoCodeRover | gpt4o |
Amazon Q Developer | (unknown) |
CodeR | gpt-4-1106-preview |
MASAI | (unknown) |
SIMA | gpt4o |
Agentless | gpt4o |
Moatless Tools | Claude 3.5 Sonnet |
IBM Research Agent | (unknown) |
Aider | gpt4o + Claude 3 Opus |
OpenDevin + CodeAct | gpt4o |
AgileCoder | (various) |
ChatDev | (unknown) |
MetaGPT | gpt4o |
External links
edit- Agent-Oriented Software Engineering: Reflections on Architectures, Methodologies, Languages, and Frameworks ISBN 978-3642544316
References
edit- Michael Winikoff and Lin Padgham. Agent Oriented Software Engineering. Chapter 15 (pages 695-757) In G. Weiss (Ed.). Multiagent Systems. 2nd Edition. MIT Press. ISBN 978-0-262-01889-0 (a recent survey of the field)
- Site of the MaCMAS methodology which is applying MAS-PL. https://web.archive.org/web/20100922120209/http://james.eii.us.es/MaCMAS/index.php/Main_Page
- MAS Product Lines site: https://web.archive.org/web/20140518122645/http://mas-productlines.org/
- Joaquin Peña, Michael G. Hinchey, and Antonio Ruiz-Cortés. Multiagent system product lines: Challenges and benefits. Communications of the ACM, December 2006, volume 49, issue number 12. doi:10.1145/1183236.1183272
- Peña, Joaquin; Hinchey, Michael G.; Resinas, Manuel; Sterritt, Roy; Rash, James L. (2007). "Designing and Managing Evolving Systems using a MAS-Product-Line Approach". Journal of Science of Computer Programming. 66: 71–86. doi:10.1016/j.scico.2006.10.007.
- Joaquin Peña, Michael G. Hinchey, Antonio Ruiz-Cortés, and Pablo Trinidad. Building the Core Architecture of a NASA Multiagent System Product Line. In 7th International Workshop on Agent Oriented Software Engineering 2006, page to be published, Hakodate, Japan, May 2006. LNCS. https://doi.org/10.1007%2F978-3-540-70945-9_13
- Joaquin Peña, Michael G. Hinchey, Manuel Resinas, Roy Sterritt, James L. Rash. Managing the Evolution of an Enterprise Architecture using a MAS-Product-Line Approach. 5th Int. Workshop on System/Software Architectures (IWSSA’06). Nevada, USA. 2006
- Soe-Tsyr Yuan. MAS Building Environments with Product-Line-Architecture Awareness.
- Josh_Dehlinger and Robyn Lutz have several publications in this field.
- MAS-PL -- Current research. In THE FOURTH TECHNICAL FORUM (TF4) of AgentLink. December 2006.