-
LABBench2: An Improved Benchmark for AI Systems Performing Biology Research
arXiv:2604.09554v1 Announce Type: new Abstract: Optimism for accelerating scientific discovery with AI continues to grow. Current applications of AI in scientific research range from training dedicated foundation models …
2026-04-14T04:00:00Z
-
Linear Programming for Multi-Criteria Assessment with Cardinal and Ordinal Data: A Pessimistic Virtual Gap Analysis
arXiv:2604.09555v1 Announce Type: new Abstract: Multi-criteria Analysis (MCA) is used to rank alternatives based on various criteria. Key MCA methods, such as Multiple Criteria Decision Making (MCDM) methods, estimate pa…
2026-04-14T04:00:00Z
-
Seven simple steps for log analysis in AI systems
arXiv:2604.09563v1 Announce Type: new Abstract: AI systems produce large volumes of logs as they interact with tools and users. Analysing these logs can help understand model capabilities, propensities, and behaviours, o…
2026-04-14T04:00:00Z
-
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization
arXiv:2604.09574v1 Announce Type: new Abstract: The rise of autonomous GUI agents has triggered adversarial countermeasures from digital platforms, yet existing research prioritizes utility and robustness over the critic…
2026-04-14T04:00:00Z
-
AHC: Meta-Learned Adaptive Compression for Continual Object Detection on Memory-Constrained Microcontrollers
arXiv:2604.09576v1 Announce Type: new Abstract: Deploying continual object detection on microcontrollers (MCUs) with under 100KB memory requires efficient feature compression that can adapt to evolving task distributions…
2026-04-14T04:00:00Z
-
Explainable Planning for Hybrid Systems
arXiv:2604.09578v1 Announce Type: new Abstract: The recent advancement in artificial intelligence (AI) technologies facilitates a paradigm shift toward automation. Autonomous systems are fully or partially replacing manu…
2026-04-14T04:00:00Z
-
Help Without Being Asked: A Deployed Proactive Agent System for On-Call Support with Continuous Self-Improvement
arXiv:2604.09579v1 Announce Type: new Abstract: In large-scale cloud service platforms, thousands of customer tickets are generated daily and are typically handled through on-call dialogues. This high volume of on-call i…
2026-04-14T04:00:00Z
-
OOWM: Structuring Embodied Reasoning and Planning via Object-Oriented Programmatic World Modeling
arXiv:2604.09580v1 Announce Type: new Abstract: Standard Chain-of-Thought (CoT) prompting empowers Large Language Models (LLMs) with reasoning capabilities, yet its reliance on linear natural language is inherently insuf…
2026-04-14T04:00:00Z
-
OpeFlo: Automated UX Evaluation via Simulated Human Web Interaction with GUI Grounding
arXiv:2604.09581v1 Announce Type: new Abstract: Evaluating web usability typically requires time-consuming user studies and expert reviews, which often limits iteration speed during product development, especially for sm…
2026-04-14T04:00:00Z
-
Factorizing formal contexts from closures of necessity operators
arXiv:2604.09582v1 Announce Type: new Abstract: Factorizing datasets is an interesting process in a multitude of approaches, but many times it is not possible or efficient the computation of a factorization of the datase…
2026-04-14T04:00:00Z
-
Agentic Exploration of PDE Spaces using Latent Foundation Models for Parameterized Simulations
arXiv:2604.09584v1 Announce Type: new Abstract: Flow physics and more broadly physical phenomena governed by partial differential equations (PDEs), are inherently continuous, high-dimensional and often chaotic in nature.…
2026-04-14T04:00:00Z
-
MobiFlow: Real-World Mobile Agent Benchmarking through Trajectory Fusion
arXiv:2604.09587v1 Announce Type: new Abstract: Mobile agents can autonomously complete user-assigned tasks through GUI interactions. However, existing mainstream evaluation benchmarks, such as AndroidWorld, operate by c…
2026-04-14T04:00:00Z
-
Persistent Identity in AI Agents: A Multi-Anchor Architecture for Resilient Memory and Continuity
arXiv:2604.09588v1 Announce Type: new Abstract: Modern AI agents suffer from a fundamental identity problem: when context windows overflow and conversation histories are summarized, agents experience catastrophic forgett…
2026-04-14T04:00:00Z
-
DeepReviewer 2.0: A Traceable Agentic System for Auditable Scientific Peer Review
arXiv:2604.09590v1 Announce Type: new Abstract: Automated peer review is often framed as generating fluent critique, yet reviewers and area chairs need judgments they can \emph{audit}: where a concern applies, what evide…
2026-04-14T04:00:00Z
-
Spatial Competence Benchmark
arXiv:2604.09594v1 Announce Type: new Abstract: Spatial competence is the quality of maintaining a consistent internal representation of an environment and using it to infer discrete structure and plan actions under cons…
2026-04-14T04:00:00Z
-
DERM-3R: A Resource-Efficient Multimodal Agents Framework for Dermatologic Diagnosis and Treatment in Real-World Clinical Settings
arXiv:2604.09596v1 Announce Type: new Abstract: Dermatologic diseases impose a large and growing global burden, affecting billions and substantially reducing quality of life. While modern therapies can rapidly control ac…
2026-04-14T04:00:00Z
-
CID-TKG: Collaborative Historical Invariance and Evolutionary Dynamics Learning for Temporal Knowledge Graph Reasoning
arXiv:2604.09600v1 Announce Type: new Abstract: Temporal knowledge graph (TKG) reasoning aims to infer future facts at unseen timestamps from temporally evolving entities and relations. Despite recent progress, existing …
2026-04-14T04:00:00Z
-
Hubble: An LLM-Driven Agentic Framework for Safe and Automated Alpha Factor Discovery
arXiv:2604.09601v1 Announce Type: new Abstract: Discovering predictive alpha factors in quantitative finance remains a formidable challenge due to the vast combinatorial search space and inherently low signal-to-noise ra…
2026-04-14T04:00:00Z
-
From Scalars to Tensors: Declared Losses Recover Epistemic Distinctions That Neutrosophic Scalars Cannot Express
arXiv:2604.09602v1 Announce Type: new Abstract: Leyva-V\'azquez and Smarandache (2025) demonstrated that neutrosophic T/I/F evaluation, where Truth, Indeterminacy, and Falsity are independent dimensions not constrained t…
2026-04-14T04:00:00Z
-
LLMs for Text-Based Exploration and Navigation Under Partial Observability
arXiv:2604.09604v1 Announce Type: new Abstract: Exploration and goal-directed navigation in unknown layouts are central to inspection, logistics, and search-and-rescue. We ask whether large language models (LLMs) can fun…
2026-04-14T04:00:00Z
-
Evaluating Reliability Gaps in Large Language Model Safety via Repeated Prompt Sampling
arXiv:2604.09606v1 Announce Type: new Abstract: Traditional benchmarks for large language models (LLMs), such as HELM and AIR-BENCH, primarily assess safety risk through breadth-oriented evaluation across diverse tasks. …
2026-04-14T04:00:00Z
-
Unifying Ontology Construction and Semantic Alignment for Deterministic Enterprise Reasoning at Scale
arXiv:2604.09608v1 Announce Type: new Abstract: While enterprises amass vast quantities of data, much of it remains chaotic and effectively dormant, preventing decision-making based on comprehensive information. Existing…
2026-04-14T04:00:00Z
-
General-purpose LLMs as Models of Human Driver Behavior: The Case of Simplified Merging
arXiv:2604.09609v1 Announce Type: new Abstract: Human behavior models are essential as behavior references and for simulating human agents in virtual safety assessment of automated vehicles (AVs), yet current models face…
2026-04-14T04:00:00Z
-
Beyond Theory of Mind in Robotics
arXiv:2604.09612v1 Announce Type: new Abstract: Theory of Mind, the capacity to explain and predict behavior by inferring hidden mental states, has become the dominant paradigm for social interaction in robotics. Yet ToM…
2026-04-14T04:00:00Z
-
The Geometry of Knowing: From Possibilistic Ignorance to Probabilistic Certainty -- A Measure-Theoretic Framework for Epistemic Convergence
arXiv:2604.09614v1 Announce Type: new Abstract: This paper develops a measure-theoretic framework establishing when and how a possibilistic representation of incomplete knowledge contracts into a probabilistic representa…
2026-04-14T04:00:00Z
-
AdaQE-CG: Adaptive Query Expansion for Web-Scale Generative AI Model and Data Card Generation
arXiv:2604.09617v1 Announce Type: new Abstract: Transparent and standardized documentation is essential for building trustworthy generative AI (GAI) systems. However, existing automated methods for generating model and d…
2026-04-14T04:00:00Z
-
Competing with AI Scientists: Agent-Driven Approach to Astrophysics Research
arXiv:2604.09621v1 Announce Type: new Abstract: We present an agent-driven approach to the construction of parameter inference pipelines for scientific data analysis. Our method leverages a multi-agent system, Cmbagent (…
2026-04-14T04:00:00Z
-
How LLMs Might Think
arXiv:2604.09674v1 Announce Type: new Abstract: Do large language models (LLMs) think? Daniel Stoljar and Zhihe Vincent Zhang have recently developed an argument from rationality for the claim that LLMs do not think. We …
2026-04-14T04:00:00Z
-
Belief-Aware VLM Model for Human-like Reasoning
arXiv:2604.09686v1 Announce Type: new Abstract: Traditional neural network models for intent inference rely heavily on observable states and struggle to generalize across diverse tasks and dynamic environments. Recent ad…
2026-04-14T04:00:00Z
-
Tipiano: Cascaded Piano Hand Motion Synthesis via Fingertip Priors
arXiv:2604.09692v1 Announce Type: new Abstract: Synthesizing realistic piano hand motions requires both precision and naturalness. Physics-based methods achieve precision but produce stiff motions; data-driven models lea…
2026-04-14T04:00:00Z
-
The Myth of Expert Specialization in MoEs: Why Routing Reflects Geometry, Not Necessarily Domain Expertise
arXiv:2604.09780v1 Announce Type: new Abstract: Mixture of Experts (MoEs) are now ubiquitous in large language models, yet the mechanisms behind their "expert specialization" remain poorly understood. We show that, since…
2026-04-14T04:00:00Z
-
Pioneer Agent: Continual Improvement of Small Language Models in Production
arXiv:2604.09791v1 Announce Type: new Abstract: Small language models are attractive for production deployment due to their low cost, fast inference, and ease of specialization. However, adapting them to a specific task …
2026-04-14T04:00:00Z
-
Controllable and Verifiable Tool-Use Data Synthesis for Agentic Reinforcement Learning
arXiv:2604.09813v1 Announce Type: new Abstract: Existing synthetic tool-use corpora are primarily designed for offline supervised fine-tuning, yet reinforcement learning (RL) requires executable environments that support…
2026-04-14T04:00:00Z
-
EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience Learning
arXiv:2604.09815v1 Announce Type: new Abstract: Computer-use agents that combine GUI interaction with structured API calls via the Model Context Protocol (MCP) show promise for automating software tasks. However, existin…
2026-04-14T04:00:00Z
-
COMPOSITE-Stem
arXiv:2604.09836v1 Announce Type: new Abstract: AI agents hold growing promise for accelerating scientific discovery; yet, a lack of frontier evaluations hinders adoption into real workflows. Expert-written benchmarks ha…
2026-04-14T04:00:00Z
-
Steered LLM Activations are Non-Surjective
arXiv:2604.09839v1 Announce Type: new Abstract: Activation steering is a popular white-box control technique that modifies model activations to elicit an abstract change in output behavior. It has also become a standard …
2026-04-14T04:00:00Z
-
MEMENTO: Teaching LLMs to Manage Their Own Context
arXiv:2604.09852v1 Announce Type: new Abstract: Reasoning models think in long, unstructured streams with no mechanism for compressing or organizing their own intermediate state. We introduce MEMENTO: a method that teach…
2026-04-14T04:00:00Z
-
Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards
arXiv:2604.09855v1 Announce Type: new Abstract: The recent advancement of Large Language Models (LLMs) has established their potential as autonomous interactive agents. However, they often struggle in strategic games of …
2026-04-14T04:00:00Z
-
Evolutionary Token-Level Prompt Optimization for Diffusion Models
arXiv:2604.09861v1 Announce Type: new Abstract: Text-to-image diffusion models exhibit strong generative performance but remain highly sensitive to prompt formulation, often requiring extensive manual trial and error to …
2026-04-14T04:00:00Z
-
What do your logits know? (The answer may surprise you!)
arXiv:2604.09885v1 Announce Type: new Abstract: Recent work has shown that probing model internals can reveal a wealth of information not apparent from the model generations. This poses the risk of unintentional or malic…
2026-04-14T04:00:00Z