paper-system/arxiv-processor/20250123-papers.json

[
  {
    "title": "Offline Critic-Guided Diffusion Policy for Multi-User Delay-Constrained\n  Scheduling",
    "abstract": "Effective multi-user delay-constrained scheduling is crucial in various\nreal-world applications, such as instant messaging, live streaming, and data\ncenter management. In these scenarios, schedulers must make real-time decisions\nto satisfy both delay and resource constraints without prior knowledge of\nsystem dynamics, which are often time-varying and challenging to estimate.\nCurrent learning-based methods typically require interactions with actual\nsystems during the training stage, which can be difficult or impractical, as it\nis capable of significantly degrading system performance and incurring\nsubstantial service costs. To address these challenges, we propose a novel\noffline reinforcement learning-based algorithm, named \\underline{S}cheduling By\n\\underline{O}ffline Learning with \\underline{C}ritic Guidance and\n\\underline{D}iffusion Generation (SOCD), to learn efficient scheduling policies\npurely from pre-collected \\emph{offline data}. SOCD innovatively employs a\ndiffusion-based policy network, complemented by a sampling-free critic network\nfor policy guidance. By integrating the Lagrangian multiplier optimization into\nthe offline reinforcement learning, SOCD effectively trains high-quality\nconstraint-aware policies exclusively from available datasets, eliminating the\nneed for online interactions with the system. Experimental results demonstrate\nthat SOCD is resilient to various system dynamics, including partially\nobservable and large-scale environments, and delivers superior performance\ncompared to existing methods.",
    "arxiv_id": "2501.12942v1"
  },
  {
    "title": "Evolution and The Knightian Blindspot of Machine Learning",
    "abstract": "This paper claims that machine learning (ML) largely overlooks an important\nfacet of general intelligence: robustness to a qualitatively unknown future in\nan open world. Such robustness relates to Knightian uncertainty (KU) in\neconomics, i.e. uncertainty that cannot be quantified, which is excluded from\nconsideration in ML's key formalisms. This paper aims to identify this blind\nspot, argue its importance, and catalyze research into addressing it, which we\nbelieve is necessary to create truly robust open-world AI. To help illuminate\nthe blind spot, we contrast one area of ML, reinforcement learning (RL), with\nthe process of biological evolution. Despite staggering ongoing progress, RL\nstill struggles in open-world situations, often failing under unforeseen\nsituations. For example, the idea of zero-shot transferring a self-driving car\npolicy trained only in the US to the UK currently seems exceedingly ambitious.\nIn dramatic contrast, biological evolution routinely produces agents that\nthrive within an open world, sometimes even to situations that are remarkably\nout-of-distribution (e.g. invasive species; or humans, who do undertake such\nzero-shot international driving). Interestingly, evolution achieves such\nrobustness without explicit theory, formalisms, or mathematical gradients. We\nexplore the assumptions underlying RL's typical formalisms, showing how they\nlimit RL's engagement with the unknown unknowns characteristic of an\never-changing complex world. Further, we identify mechanisms through which\nevolutionary processes foster robustness to novel and unpredictable challenges,\nand discuss potential pathways to algorithmically embody them. The conclusion\nis that the intriguing remaining fragility of ML may result from blind spots in\nits formalisms, and that significant gains may result from direct confrontation\nwith the challenge of KU.",
    "arxiv_id": "2501.13075v1"
  },
  {
    "title": "Boosting MCTS with Free Energy Minimization",
    "abstract": "Active Inference, grounded in the Free Energy Principle, provides a powerful\nlens for understanding how agents balance exploration and goal-directed\nbehavior in uncertain environments. Here, we propose a new planning framework,\nthat integrates Monte Carlo Tree Search (MCTS) with active inference objectives\nto systematically reduce epistemic uncertainty while pursuing extrinsic\nrewards. Our key insight is that MCTS already renowned for its search\nefficiency can be naturally extended to incorporate free energy minimization by\nblending expected rewards with information gain. Concretely, the Cross-Entropy\nMethod (CEM) is used to optimize action proposals at the root node, while tree\nexpansions leverage reward modeling alongside intrinsic exploration bonuses.\nThis synergy allows our planner to maintain coherent estimates of value and\nuncertainty throughout planning, without sacrificing computational\ntractability. Empirically, we benchmark our planner on a diverse set of\ncontinuous control tasks, where it demonstrates performance gains over both\nstandalone CEM and MCTS with random rollouts.",
    "arxiv_id": "2501.13083v1"
  },
  {
    "title": "A Unified Invariant Learning Framework for Graph Classification",
    "abstract": "Invariant learning demonstrates substantial potential for enhancing the\ngeneralization of graph neural networks (GNNs) with out-of-distribution (OOD)\ndata. It aims to recognize stable features in graph data for classification,\nbased on the premise that these features causally determine the target label,\nand their influence is invariant to changes in distribution. Along this line,\nmost studies have attempted to pinpoint these stable features by emphasizing\nexplicit substructures in the graph, such as masked or attentive subgraphs, and\nprimarily enforcing the invariance principle in the semantic space, i.e., graph\nrepresentations. However, we argue that focusing only on the semantic space may\nnot accurately identify these stable features. To address this, we introduce\nthe Unified Invariant Learning (UIL) framework for graph classification. It\nprovides a unified perspective on invariant graph learning, emphasizing both\nstructural and semantic invariance principles to identify more robust stable\nfeatures. In the graph space, UIL adheres to the structural invariance\nprinciple by reducing the distance between graphons over a set of stable\nfeatures across different environments. Simultaneously, to confirm semantic\ninvariance, UIL underscores that the acquired graph representations should\ndemonstrate exemplary performance across diverse environments. We present both\ntheoretical and empirical evidence to confirm our method's ability to recognize\nsuperior stable features. Moreover, through a series of comprehensive\nexperiments complemented by in-depth analyses, we demonstrate that UIL\nconsiderably enhances OOD generalization, surpassing the performance of leading\nbaseline methods. Our codes are available at https://github.com/yongduosui/UIL.",
    "arxiv_id": "2501.12595v1"
  },
  {
    "title": "Kimi k1.5: Scaling Reinforcement Learning with LLMs",
    "abstract": "Language model pretraining with next token prediction has proved effective\nfor scaling compute but is limited to the amount of available training data.\nScaling reinforcement learning (RL) unlocks a new axis for the continued\nimprovement of artificial intelligence, with the promise that large language\nmodels (LLMs) can scale their training data by learning to explore with\nrewards. However, prior published work has not produced competitive results. In\nlight of this, we report on the training practice of Kimi k1.5, our latest\nmulti-modal LLM trained with RL, including its RL training techniques,\nmulti-modal data recipes, and infrastructure optimization. Long context scaling\nand improved policy optimization methods are key ingredients of our approach,\nwhich establishes a simplistic, effective RL framework without relying on more\ncomplex techniques such as Monte Carlo tree search, value functions, and\nprocess reward models. Notably, our system achieves state-of-the-art reasoning\nperformance across multiple benchmarks and modalities -- e.g., 77.5 on AIME,\n96.2 on MATH 500, 94-th percentile on Codeforces, 74.9 on MathVista -- matching\nOpenAI's o1. Moreover, we present effective long2short methods that use\nlong-CoT techniques to improve short-CoT models, yielding state-of-the-art\nshort-CoT reasoning results -- e.g., 60.8 on AIME, 94.6 on MATH500, 47.3 on\nLiveCodeBench -- outperforming existing short-CoT models such as GPT-4o and\nClaude Sonnet 3.5 by a large margin (up to +550%).",
    "arxiv_id": "2501.12599v1"
  },
  {
    "title": "GATE: Adaptive Learning with Working Memory by Information Gating in\n  Multi-lamellar Hippocampal Formation",
    "abstract": "Hippocampal formation (HF) can rapidly adapt to varied environments and build\nflexible working memory (WM). To mirror the HF's mechanism on generalization\nand WM, we propose a model named Generalization and Associative Temporary\nEncoding (GATE), which deploys a 3-D multi-lamellar dorsoventral (DV)\narchitecture, and learns to build up internally representation from externally\ndriven information layer-wisely. In each lamella, regions of HF:\nEC3-CA1-EC5-EC3 forms a re-entrant loop that discriminately maintains\ninformation by EC3 persistent activity, and selectively readouts the retained\ninformation by CA1 neurons. CA3 and EC5 further provides gating function that\ncontrols these processes. After learning complex WM tasks, GATE forms neuron\nrepresentations that align with experimental records, including splitter, lap,\nevidence, trace, delay-active cells, as well as conventional place cells.\nCrucially, DV architecture in GATE also captures information, range from\ndetailed to abstract, which enables a rapid generalization ability when cue,\nenvironment or task changes, with learned representations inherited. GATE\npromises a viable framework for understanding the HF's flexible memory\nmechanisms and for progressively developing brain-inspired intelligent systems.",
    "arxiv_id": "2501.12615v1"
  },
  {
    "title": "Deep Learning-Based Identification of Inconsistent Method Names: How Far\n  Are We?",
    "abstract": "Concise and meaningful method names are crucial for program comprehension and\nmaintenance. However, method names may become inconsistent with their\ncorresponding implementations, causing confusion and errors. Several deep\nlearning (DL)-based approaches have been proposed to identify such\ninconsistencies, with initial evaluations showing promising results. However,\nthese evaluations typically use a balanced dataset, where the number of\ninconsistent and consistent names are equal. This setup, along with flawed\ndataset construction, leads to false positives, making reported performance\nless reliable in real-world scenarios, where most method names are consistent.\nIn this paper, we present an empirical study that evaluates state-of-the-art\nDL-based methods for identifying inconsistent method names. We create a new\nbenchmark by combining automatic identification from commit histories and\nmanual developer inspections, reducing false positives. We evaluate five\nrepresentative DL approaches (one retrieval-based and four generation-based) on\nthis benchmark. Our results show that performance drops substantially when\nmoving from the balanced dataset to the new benchmark. We further conduct\nquantitative and qualitative analyses to understand the strengths and\nweaknesses of the approaches. Retrieval-based methods perform well on simple\nmethods and those with popular name sub-tokens but fail due to inefficient\nrepresentation techniques. Generation-based methods struggle with inaccurate\nsimilarity calculations and immature name generation. Based on these findings,\nwe propose improvements using contrastive learning and large language models\n(LLMs). Our study suggests that significant improvements are needed before\nthese DL approaches can be effectively applied to real-world software systems.",
    "arxiv_id": "2501.12617v1"
  },
  {
    "title": "Adaptive Data Exploitation in Deep Reinforcement Learning",
    "abstract": "We introduce ADEPT: Adaptive Data ExPloiTation, a simple yet powerful\nframework to enhance the **data efficiency** and **generalization** in deep\nreinforcement learning (RL). Specifically, ADEPT adaptively manages the use of\nsampled data across different learning stages via multi-armed bandit (MAB)\nalgorithms, optimizing data utilization while mitigating overfitting. Moreover,\nADEPT can significantly reduce the computational overhead and accelerate a wide\nrange of RL algorithms. We test ADEPT on benchmarks including Procgen,\nMiniGrid, and PyBullet. Extensive simulation demonstrates that ADEPT can\nachieve superior performance with remarkable computational efficiency, offering\na practical solution to data-efficient RL. Our code is available at\nhttps://github.com/yuanmingqi/ADEPT.",
    "arxiv_id": "2501.12620v1"
  },
  {
    "title": "Towards Robust Multi-tab Website Fingerprinting",
    "abstract": "Website fingerprinting enables an eavesdropper to determine which websites a\nuser is visiting over an encrypted connection. State-of-the-art website\nfingerprinting (WF) attacks have demonstrated effectiveness even against\nTor-protected network traffic. However, existing WF attacks have critical\nlimitations on accurately identifying websites in multi-tab browsing sessions,\nwhere the holistic pattern of individual websites is no longer preserved, and\nthe number of tabs opened by a client is unknown a priori. In this paper, we\npropose ARES, a novel WF framework natively designed for multi-tab WF attacks.\nARES formulates the multi-tab attack as a multi-label classification problem\nand solves it using the novel Transformer-based models. Specifically, ARES\nextracts local patterns based on multi-level traffic aggregation features and\nutilizes the improved self-attention mechanism to analyze the correlations\nbetween these local patterns, effectively identifying websites. We implement a\nprototype of ARES and extensively evaluate its effectiveness using our\nlarge-scale datasets collected over multiple months. The experimental results\nillustrate that ARES achieves optimal performance in several realistic\nscenarios. Further, ARES remains robust even against various WF defenses.",
    "arxiv_id": "2501.12622v1"
  },
  {
    "title": "Inverse Reinforcement Learning with Switching Rewards and History\n  Dependency for Characterizing Animal Behaviors",
    "abstract": "Traditional approaches to studying decision-making in neuroscience focus on\nsimplified behavioral tasks where animals perform repetitive, stereotyped\nactions to receive explicit rewards. While informative, these methods constrain\nour understanding of decision-making to short timescale behaviors driven by\nexplicit goals. In natural environments, animals exhibit more complex,\nlong-term behaviors driven by intrinsic motivations that are often\nunobservable. Recent works in time-varying inverse reinforcement learning (IRL)\naim to capture shifting motivations in long-term, freely moving behaviors.\nHowever, a crucial challenge remains: animals make decisions based on their\nhistory, not just their current state. To address this, we introduce SWIRL\n(SWitching IRL), a novel framework that extends traditional IRL by\nincorporating time-varying, history-dependent reward functions. SWIRL models\nlong behavioral sequences as transitions between short-term decision-making\nprocesses, each governed by a unique reward function. SWIRL incorporates\nbiologically plausible history dependency to capture how past decisions and\nenvironmental contexts shape behavior, offering a more accurate description of\nanimal decision-making. We apply SWIRL to simulated and real-world animal\nbehavior datasets and show that it outperforms models lacking history\ndependency, both quantitatively and qualitatively. This work presents the first\nIRL model to incorporate history-dependent policies and rewards to advance our\nunderstanding of complex, naturalistic decision-making in animals.",
    "arxiv_id": "2501.12633v1"
  },
  {
    "title": "Dynamics of Toxicity in Political Podcasts",
    "abstract": "Toxicity in digital media poses significant challenges, yet little attention\nhas been given to its dynamics within the rapidly growing medium of podcasts.\nThis paper addresses this gap by analyzing political podcast data to study the\nemergence and propagation of toxicity, focusing on conversation\nchains-structured reply patterns within podcast transcripts. Leveraging\nstate-of-the-art transcription models and advanced conversational analysis\ntechniques, we systematically examine toxic discourse in over 30 popular\npolitical podcasts in the United States. Our key contributions include: (1)\ncreating a comprehensive dataset of transcribed and diarized political\npodcasts, identifying thousands of toxic instances using Google's Perspective\nAPI, (2) uncovering concerning trends where a majority of episodes contain at\nleast one toxic instance, (3) introducing toxic conversation chains and\nanalyzing their structural and linguistic properties, revealing characteristics\nsuch as longer durations, repetitive patterns, figurative language, and\nemotional cues tied to anger and annoyance, (4) identifying demand-related\nwords like 'want', 'like', and 'know' as precursors to toxicity, and (5)\ndeveloping predictive models to anticipate toxicity shifts based on annotated\nchange points. Our findings provide critical insights into podcast toxicity and\nestablish a foundation for future research on real-time monitoring and\nintervention mechanisms to foster healthier discourse in this influential\nmedium.",
    "arxiv_id": "2501.12640v1"
  },
  {
    "title": "The potential -- and the pitfalls -- of using pre-trained language\n  models as cognitive science theories",
    "abstract": "Many studies have evaluated the cognitive alignment of Pre-trained Language\nModels (PLMs), i.e., their correspondence to adult performance across a range\nof cognitive domains. Recently, the focus has expanded to the developmental\nalignment of these models: identifying phases during training where\nimprovements in model performance track improvements in children's thinking\nover development. However, there are many challenges to the use of PLMs as\ncognitive science theories, including different architectures, different\ntraining data modalities and scales, and limited model interpretability. In\nthis paper, we distill lessons learned from treating PLMs, not as engineering\nartifacts but as cognitive science and developmental science models. We review\nassumptions used by researchers to map measures of PLM performance to measures\nof human performance. We identify potential pitfalls of this approach to\nunderstanding human thinking, and we end by enumerating criteria for using PLMs\nas credible accounts of cognition and cognitive development.",
    "arxiv_id": "2501.12651v1"
  },
  {
    "title": "NBDI: A Simple and Efficient Termination Condition for Skill Extraction\n  from Task-Agnostic Demonstrations",
    "abstract": "Intelligent agents are able to make decisions based on different levels of\ngranularity and duration. Recent advances in skill learning enabled the agent\nto solve complex, long-horizon tasks by effectively guiding the agent in\nchoosing appropriate skills. However, the practice of using fixed-length skills\ncan easily result in skipping valuable decision points, which ultimately limits\nthe potential for further exploration and faster policy learning. In this work,\nwe propose to learn a simple and efficient termination condition that\nidentifies decision points through a state-action novelty module that leverages\nagent experience data. Our approach, Novelty-based Decision Point\nIdentification (NBDI), outperforms previous baselines in complex, long-horizon\ntasks, and remains effective even in the presence of significant variations in\nthe environment configurations of downstream tasks, highlighting the importance\nof decision point identification in skill learning.",
    "arxiv_id": "2501.12668v1"
  },
  {
    "title": "Growth strategies for arbitrary DAG neural architectures",
    "abstract": "Deep learning has shown impressive results obtained at the cost of training\nhuge neural networks. However, the larger the architecture, the higher the\ncomputational, financial, and environmental costs during training and\ninference. We aim at reducing both training and inference durations. We focus\non Neural Architecture Growth, which can increase the size of a small model\nwhen needed, directly during training using information from the\nbackpropagation. We expand existing work and freely grow neural networks in the\nform of any Directed Acyclic Graph by reducing expressivity bottlenecks in the\narchitecture. We explore strategies to reduce excessive computations and steer\nnetwork growth toward more parameter-efficient architectures.",
    "arxiv_id": "2501.12690v1"
  },
  {
    "title": "EvidenceMap: Unleashing the Power of Small Language Models with Evidence\n  Analysis for Biomedical Question Answering",
    "abstract": "Current LLM-based approaches improve question answering performance by\nleveraging the internal reasoning abilities of models or incorporating external\nknowledge. However, when humans address professional problems, it is essential\nto explicitly analyze the multifaceted relationships from multiple pieces and\ndiverse sources of evidence to achieve better answers. In this study, we\npropose a novel generative question answering framework for the biomedical\ndomain, named EvidenceMap, which explicitly learns and incorporates evidence\nanalysis with small language models (SLMs). The framework describes an evidence\nmap for each question and fully utilizes an SLM to derive the representation of\nthe supportive evaluation, the logical correlation, and the summarization of\nthe related evidence, which facilitates an analysis-augmented generation with\nanother SLM in an autoregressive way. Extensive experiments have shown that\nintroducing an evidence analysis learning process can significantly outperform\nlarger models and popular LLM reasoning methods.",
    "arxiv_id": "2501.12746v1"
  },
  {
    "title": "NExtLong: Toward Effective Long-Context Training without Long Documents",
    "abstract": "Large language models (LLMs) with extended context windows have made\nsignificant strides yet remain a challenge due to the scarcity of long\ndocuments. Existing methods tend to synthesize long-context data but lack a\nclear mechanism to reinforce the long-range dependency modeling. To address\nthis limitation, we propose NExtLong, a novel framework for synthesizing\nlong-context data through Negative document Extension. NExtLong decomposes a\ndocument into multiple meta-chunks and extends the context by interleaving hard\nnegative distractors retrieved from pretraining corpora. This approach compels\nthe model to discriminate long-range dependent context from distracting\ncontent, enhancing its ability to model long-range dependencies. Extensive\nexperiments demonstrate that NExtLong achieves significant performance\nimprovements on the HELMET and RULER benchmarks compared to existing\nlong-context synthesis approaches and leading models, which are trained on\nnon-synthetic long documents. These findings highlight NExtLong's ability to\nreduce reliance on non-synthetic long documents, making it an effective\nframework for developing advanced long-context LLMs.",
    "arxiv_id": "2501.12766v1"
  },
  {
    "title": "Revisit Self-Debugging with Self-Generated Tests for Code Generation",
    "abstract": "Large language models (LLMs) have shown significant advancements in code\ngeneration, but still face challenges on tasks beyond their basic capabilities.\nRecently, the notion of self-debugging has been proposed to boost the\nperformance of code generation by leveraging execution feedback from tests.\nDespite its promise, the availability of high-quality tests in real-world\nscenarios is limited. In this context, self-debugging with self-generated tests\nis a promising solution but lacks a full exploration of its limitations and\npractical potential. Therefore, we investigate its efficacy on diverse\nprogramming problems. To deepen our understanding, we propose two distinct\nparadigms for the process: post-execution and in-execution self-debugging.\nWithin the scope of self-contained Python programming tasks, we find that\npost-execution self-debugging struggles on basic problems but shows potential\nfor improvement on competitive ones, due to the bias introduced by\nself-generated tests. On the other hand, in-execution self-debugging enables\nLLMs to mitigate the bias by solely leveraging intermediate states during\nexecution, thereby enhancing code generation.",
    "arxiv_id": "2501.12793v1"
  },
  {
    "title": "Unveiling Zero-Space Detection: A Novel Framework for Autonomous\n  Ransomware Identification in High-Velocity Environments",
    "abstract": "Modern cybersecurity landscapes increasingly demand sophisticated detection\nframeworks capable of identifying evolving threats with precision and\nadaptability. The proposed Zero-Space Detection framework introduces a novel\napproach that dynamically identifies latent behavioral patterns through\nunsupervised clustering and advanced deep learning techniques. Designed to\naddress the limitations of signature-based and heuristic methods, it operates\neffectively in high-velocity environments by integrating multi-phase filtering\nand ensemble learning for refined decision-making. Experimental evaluation\nreveals high detection rates across diverse ransomware families, including\nLockBit, Conti, REvil, and BlackMatter, while maintaining low false positive\nrates and scalable performance. Computational overhead remains minimal, with\naverage processing times ensuring compatibility with real-time systems even\nunder peak operational loads. The framework demonstrates resilience against\nadversarial strategies such as obfuscation and encryption speed variability,\nwhich frequently challenge conventional detection systems. Analysis across\nmultiple data sources highlights its versatility in handling diverse file types\nand operational contexts. Comprehensive metrics, including detection\nprobability, latency, and resource efficiency, validate its efficacy under\nreal-world conditions. Through its modular architecture, the framework achieves\nseamless integration with existing cybersecurity infrastructures without\nsignificant reconfiguration. The results demonstrate its robustness and\nscalability, offering a transformative paradigm for ransomware identification\nin dynamic and resource-constrained environments.",
    "arxiv_id": "2501.12811v1"
  },
  {
    "title": "To Measure or Not: A Cost-Sensitive, Selective Measuring Environment for\n  Agricultural Management Decisions with Reinforcement Learning",
    "abstract": "Farmers rely on in-field observations to make well-informed crop management\ndecisions to maximize profit and minimize adverse environmental impact.\nHowever, obtaining real-world crop state measurements is labor-intensive,\ntime-consuming and expensive. In most cases, it is not feasible to gather crop\nstate measurements before every decision moment. Moreover, in previous research\npertaining to farm management optimization, these observations are often\nassumed to be readily available without any cost, which is unrealistic. Hence,\nenabling optimization without the need to have temporally complete crop state\nobservations is important. An approach to that problem is to include measuring\nas part of decision making. As a solution, we apply reinforcement learning (RL)\nto recommend opportune moments to simultaneously measure crop features and\napply nitrogen fertilizer. With realistic considerations, we design an RL\nenvironment with explicit crop feature measuring costs. While balancing costs,\nwe find that an RL agent, trained with recurrent PPO, discovers adaptive\nmeasuring policies that follow critical crop development stages, with results\naligned by what domain experts would consider a sensible approach. Our results\nhighlight the importance of measuring when crop feature measurements are not\nreadily available.",
    "arxiv_id": "2501.12823v1"
  },
  {
    "title": "GAMED-Snake: Gradient-aware Adaptive Momentum Evolution Deep Snake Model\n  for Multi-organ Segmentation",
    "abstract": "Multi-organ segmentation is a critical yet challenging task due to complex\nanatomical backgrounds, blurred boundaries, and diverse morphologies. This\nstudy introduces the Gradient-aware Adaptive Momentum Evolution Deep Snake\n(GAMED-Snake) model, which establishes a novel paradigm for contour-based\nsegmentation by integrating gradient-based learning with adaptive momentum\nevolution mechanisms. The GAMED-Snake model incorporates three major\ninnovations: First, the Distance Energy Map Prior (DEMP) generates a\npixel-level force field that effectively attracts contour points towards the\ntrue boundaries, even in scenarios with complex backgrounds and blurred edges.\nSecond, the Differential Convolution Inception Module (DCIM) precisely extracts\ncomprehensive energy gradients, significantly enhancing segmentation accuracy.\nThird, the Adaptive Momentum Evolution Mechanism (AMEM) employs cross-attention\nto establish dynamic features across different iterations of evolution,\nenabling precise boundary alignment for diverse morphologies. Experimental\nresults on four challenging multi-organ segmentation datasets demonstrate that\nGAMED-Snake improves the mDice metric by approximately 2% compared to\nstate-of-the-art methods. Code will be available at\nhttps://github.com/SYSUzrc/GAMED-Snake.",
    "arxiv_id": "2501.12844v1"
  },
  {
    "title": "As Confidence Aligns: Exploring the Effect of AI Confidence on Human\n  Self-confidence in Human-AI Decision Making",
    "abstract": "Complementary collaboration between humans and AI is essential for human-AI\ndecision making. One feasible approach to achieving it involves accounting for\nthe calibrated confidence levels of both AI and users. However, this process\nwould likely be made more difficult by the fact that AI confidence may\ninfluence users' self-confidence and its calibration. To explore these\ndynamics, we conducted a randomized behavioral experiment. Our results indicate\nthat in human-AI decision-making, users' self-confidence aligns with AI\nconfidence and such alignment can persist even after AI ceases to be involved.\nThis alignment then affects users' self-confidence calibration. We also found\nthe presence of real-time correctness feedback of decisions reduced the degree\nof alignment. These findings suggest that users' self-confidence is not\nindependent of AI confidence, which practitioners aiming to achieve better\nhuman-AI collaboration need to be aware of. We call for research focusing on\nthe alignment of human cognition and behavior with AI.",
    "arxiv_id": "2501.12868v1"
  },
  {
    "title": "Drone Carrier: An Integrated Unmanned Surface Vehicle for Autonomous\n  Inspection and Intervention in GNSS-Denied Maritime Environment",
    "abstract": "This paper introduces an innovative drone carrier concept that is applied in\nmaritime port security or offshore rescue. This system works with a\nheterogeneous system consisting of multiple Unmanned Aerial Vehicles (UAVs) and\nUnmanned Surface Vehicles (USVs) to perform inspection and intervention tasks\nin GNSS-denied or interrupted environments. The carrier, an electric catamaran\nmeasuring 4m by 7m, features a 4m by 6m deck supporting automated takeoff and\nlanding for four DJI M300 drones, along with a 10kg-payload manipulator\noperable in up to level 3 sea conditions. Utilizing an offshore gimbal camera\nfor navigation, the carrier can autonomously navigate, approach and dock with\nnon-cooperative vessels, guided by an onboard camera, LiDAR, and Doppler\nVelocity Log (DVL) over a 3 km$^2$ area. UAVs equipped with onboard\nUltra-Wideband (UWB) technology execute mapping, detection, and manipulation\ntasks using a versatile gripper designed for wet, saline conditions.\nAdditionally, two UAVs can coordinate to transport large objects to the\nmanipulator or interact directly with them. These procedures are fully\nautomated and were successfully demonstrated at the Mohammed Bin Zayed\nInternational Robotic Competition (MBZIRC2024), where the drone carrier\nequipped with four UAVS and one manipulator, automatically accomplished the\nintervention tasks in sea-level-3 (wave height 1.25m) based on the rough target\ninformation.",
    "arxiv_id": "2501.12869v1"
  },
  {
    "title": "Reinforcement learning Based Automated Design of Differential Evolution\n  Algorithm for Black-box Optimization",
    "abstract": "Differential evolution (DE) algorithm is recognized as one of the most\neffective evolutionary algorithms, demonstrating remarkable efficacy in\nblack-box optimization due to its derivative-free nature. Numerous enhancements\nto the fundamental DE have been proposed, incorporating innovative mutation\nstrategies and sophisticated parameter tuning techniques to improve\nperformance. However, no single variant has proven universally superior across\nall problems. To address this challenge, we introduce a novel framework that\nemploys reinforcement learning (RL) to automatically design DE for black-box\noptimization through meta-learning. RL acts as an advanced meta-optimizer,\ngenerating a customized DE configuration that includes an optimal\ninitialization strategy, update rule, and hyperparameters tailored to a\nspecific black-box optimization problem. This process is informed by a detailed\nanalysis of the problem characteristics. In this proof-of-concept study, we\nutilize a double deep Q-network for implementation, considering a subset of 40\npossible strategy combinations and parameter optimizations simultaneously. The\nframework's performance is evaluated against black-box optimization benchmarks\nand compared with state-of-the-art algorithms. The experimental results\nhighlight the promising potential of our proposed framework.",
    "arxiv_id": "2501.12881v1"
  },
  {
    "title": "Learning Graph Node Embeddings by Smooth Pair Sampling",
    "abstract": "Random walk-based node embedding algorithms have attracted a lot of attention\ndue to their scalability and ease of implementation. Previous research has\nfocused on different walk strategies, optimization objectives, and embedding\nlearning models. Inspired by observations on real data, we take a different\napproach and propose a new regularization technique. More precisely, the\nfrequencies of node pairs generated by the skip-gram model on random walk node\nsequences follow a highly skewed distribution which causes learning to be\ndominated by a fraction of the pairs. We address the issue by designing an\nefficient sampling procedure that generates node pairs according to their {\\em\nsmoothed frequency}. Theoretical and experimental results demonstrate the\nadvantages of our approach.",
    "arxiv_id": "2501.12884v1"
  },
  {
    "title": "Architectural Fusion Through Contextual Partitioning in Large Language\n  Models: A Novel Approach to Parameterized Knowledge Integration",
    "abstract": "Contextual Partitioning introduces an innovative approach to enhancing the\narchitectural design of large-scale computational models through the dynamic\nsegmentation of parameters into context-aware regions. This methodology\nemphasizes the importance of task-specific specialization, achieved through\nadaptive parameter allocation mechanisms that align with the linguistic\nfeatures of input data. Experimental evaluations demonstrated substantial\nimprovements in accuracy, perplexity, and contextual coherence across a variety\nof linguistic tasks, highlighting the adaptability and scalability of the\nproposed framework. By reducing redundancy and enhancing computational\nefficiency, Contextual Partitioning not only streamlines model operations but\nalso expands the scope of applications for advanced language processing\nsystems. The approach operates autonomously, requiring no external fine-tuning,\nthereby addressing a significant limitation in conventional parameter\noptimization techniques. Empirical results demonstrate the effectiveness of\ngradient-driven segmentation, enabling models to dynamically recalibrate and\nspecialize in response to task-specific demands. Furthermore, resource\nutilization metrics reveal notable reductions in memory usage and training\ntimes, confirming the efficiency of the approach. Observations from qualitative\nanalyses illustrate improved contextual coherence and logical flow in generated\noutputs, reinforcing the practical value of this technique. The findings\ncollectively demonstrate the potential for Contextual Partitioning to redefine\nthe scalability and adaptability of computational language architectures in\ndiverse and complex domains.",
    "arxiv_id": "2501.12901v1"
  },
  {
    "title": "A Novel Tracking Framework for Devices in X-ray Leveraging Supplementary\n  Cue-Driven Self-Supervised Features",
    "abstract": "To restore proper blood flow in blocked coronary arteries via angioplasty\nprocedure, accurate placement of devices such as catheters, balloons, and\nstents under live fluoroscopy or diagnostic angiography is crucial. Identified\nballoon markers help in enhancing stent visibility in X-ray sequences, while\nthe catheter tip aids in precise navigation and co-registering vessel\nstructures, reducing the need for contrast in angiography. However, accurate\ndetection of these devices in interventional X-ray sequences faces significant\nchallenges, particularly due to occlusions from contrasted vessels and other\ndevices and distractions from surrounding, resulting in the failure to track\nsuch small objects. While most tracking methods rely on spatial correlation of\npast and current appearance, they often lack strong motion comprehension\nessential for navigating through these challenging conditions, and fail to\neffectively detect multiple instances in the scene. To overcome these\nlimitations, we propose a self-supervised learning approach that enhances its\nspatio-temporal understanding by incorporating supplementary cues and learning\nacross multiple representation spaces on a large dataset. Followed by that, we\nintroduce a generic real-time tracking framework that effectively leverages the\npretrained spatio-temporal network and also takes the historical appearance and\ntrajectory data into account. This results in enhanced localization of multiple\ninstances of device landmarks. Our method outperforms state-of-the-art methods\nin interventional X-ray device tracking, especially stability and robustness,\nachieving an 87% reduction in max error for balloon marker detection and a 61%\nreduction in max error for catheter tip detection.",
    "arxiv_id": "2501.12958v1"
  },
  {
    "title": "Accessible Smart Contracts Verification: Synthesizing Formal Models with\n  Tamed LLMs",
    "abstract": "When blockchain systems are said to be trustless, what this really means is\nthat all the trust is put into software. Thus, there are strong incentives to\nensure blockchain software is correct -- vulnerabilities here cost millions and\nbreak businesses. One of the most powerful ways of establishing software\ncorrectness is by using formal methods. Approaches based on formal methods,\nhowever, induce a significant overhead in terms of time and expertise required\nto successfully employ them. Our work addresses this critical disadvantage by\nautomating the creation of a formal model -- a mathematical abstraction of the\nsoftware system -- which is often a core task when employing formal methods. We\nperform model synthesis in three phases: we first transpile the code into model\nstubs; then we \"fill in the blanks\" using a large language model (LLM);\nfinally, we iteratively repair the generated model, on both syntactical and\nsemantical level. In this way, we significantly reduce the amount of time\nnecessary to create formal models and increase accessibility of valuable\nsoftware verification methods that rely on them. The practical context of our\nwork was reducing the time-to-value of using formal models for correctness\naudits of smart contracts.",
    "arxiv_id": "2501.12972v1"
  },
  {
    "title": "Ehrenfeucht-Haussler Rank and Chain of Thought",
    "abstract": "The notion of rank of a Boolean function has been a cornerstone in the theory\nof PAC learning, enabling quasipolynomial-time learning algorithms for\npolynomial-size decision trees. We present a novel characterization of rank,\ngrounded in the well-known Transformer architecture. We show that the rank of a\nfunction $f$ corresponds to the minimum number of Chain of Thought (CoT) steps\nrequired by a single-layer transformer decoder with hard attention to compute\n$f$. Based on this characterization we establish tight bounds on the number of\nCoT steps required for specific problems, showing that $\\ell$-fold function\ncomposition necessitates exactly $\\ell$ CoT steps. Furthermore, we analyze the\nproblem of identifying the position of the $k$-th occurrence of 1 in a Boolean\nsequence, proving that it requires $k$ CoT steps.",
    "arxiv_id": "2501.12997v1"
  },
  {
    "title": "MONA: Myopic Optimization with Non-myopic Approval Can Mitigate\n  Multi-step Reward Hacking",
    "abstract": "Future advanced AI systems may learn sophisticated strategies through\nreinforcement learning (RL) that humans cannot understand well enough to safely\nevaluate. We propose a training method which avoids agents learning undesired\nmulti-step plans that receive high reward (multi-step \"reward hacks\") even if\nhumans are not able to detect that the behaviour is undesired. The method,\nMyopic Optimization with Non-myopic Approval (MONA), works by combining\nshort-sighted optimization with far-sighted reward. We demonstrate that MONA\ncan prevent multi-step reward hacking that ordinary RL causes, even without\nbeing able to detect the reward hacking and without any extra information that\nordinary RL does not get access to. We study MONA empirically in three settings\nwhich model different misalignment failure modes including 2-step environments\nwith LLMs representing delegated oversight and encoded reasoning and\nlonger-horizon gridworld environments representing sensor tampering.",
    "arxiv_id": "2501.13011v1"
  },
  {
    "title": "Provably-Safe Neural Network Training Using Hybrid Zonotope Reachability\n  Analysis",
    "abstract": "Even though neural networks are being increasingly deployed in\nsafety-critical applications, it remains difficult to enforce constraints on\ntheir output, meaning that it is hard to guarantee safety in such settings.\nTowards addressing this, many existing methods seek to verify a neural\nnetwork's satisfaction of safety constraints, but do not address how to correct\nan \"unsafe\" network. On the other hand, the few works that extract a training\nsignal from verification cannot handle non-convex sets, and are either\nconservative or slow. To address these challenges, this work proposes a neural\nnetwork training method that can encourage the exact reachable set of a\nnon-convex input set through a neural network with rectified linear unit (ReLU)\nnonlinearities to avoid a non-convex unsafe region, using recent results in\nnon-convex set representation with hybrid zonotopes and extracting gradient\ninformation from mixed-integer linear programs (MILPs). The proposed method is\nfast, with the computational complexity of each training iteration comparable\nto that of solving a linear program (LP) with number of dimensions and\nconstraints linear to the number of neurons and complexity of input and unsafe\nsets. For a neural network with three hidden layers of width 30, the method was\nable to drive the reachable set of a non-convex input set with 55 generators\nand 26 constraints out of a non-convex unsafe region with 21 generators and 11\nconstraints in 490 seconds.",
    "arxiv_id": "2501.13023v1"
  },
  {
    "title": "AdaWM: Adaptive World Model based Planning for Autonomous Driving",
    "abstract": "World model based reinforcement learning (RL) has emerged as a promising\napproach for autonomous driving, which learns a latent dynamics model and uses\nit to train a planning policy. To speed up the learning process, the\npretrain-finetune paradigm is often used, where online RL is initialized by a\npretrained model and a policy learned offline. However, naively performing such\ninitialization in RL may result in dramatic performance degradation during the\nonline interactions in the new task. To tackle this challenge, we first analyze\nthe performance degradation and identify two primary root causes therein: the\nmismatch of the planning policy and the mismatch of the dynamics model, due to\ndistribution shift. We further analyze the effects of these factors on\nperformance degradation during finetuning, and our findings reveal that the\nchoice of finetuning strategies plays a pivotal role in mitigating these\neffects. We then introduce AdaWM, an Adaptive World Model based planning\nmethod, featuring two key steps: (a) mismatch identification, which quantifies\nthe mismatches and informs the finetuning strategy, and (b) alignment-driven\nfinetuning, which selectively updates either the policy or the model as needed\nusing efficient low-rank updates. Extensive experiments on the challenging\nCARLA driving tasks demonstrate that AdaWM significantly improves the\nfinetuning process, resulting in more robust and efficient performance in\nautonomous driving systems.",
    "arxiv_id": "2501.13072v1"
  },
  {
    "title": "Attention-Driven Hierarchical Reinforcement Learning with Particle\n  Filtering for Source Localization in Dynamic Fields",
    "abstract": "In many real-world scenarios, such as gas leak detection or environmental\npollutant tracking, solving the Inverse Source Localization and\nCharacterization problem involves navigating complex, dynamic fields with\nsparse and noisy observations. Traditional methods face significant challenges,\nincluding partial observability, temporal and spatial dynamics,\nout-of-distribution generalization, and reward sparsity. To address these\nissues, we propose a hierarchical framework that integrates Bayesian inference\nand reinforcement learning. The framework leverages an attention-enhanced\nparticle filtering mechanism for efficient and accurate belief updates, and\nincorporates two complementary execution strategies: Attention Particle\nFiltering Planning and Attention Particle Filtering Reinforcement Learning.\nThese approaches optimize exploration and adaptation under uncertainty.\nTheoretical analysis proves the convergence of the attention-enhanced particle\nfilter, while extensive experiments across diverse scenarios validate the\nframework's superior accuracy, adaptability, and computational efficiency. Our\nresults highlight the framework's potential for broad applications in dynamic\nfield estimation tasks.",
    "arxiv_id": "2501.13084v1"
  },
  {
    "title": "Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at\n  CHI through a Systematic Literature Review",
    "abstract": "Large language models (LLMs) have been positioned to revolutionize HCI, by\nreshaping not only the interfaces, design patterns, and sociotechnical systems\nthat we study, but also the research practices we use. To-date, however, there\nhas been little understanding of LLMs' uptake in HCI. We address this gap via a\nsystematic literature review of 153 CHI papers from 2020-24 that engage with\nLLMs. We taxonomize: (1) domains where LLMs are applied; (2) roles of LLMs in\nHCI projects; (3) contribution types; and (4) acknowledged limitations and\nrisks. We find LLM work in 10 diverse domains, primarily via empirical and\nartifact contributions. Authors use LLMs in five distinct roles, including as\nresearch tools or simulated users. Still, authors often raise validity and\nreproducibility concerns, and overwhelmingly study closed models. We outline\nopportunities to improve HCI research with and on LLMs, and provide guiding\nquestions for researchers to consider the validity and appropriateness of\nLLM-related work.",
    "arxiv_id": "2501.12557v1"
  },
  {
    "title": "FedGrAINS: Personalized SubGraph Federated Learning with Adaptive\n  Neighbor Sampling",
    "abstract": "Graphs are crucial for modeling relational and biological data. As datasets\ngrow larger in real-world scenarios, the risk of exposing sensitive information\nincreases, making privacy-preserving training methods like federated learning\n(FL) essential to ensure data security and compliance with privacy regulations.\nRecently proposed personalized subgraph FL methods have become the de-facto\nstandard for training personalized Graph Neural Networks (GNNs) in a federated\nmanner while dealing with the missing links across clients' subgraphs due to\nprivacy restrictions. However, personalized subgraph FL faces significant\nchallenges due to the heterogeneity in client subgraphs, such as degree\ndistributions among the nodes, which complicate federated training of graph\nmodels. To address these challenges, we propose \\textit{FedGrAINS}, a novel\ndata-adaptive and sampling-based regularization method for subgraph FL.\nFedGrAINS leverages generative flow networks (GFlowNets) to evaluate node\nimportance concerning clients' tasks, dynamically adjusting the message-passing\nstep in clients' GNNs. This adaptation reflects task-optimized sampling aligned\nwith a trajectory balance objective. Experimental results demonstrate that the\ninclusion of \\textit{FedGrAINS} as a regularizer consistently improves the FL\nperformance compared to baselines that do not leverage such regularization.",
    "arxiv_id": "2501.12592v1"
  },
  {
    "title": "HEPPO: Hardware-Efficient Proximal Policy Optimization -- A Universal\n  Pipelined Architecture for Generalized Advantage Estimation",
    "abstract": "This paper introduces HEPPO, an FPGA-based accelerator designed to optimize\nthe Generalized Advantage Estimation (GAE) stage in Proximal Policy\nOptimization (PPO). Unlike previous approaches that focused on trajectory\ncollection and actor-critic updates, HEPPO addresses GAE's computational\ndemands with a parallel, pipelined architecture implemented on a single\nSystem-on-Chip (SoC). This design allows for the adaptation of various hardware\naccelerators tailored for different PPO phases. A key innovation is our\nstrategic standardization technique, which combines dynamic reward\nstandardization and block standardization for values, followed by 8-bit uniform\nquantization. This method stabilizes learning, enhances performance, and\nmanages memory bottlenecks, achieving a 4x reduction in memory usage and a 1.5x\nincrease in cumulative rewards. We propose a solution on a single SoC device\nwith programmable logic and embedded processors, delivering throughput orders\nof magnitude higher than traditional CPU-GPU systems. Our single-chip solution\nminimizes communication latency and throughput bottlenecks, significantly\nboosting PPO training efficiency. Experimental results show a 30% increase in\nPPO speed and a substantial reduction in memory access time, underscoring\nHEPPO's potential for broad applicability in hardware-efficient reinforcement\nlearning algorithms.",
    "arxiv_id": "2501.12703v1"
  },
  {
    "title": "Practical quantum federated learning and its experimental demonstration",
    "abstract": "Federated learning is essential for decentralized, privacy-preserving model\ntraining in the data-driven era. Quantum-enhanced federated learning leverages\nquantum resources to address privacy and scalability challenges, offering\nsecurity and efficiency advantages beyond classical methods. However, practical\nand scalable frameworks addressing privacy concerns in the quantum computing\nera remain undeveloped. Here, we propose a practical quantum federated learning\nframework on quantum networks, utilizing distributed quantum secret keys to\nprotect local model updates and enable secure aggregation with\ninformation-theoretic security. We experimentally validate our framework on a\n4-client quantum network with a scalable structure. Extensive numerical\nexperiments on both quantum and classical datasets show that adding a quantum\nclient significantly enhances the trained global model's ability to classify\nmultipartite entangled and non-stabilizer quantum datasets. Simulations further\ndemonstrate scalability to 200 clients with classical models trained on the\nMNIST dataset, reducing communication costs by $75\\%$ through advanced model\ncompression techniques and achieving rapid training convergence. Our work\nprovides critical insights for building scalable, efficient, and quantum-secure\nmachine learning systems for the coming quantum internet era.",
    "arxiv_id": "2501.12709v1"
  },
  {
    "title": "A Call for Critically Rethinking and Reforming Data Analysis in\n  Empirical Software Engineering",
    "abstract": "Context: Empirical Software Engineering (ESE) drives innovation in SE through\nqualitative and quantitative studies. However, concerns about the correct\napplication of empirical methodologies have existed since the 2006 Dagstuhl\nseminar on SE. Objective: To analyze three decades of SE research, identify\nmistakes in statistical methods, and evaluate experts' ability to detect and\naddress these issues. Methods: We conducted a literature survey of ~27,000\nempirical studies, using LLMs to classify statistical methodologies as adequate\nor inadequate. Additionally, we selected 30 primary studies and held a workshop\nwith 33 ESE experts to assess their ability to identify and resolve statistical\nissues. Results: Significant statistical issues were found in the primary\nstudies, and experts showed limited ability to detect and correct these\nmethodological problems, raising concerns about the broader ESE community's\nproficiency in this area. Conclusions. Despite our study's eventual\nlimitations, its results shed light on recurring issues from promoting\ninformation copy-and-paste from past authors' works and the continuous\npublication of inadequate approaches that promote dubious results and\njeopardize the spread of the correct statistical strategies among researchers.\nBesides, it justifies further investigation into empirical rigor in software\nengineering to expose these recurring issues and establish a framework for\nreassessing our field's foundation of statistical methodology application.\nTherefore, this work calls for critically rethinking and reforming data\nanalysis in empirical software engineering, paving the way for our work soon.",
    "arxiv_id": "2501.12728v1"
  },
  {
    "title": "Estimating the Conformal Prediction Threshold from Noisy Labels",
    "abstract": "Conformal Prediction (CP) is a method to control prediction uncertainty by\nproducing a small prediction set, ensuring a predetermined probability that the\ntrue class lies within this set. This is commonly done by defining a score,\nbased on the model predictions, and setting a threshold on this score using a\nvalidation set. In this study, we address the problem of CP calibration when we\nonly have access to a validation set with noisy labels. We show how we can\nestimate the noise-free conformal threshold based on the noisy labeled data.\nOur solution is flexible and can accommodate various modeling assumptions\nregarding the label contamination process, without needing any information\nabout the underlying data distribution or the internal mechanisms of the\nmachine learning classifier. We develop a coverage guarantee for uniform noise\nthat is effective even in tasks with a large number of classes. We dub our\napproach Noise-Aware Conformal Prediction (NACP) and show on several natural\nand medical image classification datasets, including ImageNet, that it\nsignificantly outperforms current noisy label methods and achieves results\ncomparable to those obtained with a clean validation set.",
    "arxiv_id": "2501.12749v1"
  },
  {
    "title": "On Tradeoffs in Learning-Augmented Algorithms",
    "abstract": "The field of learning-augmented algorithms has gained significant attention\nin recent years. These algorithms, using potentially inaccurate predictions,\nmust exhibit three key properties: consistency, robustness, and smoothness. In\nscenarios where distributional information about predictions is available, a\nstrong expected performance is required. Typically, the design of these\nalgorithms involves a natural tradeoff between consistency and robustness, and\nprevious works aimed to achieve Pareto-optimal tradeoffs for specific problems.\nHowever, in some settings, this comes at the expense of smoothness. This paper\ndemonstrates that certain problems involve multiple tradeoffs between\nconsistency, robustness, smoothness, and average performance.",
    "arxiv_id": "2501.12770v1"
  },
  {
    "title": "Data re-uploading in Quantum Machine Learning for time series:\n  application to traffic forecasting",
    "abstract": "Accurate traffic forecasting plays a crucial role in modern Intelligent\nTransportation Systems (ITS), as it enables real-time traffic flow management,\nreduces congestion, and improves the overall efficiency of urban transportation\nnetworks. With the rise of Quantum Machine Learning (QML), it has emerged a new\nparadigm possessing the potential to enhance predictive capabilities beyond\nwhat classical machine learning models can achieve. In the present work we\npursue a heuristic approach to explore the potential of QML, and focus on a\nspecific transport issue. In particular, as a case study we investigate a\ntraffic forecast task for a major urban area in Athens (Greece), for which we\npossess high-resolution data. In this endeavor we explore the application of\nQuantum Neural Networks (QNN), and, notably, we present the first application\nof quantum data re-uploading in the context of transport forecasting. This\ntechnique allows quantum models to better capture complex patterns, such as\ntraffic dynamics, by repeatedly encoding classical data into a quantum state.\nAside from providing a prediction model, we spend considerable effort in\ncomparing the performance of our hybrid quantum-classical neural networks with\nclassical deep learning approaches. Our results show that hybrid models achieve\ncompetitive accuracy with state-of-the-art classical methods, especially when\nthe number of qubits and re-uploading blocks is increased. While the classical\nmodels demonstrate lower computational demands, we provide evidence that\nincreasing the complexity of the quantum model improves predictive accuracy.\nThese findings indicate that QML techniques, and specifically the data\nre-uploading approach, hold promise for advancing traffic forecasting models\nand could be instrumental in addressing challenges inherent in ITS\nenvironments.",
    "arxiv_id": "2501.12776v1"
  },
  {
    "title": "Machine Learning Modeling for Multi-order Human Visual Motion Processing",
    "abstract": "Our research aims to develop machines that learn to perceive visual motion as\ndo humans. While recent advances in computer vision (CV) have enabled DNN-based\nmodels to accurately estimate optical flow in naturalistic images, a\nsignificant disparity remains between CV models and the biological visual\nsystem in both architecture and behavior. This disparity includes humans'\nability to perceive the motion of higher-order image features (second-order\nmotion), which many CV models fail to capture because of their reliance on the\nintensity conservation law. Our model architecture mimics the cortical V1-MT\nmotion processing pathway, utilizing a trainable motion energy sensor bank and\na recurrent graph network. Supervised learning employing diverse naturalistic\nvideos allows the model to replicate psychophysical and physiological findings\nabout first-order (luminance-based) motion perception. For second-order motion,\ninspired by neuroscientific findings, the model includes an additional sensing\npathway with nonlinear preprocessing before motion energy sensing, implemented\nusing a simple multilayer 3D CNN block. When exploring how the brain acquired\nthe ability to perceive second-order motion in natural environments, in which\npure second-order signals are rare, we hypothesized that second-order\nmechanisms were critical when estimating robust object motion amidst optical\nfluctuations, such as highlights on glossy surfaces. We trained our\ndual-pathway model on novel motion datasets with varying material properties of\nmoving objects. We found that training to estimate object motion from\nnon-Lambertian materials naturally endowed the model with the capacity to\nperceive second-order motion, as can humans. The resulting model effectively\naligns with biological systems while generalizing to both first- and\nsecond-order motion phenomena in natural scenes.",
    "arxiv_id": "2501.12810v1"
  },
  {
    "title": "Open or Closed LLM for Lesser-Resourced Languages? Lessons from Greek",
    "abstract": "Natural Language Processing (NLP) for lesser-resourced languages faces\npersistent challenges, including limited datasets, inherited biases from\nhigh-resource languages, and the need for domain-specific solutions. This study\naddresses these gaps for Modern Greek through three key contributions. First,\nwe evaluate the performance of open-source (Llama-70b) and closed-source\n(GPT-4o mini) large language models (LLMs) on seven core NLP tasks with dataset\navailability, revealing task-specific strengths, weaknesses, and parity in\ntheir performance. Second, we expand the scope of Greek NLP by reframing\nAuthorship Attribution as a tool to assess potential data usage by LLMs in\npre-training, with high 0-shot accuracy suggesting ethical implications for\ndata provenance. Third, we showcase a legal NLP case study, where a Summarize,\nTranslate, and Embed (STE) methodology outperforms the traditional TF-IDF\napproach for clustering \\emph{long} legal texts. Together, these contributions\nprovide a roadmap to advance NLP in lesser-resourced languages, bridging gaps\nin model evaluation, task innovation, and real-world impact.",
    "arxiv_id": "2501.12826v1"
  },
  {
    "title": "Mutation-Guided LLM-based Test Generation at Meta",
    "abstract": "This paper describes Meta's ACH system for mutation-guided LLM-based test\ngeneration. ACH generates relatively few mutants (aka simulated faults),\ncompared to traditional mutation testing. Instead, it focuses on generating\ncurrently undetected faults that are specific to an issue of concern. From\nthese currently uncaught faults, ACH generates tests that can catch them,\nthereby `killing' the mutants and consequently hardening the platform against\nregressions. We use privacy concerns to illustrate our approach, but ACH can\nharden code against {\\em any} type of regression. In total, ACH was applied to\n10,795 Android Kotlin classes in 7 software platforms deployed by Meta, from\nwhich it generated 9,095 mutants and 571 privacy-hardening test cases. ACH also\ndeploys an LLM-based equivalent mutant detection agent that achieves a\nprecision of 0.79 and a recall of 0.47 (rising to 0.95 and 0.96 with simple\npre-processing). ACH was used by Messenger and WhatsApp test-a-thons where\nengineers accepted 73% of its tests, judging 36% to privacy relevant. We\nconclude that ACH hardens code against specific concerns and that, even when\nits tests do not directly tackle the specific concern, engineers find them\nuseful for their other benefits.",
    "arxiv_id": "2501.12862v1"
  },
  {
    "title": "PreciseCam: Precise Camera Control for Text-to-Image Generation",
    "abstract": "Images as an artistic medium often rely on specific camera angles and lens\ndistortions to convey ideas or emotions; however, such precise control is\nmissing in current text-to-image models. We propose an efficient and general\nsolution that allows precise control over the camera when generating both\nphotographic and artistic images. Unlike prior methods that rely on predefined\nshots, we rely solely on four simple extrinsic and intrinsic camera parameters,\nremoving the need for pre-existing geometry, reference 3D objects, and\nmulti-view data. We also present a novel dataset with more than 57,000 images,\nalong with their text prompts and ground-truth camera parameters. Our\nevaluation shows precise camera control in text-to-image generation, surpassing\ntraditional prompt engineering approaches. Our data, model, and code are\npublicly available at https://graphics.unizar.es/projects/PreciseCam2024.",
    "arxiv_id": "2501.12910v1"
  },
  {
    "title": "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via\n  Reinforcement Learning",
    "abstract": "We introduce our first-generation reasoning models, DeepSeek-R1-Zero and\nDeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement\nlearning (RL) without supervised fine-tuning (SFT) as a preliminary step,\ndemonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero\nnaturally emerges with numerous powerful and intriguing reasoning behaviors.\nHowever, it encounters challenges such as poor readability, and language\nmixing. To address these issues and further enhance reasoning performance, we\nintroduce DeepSeek-R1, which incorporates multi-stage training and cold-start\ndata before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217\non reasoning tasks. To support the research community, we open-source\nDeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B,\n70B) distilled from DeepSeek-R1 based on Qwen and Llama.",
    "arxiv_id": "2501.12948v1"
  },
  {
    "title": "GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models",
    "abstract": "Large Language Models (LLMs) face significant deployment challenges due to\ntheir substantial resource requirements. While low-bit quantized weights can\nreduce memory usage and improve inference efficiency, current hardware lacks\nnative support for mixed-precision General Matrix Multiplication (mpGEMM),\nresulting in inefficient dequantization-based implementations. Moreover,\nuniform quantization methods often fail to capture weight distributions\nadequately, leading to performance degradation. We propose GANQ (GPU-Adaptive\nNon-Uniform Quantization), a layer-wise post-training non-uniform quantization\nframework optimized for hardware-efficient lookup table-based mpGEMM. GANQ\nachieves superior quantization performance by utilizing a training-free,\nGPU-adaptive optimization algorithm to efficiently reduce layer-wise\nquantization errors. Extensive experiments demonstrate GANQ's ability to reduce\nthe perplexity gap from the FP16 baseline compared to state-of-the-art methods\nfor both 3-bit and 4-bit quantization. Furthermore, when deployed on a single\nNVIDIA RTX 4090 GPU, GANQ's quantized models achieve up to 2.57$\\times$ speedup\nover the baseline, advancing memory and inference efficiency in LLM deployment.",
    "arxiv_id": "2501.12956v1"
  },
  {
    "title": "It's complicated. The relationship of algorithmic fairness and\n  non-discrimination regulations in the EU AI Act",
    "abstract": "What constitutes a fair decision? This question is not only difficult for\nhumans but becomes more challenging when Artificial Intelligence (AI) models\nare used. In light of discriminatory algorithmic behaviors, the EU has recently\npassed the AI Act, which mandates specific rules for AI models, incorporating\nboth traditional legal non-discrimination regulations and machine learning\nbased algorithmic fairness concepts. This paper aims to bridge these two\ndifferent concepts in the AI Act through: First a high-level introduction of\nboth concepts targeting legal and computer science-oriented scholars, and\nsecond an in-depth analysis of the AI Act's relationship between legal\nnon-discrimination regulations and algorithmic fairness. Our analysis reveals\nthree key findings: (1.), most non-discrimination regulations target only\nhigh-risk AI systems. (2.), the regulation of high-risk systems encompasses\nboth data input requirements and output monitoring, though these regulations\nare often inconsistent and raise questions of computational feasibility. (3.)\nRegulations for General Purpose AI Models, such as Large Language Models that\nare not simultaneously classified as high-risk systems, currently lack\nspecificity compared to other regulations. Based on these findings, we\nrecommend developing more specific auditing and testing methodologies for AI\nsystems. This paper aims to serve as a foundation for future interdisciplinary\ncollaboration between legal scholars and computer science-oriented machine\nlearning researchers studying discrimination in AI systems.",
    "arxiv_id": "2501.12962v1"
  },
  {
    "title": "Galois groups of polynomials and neurosymbolic networks",
    "abstract": "This paper introduces a novel approach to understanding Galois theory, one of\nthe foundational areas of algebra, through the lens of machine learning. By\nanalyzing polynomial equations with machine learning techniques, we aim to\nstreamline the process of determining solvability by radicals and explore\nbroader applications within Galois theory. This summary encapsulates the\nbackground, methodology, potential applications, and challenges of using data\nscience in Galois theory.\n  More specifically, we design a neurosymbolic network to classify Galois\ngroups and show how this is more efficient than usual neural networks. We\ndiscover some very interesting distribution of polynomials for groups not\nisomorphic to the symmetric groups and alternating groups.",
    "arxiv_id": "2501.12978v1"
  },
  {
    "title": "FlanEC: Exploring Flan-T5 for Post-ASR Error Correction",
    "abstract": "In this paper, we present an encoder-decoder model leveraging Flan-T5 for\npost-Automatic Speech Recognition (ASR) Generative Speech Error Correction\n(GenSEC), and we refer to it as FlanEC. We explore its application within the\nGenSEC framework to enhance ASR outputs by mapping n-best hypotheses into a\nsingle output sentence. By utilizing n-best lists from ASR models, we aim to\nimprove the linguistic correctness, accuracy, and grammaticality of final ASR\ntranscriptions. Specifically, we investigate whether scaling the training data\nand incorporating diverse datasets can lead to significant improvements in\npost-ASR error correction. We evaluate FlanEC using the HyPoradise dataset,\nproviding a comprehensive analysis of the model's effectiveness in this domain.\nFurthermore, we assess the proposed approach under different settings to\nevaluate model scalability and efficiency, offering valuable insights into the\npotential of instruction-tuned encoder-decoder models for this task.",
    "arxiv_id": "2501.12979v1"
  },
  {
    "title": "Paper Quality Assessment based on Individual Wisdom Metrics from Open\n  Peer Review",
    "abstract": "This study proposes a data-driven framework for enhancing the accuracy and\nefficiency of scientific peer review through an open, bottom-up process that\nestimates reviewer quality. Traditional closed peer review systems, while\nessential for quality control, are often slow, costly, and subject to biases\nthat can impede scientific progress. Here, we introduce a method that evaluates\nindividual reviewer reliability by quantifying agreement with community\nconsensus scores and applying Bayesian weighting to refine paper quality\nassessments. We analyze open peer review data from two major scientific\nconferences, and demonstrate that reviewer-specific quality scores\nsignificantly improve the reliability of paper quality estimation. Perhaps\nsurprisingly, we find that reviewer quality scores are unrelated to authorship\nquality. Our model incorporates incentive structures to recognize high-quality\nreviewers and encourage broader coverage of submitted papers, thereby\nmitigating the common \"rich-get-richer\" pitfall of social media. These findings\nsuggest that open peer review, with mechanisms for estimating and incentivizing\nreviewer quality, offers a scalable and equitable alternative for scientific\npublishing, with potential to enhance the speed, fairness, and transparency of\nthe peer review process.",
    "arxiv_id": "2501.13014v1"
  },
  {
    "title": "Optimizing Return Distributions with Distributional Dynamic Programming",
    "abstract": "We introduce distributional dynamic programming (DP) methods for optimizing\nstatistical functionals of the return distribution, with standard reinforcement\nlearning as a special case. Previous distributional DP methods could optimize\nthe same class of expected utilities as classic DP. To go beyond expected\nutilities, we combine distributional DP with stock augmentation, a technique\npreviously introduced for classic DP in the context of risk-sensitive RL, where\nthe MDP state is augmented with a statistic of the rewards obtained so far\n(since the first time step). We find that a number of recently studied problems\ncan be formulated as stock-augmented return distribution optimization, and we\nshow that we can use distributional DP to solve them. We analyze distributional\nvalue and policy iteration, with bounds and a study of what objectives these\ndistributional DP methods can or cannot optimize. We describe a number of\napplications outlining how to use distributional DP to solve different\nstock-augmented return distribution optimization problems, for example\nmaximizing conditional value-at-risk, and homeostatic regulation. To highlight\nthe practical potential of stock-augmented return distribution optimization and\ndistributional DP, we combine the core ideas of distributional value iteration\nwith the deep RL agent DQN, and empirically evaluate it for solving instances\nof the applications discussed.",
    "arxiv_id": "2501.13028v1"
  },
  {
    "title": "Autonomy-of-Experts Models",
    "abstract": "Mixture-of-Experts (MoE) models mostly use a router to assign tokens to\nspecific expert modules, activating only partial parameters and often\noutperforming dense models. We argue that the separation between the router's\ndecision-making and the experts' execution is a critical yet overlooked issue,\nleading to suboptimal expert selection and ineffective learning. To address\nthis, we propose Autonomy-of-Experts (AoE), a novel MoE paradigm in which\nexperts autonomously select themselves to process inputs. AoE is based on the\ninsight that an expert is aware of its own capacity to effectively process a\ntoken, an awareness reflected in the scale of its internal activations. In AoE,\nrouters are removed; instead, experts pre-compute internal activations for\ninputs and are ranked based on their activation norms. Only the top-ranking\nexperts proceed with the forward pass, while the others abort. The overhead of\npre-computing activations is reduced through a low-rank weight factorization.\nThis self-evaluating-then-partner-comparing approach ensures improved expert\nselection and effective learning. We pre-train language models having 700M up\nto 4B parameters, demonstrating that AoE outperforms traditional MoE models\nwith comparable efficiency.",
    "arxiv_id": "2501.13074v1"
  },
  {
    "title": "Robust Representation Consistency Model via Contrastive Denoising",
    "abstract": "Robustness is essential for deep neural networks, especially in\nsecurity-sensitive applications. To this end, randomized smoothing provides\ntheoretical guarantees for certifying robustness against adversarial\nperturbations. Recently, diffusion models have been successfully employed for\nrandomized smoothing to purify noise-perturbed samples before making\npredictions with a standard classifier. While these methods excel at small\nperturbation radii, they struggle with larger perturbations and incur a\nsignificant computational overhead during inference compared to classical\nmethods. To address this, we reformulate the generative modeling task along the\ndiffusion trajectories in pixel space as a discriminative task in the latent\nspace. Specifically, we use instance discrimination to achieve consistent\nrepresentations along the trajectories by aligning temporally adjacent points.\nAfter fine-tuning based on the learned representations, our model enables\nimplicit denoising-then-classification via a single prediction, substantially\nreducing inference costs. We conduct extensive experiments on various datasets\nand achieve state-of-the-art performance with minimal computation budget during\ninference. For example, our method outperforms the certified accuracy of\ndiffusion-based methods on ImageNet across all perturbation radii by 5.3% on\naverage, with up to 11.6% at larger radii, while reducing inference costs by\n85$\\times$ on average. Codes are available at:\nhttps://github.com/jiachenlei/rRCM.",
    "arxiv_id": "2501.13094v1"
  },
  {
    "title": "Leveraging LLMs to Create a Haptic Devices' Recommendation System",
    "abstract": "Haptic technology has seen significant growth, yet a lack of awareness of\nexisting haptic device design knowledge hinders development. This paper\naddresses these limitations by leveraging advancements in Large Language Models\n(LLMs) to develop a haptic agent, focusing specifically on Grounded Force\nFeedback (GFF) devices recommendation. Our approach involves automating the\ncreation of a structured haptic device database using information from research\npapers and product specifications. This database enables the recommendation of\nrelevant GFF devices based on user queries. To ensure precise and contextually\nrelevant recommendations, the system employs a dynamic retrieval method that\ncombines both conditional and semantic searches. Benchmarking against the\nestablished UEQ and existing haptic device searching tools, the proposed haptic\nrecommendation agent ranks in the top 10\\% across all UEQ categories with mean\ndifferences favoring the agent in nearly all subscales, and maintains no\nsignificant performance bias across different user groups, showcasing superior\nusability and user satisfaction.",
    "arxiv_id": "2501.12573v1"
  },
  {
    "title": "Guaranteed Recovery of Unambiguous Clusters",
    "abstract": "Clustering is often a challenging problem because of the inherent ambiguity\nin what the \"correct\" clustering should be. Even when the number of clusters\n$K$ is known, this ambiguity often still exists, particularly when there is\nvariation in density among different clusters, and clusters have multiple\nrelatively separated regions of high density. In this paper we propose an\ninformation-theoretic characterization of when a $K$-clustering is ambiguous,\nand design an algorithm that recovers the clustering whenever it is\nunambiguous. This characterization formalizes the situation when two high\ndensity regions within a cluster are separable enough that they look more like\ntwo distinct clusters than two truly distinct clusters in the clustering. The\nalgorithm first identifies $K$ partial clusters (or \"seeds\") using a\ndensity-based approach, and then adds unclustered points to the initial $K$\npartial clusters in a greedy manner to form a complete clustering. We implement\nand test a version of the algorithm that is modified to effectively handle\noverlapping clusters, and observe that it requires little parameter selection\nand displays improved performance on many datasets compared to widely used\nalgorithms for non-convex cluster recovery.",
    "arxiv_id": "2501.13093v1"
  }
]